Fun with Azure DB

This week I've been having fun with Azure DB.  For those who haven't experienced Microsoft's cloudy SQL offering, this is a DaaS (Database-as-a-Service) where one can fire up a SQL Server database instance in the cloud and connect to it from anywhere, using any compatible client.

How does this differ from a normal VM?  Well, this is just the database instance - so, you can connect to it using SSMS or the tool of your choice.  However, there's no underlying OS, no file structures, no Windows authentication.

This means a few differences with your traditional on-tin database.  For one, the performance you receive is subject to the number of DTUs - Database Transactional Units - that you purchase as part of your package.  Each Azure DB instance will operate up to a maximum number of simultaneous DTUs.  This means you can, under heavy workload, start running into performance problems and mysterious wait types, like the LOG_RATE_GOVERNOR wait which essentially means - buy more DTUs.

As part of some academic work I'm doing, I was using Azure DB this week to demonstrate that a particular query ran better on a de-normalised schema than in a normalised one.  Using Azure DB to do this was painful.  I found performance so variable - the same query, with no changes, would run to different durations ranging from 8 seconds to over 40 seconds.  When I checked the wait types, these were either the aforementioned or SOS_SCHEDULER_YIELD.  Checking Azure stats showed I kept hitting my limit of 20 DTUs:

Bear in mind I'm purchasing a reasonably small Standard setup costing around £25/month, so I accept I'm not buying a Cray supercomputer.  But I was a little disappointed to watch Azure behaving like a potato with two wires sticking out of the top, when I was asking it to, as I recall, query over a single column in a table of around 6m rows (around 200MB of actual data) and group by count.  Even when I added an index this took an unacceptably long time.  The wait types were mostly CPU and log quota bound.

Alas, this was a pattern that repeated itself.  Performance was so variable that I found it frustrating to use.  And there's a couple of unwanted side effects, too - like the propensity of the Azure connection to reconnect at random intervals and assign me a new connection SPID, confusing even SSMS:

Oh, and did I mention that backing up your Azure DB to a 'normal' backup file is virtually impossible?  Meaning that to get your data out, you must use SSIS or export it as a .BACPAC file - great, providing you have no heaps.  You don't have heaps, do you?

In conclusion, I would recommend using this with caution.  It's probably useful for a certain group of use-cases, but fails in providing some of the basic functionality we have gotten used to - the ability to interact with the OS, the use of SQL Agent as a scheduling engine - in fact, you can check out further limitations here, here and here.  Personally, I'll be sticking to the real version.


Popular Posts