Tuesday, July 21, 2009

SQL Azure 10GB Limit: Implications & Workarounds

I’ve taken my first deep look at Azure to answer some specific questions that are covered in detail below. My overall impression: Azure (and cloud computing in general) is a powerful paradigm shift for our industry, with the end-result being more emphasis on pure software engineering and less emphasis on operations. Solution providers, consumers, and businesses not in the infrastructure outsourcing field all stand to benefit under this re-alignment – and we should all embrace it. I’m personally very excited to start delivering applications under this new mindset; I think concerns about the longevity of the initiative (or Microsoft’s dedication to it) are misplaced.

As a permanent fixture, Azure represents the end of the “keeping the lights on” mentality that has plagued IT:

“When IT stops being an enabler and simply acts as a cost center or a "necessary evil", then entertaining new projects that leverage one of Gartner Top 10 strategic technologies is unrealistic and doomed for failure or hardship at a minimum. Why? Because this mentality is reactive and sometimes even defensive. This mentality also does not encourage investing in the future whether that is in training employees, establishing and investing in architecture, or addressing problems with long term solutions. Instead, people are rewarded for quick fix fire fighting heroics and at the end of the day it is the user community that suffers. The users get band-aids on top of already outdated systems and are often forced to work in ways in which the technology and systems dictate, instead of the other way around.

It gets worse for the employees in an IT department in charge of "keeping the lights on". Their resumes become stale as their skills are not updated to reflect the newer technologies that innovative companies seek. The reactive nature of the culture can squash innovation and people who have valid solutions may not bring them forward because it would require more than a quick fix. In the end these types of shops become like an assembly line in nature where people clock in, work their shift, and clock out. This is not what must of us envisioned when we enthusiastically enrolled in IT related curriculums in college back in the day.”

Whether it takes 2, 5, or 10 years to permeate isn’t important; but that Microsoft sees 50% of it’s server business moving online in the next 5 years is a telling approximation.

Before delving into the specific questions, I would highly recommend the presentation by Brandon Watson on the state and drivers behind cloud computing – it paints an interesting picture around Amazon’s seasonality and Google’s Ad dependency as important factors in their desire to offload infrastructure costs. Likewise, Microsoft too developed its strategy out of necessity and not choice, which further underscores its importance. (Microsoft’s Business Productivity Online Suite suite isn’t yet running off of Azure, but as adoption increases so to will the need for on-demand scale.)

Implications and workarounds for the 10GB database limit

The work-arounds, for the time being, boil down to: sharding (a term coined by Google Engineers to refer to a method of horizontal partitioning), integration with on-premise SQL or better leverage of Azure Storage. There’s a great article that captures some interesting discussion around this limit [11] and the ramifications for backups; however, the Azure FAQ seems to suggest that backups-in-the-cloud are included. That most of TSQL constructs are supported, with a few obvious exceptions[4], is certainly good news. But the 10GB limit is nonetheless a disappointment in the v1 announcement – it is by far the largest barrier to adoption, particularly for employee-facing applications that collaborate with several internal systems. As a colleague of mine pointed out: that you can connect to on-premise SQL is one thing, but the amount of effort involved to match the performance of co-located servers, particularly with chatty applications, is another. For self-contained, consumer-facing applications, migration is much more realistic. Remember, storage outside of SQL doesn’t have an upper-limit, capacity-wise; but its per-transaction pricing does offer less cost-certainty [1]. If you don’t have existing database assets to leverage,  you may opt to build your data model on Azure storage directly, but you would be sacrificing portability and toolset in the process.

On the ability to integrate with on-premise SQL [18]:

“In particular, Nigel did a really nice demo on the compatibility SDS v1 will have with on-premise SQL and how that will make porting existing applications to a cloud environment really easy.  A very strong message particularly in conjunction with the ability to run web and worker processes in Windows Azure Compute.  Together, these capabilities really provide a great foundation for delivering next generation services based applications.”

How does fault tolerance work, and are you charged for separate Web Role instances required?

Whether separately-billed instances of the Web Role (required to satisfy the 99.95% requirement in the SLA) are required for high-availability is a bit unclear [9] . It seems, on the surface, that a single role can satisfy the SLA requirements through configuration alone, without additional fees [17]:

The role definition also includes the number of instances that the fabric should deploy, either a specific number or a lower and upper bound. Finally, constraints may be specified, including how many role instances may run in the same node, whether instances of different roles should be co-located, and how to allocate instances across update domains and fault domains.

Update domains and fault domains are optional but useful features. Update domains are used to partition the service during upgrades. A role may specify the number of update domains its instances should be deployed in.

During rolling OS upgrades and service updates, only one update domain will be upgraded at a time. While an update domain is being upgraded, the fabric won't route live traffic to any roles or load balancers inside it. After an update domain has been fully upgraded and all services in it report that they're health, the fabric returns them to service and starts on the next update domain.

Fault domains are similar to update domains. They're basically disjoint failure zones within a single datacenter. They're determined according to the datacenter topology, based on things like rack and switch configurations, which make certain classes of failures likely to affect well-defined groups of nodes. As with update domains, a role may specify the number of fault domains its instances should be deployed in.”

Pricing and Licensing

Note that we haven’t seen the the subscription model or volume licensing just yet [1]:

“Microsoft will offer Azure services in three different ways, including a consumption (pay-as-you-go) model, a subscription model for resellers, and volume licensing to enterprises. The so-called consumption model appears to be priced comparably with the current leader in this market, Amazon.com, which offers a variety of online services, including hosted Windows and Linux services.

Microsoft will charge 12 cents per hour for computer infrastructure services, 15 cents per gigabyte for storage, and 10 cents per 10,000 storage transactions. Users of Microsoft's cloud-based database, SQL Azure, will incur monthly charges of $9.99 for the Web Edition, which supports up to 1GB databases, and $99.99 for the Business Edition, which allows up to 10GB databases. The web developer-oriented .NET Services will cost 15 cents per 100,000 message operations. Additional bandwidth charges apply across the three services as well: 10 cents per gigabyte for incoming data and 15 cents per gigabyte for outgoing data.”

Session, Cache Data

This is a bit of paradigm shift for the typical .NET developer: you no longer have in-process memory (a good thing for scalability) – so where do values in HttpContext.Cache and HttpContext.Session go? The SDK has providers [2] that allow plug into your application “seamlessly” to store these values in Azure Storage (table and blobs) – but there must be a performance penalty associated with this, not to mention the $0.01/10K transactions and $0.15/Gb cost associated with reading/writing to this store. (I use “seamlessly" because objects crossing the wire have to be serializable, so some minor work may be required there.)  Whether Velocity eventually services this role remains to be seen; lots of desire to see this happen, mostly for the richness of the API [21]. In terms of leveraging the local 250Gb available to each web role, it appears that there this doesn’t offer much use to developers as it’s not synchronized across multiple instances.

On-Premise Azure

There are no plans for an on-premise version of Azure (storage or otherwise), as far as I can see [20].

Exception Handling

I don’t have a good handle yet on how this is best applied. Some providers will likely emerge that channel diagnostic messages into Azure Storage where they can later be searched, grouped and presented by various ‘Storage Explorer’ like applications [6].

Azure Medium Trust Implications

No major limitations in the custom Medium Trust policy used by Azure, from what I can see [16]: external TCP connections and local writes both seem to be permitted, allowing a full range of possibilities around external integration scenarios using WebRequest, FtpWebRequest, StmpClient [15] and the like. 

Sample Deployment of AzureBright: http://www.radicallyfrugal.com

What better test-drive than to take an MVC application and deploy it to the cloud, under a custom domain.

What is AzureBright [1]?

“AzureBright Blog & Forum are two cloud-hosted applications, created for the new CloudApp() contest. Goals: 1.To have a fast, clean (html), scalable, Web 2.0 look and feel, and SEO Friendly, Blog and Forum (StackOverflow.com-like functionality) for the .net community. 2.To share development knowledge and experiences, of new challenges encountered with Windows Azure Service’s Platform.

Technologies used: Windows Azure Storage (Table, Queue, Blob), ASP.NET MVC framework (C#), AJAX using jQuery/json.

Challenges encountered: Paging over data. “Order by” and “Group by” data, Leverage Concurrency, Update/Insert/Delete Batch Transactions, Page Views, and other counters,Voting, Spam, Multiple Worker messages, Membership (manage users and roles)…”

References:

[1] – Pricing/Reselling

http://www.microsoft.com/azure/pricing.mspx
http://windowsitpro.com/article/articleid/102509/microsoft-answers-windows-7-azure-rtm-questions.html

[2] – Session/Cache Providers

http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/2d1340ed-0ad0-456a-b069-aa6b85672102
http://www.aaronlerch.com/blog/2008/11/01/run-aspnet-mvc-on-windows-azure/

[3] – Troubleshooting

http://stackoverflow.com/questions/444528/what-is-my-azure-table-storage-accountname
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/cbf60a9a-861f-475d-90c6-ba8ad3282d2f

[4] – TSQL Support in SQL Data Services

http://blogs.msdn.com/ssds/archive/2009/07/07/9823115.aspx

[5] – Queue Performance (good blog...)

http://cid-61aea8168d26ea6b.profile.live.com/
http://bstineman.spaces.live.com/blog/cns!61AEA8168D26EA6B!312.entry

[6] – Storage Explorer

http://azurestorageexplorer.codeplex.com/
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/d4e8f1be-297b-47ff-8135-c26ceafea9c2
http://davidpallmann.blogspot.com/

[7] – 10Gb Limit

http://www.techcrunch.com/2009/07/14/microsofts-azure-gets-a-business-model-and-an-official-release-date/
http://www.sharepointevolved.com/blog/Lists/Categories/Category.aspx?Name=Micorosft%20Azure
http://blogs.msdn.com/windowsazure/archive/2009/07/14/confirming-commercial-availability-and-announcing-business-model.aspx

[8] – SLA

Windows Azure:

Windows Azure has separate SLA’s for compute and storage. For compute, we guarantee that when you deploy two or more role instances in different fault and upgrade domains your Internet facing roles will have external connectivity at least 99.95% of the time. Additionally, we will monitor all of your individual role instances and detect within two minutes when a role instance’s process is not running and initiate corrective action (we will publish by PDC the full details of our uptime promise for individual role instances). For storage, we guarantee that at least 99.9% of the time we will successfully process correctly formatted requests that we receive to add, update, read and delete data. We also guarantee that your storage accounts will have connectivity to our Internet gateway.

[9] – Fault & Update Domains

http://www.azureusergroup.com/profiles/blogs/fault-domains-and-upgrade

[10] – AzureBright
http://azurebright.codeplex.com/
http://www.newcloudapp.com/vote.aspx

[11] – Brent Ozar (SQL Server DBA)

http://www.brentozar.com/archive/2009/07/sql-azure-pricing-10-for-1gb-100-for-10gb/

[12] – TechCrunch Thread

http://www.techcrunch.com/2009/07/14/microsofts-azure-gets-a-business-model-and-an-official-release-date/

[13] – Excellent Presentation by Brandon Watson

http://www.manyniches.com/cloudcomputing/cloud-platforms-whats-going-on/

[14] – Azure FAQ

http://www.microsoft.com/azure/faq.mspx

[15] – Windows Azure - Sending SMTP Emails!

http://blogs.msdn.com/davidlem/archive/2009/01/08/windows-azure-sending-smtp-emails.aspx
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/ef7a0cd5-5b2c-4ab2-902b-d39cc4b69913

[16] – Windows Azure SDK Trust Policy Reference

http://msdn.microsoft.com/en-us/library/dd179369.aspx

[17] – Deep Technical Details:

http://snarfed.org/space/windows+azure+details

[18] – SQL Azure Team Blog

http://blogs.msdn.com/ssds/archive/2009/03/20/9493117.aspx

[19] – Estimation Spreadsheet:

http://blogs.codes-sources.com/redo/archive/2009/07/14/azure-services-platform-windows-azure-sql-services-net-services-calculez-vous-meme-votre-prix-d-hebergement-mensuel.aspx

[20] – On-premise Azure ruled out:

http://blogs.zdnet.com/microsoft/?p=2340
http://blogs.zdnet.com/microsoft/?p=2340

[21] – Velocity on Azure:

http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/8b4fc071-fe1e-40d2-987a-d6a95b13a34b
http://silverlightuk.blogspot.com/2008/11/windows-azure-cache-based-session.html
http://social.msdn.microsoft.com/Forums/en-US/velocity/thread/a7bbaaa7-de2a-4b24-b41f-cf30f6fb592e
http://blogs.msdn.com/davidlem/archive/2008/12/16/windows-azure-what-happens-in-the-data-center.aspx

4 comments - Subscribe!

Anonymous said...

Hey Nariman... Do you have any success stories to share?? Any sites worth showcasing that have used either Azure or EC2 to deliver the on-demand-scalability-at-pennies-gb vision?

Nariman Haghighi said...

A great summary of what people are doing with Azure (in the UK) was just recently posted:
http://blogs.msdn.com/ukisvdev/archive/2009/08/12/people-are-doing-some-interesting-things-with-windows-azure-in-the-uk.aspx

vintana said...

Any word or insights on how to migrate data to SQL Azure?

Nariman Haghighi said...

The following posts cover the limited options you have for importing database content into SQL Azure:

On Using SSIS and ST.EXE:
http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/dd375420-bd55-425f-b400-d8dad734a13c

On importing into Table Storage:
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/7969f6c5-c74d-4b5d-a572-8a1052adf176

Some background on TDS:
http://www.azurejournal.com/2009/05/sql-data-services-your-database-in-the-cloud/

EC2 offers more attractive options around physically shipping data; expect to see advancements along these lines when the size limitations around SDS are eased.