Tuesday, July 21, 2009

SQL Azure 10GB Limit: Implications & Workarounds

I’ve taken my first deep look at Azure to answer some specific questions that are covered in detail below. My overall impression: Azure (and cloud computing in general) is a powerful paradigm shift for our industry, with the end-result being more emphasis on pure software engineering and less emphasis on operations. Solution providers, consumers, and businesses not in the infrastructure outsourcing field all stand to benefit under this re-alignment – and we should all embrace it. I’m personally very excited to start delivering applications under this new mindset; I think concerns about the longevity of the initiative (or Microsoft’s dedication to it) are misplaced.

As a permanent fixture, Azure represents the end of the “keeping the lights on” mentality that has plagued IT:

“When IT stops being an enabler and simply acts as a cost center or a "necessary evil", then entertaining new projects that leverage one of Gartner Top 10 strategic technologies is unrealistic and doomed for failure or hardship at a minimum. Why? Because this mentality is reactive and sometimes even defensive. This mentality also does not encourage investing in the future whether that is in training employees, establishing and investing in architecture, or addressing problems with long term solutions. Instead, people are rewarded for quick fix fire fighting heroics and at the end of the day it is the user community that suffers. The users get band-aids on top of already outdated systems and are often forced to work in ways in which the technology and systems dictate, instead of the other way around.

It gets worse for the employees in an IT department in charge of "keeping the lights on". Their resumes become stale as their skills are not updated to reflect the newer technologies that innovative companies seek. The reactive nature of the culture can squash innovation and people who have valid solutions may not bring them forward because it would require more than a quick fix. In the end these types of shops become like an assembly line in nature where people clock in, work their shift, and clock out. This is not what must of us envisioned when we enthusiastically enrolled in IT related curriculums in college back in the day.”

Whether it takes 2, 5, or 10 years to permeate isn’t important; but that Microsoft sees 50% of it’s server business moving online in the next 5 years is a telling approximation.

Before delving into the specific questions, I would highly recommend the presentation by Brandon Watson on the state and drivers behind cloud computing – it paints an interesting picture around Amazon’s seasonality and Google’s Ad dependency as important factors in their desire to offload infrastructure costs. Likewise, Microsoft too developed its strategy out of necessity and not choice, which further underscores its importance. (Microsoft’s Business Productivity Online Suite suite isn’t yet running off of Azure, but as adoption increases so to will the need for on-demand scale.)

Implications and workarounds for the 10GB database limit

The work-arounds, for the time being, boil down to: sharding (a term coined by Google Engineers to refer to a method of horizontal partitioning), integration with on-premise SQL or better leverage of Azure Storage. There’s a great article that captures some interesting discussion around this limit [11] and the ramifications for backups; however, the Azure FAQ seems to suggest that backups-in-the-cloud are included. That most of TSQL constructs are supported, with a few obvious exceptions[4], is certainly good news. But the 10GB limit is nonetheless a disappointment in the v1 announcement – it is by far the largest barrier to adoption, particularly for employee-facing applications that collaborate with several internal systems. As a colleague of mine pointed out: that you can connect to on-premise SQL is one thing, but the amount of effort involved to match the performance of co-located servers, particularly with chatty applications, is another. For self-contained, consumer-facing applications, migration is much more realistic. Remember, storage outside of SQL doesn’t have an upper-limit, capacity-wise; but its per-transaction pricing does offer less cost-certainty [1]. If you don’t have existing database assets to leverage,  you may opt to build your data model on Azure storage directly, but you would be sacrificing portability and toolset in the process.

On the ability to integrate with on-premise SQL [18]:

“In particular, Nigel did a really nice demo on the compatibility SDS v1 will have with on-premise SQL and how that will make porting existing applications to a cloud environment really easy.  A very strong message particularly in conjunction with the ability to run web and worker processes in Windows Azure Compute.  Together, these capabilities really provide a great foundation for delivering next generation services based applications.”

How does fault tolerance work, and are you charged for separate Web Role instances required?

Whether separately-billed instances of the Web Role (required to satisfy the 99.95% requirement in the SLA) are required for high-availability is a bit unclear [9] . It seems, on the surface, that a single role can satisfy the SLA requirements through configuration alone, without additional fees [17]:

The role definition also includes the number of instances that the fabric should deploy, either a specific number or a lower and upper bound. Finally, constraints may be specified, including how many role instances may run in the same node, whether instances of different roles should be co-located, and how to allocate instances across update domains and fault domains.

Update domains and fault domains are optional but useful features. Update domains are used to partition the service during upgrades. A role may specify the number of update domains its instances should be deployed in.

During rolling OS upgrades and service updates, only one update domain will be upgraded at a time. While an update domain is being upgraded, the fabric won't route live traffic to any roles or load balancers inside it. After an update domain has been fully upgraded and all services in it report that they're health, the fabric returns them to service and starts on the next update domain.

Fault domains are similar to update domains. They're basically disjoint failure zones within a single datacenter. They're determined according to the datacenter topology, based on things like rack and switch configurations, which make certain classes of failures likely to affect well-defined groups of nodes. As with update domains, a role may specify the number of fault domains its instances should be deployed in.”

Pricing and Licensing

Note that we haven’t seen the the subscription model or volume licensing just yet [1]:

“Microsoft will offer Azure services in three different ways, including a consumption (pay-as-you-go) model, a subscription model for resellers, and volume licensing to enterprises. The so-called consumption model appears to be priced comparably with the current leader in this market, Amazon.com, which offers a variety of online services, including hosted Windows and Linux services.

Microsoft will charge 12 cents per hour for computer infrastructure services, 15 cents per gigabyte for storage, and 10 cents per 10,000 storage transactions. Users of Microsoft's cloud-based database, SQL Azure, will incur monthly charges of $9.99 for the Web Edition, which supports up to 1GB databases, and $99.99 for the Business Edition, which allows up to 10GB databases. The web developer-oriented .NET Services will cost 15 cents per 100,000 message operations. Additional bandwidth charges apply across the three services as well: 10 cents per gigabyte for incoming data and 15 cents per gigabyte for outgoing data.”

Session, Cache Data

This is a bit of paradigm shift for the typical .NET developer: you no longer have in-process memory (a good thing for scalability) – so where do values in HttpContext.Cache and HttpContext.Session go? The SDK has providers [2] that allow plug into your application “seamlessly” to store these values in Azure Storage (table and blobs) – but there must be a performance penalty associated with this, not to mention the $0.01/10K transactions and $0.15/Gb cost associated with reading/writing to this store. (I use “seamlessly" because objects crossing the wire have to be serializable, so some minor work may be required there.)  Whether Velocity eventually services this role remains to be seen; lots of desire to see this happen, mostly for the richness of the API [21]. In terms of leveraging the local 250Gb available to each web role, it appears that there this doesn’t offer much use to developers as it’s not synchronized across multiple instances.

On-Premise Azure

There are no plans for an on-premise version of Azure (storage or otherwise), as far as I can see [20].

Exception Handling

I don’t have a good handle yet on how this is best applied. Some providers will likely emerge that channel diagnostic messages into Azure Storage where they can later be searched, grouped and presented by various ‘Storage Explorer’ like applications [6].

Azure Medium Trust Implications

No major limitations in the custom Medium Trust policy used by Azure, from what I can see [16]: external TCP connections and local writes both seem to be permitted, allowing a full range of possibilities around external integration scenarios using WebRequest, FtpWebRequest, StmpClient [15] and the like. 

Sample Deployment of AzureBright: http://www.radicallyfrugal.com

What better test-drive than to take an MVC application and deploy it to the cloud, under a custom domain.

What is AzureBright [1]?

“AzureBright Blog & Forum are two cloud-hosted applications, created for the new CloudApp() contest. Goals: 1.To have a fast, clean (html), scalable, Web 2.0 look and feel, and SEO Friendly, Blog and Forum (StackOverflow.com-like functionality) for the .net community. 2.To share development knowledge and experiences, of new challenges encountered with Windows Azure Service’s Platform.

Technologies used: Windows Azure Storage (Table, Queue, Blob), ASP.NET MVC framework (C#), AJAX using jQuery/json.

Challenges encountered: Paging over data. “Order by” and “Group by” data, Leverage Concurrency, Update/Insert/Delete Batch Transactions, Page Views, and other counters,Voting, Spam, Multiple Worker messages, Membership (manage users and roles)…”


[1] – Pricing/Reselling


[2] – Session/Cache Providers


[3] – Troubleshooting


[4] – TSQL Support in SQL Data Services


[5] – Queue Performance (good blog...)


[6] – Storage Explorer


[7] – 10Gb Limit


[8] – SLA

Windows Azure:

Windows Azure has separate SLA’s for compute and storage. For compute, we guarantee that when you deploy two or more role instances in different fault and upgrade domains your Internet facing roles will have external connectivity at least 99.95% of the time. Additionally, we will monitor all of your individual role instances and detect within two minutes when a role instance’s process is not running and initiate corrective action (we will publish by PDC the full details of our uptime promise for individual role instances). For storage, we guarantee that at least 99.9% of the time we will successfully process correctly formatted requests that we receive to add, update, read and delete data. We also guarantee that your storage accounts will have connectivity to our Internet gateway.

[9] – Fault & Update Domains


[10] – AzureBright

[11] – Brent Ozar (SQL Server DBA)


[12] – TechCrunch Thread


[13] – Excellent Presentation by Brandon Watson


[14] – Azure FAQ


[15] – Windows Azure - Sending SMTP Emails!


[16] – Windows Azure SDK Trust Policy Reference


[17] – Deep Technical Details:


[18] – SQL Azure Team Blog


[19] – Estimation Spreadsheet:


[20] – On-premise Azure ruled out:


[21] – Velocity on Azure:


Friday, July 17, 2009

PopUrls, PopFly & The Future of Aggregation with Yahoo Pipes

Hot on the trail of Google, Microsoft announced this afternoon that it too would be discontinuing it’s own Mashup Editor, PopFly [1]. Naturally, no one is happier to rejoice in the demise of these two experiments than Yahoo, whose own Pipes offering continues to thrive [2].

Having test-driven Pipes recently, I’m not too surprised. Pipes’ success, much like twitter, lies in its simplicity and elegance; they’ve focused on delivering core data that serves as the essential building block for integrated applications. It’s a simple way to build a unit of logic that’s truly boundary-less.

Now take aggregation service of today like PopUrls [3], for instance:


I’m hard-pressed to see exactly how this exact same data, supplied by public RSS interfaces, can’t be constructed in a personalized fashion using Pipes in a way that can be consumed by existing readers?

What’s really the value-add here, beyond the interface?

And if there is a value-add, why isn’t it exposed for consumption by others?

Yahoo Pipes carries immense potential for the wider development community and it’s no surprise that it’s the lone standing ‘Mashup Facilitator’ among the leaders.

[1] – Washington Post

[2] – Pipes Blog

[3] – PopUrls (Thomas Marban)

Claim code: 6hjuni7rkc

Tuesday, July 14, 2009

Hosting Your Site on a Content Distribution Network (CDN)

What if you were able to deploy a completely updatable static HTML site in minutes and have it distributed to a  global audience for pennies a GB?

A thought occurred to me the other day: offerings like Google Sites and SharePoint Online, though not branded as CDNs per se,  likely enjoy reliable access times globally; and both allow arbitrary files to be addressable at user defined URLs using CNAME records. However, with Sites at least, *.HTML, *.CSS, and *.JS files are explicitly prohibited, leaving users struggling to achieve concepts not permitted in the default templates and relegating the products to internal-use scenarios.

But imagine if you had free-reign over static files that leverage the same infrastructure? Imagine if a CDN could host your entire website and not just its static elements. Sure Azure (pricing just announced [14]), EC2 and AppEngine carry promise for offloading applications to the cloud, but the serving of static sites still seems better suited for well-established CDNs.

Welcome to the world that I’ve been immersed in for the last few days: evaluating CDNs as a cost-effective approach for hosting a static site with a gradual ramp-up to ~2M visitors a day. Solution requirements: high-availability, high-scalability and cost-predictability.

Traditional players (Limelight, Akamai and Level 3 – who collectively control 80% of the market) are slowly being challenged by a new breed of pay-as-you-go ‘budget’ CDNs courtesy of pioneers like Amazon, SimpleCDN and the like. Even Limelight re-sellers like Rackspace are undercutting aggressively while offering users a cheap alternative to the same network. In terms of high-level feature set, they’re all roughly comparable; key differences lie mostly in the quality and reach of the distribution networks via so-called “edge” servers around the world.

But there are massive price differences [3] – as large as 10-20x between the least expensive (SimpleCDN) and the most expensive (Akamai). It’s not clear what the noteworthy differentiator between the leading networks is (or whether there is one) – I think they jury’s still out and, in practise, it likely depends largely on the account and their desire to accommodate you.  Attempts to quantify performance between the two is a seriously challenged science [10]. The conclusion of “..if you want to really test a CDN’s performance, and see what it will do for your content and your users, you’ve got to run a trial” seems to be the only take-away. And the challengers aren’t without their share of criticism either [6] – both are relatively new (Amazon’s CloudFront is still in beta and doesn’t offer an SLA; it has had some outages) and both have relatively limited distribution networks (SimpleCDN has 10 points of presence in US/Europe and Amazon has 14 that cover Asia as well [9]).

Another consideration: separation of storage from delivery – normally, the CDN can either host your content and charge you a storage fee ($.15/GB, this is pretty standard across the lot, including Azure) or it can access content that’s already hosted and distribute it. With Amazon, you must use their storage services (S3) to be eligible for the CDN, and they will charge you the transfer rates for propagating content from S3 to the CloudFront CDN. SimpleCDN has a product in beta (Lightening) that supports already-hosted content but their mature offering, like Amazon, also relies on a built-in storage service (also at $0.15/GB).

But hosting your own content defeats the purpose of freeing yourself from infrastructure and high-availability concerns; you would have to ensure that the source content is always available or risk requests from the CDN (upon cache expirations) resulting in 404s (this behaviour was confirmed for at least Limelight).  And this brings me to my next point…

Cache-Control Headers – Why they matter

Cache-control headers on source content (whether hosted by you or a storage services) are respected by the CDN distribution network. This is the only mechanism you have to expire content located on edge servers, otherwise it defaults to something like 24 hours.  Normally, there is no programmatic API to invalidate cached files on edge-servers [5],  which creates a tricky balance: to be able to update something, you must define the TTL when you create the file, before it propagates to edge-servers. This is particularly relevant for URLs that can’t be versioned. You can’t selectively update something: either you pay the price of updating it regularly of give up the ability entirely. In our case, we’re looking to employ the CDN in a unique way to host the entire site. In order to do so, we need to refresh HTML every few minutes and cache-controls on the source files allow us to do this. At the same time, we want to employ standard caching practises to ensure that images, CSS files are held as far as the clients and that only HTML requests beyond say x minutes reach the edge-servers and propagate back to the source server.

Can you run the entire website on a CDN?

I’d like to think so, and we’re still trying to see if there are any hidden catches in this scheme. One concern that we’ve run into: how do we ensure that domain.com, www.domain.com all redirect to the index page www.domain.com/index.html? Remember this isn’t a web server where you can just go into IIS and configure the default page. These are CNAME records that point to the custom CDN sub-domain that’s been provisioned for your particular account. In my discussions with Limelight, they mentioned that their engineer can apply this as a custom one-time configuration, which is nice touch. For SimpleCDN and Amazon, I think your only option is to set up DNS WebForward records so that both domain.com and www.domain.com point to content.domain.com/index.html, with the CNAME record for content.domain.com pointing to your CDN. Under this workaround, trying to access content.domain.com will generate a 403 [11]. The lack of a “index file” option in the pay-as-you-go CDNs could be a bit of a hindrance if you’re URL sensitivity.

And where does the HTML come from?

An internal ASP.NET site generates the desired HTML; a scheduled service consumes this HTML and publishes to the storage service (via FTP or SOAP APIs) on a regular interval, with cache headers to expire the content as needed. This publication scheme itself needs some level of redundancy for a solution demanding high-availability, but instead of failures propagating to your end-users they’ll simply result in stale content.


Traditional hosting doesn’t stand a chance in competing with these prices and is more than x2 the cost even at the Limelight tier. Managed hosting providers will likely charge bandwidth on the 95th percentile of the transfer rate as opposed to the cumulative transfer amount. This means they take the highest 5% hourly/daily Mbps rate , drop them, and use the next highest Mbps to calculate your monthly invoice. Say you had a commitment to 3-5Mbps burstable to 100Mbps at an overcharge of $350/1Mbps. This carries tremendous cost-risks for a solution that may spike above the 5% barrier but otherwise remain near the initial commitment level. And if you need gigabit burstable network, then costs effectively double from there. By their own admission, managed hosting providers currently lack support for on-demand scaling and are inherently unequipped to compete in these scenarios; not to mention they would require a minimum 12-month commitment that would discourage event-based sites campaigns that are active for only a portion of the year. (Limelight also has a minimum 12-month commitment, but they offer so-called ‘bucket pricing’ that gives you a fixed $/Gb rate that can be used for shorter durations.)

Rackspace Cloud (Formerly Mosso)

We looked briefly at Rackspace for their strategic partnership with Limelight Networks. While access to the full-range of the Limelight reach (60 points-of-presence globally) at .22/GB is compelling, we had no choice but to dismiss this option for two reasons: lack of CNAME support and lack of access for direct uploads to Limelight. That we would have to upload to Limelight via Rackspace introduced a new point of failure in our solution that relies on a frequent update schedule.  Others have also highlighted the lack of SSL support with Rackspace [12].

Limelight Networks

I’d also like to take this opportunity to acknowledge the unusually strong first impression that Limelight  customer service had on me; these guys are a class-act all the way, from quick turn-arounds to their willingness to accommodate our specifics and for not being afraid to refer us to resellers if the scenario warranted it.


Performance-wise, Limelight is head-and-shoulders above the field surveyed here [12] [13] (by one report ~4.2x faster than CloudFront in US/Europe, ~1.9x faster in Asia and ~3.1x faster globally). If CNAME and SSL aren’t necessary, than Cloud Files is a compelling access point to this network. I think all CDNs could support read-only (i.e. HTTP GETs) sites just as easily; if you need to support POSTs, you have to figure out a way to pass those directly to the source server. Released 3-4 months ago, it seems that LimelightSITE might be specifically geared towards this scenario. Akamai also offers a WhitePaper that describes their take on the same concept (scaling dynamic applications):


Aside from the J2EE “EdgeComputing” support, I think these options are both geared towards scaling the serving of content (even if it’s personalized) and not scaling the processing of data (like an Azure or EC2). 

More Info:

If you’re at all interested in keeping up with these subjects you can also subscribe to www.cdnevangelist.com or www.businessofvideo.com for excellent coverage.


[1] – Tools for Amazon’s CloudFront (CloudBerry S3 Explorer & S3 Fox, a FireFox Plug-in)

[3] – Simple CDN

Lightening will syndicate content that you host vs. StormFront charging for $149/GB or $.15/MB.

[4] – Amazon CloudFront vs. CacheFly - Performance Benchmark at Sinopop.net

[5] – Cache-Control Headers

[6] – CloudFront vs. SimpleCDN (Community Commentary)

[7] – CDN Pricing Pressure

[9] – Q: Where are the edge locations used by Amazon CloudFront?

[10] – The Microsoft CDN Case Study

[11] – ListBucketResult: reply for a CDN request to null

[12] – Cloud Files (Rackspace Cloud)

[13] – Cloud Files vs. Cloud Front (Performance Reports from Pingdom)

[14] - Microsoft announces Azure pricing, details