Maarten Balliauw {blog}

ASP.NET, ASP.NET MVC, Windows Azure, PHP, ...

NAVIGATION - SEARCH

Microsoft Azure cloud plugin for TeamCity (dabbling in Java code)

If you follow me on Twitter, you may have seen me in several stages of anger at Java. After two weeks of learning, experimenting, coding and even getting it all to compile, I’m proud to announce an inital very early preview of my Microsoft Azure cloud plugin for TeamCity.

This plugin provides Microsoft Azure cloud support for TeamCity. By configuring a Microsoft Azure cloud in TeamCity, a set of known virtual build agents can be started and stopped on demand by the TeamCity server so we can benefit from Microsoft Azure’s cost model (a stopped VM is almost free) and scaling model (only start new instances when we need them).

Curious to try it? Make sure you know it is all still very early alpha version software so use with caution. I wanted to get an early preview out to gather some comments on it. Here are the installation steps:

  • Download the plugin ZIP file from the latest GitHub release.
  • Copy it to the TeamCity plugins folder
  • Restart TeamCity server and verify the plugin was installed from Administration | Plugins List

Creating a new cloud profile

From TeamCity’s Administration | Agent Cloud, we can create a new cloud configuration making use of the Microsoft Azure plugin for TeamCity. All we have to do is select “Microsoft Azure” as the cloud type and enter the requested details.

TeamCity agent on Azure VM

Once we enter some preconfigured and pre-provisioned VM names, we’re good to save and profit.

Known issue: only one Microsoft Azure cloud configuration can be created per TeamCity server because the KeyStore being configured by the plugin only stores one management certificate. Contribute a fix if you feel up for it!

What’s up?

From Agents | Cloud, we can now see which VM instances are stopped/running on Microsoft Azure.

Start stop TeamCity agent on Azure

Known issue: status of the VM displayed is not always current. The VM status is read from TeamCity's last known status, not from Microsoft Azure. Again, contribute a fix if you feel up for it.

What is there to come?

That’s pretty much it for now. I told you, it’s early. In my ideal world, there should also be a possibility to launch VM instances from a predefined image and destroy them when no longer needed. I also would love to convert it all to Kotlin as I still don’t like Java as a language and Kotlin looks really nice. ANd ideally, the crude UI I did for the plugin should be much nicer too.

Happy building in the cloud!

Windows Azure Storage magic with Shared Access Signatures

When building cloud applications on Windows Azure, it’s always a good thing to delegate as much work to specialized services as possible. File downloads would be one good example: these can be streamed directly from Windows Azure blob storage to your client, without having to pass a web application hosted on Windows Azure Cloud Services or Web Sites. Why occupy the web server with copying data from a request stream to a response stream? Let blob storage handle it!

When thinking this through there may be some issues you may think of. Here are a few:

  • How can I keep this blob secure? I don’t want to give everyone access to it!
  • How can I keep the download URL on my web server, track the number of downloads (or enforce other security rules) and still benefit from offloading the download to blob storage?
  • How can the blob be stored in a way that is clear to my application (e.g. a customer ID or something), yet give it a friendly name when downloading?

Let’s answer these!

Meet Shared Access Signatures

Keeping blobs secure is pretty easy on Windows Azure Blob Storage, but it’s also sort of an all-or-nothing story… Either you make all blobs in a container private, or you make them public.

Not to worry though! Using Shared Access Signatures it is possible to grant temporary privileges on a blob, for read and write access. Here’s a code snippet that will grant read access to the blob named helloworld.txt, residing in a private container named files, during the next minute:

CloudStorageAccount account = // your storage account connection here var client = account.CreateCloudBlobClient(); var container = client.GetContainerReference("files"); var blob = container.GetBlockBlobReference("helloworld.txt"); var builder = new UriBuilder(blob.Uri); builder.Query = blob.GetSharedAccessSignature( new SharedAccessBlobPolicy { Permissions = SharedAccessBlobPermissions.Read, SharedAccessStartTime = new DateTimeOffset(DateTime.UtcNow.AddMinutes(-5)), SharedAccessExpiryTime = new DateTimeOffset(DateTime.UtcNow.AddMinutes(1) }).TrimStart('?'); var signedBlobUrl = builder.Uri;

Note I’m giving access starting 5 minutes ago, just to make sure any clock skew along the way is ignored within a reasonable time window.

There we go: our blob is secured and by passing along the signedBlobUrl to our user, he or she can start downloading our blob without having access to any other blobs at all.

Meet HTTP redirects

Shared Access Signatures are really cool, but the generated URLs are… “fugly”, they are not pretty or easy to remember. Well, there is this thing called HTTP redirects, right? Here’s an ASP.NET MVC action method that checks if the user is authenticated, queries a repository for the correct filename, generates the signed access signature and redirects us to the actual download.

[Authorize] [EnsureInvoiceAccessibleForUser] public ActionResult DownloadInvoice(string invoiceId) { // Fetch invoice var invoice = InvoiceService.RetrieveInvoice(invoiceId); if (invoice == null) { return new HttpNotFoundResult(); } // We can do other things here: track # downloads, ... // Build shared access signature CloudStorageAccount account = // your storage account connection here var client = account.CreateCloudBlobClient(); var container = client.GetContainerReference("invoices"); var blob = container.GetBlockBlobReference(invoice.CustomerId + "-" + invoice.InvoiceId); var builder = new UriBuilder(blob.Uri); builder.Query = blob.GetSharedAccessSignature( new SharedAccessBlobPolicy { Permissions = SharedAccessBlobPermissions.Read, SharedAccessStartTime = new DateTimeOffset(DateTime.UtcNow.AddMinutes(-5)), SharedAccessExpiryTime = new DateTimeOffset(DateTime.UtcNow.AddMinutes(1) }).TrimStart('?'); var signedBlobUrl = builder.Uri; // Redirect return Redirect(signedBlobUrl); }

This gives us the best of both worlds: our web application can still verify access and run some business logic on it, yet we can offload the file download to blob storage.

Meet Shared Access Signatures content disposition header

Often, storage is a technical thing where we choose technical filenames for the things we store, instead of human-readable or human-friendly file names. In the example above, users will get a very strange filename to be downloaded: the customer id + invoice id, concatenated. No .pdf file extension, nothing else either. Users may get confused by this, or have problems opening the file because teir browser will not recognize this is a PDF.

Last November, a feature was added to blob storage which enables us to let a blob be whatever we want it to be: support for setting some additional headers on a blob through the Shared Access Signature.

The following headers can be specified on-the-fly, through the shared access signature:

  • Cache-Control
  • Content-Disposition
  • Content-Encoding
  • Content-Language
  • Content-Type

Here’s how to generate a meaningful Shared Access Signature in the previous example, where we specify a human-readable filename for the resulting download, as well as a custom content type:

builder.Query = blob.GetSharedAccessSignature( new SharedAccessBlobPolicy { Permissions = SharedAccessBlobPermissions.Read, SharedAccessStartTime = new DateTimeOffset(DateTime.UtcNow.AddMinutes(-5)), SharedAccessExpiryTime = new DateTimeOffset(DateTime.UtcNow.AddMinutes(1) }, new SharedAccessBlobHeaders { ContentDisposition = "attachment; filename=" + customer.DisplayName + "-invoice-" + invoice.InvoiceId + ".pdf", ContentType = "application/pdf" }).TrimStart('?');

Note: for this feature to work, the service version for the storage account must be set to the latest one, using the DefaultServiceVersion on the blob client. Here’s an example:

CloudStorageAccount account = // your storage account connection here var client = account.CreateCloudBlobClient(); var serviceProperties = client.GetServiceProperties(); serviceProperties.DefaultServiceVersion = "2013-08-15"; client.SetServiceProperties(serviceProperties);

Combining all these techniques, we can do some analytics and business logic in our web application and offload the boring file and network I/O to blob storage.

Enjoy!

Using the Windows Azure Content Delivery Network (CDN)

CDNWith the Windows Azure Content Delivery Network (CDN) released as a preview, I thought it was a good time to write up some details about how to work with it. The CDN can be used for offloading content to a globally distributed network of servers, ensuring faster throughput to your end users.

Note: this is a modified and updated version of my article at ACloudyPlace.com roughly two years ago. I have added information on how to work with ASP.NET MVC bundling and the Windows Azure CDN, updated screenshots and so on.

Reasons for using a CDN

There are a number of reasons to use a CDN. One of the obvious reasons lies in the nature of the CDN itself: a CDN is globally distributed and caches static content on edge nodes, closer to the end user. If a user accesses your web application and some of the files are cached on the CDN, the end user will download those files directly from the CDN, experiencing less latency in their request.

Windows Azure CDN graphically

Another reason for using the CDN is throughput. If you look at a typical webpage, about 20% of it is HTML which was dynamically rendered based on the user’s request. The other 80% goes to static files like images, CSS, JavaScript and so forth. Your server has to read those static files from disk and write them on the response stream, both actions which take away some of the resources available on your virtual machine. By moving static content to the CDN, your virtual machine will have more capacity available for generating dynamic content.

Enabling the Windows Azure CDN

The Windows Azure CDN is built for two services that are available in your subscription: storage and cloud services. The easiest way to get started with the CDN is by using the Windows Azure Management Portal. From the New menu at the bottom, select App Services | CDN | Quick Create.

Enabling Windows Azure CDN

From the dropdown that is shown, select either a storage account or a cloud service which will serve as the source of our CDN edge data. After clicking Create, the CDN will be initialized. This may take up to 60 minutes because the settings you’ve just applied may take that long to propagate to all CDN edge locations globally (over 24 was the last number I read). Your CDN will be assigned a URL in the form of .vo.msecnd.net">http://<id>.vo.msecnd.net.

Once the CDN endpoint is created, there are some options that can be managed. Currently they are somewhat limited but I’m pretty sure this will expand. For now, you can for example assign a custom domain name to the CDN by clicking the “Manage Domains” button in the toolbar.

Manage the Windows Azure CDN - Add custom domain

Note that the CDN works using HTTP by default, but HTTPS is supported as well and can be enabled through the management portal. Unfortunately, SSL is using a certificate that Microsoft provides and there’s currently no option to use your own, making it hard to use a custom domain name and HTTPS.

Serving blob storage content through the CDN

Let’s start and offload our static content (CSS, images, JavaScript) to the Windows Azure CDN using a storage account as the source for CDN content. In an ASP.NET MVC project, edit the _Layout.cshtml view. Instead of using the bundles for CSS and scripts, let’s include them manually from a URL hosted on your newly created CDN:

1 <!DOCTYPE html> 2 <html> 3 <head> 4 <title>@ViewBag.Title</title> 5 <link href="http://az172665.vo.msecnd.net/static/Content/Site.css" rel="stylesheet" type="text/css" /> 6 <script src="http://az172665.vo.msecnd.net/static/Scripts/jquery-1.8.2.min.js" type="text/javascript"></script> 7 </head> 8 <!-- more HTML --> 9 </html>

Note that the CDN URL includes a reference to a folder named “static”.

If you now run this application, you’ll find no CSS or JavaScript applied. The reason for this is obvious: we have specified the URL to our CDN but haven’t uploaded any files to our storage account backing the CDN.

Where are our styles?

Uploading files to the CDN is easy. All you need is a public blob container and some blobs hosted in there. You can use tools like Cerebrata’s Cloud Storage Studio or upload the files from code. For example, I’ve created an action method taking care of uploading static content for me:

1 [HttpPost, ActionName("Synchronize")] 2 public ActionResult Synchronize_Post() 3 { 4 var account = CloudStorageAccount.Parse( 5 ConfigurationManager.AppSettings["StorageConnectionString"]); 6 var client = account.CreateCloudBlobClient(); 7 8 var container = client.GetContainerReference("static"); 9 container.CreateIfNotExist(); 10 container.SetPermissions( 11 new BlobContainerPermissions { 12 PublicAccess = BlobContainerPublicAccessType.Blob }); 13 14 var approot = HostingEnvironment.MapPath("~/"); 15 var files = new List<string>(); 16 files.AddRange(Directory.EnumerateFiles( 17 HostingEnvironment.MapPath("~/Content"), "*", SearchOption.AllDirectories)); 18 files.AddRange(Directory.EnumerateFiles( 19 HostingEnvironment.MapPath("~/Scripts"), "*", SearchOption.AllDirectories)); 20 21 foreach (var file in files) 22 { 23 var contentType = "application/octet-stream"; 24 switch (Path.GetExtension(file)) 25 { 26 case "png": contentType = "image/png"; break; 27 case "css": contentType = "text/css"; break; 28 case "js": contentType = "text/javascript"; break; 29 } 30 31 var blob = container.GetBlobReference(file.Replace(approot, "")); 32 blob.Properties.ContentType = contentType; 33 blob.Properties.CacheControl = "public, max-age=3600"; 34 blob.UploadFile(file); 35 blob.SetProperties(); 36 } 37 38 ViewBag.Message = "Contents have been synchronized with the CDN."; 39 40 return View(); 41 }

There are two very important lines of code in there. The first one, container.SetPermissions, ensures that the blob storage container we’re uploading to allows public access. The Windows Azure CDN can only cache blobs stored in public containers.

The second important line of code, blob.Properties.CacheControl, is more interesting. How does the Windows Azure CDN know how long a blob should be cached on each edge node? By default, each blob will be cached for roughly 72 hours. This has some important consequences. First, you cannot invalidate the cache and have to wait for content expiration to occur. Second, the CDN will possibly refresh your blob every 72 hours.

As a general best practice, make sure that you specify the Cache-Control HTTP header for every blob you want to have cached on the CDN. If you want to have the possibility to update content every hour, make sure you specify a low TTL of, say, 3600 seconds. If you want less traffic to occur between the CDN and your storage account, specify a longer TTL of a few days or even a few weeks.

Another best practice is to address CDN URLs using a version number. Since the CDN can create a separate cache of a blob based on the query string, appending a version number to the URL may make it easier to refresh contents in the CDN based on the version of your application. For example, main.css?v1 and main.css?v2 may return different versions of main.css cached on the CDN edge node. Do note that the query string support is opt-in and should be enabled through the management portal. Here’s a quick code snippet which appends the AssemblyVersion to the CDN URLs to version content based on the deployed application version:

1 @{ 2 var version = System.Reflection.Assembly.GetAssembly( 3 typeof(WindowsAzureCdn.Web.Controllers.HomeController)) 4 .GetName().Version.ToString(); 5 } 6 <!DOCTYPE html> 7 <html> 8 <head> 9 <title>@ViewBag.Title</title> 10 <link href="http://az172729.vo.msecnd.net/static/Content/Site.css?@version" rel="stylesheet" type="text/css" /> 11 <script src="http://az172729.vo.msecnd.net/static/Scripts/jquery-1.8.2.min.js?@version" type="text/javascript"></script> 12 </head> 13 <!-- more HTML --> 14 </html>

Using cloud services with the CDN

So far we’ve seen how you can offload static content to the Windows Azure CDN. We can upload blobs to a storage account and have them cached on different edge nodes around the globe. Did you know you can also use your cloud service as a source for files cached on the CDN? The only thing to do is, again, go to the Windows Azure Management Portal and ensure the CDN is enabled for the cloud service you want to use.

Serving static content through the CDN

The main difference with using a storage account as the source for the CDN is that the CDN will look into the /cdn/* folder on your cloud service to retrieve its contents. There are two options for doing this: either moving static content to the /cdn folder, or using IIS URL rewriting to “fake” a /cdn folder.

When using ASP.NET MVC’s bundling features, we’ll have to modify the bundle configuration in BundleConfig.cs. First, we’ll have to set bundle.EnableCdn to true. Next, we’ll have to provide the URL to the CDN version of our bundles. Here’s a snippet which does just that for the Content/css bundle. We’re still working with a version number to make sure we can update the CDN contents for every deployment of our application.

1 var version = System.Reflection.Assembly.GetAssembly(typeof(BundleConfig)).GetName().Version.ToString(); 2 var cdnUrl = "http://az170459.vo.msecnd.net/{0}?" + version; 3 4 bundles.UseCdn = true; 5 bundles.Add(new StyleBundle("~/Content/css", string.Format(cdnUrl, "Content/css")).Include("~/Content/site.css"));

Note that this time, the CDN URL does not include any reference to a blob container.

Whether you are using bundling or not, the trick will be to request URLs straight from the CDN instead of from your server to be able to make use of the CDN.

Exposing static content to the CDN with IIS URL rewriting

The Windows Azure CDN only looks at the /cdn folder as a source of files to cache. This means that if you simply copy your static content into the /cdn folder, you’re finished. Your web application and the CDN will play happily together. But this means the static content really has to be static. In the previous example of using ASP.NET MVC bundling, our static “bundles” aren’t really static…

An alternative to copying static content to a /cdn folder explicitly is to use IIS URL rewriting. IIS URL rewriting is enabled on Windows Azure by default and can be configured to translate a /cdn URL to a / URL. For example, if the CDN requests the /cdn/Content/css bundle, IIS URL rewriting will simply serve the /Content/css bundle leaving you with no additional work.

To configure IIS URL rewriting, add a <rewrite> section under the <system.webServer> section in Web.config:

1 <system.webServer> 2 <!-- More settings --> 3 4 <rewrite> 5 <rules> 6 <rule name="RewriteIncomingCdnRequest" stopProcessing="true"> 7 <match url="^cdn/(.*)$" /> 8 <action type="Rewrite" url="{R:1}" /> 9 </rule> 10 </rules> 11 </rewrite> 12 </system.webServer>

As a side note, you can also configure an outbound rule in IIS URL rewriting to automatically modify your HTML into using the Windows Azure CDN. Do know that this option is only supported when not using dynamic content compression and adds additional workload to your web server due to having to parse and modify your outgoing HTML.

Serving dynamic content through the CDN

Some dynamic content is static in a sense. For example, generating an image on the server or generating a PDF report based on the same inputs. Why would you generate those files over and over again? This kind of content is a perfect candidate to cache on the CDN as well!

Imagine you have an ASP.NET MVC action method which generates an image based on a given string. For every different string the output would be different, however if someone uses the same input string the image being generated would be exactly the same.

As an example, we’ll be using this action method in a view to display the page title as an image. Here’s the view’s Razor code:

1 @{ 2 ViewBag.Title = "Home Page"; 3 } 4 5 <h2><img src="/Home/GenerateImage/@ViewBag.Message" alt="@ViewBag.Message" /></h2> 6 <p> 7 To learn more about ASP.NET MVC visit <a href="http://asp.net/mvc" title="ASP.NET MVC Website">http://asp.net/mvc</a>. 8 </p>

In the previous section, we’ve seen how an IIS rewrite rule can map all incoming requests from the CDN. The same rule can be applied here: if the CDN requests /cdn/Home/GenerateImage/Welcome, IIS will rewrite this to /Home/GenerateImage/Welcome and render the image once and cache it on the CDN from then on.

As mentioned earlier, a best practice is to specify the Cache-Control HTTP header. This can be done in our action method by using the [OutputCache] attribute, specifying the time-to-live in seconds:

1 [OutputCache(VaryByParam = "*", Duration = 3600, Location = OutputCacheLocation.Downstream)] 2 public ActionResult GenerateImage(string id) 3 { 4 // ... generate image ... 5 6 return File(image, "image/png"); 7 }

We would now only have to generate this image once for every different string requested. The Windows Azure CDN will take care of all intermediate caching.

Conclusion

The Windows Azure CDN is one of the building blocks to create fault-tolerant, reliable and fast applications running on Windows Azure. By caching static content on the CDN, the web server has more resources available to process other requests. Next to that, users will experience faster loading of your applications because content is delivered from a server closer to their location.

Enjoy!

An autoscaling build farm using TeamCity and Windows Azure

Autoscaling... myself!Cloud computing is often referred to as a cost saver due to its billing models. If we can move workloads that are seasonal to the cloud, cost reduction is something that will come. No matter if it’s really “seasonal seasonal” (e.g. a temporary high workload around the holidays) or “daily seasonal” where workloads are different depending on the time of day, these workloads have written cloud all over them.

A workload that may be seasonal is the workload done by build servers. Take TeamCity for example. A TeamCity server instruments a pool of build agents that are either idle or compiling source code into binaries. Depending on how your team is structured and when people work, there is a big chance that pool of build agents is doing nothing for several hours every day, except incurring cost. What if we could move the build agents to a platform like Windows Azure and have them autoscale, depending on the actual load on the build farm?

Creating a build agent virtual machine

The first step in setting this brilliant scheme in motion is to set up a build agent virtual machine. We can select any virtual machine image we want for our build agent, even upload our own vhd’s if needed. I'm selecting a Windows Server 2012 image here but if you need a different OS for your build agent you can select that instead.

During the creation of this build agent, there is nothing special we should do. We can select a small/medium/large/extra large instance, give ourselves an administrator password and so on. The only important step here is that we set up an endpoint for the TeamCity build agent, listening on TCP port 9090.

Open load balancer endpoint

Once the machine is started, we will have to install all required prerequisites for our build agent. We can connect using remote desktop (or SSH if it’s a Linux machine). On the machine I have here, I installed all .NET framework versions to ensure I'm able to build .NET projects. On a build machine for Java, we would install the correct runtimes and JDK's for our projects. Anything, really, if it is needed for the sources we’ll be building. On Windows, I typically use Web Platform Installer and Chocolatey to get this done as automated as possible.

Web Platform Installer in action

Installing TeamCity build agent

In order for our build agent to communicate with the TeamCity server, we have to install the build agent. We can do this by navigating to our TeamCity server from within the virtual machine and use the Install Build Agents link from the Agents page.

Installing build agent

On a Windows server, we can use the Windows Installer but we can also use Java Web Start or even simply extract a ZIP file. This last option can be useful on a Linux machine, for example.

Installing the build agent is pretty much a next, next, finish operation. The only important thing is that we run the agent as a Windows service (or have it automatically start at boot time on other operating systems). We also want to specify the URL to our TeamCity server as well as the port on which the build agent will listen for incoming data from TeamCity. Note that this port should be the one opened in the load balancer earlier, in the case of this machine port 9090.

Specify port and server

Before starting the build agent, make sure the local firewall allows incoming connections. Through the Windows firewall, allow incoming connections for port 9090 (and while we’re at it, for a range of ports so we can easily clone this machine and not care about the firewall anymore).

Windows Firewall configuration

If we now start the build agent service, it should connect to our TeamCity server. Under the agents tab, we should be seeing a new unauthorized agent popping up. If that works, we’re good to go with our build agent farm.

A new unauthorized build agent shows up

Don’t shut down the machine just yet, we still need to prepare it for creating a build agent image.

Creating a build agent image

While still connected through remote desktop, open a command prompt and run the sysprep /generalize command from the c:\windows\system32\sysprep folder. On Linux, there’s a similar option in the Windows Azure agent. Sysprep ensures the machine can be cloned into a new machine, getting its own settings like a hostname and IP address. A non-sysprepped machine can thus never be cloned.

Sysprep our machine

Once finished, our RDP connection should be gone and our machine can be shutdown in the Windows Azure Management Portal. In fact, it must be shut down. Once that is done, we can use the Capture button and transform our virtual machine into a template we can create new virtual machines from.Capturing a virtual machine image

The capturing process will take a couple of minutes and results in having no more build server virtual machine to be found in the Windows Azure Management Portal. Is that bad? No, we can now start cloning the machine and create multiple, all having the exact same configuration and components installed.

Setting up multiple build agent machines

The next thing we want to have is multiple build agent machines. From the Windows Azure Management portal, we can create them. Not based on a platform image but using the image created during the previous step.

Using a virtual machine image as the base

The virtual machine configuration can be whatever we want. Do we want extra small instances or extra large? It’s all up to us and our credit card. On the next page, we have to specify some more important details. First, we have to select or create a cloud service. This will be the DNS host name under which all of our build agents are going to live. We also have to specify the affinity group, in essence a setting telling Windows Azure to never unplug power or networking for all machines in this group at the same time.

Selecting cloud service and availability set

We will be creating a couple of machines, so it’s important to get the next page right. Since all our machines will share the same hostname and IP address to the outside world, our build agents have to listen on different TCP ports. Make sure that the first agent maps port 9090 to port 9090, the second one 9091 to 9091 and so on. Not doing this will mess with your mind afterwards when troubleshooting.

Endpoint configuration

Finish the process, let Windows Azure start the machine and create a new one. Important: same cloud service, same availability set and correct endpoint mappings!

Configuring the build agents

Once we have several machines running, we have to connect to them using remote desktop again. This can be done through the portal. Once in, locate the build agent configuration file (c:\BuildAgent\conf\buildAgent.properties in a default installation) and set the port number on which it listens to the one that was mapped as an external endpoint. Again, agent one will listen on port 9090, agent number two on 9091 and so on. We can also set a better name for the build agent, in my case I’ve chosen to go with “agent2”. Very inspirational and all.

Setting build agent name and port

Save and restart the build agent (or the machine). The TeamCity server should now start listing all build agents.

Windows Azure build agents for TeamCity

Make sure to authorize them all, as we want to be sure they can connect to TeamCity server later on. Once that has been done and all build agents are listed here, we can shut them all down except for one. We want to have something running, right?

Configuring autoscaling

It might have been a good question: why did we have to move all these machines under the same cloud service? The reason is simple: we wanted to autoscale our farm and this can only be done within one cloud service. From the cloud service, click the Scale tab and start configuring.

Autoscaling configuration

For this post, I’ve chosen the following values:

  • Autoscale based on CPU
  • Have a minimum of one instance, and a maximum of, well… all of them.
  • The target CPU range is 0 to 10. For a production environment this will typically be between 60 and 80 or 40 and 80, depending on the chosen machine size for the build agents. Windows Azure will trigger an autoscaling operation if we go outside this range, having a small and low range means it will trigger a scale operation much faster. Bigger numbers means slower to respond.
  • Scale up by and scale down by as well as the number of minutes to wait after the previous operations are up to you. If you want a build agent to remain online for 30 minutes after it has been started, even if CPU usage drops, set it to 30 minutes. If 2 machines should be started at once, increase that number as well.

Scaling will happen based on the average CPU percentage of all running machines. If our builds run at 100% CPU all the time on our agent we can set the thresholds a bit higher. If builds are only taking 20% we might want to run multiple agents on one machine or decrease the scaling thresholds a bit. Want to measure CPU utilization for a given build? Better read up on the TeamCity Performance Monitor then.

Putting it to the test

Putting it to the test shouldn’t be that hard. Start some builds and make sure the agent gets loaded with builds. Once we hit the CPU threshold, Windows Azure will launch a virtual machine that was previously turned off.

Windows Azure autoscaling in action

Once it has booted, we will also see it surface on the TeamCity server.

Build agents on TeamCity

Once the load goes down again, Windows Azure will shutdown machines that are below the thresholds and make sure they don’t incur costs any longer. Which is pretty impressive!

If a development team triggers a massive amount of builds during the day, Windows Azure will pretty soon scale out to a higher number of virtual build agents. And at night when there are only some builds being triggered, it will scale back to lesser instances. If, for example, we manage to run machines only for 12 hours instead of 24 hours a day, that means our build farm’s price goes down by half.

TeamCity’s architecture as well as the way Windows Azure works makes this cost reduction possible. It’s also fun to set up, it gives us a wide range of options (how about a Windows Server 2012 farm, a Linux farm and so on).

Enjoy!

Windows Azure Traffic Manager Explained

imageWith yesterday’s announcement on Windows Azure Traffic Manager surfacing in the management portal (as a preview), I thought it was a good moment to recap this more than 2 year old service. Windows Azure Traffic Manager allows you to control the distribution of network traffic to your Cloud Services and VMs hosted within Windows Azure.

 

 

What is Traffic Manager?

The Windows Azure Traffic Manager provides several methods of distributing internet traffic among two or more cloud services or VMs, all accessible with the same URL, in one or more Windows Azure datacenters. At its core, it is basically a distributed DNS service that knows which Windows Azure services are sitting behind the traffic manager URL and distributes requests based on three possible profiles:

  • Failover: all traffic is mapped to one Windows Azure service, unless it fails. It then directs all traffic to the failover Windows Azure service.
  • Performance: all traffic is mapped to the Windows Azure service “closest” (in routing terms) to the client requesting it. This will direct users from the US to one of the US datacenters, European users will probably end up in one of the European datacenters and Asian users, well, somewhere in the Asian datacenters.
  • Round-robin: Just distribute requests between various Windows Azure services defined in the Traffic Manager policy

Now I’ve started this post with the slightly bitchy tone that “this service has been around for over two years”. And that’s true! It has been in the old management portal for ages and hasn’t since left the preview stage. However don’t think nothing happened with this service: next to using Traffic Manager for cloud services, we now can also use it for distributing traffic across VM’s. Next to distributing traffic over datacenters for cloud services, we can now do this for VMs as well. What about a SharePoint farm deployed in multiple datacenters, using Traffic Manager to distribute traffic geographically?

Why should I care?

We’ve seen it before: clouds being down. Amazon EC2, Google, Windows Azure, … They all have had their glitches. With any cloud going down, whether completely or partially, it seems a lot of websites “in the cloud” are down at that time. Most comments you read on Twitter at those times are along the lines of “outrageous!” and “don’t go cloud!”. While I understand these comments, I think they are wrong. These “clouds” can fail. They are even designed to fail, and often provide components and services that allow you to cope with these failures. You just have to expect failure at some point in time and build it into your application.

Yes, I just told you to expect failure when going to the cloud. But don’t consider a failing cloud a bad cloud or a cloud that is down. For your application, a “failing” cloud or server or database should be nothing more than a scaling operation. The only thing is: it’s scaling down to zero. If you design your application so that it can scale out, you should also plan for scaling “in”, eventually to zero. Use different availability zones on Amazon, and if you’re a Windows Azure user you are protected by fault domains within the datacenter, and Traffic Manager can save your behind cross-datacenter. Use it!

 

My thoughts on Traffic Manager

Let’s come back to that “2 year old service”. Don’t let that or the fact that is “is still a preview” hold you back from using Traffic Manager. Our MyGet web application is making use of it since it was first introduced. While in the beginning we used it for performance reasons (routing US traffic to a US datacenter and EU traffic to a EU datacenter), we’ve changed the strategy and are now using it as a failover to the North Europe datacenter in which nothing is deployed. The screenshot below highlights a degradation (because there indeed is no deployment in Europe North, currently).

MyGet Windows Azure Traffic Manager

But why failover to a datacenter in which no deployments are done? Well, because if West Europe datacenter would fail, we can simply spin up a new deployment in North Europe. Yes, there will be some downtime, but the last thing we want to have in such situation is downtime from DNS propagation taking too long. Now we simply map www.myget.org to our Traffic Manager domain and whenever we need to switch, Traffic Manager takes care of the DNS part.

In general, Traffic Manager has probably been the most stable service in the Windows Azure platform. I haven’t experienced any issues so far with Traffic Manager over more than two years, preview mode or not.

Enjoy!

Update: Alexandre Brisebois, a colleague MVP, has some additional insights to share.

Throttling ASP.NET Web API calls

Many API’s out there, such as GitHub’s API, have a concept called “rate limiting” or “throttling” in place. Rate limiting is used to prevent clients from issuing too many requests over a short amount of time to your API. For example, we can limit anonymous API clients to a maximum of 60 requests per hour whereas we can allow more requests to authenticated clients. But how can we implement this?

Intercepting API calls to enforce throttling

Just like ASP.NET MVC, ASP.NET Web API allows us to write action filters. An action filter is an attribute that you can apply to a controller action, an entire controller and even to all controllers in a project. The attribute modifies the way in which the action is executed by intercepting calls to it. Sound like a great approach, right?

Well… yes! Implementing throttling as an action filter would make sense, although in my opinion it has some disadvantages:

  • We have to implement it as an IAuthorizationFilter to make sure it hooks into the pipeline before most other action filters. This feels kind of dirty but it would do the trick as throttling is some sort of “authorization” to make a number of requests to the API.
  • It gets executed quite late in the overall ASP.NET Web API pipeline. While not a big problem, perhaps we want to skip executing certain portions of code whenever throttling occurs.

So while it makes sense to implement throttling as an action filter, I would prefer plugging it earlier in the pipeline. Luckily for us, ASP.NET Web API also provides the concept of message handlers. They accept an HTTP request and return an HTTP response and plug into the pipeline quite early. Here’s a sample throttling message handler:

1 public class ThrottlingHandler 2 : DelegatingHandler 3 { 4 protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) 5 { 6 var identifier = request.GetClientIpAddress(); 7 8 long currentRequests = 1; 9 long maxRequestsPerHour = 60; 10 11 if (HttpContext.Current.Cache[string.Format("throttling_{0}", identifier)] != null) 12 { 13 currentRequests = (long)System.Web.HttpContext.Current.Cache[string.Format("throttling_{0}", identifier)] + 1; 14 HttpContext.Current.Cache[string.Format("throttling_{0}", identifier)] = currentRequests; 15 } 16 else 17 { 18 HttpContext.Current.Cache.Add(string.Format("throttling_{0}", identifier), currentRequests, 19 null, Cache.NoAbsoluteExpiration, TimeSpan.FromHours(1), 20 CacheItemPriority.Low, null); 21 } 22 23 Task<HttpResponseMessage> response = null; 24 if (currentRequests > maxRequestsPerHour) 25 { 26 response = CreateResponse(request, HttpStatusCode.Conflict, "You are being throttled."); 27 } 28 else 29 { 30 response = base.SendAsync(request, cancellationToken); 31 } 32 33 return response; 34 } 35 36 protected Task<HttpResponseMessage> CreateResponse(HttpRequestMessage request, HttpStatusCode statusCode, string message) 37 { 38 var tsc = new TaskCompletionSource<HttpResponseMessage>(); 39 var response = request.CreateResponse(statusCode); 40 response.ReasonPhrase = message; 41 response.Content = new StringContent(message); 42 tsc.SetResult(response); 43 return tsc.Task; 44 } 45 }

We have to register it as well, which we can do when our application starts:

1 config.MessageHandlers.Add(new ThrottlingHandler());

The throttling handler above isn’t ideal. It’s not very extensible nor does it allow scaling out on a web farm. And it’s bound to being hosted in ASP.NET on IIS. It’s bad! Since there’s already a great project called WebApiContrib, I decided to contribute a better throttling handler to it.

Using the WebApiContrib ThrottlingHandler

The easiest way of using the ThrottlingHandler is by registering it using simple parameters like the following, which throttles every user at 60 requests per hour:

1 config.MessageHandlers.Add(new ThrottlingHandler( 2 new InMemoryThrottleStore(), 3 id => 60, 4 TimeSpan.FromHours(1)));

The IThrottleStore interface stores id + current number of requests. There’s only an in-memory store available but you can easily extend it to write this in a distributed cache or a database.

What’s interesting is we can change how our ThrottlingHandler behaves quite easily. Let’s give a specific IP address a better rate limit:

1 config.MessageHandlers.Add(new ThrottlingHandler( 2 new InMemoryThrottleStore(), 3 id => 4 { 5 if (id == "10.0.0.1") 6 { 7 return 5000; 8 } 9 return 60; 10 }, 11 TimeSpan.FromHours(1)));

Wait… Are you telling me this is all IP based? Well yes, by default. But overriding the ThrottlingHandler allows you to do funky things! Here’s a wireframe:

1 public class MyThrottlingHandler : ThrottlingHandler 2 { 3 // ... 4 5 protected override string GetUserIdentifier(HttpRequestMessage request) 6 { 7 // your user id generation logic here 8 } 9 }

By implementing the GetUserIdentifier method, we can for example return an IP address for unauthenticated users and their username for authenticated users. We can then decide on the throttling quota based on username.

Once using it, the ThrottlingHandler will inject two HTTP headers in every response, informing the client about the rate limit:

image

Enjoy! And do checkout WebApiContrib, it contains almost all extensions to ASP.NET Web API you will ever need!

Storing user uploads in Windows Azure blob storage

On one of the mailing lists I follow, an interesting question came up: “We want to write a VSTO plugin for Outlook which copies attachments to blob storage. What’s the best way to do this? What about security?”. Shortly thereafter, an answer came around: “That can be done directly from the client. And storage credentials can be encrypted for use in your VSTO plugin.”

While that’s certainly a solution to the problem, it’s not the best. Let’s try and answer…

What’s the best way to uploads data to blob storage directly from the client?

The first solution that comes to mind is implementing the following flow: the client authenticates and uploads data to your service which then stores the upload on blob storage.

Upload data to blob storage

While that is in fact a valid solution, think about the following: you are creating an expensive layer in your application that just sits there copying data from one network connection to another. If you have to scale this solution, you will have to scale out the service layer in between. If you want redundancy, you need at least two machines for doing this simple copy operation… A better approach would be one where the client authenticates with your service and then uploads the data directly to blob storage.

Upload data to blob storage using shared access signature

This approach allows you to have a “cheap” service layer: it can even run on the free version of Windows Azure Web Sites if you have a low traffic volume. You don’t have to scale out the service layer once your number of clients grows (at least, not for the uploading scenario).But how would you handle uploading to blob storage from a security point of view…

What about security? Shared access signatures!

The first suggested answer on the mailing list was this: “(…) storage credentials can be encrypted for use in your VSTO plugin.” That’s true, but you only have 2 access keys to storage. It’s like giving the master key of your house to someone you don’t know. It’s encrypted, sure, but still, the master key is at the client and that’s a potential risk. The solution? Using a shared access signature!

Shared access signatures (SAS) allow us to separate the code that signs a request from the code that executes it. It basically is a set of query string parameters attached to a blob (or container!) URL that serves as the authentication ticket to blob storage. Of course, these parameters are signed using the real storage access key, so that no-one can change this signature without knowing the master key. And that’s the scenario we want to support…

On the service side, the place where you’ll be authenticating your user, you can create a Web API method (or ASMX or WCF or whatever you feel like) similar to this one:

public class UploadController : ApiController { [Authorize] public string Put(string fileName) { var account = CloudStorageAccount.DevelopmentStorageAccount; var blobClient = account.CreateCloudBlobClient(); var blobContainer = blobClient.GetContainerReference("uploads"); blobContainer.CreateIfNotExists(); var blob = blobContainer.GetBlockBlobReference("customer1-" + fileName); var uriBuilder = new UriBuilder(blob.Uri); uriBuilder.Query = blob.GetSharedAccessSignature(new SharedAccessBlobPolicy { Permissions = SharedAccessBlobPermissions.Write, SharedAccessStartTime = DateTime.UtcNow, SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(5) }).Substring(1); return uriBuilder.ToString(); } }

This method does a couple of things:

  • Authenticate the client using your authentication mechanism
  • Create a blob reference (not the actual blob, just a URL)
  • Signs the blob URL with write access, allowed from now until now + 5 minutes. That should give the client 5 minutes to start the upload.

On the client side, in our VSTO plugin, the only thing to do now is call this method with a filename. The web service will create a shared access signature to a non-existing blob and returns that to the client. The VSTO plugin can then use this signed blob URL to perform the upload:

Uri url = new Uri("http://...../uploads/customer1-test.txt?sv=2012-02-12&st=2012-12-18T08%3A11%3A57Z&se=2012-12-18T08%3A16%3A57Z&sr=b&sp=w&sig=Rb5sHlwRAJp7mELGBiog%2F1t0qYcdA9glaJGryFocj88%3D"); var blob = new CloudBlockBlob(url); blob.Properties.ContentType = "test/plain"; using (var data = new MemoryStream( Encoding.UTF8.GetBytes("Hello, world!"))) { blob.UploadFromStream(data); }

Easy, secure and scalable. Enjoy!

Sending e-mail from Windows Azure

Note: this blog post used to be an article for the Windows Azure Roadtrip website. Since that one no longer exists, I decided to post the articles on my blog as well. Find the source code for this post here: 04 SendingEmailsFromTheCloud.zip (922.27 kb).

When a user subscribes, you send him a thank-you e-mail. When his account expires, you send him a warning message containing a link to purchase a new subscription. When he places an order, you send him an order confirmation. I think you get the picture: a fairly common scenario in almost any application is sending out e-mails.

Now, why would I spend a blog post on sending out an e-mail? Well, for two reasons. First, Windows Azure doesn’t have a built-in mail server. No reason to panic! I’ll explain why and how to overcome this in a minute. Second, I want to demonstrate a technique that will make your applications a lot more scalable and resilient to errors.

E-mail services for Windows Azure

Windows Azure doesn’t have a built-in mail server. And for good reasons: if you deploy your application to Windows Azure, it will be hosted on an IP address which previously belonged to someone else. It’s a shared platform, after all. Now, what if some obscure person used Windows Azure to send out a number of spam messages? Chances are your newly-acquired IP address has already been blacklisted, and any e-mail you send from it ends up in people’s spam filters.

All that is fine, but of course, you still want to send out e-mails. If you have your own SMTP server, you can simply configure your .NET application hosted on Windows Azure to make use of your own mail server. There are a number of so-called SMTP relay services out there as well. Even the Belgian hosters like Combell, Hostbasket or OVH offer this service. Microsoft has also partnered with SendGrid to have an officially-supported service for sending out e-mails too. Windows Azure customers receive a special offer of 25,000 free e-mails per month from them. It’s a great service to get started with sending e-mails from your applications: after all, you’ll be able to send 25,000 free e-mails every month. I’ll be using SendGrid in this blog post.

Asynchronous operations

I said earlier that I wanted to show you two things: sending e-mails and building scalable and fault-resilient applications. This can be done using asynchronous operations. No, I don’t mean AJAX. What I mean is that you should create loosely-coupled applications.

Imagine that I was to send out an e-mail whenever a user registers. If the mail server is not available for that millisecond when I want to use it, the send fails and I might have to show an error message to my user (or even worse: a YSOD). Why would that happen? Because my application logic expects that it can communicate with a mail server in a synchronous manner.

clip_image002

Now let’s remove that expectation. If we introduce a queue in between both services, the front-end can keep accepting registrations even when the mail server is down. And when it’s back up, the queue will be processed and e-mails will be sent out. Also, if you experience high loads, simply scale out the front-end and add more servers there. More e-mail messages will end up in the queue, but they are guaranteed to be processed in the future at the pace of the mail server. With synchronous communication, the mail service would probably experience high loads or even go down when a large number of front-end servers is added.

clip_image004

Show me the code!

Let’s combine the two approaches described earlier in this post: sending out e-mails over an asynchronous service. Before we start, make sure you have a SendGrid account (free!). Next, familiarise yourself with Windows Azure storage queues using this simple tutorial.

In a fresh Windows Azure web role, I’ve created a quick-and-dirty user registration form:

clip_image006

Nothing fancy, just a form that takes a post to an ASP.NET MVC action method. This action method stores the user in a database and adds a message to a queue named emailconfirmation. Here’s the code for this action method:

[HttpPost, ActionName("Register")] public ActionResult Register_Post(RegistrationModel model) { if (ModelState.IsValid) { // ... store the user in the database ... // serialize the model var serializer = new JavaScriptSerializer(); var modelAsString = serializer.Serialize(model); // emailconfirmation queue var account = CloudStorageAccount.FromConfigurationSetting("StorageConnection"); var queueClient = account.CreateCloudQueueClient(); var queue = queueClient.GetQueueReference("emailconfirmation"); queue.CreateIfNotExist(); queue.AddMessage(new CloudQueueMessage(modelAsString)); return RedirectToAction("Thanks"); } return View(model); }

As you can see, it’s not difficult to work with queues. You just enter some data in a message and push it onto the queue. In the code above, I’ve serialized the registration model containing my newly-created user’s name and e-mail to the JSON format (using JavaScriptSerializer). A message can contain binary or textual data: as long as it’s less than 64 KB in data size, the message can be added to a queue.

Being cheap with Web Workers

When boning up on Windows Azure, you’ve probably read about so-called Worker Roles, virtual machines that are able to run your back-end code. The problem I see with Worker Roles is that they are expensive to start with. If your application has 100 users and your back-end load is low, why would you reserve an entire server to run that back-end code? The cloud and Windows Azure are all about scalability and using a “Web Worker” will be much more cost-efficient to start with - until you have a large user base, that is.

A Worker Role consists of a class that inherits the RoleEntryPoint class. It looks something along these lines:

public class WebRole : RoleEntryPoint { public override bool OnStart() { // ... return base.OnStart(); } public override void Run() { while (true) { // ... } } }

You can run this same code in a Web Role too! And that’s what I mean by a Web Worker: by simply adding this class which inherits RoleEntryPoint to your Web Role, it will act as both a Web and Worker role in one machine.

Call me cheap, but I think this is a nice hidden gem. The best part about this is that whenever your application’s load requires a separate virtual machine running the worker role code, you can simply drag and drop this file from the Web Role to the Worker Role and scale out your application as it grows.

Did you send that e-mail already?

Now that we have a pending e-mail message in our queue and we know we can reduce costs using a web worker, let’s get our e-mail across the wire. First of all, using SendGrid as our external e-mail service offers us a giant development speed advantage, since they are distributing their API client as a NuGet package. In Visual Studio, right-click your web role project and click the “Library Package Manager” menu. In the dialog (shown below), search for Sendgrid and install the package found. This will take a couple of seconds: it will download the SendGrid API client and will add an assembly reference to your project.

clip_image008

All that’s left to do is write the code that reads out the messages from the queue and sends the e-mails using SendGrid. Here’s the queue reading:

public class WebRole : RoleEntryPoint { public override bool OnStart() { CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) => { string value = ""; if (RoleEnvironment.IsAvailable) { value = RoleEnvironment.GetConfigurationSettingValue(configName); } else { value = ConfigurationManager.AppSettings[configName]; } configSetter(value); }); return base.OnStart(); } public override void Run() { // emailconfirmation queue var account = CloudStorageAccount.FromConfigurationSetting("StorageConnection"); var queueClient = account.CreateCloudQueueClient(); var queue = queueClient.GetQueueReference("emailconfirmation"); queue.CreateIfNotExist(); while (true) { var message = queue.GetMessage(); if (message != null) { // ... // mark the message as processed queue.DeleteMessage(message); } else { Thread.Sleep(TimeSpan.FromSeconds(30)); } } } }

As you can see, reading from the queue is very straightforward. You use a storage account, get a queue reference from it and then, in an infinite loop, you fetch a message from the queue. If a message is present, process it. If not, sleep for 30 seconds. On a side note: why wait 30 seconds for every poll? Well, Windows Azure will bill you per 100,000 requests to your storage account. It’s a small amount, around 0.01 cent, but it may add up quickly if this code is polling your queue continuously on an 8 core machine… Bottom line: on any cloud platform, try to architect for cost as well.

Now that we have our message, we can deserialize it and create a new e-mail that can be sent out using SendGrid:

// deserialize the model var serializer = new JavaScriptSerializer(); var model = serializer.Deserialize<RegistrationModel>(message.AsString); // create a new email object using SendGrid var email = SendGrid.GenerateInstance(); email.From = new MailAddress("maarten@example.com", "Maarten"); email.AddTo(model.Email); email.Subject = "Welcome to Maarten's Awesome Service!"; email.Html = string.Format( "<html><p>Hello {0},</p><p>Welcome to Maarten's Awesome Service!</p>" + "<p>Best regards, <br />Maarten</p></html>", model.Name); var transportInstance = REST.GetInstance(new NetworkCredential("username", "password")); transportInstance.Deliver(email); // mark the message as processed queue.DeleteMessage(message);

Sending e-mail using SendGrid is in fact getting a new e-mail message instance from the SendGrid API client, passing the e-mail details (from, to, body, etc.) on to it and handing it your SendGrid username and password upon sending.

One last thing: you notice we’re only deleting the message from the queue after processing it has succeeded. This is to ensure the message is actually processed. If for some reason the worker role crashes during processing, the message will become visible again on the queue and will be processed by a new worker role which processes this specific queue. That way, messages are never lost and always guaranteed to be processed at least once.

What PartitionKey and RowKey are for in Windows Azure Table Storage

For the past few months, I’ve been coaching a “Microsoft Student Partner” (who has a great blog on Kinect for Windows by the way!) on Windows Azure. One of the questions he recently had was around PartitionKey and RowKey in Windows Azure Table Storage. What are these for? Do I have to specify them manually? Let’s explain…

Windows Azure storage partitions

All Windows Azure storage abstractions (Blob, Table, Queue) are built upon the same stack (whitepaper here). While there’s much more to tell about it, the reason why it scales is because of its partitioning logic. Whenever you store something on Windows Azure storage, it is located on some partition in the system. Partitions are used for scale out in the system. Imagine that there’s only 3 physical machines that are used for storing data in Windows Azure storage:

Windows Azure Storage partition

Based on the size and load of a partition, partitions are fanned out across these machines. Whenever a partition gets a high load or grows in size, the Windows Azure storage management can kick in and move a partition to another machine:

Windows Azure storage partition

By doing this, Windows Azure can ensure a high throughput as well as its storage guarantees. If a partition gets busy, it’s moved to a server which can support the higher load. If it gets large, it’s moved to a location where there’s enough disk space available.

Partitions are different for every storage mechanism:

  • In blob storage, each blob is in a separate partition. This means that every blob can get the maximal throughput guaranteed by the system.
  • In queues, every queue is a separate partition.
  • In tables, it’s different: you decide how data is co-located in the system.

PartitionKey in Table Storage

In Table Storage, you have to decide on the PartitionKey yourself. In essence, you are responsible for the throughput you’ll get on your system. If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition.

Should you set the PartitionKey to the same value for every entity stored? No. You’ll end up with scaling issues at some point.
Should you set the PartitionKey to a unique value for every entity stored? No. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult. And that’s where our next concept kicks in…

RowKey in Table Storage

A RowKey in Table Storage is a very simple thing: it’s your “primary key” within a partition. PartitionKey + RowKey form the composite unique identifier for an entity. Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition.

So in essence, a RowKey is just the identifier of an entity within a partition.

PartitionKey and RowKey and performance

Before building your code, it’s a good idea to think about both properties. Don’t just assign them a guid or a random string as it does matter for performance.

The fastest way of querying? Specifying both PartitionKey and RowKey. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on RowKey within that partition.

Less fast but still fast enough will be querying by specifying PartitionKey: table storage will know which partition to query.

Less fast: querying on only RowKey. Doing this will give table storage no pointer on which partition to search in, resulting in a query that possibly spans multiple partitions, possibly multiple storage nodes as well. Wihtin a partition, searching on RowKey is still pretty fast as it’s a unique index.

Slow: searching on other properties (again, spans multiple partitions and properties).

Note that Windows Azure storage may decide to group partitions in so-called "Range partitions" - see http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx.

In order to improve query performance, think about your PartitionKey and RowKey upfront, as they are the fast way into your datasets.

Deciding on PartitionKey and RowKey

Here’s an exercise: say you want to store customers, orders and orderlines. What will you choose as the PartitionKey (PK) / RowKey (RK)?

Let’s use three tables: Customer, Order and Orderline.

An ideal setup may be this one, depending on how you want to query everything:

Customer (PK: sales region, RK: customer id) – it enables fast searches on region and on customer id
Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well)
Orderline (PK: order id, RK: order line id) – allows fast querying on both order id as well as order line id.

Of course, depending on the system you are building, the following may be a better setup:

Customer (PK: customer id, RK: display name) – it enables fast searches on customer id and display name
Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well)
Orderline (PK: order id, RK: item id) – allows fast querying on both order id as well as the item bought, of course given that one order can only contain one order line for a specific item (PK + RK should be unique)

You see? Choose them wisely, depending on your queries. And maybe an important sidenote: don’t be afraid of denormalizing your data and storing data twice in a different format, supporting more query variations.

There’s one additional “index”

That’s right! People have been asking Microsoft for a secondary index. And it’s already there… The table name itself! Take our customer – order – orderline sample again…

Having a Customer table containing all customers may be interesting to search within that data. But having an Orders table containing every order for every customer may not be the ideal solution. Maybe you want to create an order table per customer? Doing that, you can easily query the order id (it’s the table name) and within the order table, you can have more detail in PK and RK.

And there's one more: your account name. Split data over multiple storage accounts and you have yet another "partition".

Conclusion

In conclusion? Choose PartitionKey and RowKey wisely. The more meaningful to your application or business domain, the faster querying will be and the more efficient table storage will work in the long run.

Hands-on Windows Azure Services for Windows

A couple of weeks ago, Microsoft announced their Windows Azure Services for Windows Server. If you’ve ever heard about the Windows Azure Appliance (which is vaporware imho :-)), you’ll be interested to see that the Windows Azure Services for Windows Server are in fact bringing the Windows Azure Services to your datacenter. It’s still a Technical Preview, but I took the plunge and installed this on a bunch of virtual machines I had lying around. In this post, I’ll share you with some impressions, ideas, pains and speculations.

Why would you run Windows Azure Services in your own datacenter? Why not! You will make your developers happy because they have access to all services they are getting to know and getting to love. You’ll be able to provide self-service access to SQL Server, MySQL, shared hosting and virtual machines. You decide on the quota. And if you’re a server hugger like a lot of companies in Belgium: you can keep hugging your servers. I’ll elaborate more on the “why?” further in this blog post.

Note: Currently only SQL Server, MySQL, Web Sites and Virtual Machines are supported in Windows Azure Services for Windows Server. Not storage, not ACS, not Service Bus, not...

You can sign up for my “I read your blog plan” at http://cloud.balliauw.net and create your SQL Server databases on the fly! (I’ll keep this running for a couple of days, if it’s offline you’re too late). It's down.

My setup

Since I did not have enough capacity to run enough virtual machines (you need at least four!) on my machine, I decided to deploy the Windows Azure Services for Windows Server on a series of virtual machines in Windows Azure’s IaaS offering.

You will need servers for the following roles:

  • Controller node (the management portal your users will be using)
  • SQL Server (can be hosted on the controller node)
  • Storage server (can be on the cntroller node as well)

If you want to host Windows Azure Websites (shared hosting):

  • At least one load balancer node (will route HTTP(S) traffic to a frontend node)
  • At least one frontend node (will host web sites, more frontends = more websites / redundancy)
  • At least one publisher node (will serve FTP and Webdeploy)

If you want to host Virtual Machines:

  • A System Center 2012 SP1 CTP2 node (managing VM’s)
  • At least one Hyper-V server (running VM’s)

Being a true ITPro (forgot the <irony /> element there…), I decided I did not want to host those virtual machines on the public Internet. Instead, I created a Windows Azure Virtual Network. Knowing CIDR notation (<irony />), I quickly crafted the BalliauwCloud virtual network: 172.16.240.0/24.

So a private network… Then again: I wanted to be able to access some of the resources hosted in my cloud on the Internet, so I decided to open up some ports in Windows Azure’s load balancer and firewall so that my users could use the SQL Sever both internally (172.16.240.9) and externally (sql1.cloud.balliauw.net). Same with high-density shared hosting in the form of Windows Azure Websites by the way.

Being a Visio pro (no <irony /> there!), here’s the schematical overview of what I setup:

Windows Azure Services for Windows Server - Virtual Network

Nice, huh? Even nicer is my to-be diagram where I also link crating Hyper-V machines to this portal (not there yet…):

Virtual machines

My setup experience

I found the detailed step-by-step installation guide and completed the installation as described. Not a great success! The Windows Azure Websites feature requires a file share and I forgot to open up a firewall port for that. The result? A failed setup. I restarted setup and ended with 500 Internal Server Terror a couple of times. Help!

Being a Technical Preview product, there is no support for cleaning / restarting a failed setup. Luckily, someone hooked me up with the team at Microsoft who built this and thanks to Andrew (thanks, Andrew!), I was able to continue my setup.

If everything works out for your setup: enjoy! If not, here’s some troubleshooting tips:

Keep an eye on the C:\inetpub\MgmtSvc-ConfigSite\trace.txt  log file. It holds valuable information, as well as the event log (Applications and Services Log > Microsoft > Windows > Antares).

If you’re also experiencing issues and want to retry installation, here are the steps to clean your installation:

  1. On the controller node: stop services:
    net stop w3svc
    net stop WebFarmService
    net stop ResourceMetering
    net stop QuotaEnforcement
  2. In IIS Manager (inetmgr), clean up the Hosting Administration REST API service. Under site MgmtSvc-WebSites:
    - Remove IIS application HostingAdministration (just the app, NOT the site itself)
    - Remove physical files: C:\inetpub\MgmtSvc-WebSites\HostingAdministration
  3. Drop databases, and logins by running the SQL script: C:\inetpub\MgmtSvc-ConfigSite\Drop-MgmtSvcDatabases.sql
  4. (Optional, but helped in my case) Repair permissions
    PowerShell.exe -c "Add-PSSnapin WebHostingSnapin ; Set-ReadAccessToAsymmetricKeys IIS_IUSRS"
  5. Clean up registry keys by deleting the three folders under the following registry key (NOT the key itself, just the child folders):
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\IIS Extensions\Web Hosting Framework

    Delete these folders: HostingAdmin, Metering, Security
  6. Restart IIS
    net start w3svc
  7. Re-run the installation with https://localhost:30101/

Configuration

After installation comes configuration. Configuration depends on the services you want to offer. I’m greedy so I wanted to provide them all. First, I registered my SQL Server and told the Windows Azure Services for Windows Server management portal that I have about 80 GB to spare for hosting my user’s databases. I did the same with MySQL (setup is similar):

Windows Azure Services for Windows Server SQL Server

You can add more SQL Servers and even define groups. For example, if you have a SQL Server which can be used for development purposes, add that one. If you have a high-end, failover setup for production, you can add that as a separate group so that only designated users can create databases on that SQL Server cluster of yours.

For Windows Azure Web Sites, I deployed one node of every role that was required:

Windows Azure Services for Windows Server Web Sites

What I liked in this setup is that if I want to add one of these roles, the only thing required is a fresh Windows Server 2008 R2 or 2012. No need to configure the machine: the Windows Azure Services for Windows Server management portal does that for me. All I have to do as an administrator in order to grow my pool of shared resources is spin up a machine and enter the IP address. Windows Azure Services for Windows Server management portal  takes care of the installation, linking, etc.

Windows Azure Services for Windows Server - Adding a role

The final step in offering services to my users is creating at least one plan they can subscribe to. Plans define the services provided as well as the quota on these services. Here’s an example quota configuration for SQL Server in my “Cloud Basics” plan:

Windows Azure Services for Windows Server Manage plans

Plans can be private (you assign them to a user) or public (users can self-subscribe, optionally only when they have a specific access code).

End-user experience

As an end user, I can have a plan. Either I enroll myself or an administrator enrolls me. You can sign up for my “I read your blog plan” at http://cloud.balliauw.net and create your SQL Server databases on the fly! (I’ll keep this running for a couple of days, if it’s offline you’re too late).

Sign up for Windows Azure Services for Windows Server

Side note: as an administrator, you can modify this page. It’s a bunch of ASP.NET MVC .cshtml files located under C:\inetpub\MgmtSvc-TenantSite\Views.

After signing in, you’ll be given access to a portal which resembles Windows Azure’s portal. You’ll have an at-a-glance look at all services you are using and can optionally just delete your account. Here’s the initial portal:

Windows Azure Services for Windows Server customer portal

You’ll be able to manage services yourself, for example create a new SQL Server database:

Windows Azure Services for Windows Server create database

After creating a database, you can see the connection information from within the portal:

Windows Azure Services for Windows Server connection string

Just imagine you could create databases on-the-fly, whenever you need them, in your internal infrastructure. Without an administrator having to interfere. Without creating a support ticket or a formal request…

Speculations

I’m not sure if I’m supposed to disclose this information, but… The following paragraphs are based on what I can see in the installation of my “private cloud” using Windows Azure Services for Windows Server.

  • I have a suspicion that the public cloud services can enter in Windows Azure Services for Windows Server. The SQL Server database for this management portal contains various additional tables, such as a table in which SQL Azure servers can be added to a pool linked to a plan. My guess is that you’ll be able to spread users and plans between public cloud (maybe your cheap test databases can go there) and private cloud (production applications run on a SQL Server cluster in your basement).
  • The management portals are clearly build with extensibility in mind. Yes, I’ve cracked open some assemblies using ILSpy, yes I’ve opened some of the XML configuration files in there. I expect the recently announced Service Bus for Windows Server to pop up in this product as well. And who knows, maybe a nice SDK to create your own services embedded in this portal so that users can create mailboxes as they please. Or link to a VMWare cloud, I know they have management API’s.

Conclusion

I’ve opened this post with a “Why?”, let’s end it with that question. Why would you want to use this? The product was announced on Microsoft’s hosting subsite, but the product name (Windows Azure Services for Windows Server) and my experience with it so far makes me tend to think that this product is a fit for any enterprise!

You will make your developers happy because they have access to all services they are getting to know and getting to love. You’ll be able to provide self-service access to SQL Server, MySQL, shared hosting and virtual machines. You decide on the quota. You manage this. The only thing you don’t have to manage is the actual provisioning of services: users can use the self-service possibilities in Windows Azure Services for Windows Server.

Want your departments to be able to quickly setup a Wordpress or Drupal site? No problem: using Web Sites, they are up and running. And depending on the front-end role you assign them, you can even put them on internet, intranet or both. (note: this is possible throug some Powershell scripting, by default it's just one pool of servers there)

The fact that there is support for server groups (say, development servers and high-end SQL Server clusters or 8-core IIS machines running your web applications) makes it easy for administrators to grant access to specific resources while some other resources are reserved for production applications. And I suspect this will extend to the public cloud making it possible to go hybrid if you wish. Some services out there, some in your basement.

I’m keeping an eye on this one.

Note: You can sign up for my “I read your blog plan” at http://cloud.balliauw.net and create your SQL Server databases on the fly! (I’ll keep this running for a couple of days, if it’s offline you’re too late). It's down.