An autoscaling build farm using TeamCity and Windows Azure

Edit on GitHub
Autoscaling... myself!NOTE: While the content is this blog post will still work, JetBrains now has a plugin that is the recommended way of working with TeamCity and build agents on Azure. Please check this blog post to learn more about it.


Cloud computing is often referred to as a cost saver due to its billing models. If we can move workloads that are seasonal to the cloud, cost reduction is something that will come. No matter if it’s really “seasonal seasonal” (e.g. a temporary high workload around the holidays) or “daily seasonal” where workloads are different depending on the time of day, these workloads have written cloud all over them.

A workload that may be seasonal is the workload done by build servers. Take TeamCity for example. A TeamCity server instruments a pool of build agents that are either idle or compiling source code into binaries. Depending on how your team is structured and when people work, there is a big chance that pool of build agents is doing nothing for several hours every day, except incurring cost. What if we could move the build agents to a platform like Windows Azure and have them autoscale, depending on the actual load on the build farm?

Creating a build agent virtual machine

The first step in setting this brilliant scheme in motion is to set up a build agent virtual machine. We can select any virtual machine image we want for our build agent, even upload our own vhd’s if needed. I'm selecting a Windows Server 2012 image here but if you need a different OS for your build agent you can select that instead.

During the creation of this build agent, there is nothing special we should do. We can select a small/medium/large/extra large instance, give ourselves an administrator password and so on. The only important step here is that we set up an endpoint for the TeamCity build agent, listening on TCP port 9090.

Open load balancer endpoint

Once the machine is started, we will have to install all required prerequisites for our build agent. We can connect using remote desktop (or SSH if it’s a Linux machine). On the machine I have here, I installed all .NET framework versions to ensure I'm able to build .NET projects. On a build machine for Java, we would install the correct runtimes and JDK's for our projects. Anything, really, if it is needed for the sources we’ll be building. On Windows, I typically use Web Platform Installer and Chocolatey to get this done as automated as possible.

Web Platform Installer in action

Installing TeamCity build agent

In order for our build agent to communicate with the TeamCity server, we have to install the build agent. We can do this by navigating to our TeamCity server from within the virtual machine and use the Install Build Agents link from the Agents page.

Installing build agent

On a Windows server, we can use the Windows Installer but we can also use Java Web Start or even simply extract a ZIP file. This last option can be useful on a Linux machine, for example.

Installing the build agent is pretty much a next, next, finish operation. The only important thing is that we run the agent as a Windows service (or have it automatically start at boot time on other operating systems). We also want to specify the URL to our TeamCity server as well as the port on which the build agent will listen for incoming data from TeamCity. Note that this port should be the one opened in the load balancer earlier, in the case of this machine port 9090.

Specify port and server

Before starting the build agent, make sure the local firewall allows incoming connections. Through the Windows firewall, allow incoming connections for port 9090 (and while we’re at it, for a range of ports so we can easily clone this machine and not care about the firewall anymore).

Windows Firewall configuration

If we now start the build agent service, it should connect to our TeamCity server. Under the agents tab, we should be seeing a new unauthorized agent popping up. If that works, we’re good to go with our build agent farm.

A new unauthorized build agent shows up

Don’t shut down the machine just yet, we still need to prepare it for creating a build agent image.

Creating a build agent image

While still connected through remote desktop, open a command prompt and run the sysprep /generalize command from the c:\windows\system32\sysprep folder. On Linux, there’s a similar option in the Windows Azure agent. Sysprep ensures the machine can be cloned into a new machine, getting its own settings like a hostname and IP address. A non-sysprepped machine can thus never be cloned.

Sysprep our machine

Once finished, our RDP connection should be gone and our machine can be shutdown in the Windows Azure Management Portal. In fact, it must be shut down. Once that is done, we can use the Capture button and transform our virtual machine into a template we can create new virtual machines from.Capturing a virtual machine image

The capturing process will take a couple of minutes and results in having no more build server virtual machine to be found in the Windows Azure Management Portal. Is that bad? No, we can now start cloning the machine and create multiple, all having the exact same configuration and components installed.

Setting up multiple build agent machines

The next thing we want to have is multiple build agent machines. From the Windows Azure Management portal, we can create them. Not based on a platform image but using the image created during the previous step.

Using a virtual machine image as the base

The virtual machine configuration can be whatever we want. Do we want extra small instances or extra large? It’s all up to us and our credit card. On the next page, we have to specify some more important details. First, we have to select or create a cloud service. This will be the DNS host name under which all of our build agents are going to live. We also have to specify the affinity group, in essence a setting telling Windows Azure to never unplug power or networking for all machines in this group at the same time.

Selecting cloud service and availability set

We will be creating a couple of machines, so it’s important to get the next page right. Since all our machines will share the same hostname and IP address to the outside world, our build agents have to listen on different TCP ports. Make sure that the first agent maps port 9090 to port 9090, the second one 9091 to 9091 and so on. Not doing this will mess with your mind afterwards when troubleshooting.

Endpoint configuration

Finish the process, let Windows Azure start the machine and create a new one. Important: same cloud service, same availability set and correct endpoint mappings!

Configuring the build agents

Once we have several machines running, we have to connect to them using remote desktop again. This can be done through the portal. Once in, locate the build agent configuration file (c:\BuildAgent\conf\buildAgent.properties in a default installation) and set the port number on which it listens to the one that was mapped as an external endpoint. Again, agent one will listen on port 9090, agent number two on 9091 and so on. We can also set a better name for the build agent, in my case I’ve chosen to go with “agent2”. Very inspirational and all.

Setting build agent name and port

Save and restart the build agent (or the machine). The TeamCity server should now start listing all build agents.

Windows Azure build agents for TeamCity

Make sure to authorize them all, as we want to be sure they can connect to TeamCity server later on. Once that has been done and all build agents are listed here, we can shut them all down except for one. We want to have something running, right?

Configuring autoscaling

It might have been a good question: why did we have to move all these machines under the same cloud service? The reason is simple: we wanted to autoscale our farm and this can only be done within one cloud service. From the cloud service, click the Scale tab and start configuring.

Autoscaling configuration

For this post, I’ve chosen the following values:

  • Autoscale based on CPU
  • Have a minimum of one instance, and a maximum of, well… all of them.
  • The target CPU range is 0 to 10. For a production environment this will typically be between 60 and 80 or 40 and 80, depending on the chosen machine size for the build agents. Windows Azure will trigger an autoscaling operation if we go outside this range, having a small and low range means it will trigger a scale operation much faster. Bigger numbers means slower to respond.
  • Scale up by and scale down by as well as the number of minutes to wait after the previous operations are up to you. If you want a build agent to remain online for 30 minutes after it has been started, even if CPU usage drops, set it to 30 minutes. If 2 machines should be started at once, increase that number as well.

Scaling will happen based on the average CPU percentage of all running machines. If our builds run at 100% CPU all the time on our agent we can set the thresholds a bit higher. If builds are only taking 20% we might want to run multiple agents on one machine or decrease the scaling thresholds a bit. Want to measure CPU utilization for a given build? Better read up on the TeamCity Performance Monitor then.

Putting it to the test

Putting it to the test shouldn’t be that hard. Start some builds and make sure the agent gets loaded with builds. Once we hit the CPU threshold, Windows Azure will launch a virtual machine that was previously turned off.

Windows Azure autoscaling in action

Once it has booted, we will also see it surface on the TeamCity server.

Build agents on TeamCity

Once the load goes down again, Windows Azure will shutdown machines that are below the thresholds and make sure they don’t incur costs any longer. Which is pretty impressive!

If a development team triggers a massive amount of builds during the day, Windows Azure will pretty soon scale out to a higher number of virtual build agents. And at night when there are only some builds being triggered, it will scale back to lesser instances. If, for example, we manage to run machines only for 12 hours instead of 24 hours a day, that means our build farm’s price goes down by half.

TeamCity’s architecture as well as the way Windows Azure works makes this cost reduction possible. It’s also fun to set up, it gives us a wide range of options (how about a Windows Server 2012 farm, a Linux farm and so on).

Enjoy!

This is an imported post. It was imported from my old blog using an automated tool and may contain formatting errors and/or broken images.

Leave a Comment

avatar

10 responses

  1. Avatar for TomasJansson
    TomasJansson August 5th, 2013

    Nice post. I'm not sure if I got everything but let me try to understand. First you have a cloud service, then you have multiple machines associated with that cloud service that is pre-configured. When scaling those machines is turned off or on, is that right?

    If that is the case I have a follow up question. I thought that Windows Azure charged you per virtual machine you have, it doesn't matter if it is up and running or not. Is it different when using a cloud service?

  2. Avatar for Maarten Balliauw
    Maarten Balliauw August 5th, 2013

    The autoscaler indeed turns off (or on) virtual machines.

    Stopped virtual machines (in IaaS) are no longer billed for, except the storage they use (see http://weblogs.asp.net/scot.... Which means if the autoscaling scales down to one instance, you only pay one instance. From that same post, billing is by the minute.

  3. Avatar for TomasJansson
    TomasJansson August 5th, 2013

    Nice! I've missed that post. Thank you for the information.

  4. Avatar for Piers Williams
    Piers Williams August 12th, 2013

    Wooah there. I think you've overlooked the fairly significant limitation that you need to licence your build agents over and above the 3 you get with the server. So you can do 'autoscaling' to reduce paying for noop CPU hours, but you can't burst beyond your agent licence count.

  5. Avatar for Maarten Balliauw
    Maarten Balliauw August 12th, 2013

    Agreed

  6. Avatar for davidinnz
    davidinnz May 23rd, 2014

    Hi Maarten, this is a very useful article. Are you running your TeamCity Server (the one that the BuildAgents are talking to) in Azure too? Or is it on-premise? Does it make any difference?

  7. Avatar for Maarten Balliauw
    Maarten Balliauw May 23rd, 2014

    That doesn't really matter. The communication between server and agent should be possible of course. I have my TeamCity and agents running in Azure, also did a setup where the server was in Amazon EC2 and the agent in Azure.

  8. Avatar for torque
    torque October 17th, 2014

    Maarten, what size server are you using for the TeamCity Server - I'm thinking A0 Basic is sufficient (the lowest possible)

  9. Avatar for Maarten Balliauw
    Maarten Balliauw October 18th, 2014

    It depends on your build. I have builds that run on A0 Basic, for more complex builds a D-series may be better with builds on SSD. Best to test some and see which provide you with a reasonably fast build.
    Also: keep an eye on blog.jetbrains.com/teamcity, they are cooking on something better than this blog post here...

  10. Avatar for torque
    torque October 18th, 2014

    What happens if you go over 3? Does it just not use the 4th machine and never cause another burst?