Maarten Balliauw {blog}

ASP.NET MVC, Microsoft Azure, PHP, web development ...

NAVIGATION - SEARCH

Using FTP to access Windows Azure Blob Storage

A while ago, I did a blog post on creating an external facing Azure Worker Role endpoint, listening for incoming TCP connections. After doing that post, I had the idea of building a Windows Azure FTP server that served as a bridge to blob storage. Lack of time, other things to do, you name it: I did not work on that idea. Until now, that is.

Being a lazy developer, I did not start from scratch: writing an FTP server may be something that has been done before, and yes: “Binging” for “ Csharp FTP server” led me to this article on CodeGuru.com. Luckily, the author of the article had the idea of abstraction in mind: he did not build his software on top of a real file system, no, he did an abstraction. This would mean I would only have to host this thing in a worker role somehow and add some classes working with blobs and not with files. Cool!

kick it on DotNetKicks.com Shout it

Demo of the FTP to Blob Storage bridge

Well, you can try this one yourself actually… But let’s start with a disclaimer: I’m not logging your account details when you log in. Next, I’m not allowing you to transfer more than 10MB of data per day. If you require this, feel free to contact me and I’ll give you more traffic quotas.

Open up your favourite FTP client (like FileZilla), and open up an FTP connection to ftp.cloudapp.net. Don’t forget to use your Windows Azure storage account name as the username and the storage account key as the password. Connect, and you’ll be greeted in a nice way:

Windows Azure Blob FileZilla

The folders you are seeing are your blob storage containers. Feel free to browse your storage account and:

  • Create, remove and rename blob containers.
  • Create, remove and rename folders inside a container. Note that a .placeholder file will be created when doing this.
  • Upload and download blobs.

Feels like regular FTP, right? There’s more though… Using the Windows Azure storage API, you can also choose if a blob container is private or public. Why not do this using the FTP client? Right-click a blob container,  pick “File permissions…” and here you are: the public read permission is the one that you can use to control access to a blob container.

Change container permission through FTP

Show me the code!

Well… No! I think it’s not stable enough for releasing it to public yet. But what I will do is share some off my struggles I faced while developing on this.

Struggle #1: Quotas

As you may have noticed: I’m not allowing data transfers of more than 10 MB per day per storage account. This is not much, but I did not want  to go pay for other people’s traffic that comes trough a demo app. However: every command you send, every action you take, is generating traffic. I had to choose how this would be logged and persisted.

The strategy used is that all transferred bytes are counted and stored in a cache in the worker role. I created a dedicated thread that monitors this cache, and regularly persists the traffic log in blob storage. There is no fixed interval in which this happens, it just happens. I’m not sure yet that this is the best way to do it, but I feel it is a good mix between intensity of logging and intensity of an expensive write to blob storage.

Struggle #2: Sleep

This is not a technical struggle. Since I had fun, I dedicated a lot of time to this thing, mainly in fine-tuning, testing, testing with multiple concurrent clients, … I learnt that System.Net has some cool classes and also learnt that TcpClient that are closed should also be disposed. Otherwise, the socket will not be released and no new connections will be accepted after a while. Anyway: it caused a lack of sleep. The solution to this was drinking more coffee, just at the moment where I actually was drinking less coffee for over a month or two. I will have to go to coffee-rehab again…

Struggle #3: FTP PASV mode

I will not assume you know this because I also didn’t know the exact difference… When a client connects to an FTP server, it will have 2 connections with that server. One on the standard FTP TCP port 21 used for sending commands back and forth, and one on another TCP port used for transferring data. This second connection can be an active one or a passive one.

The main difference between active and passive FTP lies in the direction of the connection: with active FTP, the FTP server opens a connection to a TCP port on the client, while with passive FTP, the client will open a connection to another TCP port on the server. Here’s more details on that:

The Basics

FTP is a TCP based service exclusively. There is no UDP component to FTP. FTP is an unusual service in that it utilizes two ports, a 'data' port and a 'command' port (also known as the control port). Traditionally these are port 21 for the command port and port 20 for the data port. The confusion begins however, when we find that depending on the mode, the data port is not always on port 20.

Active FTP

In active mode FTP the client connects from a random unprivileged port (N > 1023) to the FTP server's command port, port 21. Then, the client starts listening to port N+1 and sends the FTP command PORT N+1 to the FTP server. The server will then connect back to the client's specified data port from its local data port, which is port 20.

From the server-side firewall's standpoint, to support active mode FTP the following communication channels need to be opened:

  • FTP server's port 21 from anywhere (Client initiates connection)
  • FTP server's port 21 to ports > 1023 (Server responds to client's control port)
  • FTP server's port 20 to ports > 1023 (Server initiates data connection to client's data port)
  • FTP server's port 20 from ports > 1023 (Client sends ACKs to server's data port)

When drawn out, the connection appears as follows:

In step 1, the client's command port contacts the server's command port and sends the command PORT 1027. The server then sends an ACK back to the client's command port in step 2. In step 3 the server initiates a connection on its local data port to the data port the client specified earlier. Finally, the client sends an ACK back as shown in step 4.

The main problem with active mode FTP actually falls on the client side. The FTP client doesn't make the actual connection to the data port of the server--it simply tells the server what port it is listening on and the server connects back to the specified port on the client. From the client side firewall this appears to be an outside system initiating a connection to an internal client--something that is usually blocked.

Passive FTP

In order to resolve the issue of the server initiating the connection to the client a different method for FTP connections was developed. This was known as passive mode, or PASV, after the command used by the client to tell the server it is in passive mode.

In passive mode FTP the client initiates both connections to the server, solving the problem of firewalls filtering the incoming data port connection to the client from the server. When opening an FTP connection, the client opens two random unprivileged ports locally (N > 1023 and N+1). The first port contacts the server on port 21, but instead of then issuing a PORT command and allowing the server to connect back to its data port, the client will issue the PASV command. The result of this is that the server then opens a random unprivileged port (P > 1023) and sends the PORT P command back to the client. The client then initiates the connection from port N+1 to port P on the server to transfer data.

From the server-side firewall's standpoint, to support passive mode FTP the following communication channels need to be opened:

  • FTP server's port 21 from anywhere (Client initiates connection)
  • FTP server's port 21 to ports > 1023 (Server responds to client's control port)
  • FTP server's ports > 1023 from anywhere (Client initiates data connection to random port specified by server)
  • FTP server's ports > 1023 to remote ports > 1023 (Server sends ACKs (and data) to client's data port)

When drawn, a passive mode FTP connection looks like this:

In step 1, the client contacts the server on the command port and issues the PASV command. The server then replies in step 2 with PORT 2024, telling the client which port it is listening to for the data connection. In step 3 the client then initiates the data connection from its data port to the specified server data port. Finally, the server sends back an ACK in step 4 to the client's data port.

While passive mode FTP solves many of the problems from the client side, it opens up a whole range of problems on the server side. The biggest issue is the need to allow any remote connection to high numbered ports on the server. Fortunately, many FTP daemons, including the popular WU-FTPD allow the administrator to specify a range of ports which the FTP server will use. See Appendix 1 for more information.

The second issue involves supporting and troubleshooting clients which do (or do not) support passive mode. As an example, the command line FTP utility provided with Solaris does not support passive mode, necessitating a third-party FTP client, such as ncftp.

With the massive popularity of the World Wide Web, many people prefer to use their web browser as an FTP client. Most browsers only support passive mode when accessing ftp:// URLs. This can either be good or bad depending on what the servers and firewalls are configured to support.

(from http://slacksite.com/other/ftp.html)

Clear enough? Good! In order to support passive FTP, the Windows Azure worker role should be listening on more ports than only port 21. After doing some research, I found that most FTP servers allow specifying the passive FTP port range. Opening a range of over 1000 TCP ports is also something most FTP servers seem to do. Good, I tried this one on Windows Azure, deployed it and… found out that you can only define a maximum of 5 public endpoints per deployment.

This led me to re-implementing PASV mode, opening a new port on demand from a pool of 4 public endpoints defined. Again, I deployed this one but this failed as well: there was too much of a delay in opening a new TcpListener on the fly.

Option three seemed to work: I have a TcpListener open on TCP port 20 all the time and try to dispatch incoming connections immediately. There’s also a downside to this: if users send a lot of PASV requests, there will be a lot of unused connections that may cause the application to crash. So I did a trick here as well: close listening connections after a short delay.

Conclusion

Feel free to use the service and if you require more than 10 MB traffic a day, feel free to contact me. I can specify traffic quotas per storage account and may increase traffic quotas for yours.

kick it on DotNetKicks.com Shout it

MEF will not get easier, it’s cool as ICE

Over the past few weeks, several people asked me to show them how to use MEF (Managed Extensibility Framework), some of them seemed to have some difficulties with the concept of MEF. I tried explaining that it will not get easier than it is currently, hence the title of this blog post. MEF is based on 3 keywords: export, import, compose. Since these 3 words all start with a letter that can be combined to a word, and MEF is cool, here’s a hint on how to remember it: MEF is cool as ICE!

kick it on DotNetKicks.com

Imagine the following:

You want to construct a shed somewhere in your back yard. There’s tools to accomplish that, such as a hammer and a saw. There’s also material, such as nails and wooden boards.

Let’s go for this! Here’s a piece of code to build the shed:

[code:c#]

public class Maarten
{
    public void Execute(string command)
    {
        if (command == “build-a-shed”)
        {
          List<ITool> tools = new List<ITool>
          {
            new Hammer(),
            new Saw()
          };

          List<IMaterial> material = new List<IMaterial>
          {
            new BoxOfNails(),
            new WoodenBoards()
          };

          BuildAShedCommand task = new BuildAShedCommand(tools, material);
          task.Execute();
        }
    }
}

[/code]

That’s a lot of work, building a shed! Imagine you had someone to do the above for you, someone who gathers your tools spread around somewhere in the house, goes to the DIY-store and gets a box of nails, … This is where MEF comes in to place.

Compose

Let’s start with the last component of the MEF paradigm: composition. Let’s not look for tools in the garage (and the attic), let’s not go to the DIY store, let’s “outsource” this task to someone cheap: MEF. Cheap because it will be in .NET 4.0, not because it’s, well, “cheap”. Here’s how the outsourcing would be done:

[code:c#]

public class Maarten
{
    public void Execute(string command)
    {
        if (command == “build-a-shed”)
        {
          // Tell MEF to look for stuff in my house, maybe I still have nails and wooden boards as well
          AssemblyCatalog catalog = new AssemblyCatalog(Assembly.GetExecutingAssembly());
          CompositionContainer container = new CompositionContainer(catalog);

          // Start the job, and ask MEF to find my tools and material
          BuildAShedCommand task = new BuildAShedCommand();
          container.ComposeParts(task);
          task.Execute();
        }
    }
}

[/code]

Cleaner, no? The only thing I have to do is start the job, which is more fun when my tools and material are in reach. The ComposeParts() call figures out where my tools and material are. However, MEF's stable composition promise will only do that if it can find ("satisfy") all required imports. And MEF will not know all of this automatically. Tools and material should be labeled. And that’s where the next word comes in play: export.

Export

Export, or the ExportAttribute to be precise, is a marker for MEF to tell that you want to export the type or property on which the attribute is placed. Really think of it like a label. Let’s label our hammer:

[code:c#]

[Export(typeof(ITool))]
public class Hammer : ITool
{
  // ...
}

[/code]

The same should be done for the saw, the box of nails and the wooden boards. Remember to put a different label color on the tools and the material, otherwise MEF will think that sawing should be done with a box of nails.

Import

Of course, MEF can go ahead and gather tools and material, but it will not know what to do with it unless you give it a hint. And that’s where the ImportAttribute (and the ImportManyAttribute) come in handy. I will have to tell MEF that the tools go on one stack, the material goes on another one. Here’s how:

[code:c#]

public class BuildAShedCommand : ICommand
{
  [ImportMany(typeof(ITool))]
  public IEnumerable<ITool> Tools { get; set; }

  [ImportMany(typeof(IMaterial))]
  public IEnumerable<IMaterial> Materials { get; set; }

  // ...
}

[/code]

Conclusion

Easy, no? Of course, MEF can do a lot more than this. For instance, you can specify that a certain import is only valid for exports of a specific type and specific metadata: I can have a small and a large hammer, both ITool. For building a shed, I will require the large hammer though.

Another cool feature is creating your own export provider (example at TheCodeJunkie.com). And if ICE does not make sense, try the Zoo example.

kick it on DotNetKicks.com