Sunday, January 13, 2008

Changing the WAN: Virtualizing the Core

Maybe it is time for a completely new paradigm of providing Wide-Area Networking (WAN) services. With this as a premise, first let us take a short trip down networking and computing memory lanes.

In the beginning, there was only analog and this was good. It was good for creating the first telephone networks that connected a nation. It drove the technology of analog amplification, manual and then automatic switching, and eventually connected the world with voice communication. However, we should not forget that it was really teletype that began the real-time connection of the States of the Union and countries around the world. It is easy to forget that teletype was the first digital network and the first networking technology to encompass ideas such as store-and-forward and message addresses – but I digress here. Oh, why not a bit more history, IT&T ordered 50 PDP-1 computers from DEC to automate the torn-tape processing needed to route telegraph messages.

On the next day of networking, motivated by the need to connect computers to people remotely, namely teleprocessing, the development of the Modulator-Demodulator (MODEM) opened virtually any analog phone line to the ability to communicate to a remote computer. Thus, the first nationwide teleprocessing application, the SABRE airline reservation system, was able to connect people to ticket agents to reservation computers and change the business of flying (and ultimately showed the way to commerce at the speed of light). During this time, data networks were something that from a digital perspective was something that only existed at customer locations. Jumping way ahead, the development of packet switching, the ARPAnet, NSFnet, and the Internet moved these activities from the edge of the network and created something called the network core. Since then (and actually before Internet Protocol network with Layer 2 technologies such as X.25 and Frame Relay) non-trivial activities take place within the core of the network that are unseen by users of that network.

One can naturally asks many questions if you are looking to buy or are a user of a commercial backbone network service such as an Multi Protocol Labeled Switched (MPLS) Virtual Private Network (VPN). Examples include: What is happening in the core and how is it affecting my service? If I buy network services from a provider how do I know what I am going to get in terms of performance and how can information be provided? If I have special requirements, can my provider accommodate them, and if yes, how quickly? This is the essence of the change necessary in WAN services, and it is the same story that moved the computer information processing world from batch, to timesharing, and now to wide scale compute resource virtualization.

What does this mean? In the early days of computers, the computer was more or less a personal computer. It was personal in the sense that although it was shared among many different users, when a user was actually using the machine, the machine was dedicated for some blocked amount of time just for that one user. Quickly, this was seen as inefficient and computer operating systems were created that could hand a batch of jobs. This moved the user away from the computer, creating programs off-line and submitting them in a batch queue for eventual execution, improving the utilization of the expensive computer. Two alternatives were then explored as the next direction. The first was the creation of multi-programming and computer timesharing. In this mode, multiple users could access the computer on-line and develop programs or run existing applications. Many businesses sprang-up in the late 1960s and into the 1980s to sell time on expensive, centrally located computers. Timesharing was originally touted as proving the user an experience just like if they had the computer to themselves. But of course, this was not really the case as the timesharing users did not have control of the computer, they just had the ability to run user-level applications and store files. Applications that wanted to communicate with the outside world or needed a new feature from the operating system were just not possible.

The real approach to improving what could be done effectively with a computer was the development of true personal computers. Cost-effective desktop machines provided users with the ability and control to do anything, and without the pesky interaction with a systems administrator. Whole new worlds of applications were developed, right down to the networked personal computer I am writing the message and the machine that you are reading this message. However, these machines also matured to the point where their processing performance and storage capabilities put them right back into what the good old central computer system looked like, something that today we call a server.

So, now just like in the batch and timesharing days, we have computers that are essentially centrally located to provide common functions for thousands, if not millions, of users. And this leads us right back to the original problem of not being able to address systems changes to enable new applications while traditional timeshared applications are running on these servers. With the fanfare of a revolution, virtual machine software was one of the biggest news items in 2007 (another history aside – IBM invented this technology with the introduction if the virtual machine operating system in the 1970s). Why? Because it rapidly can change the way that server farms can be managed, upgraded, shared, and protected. Because each user of these virtualized machines sees a whole machine and can tailor their operating system environment to what they need. They can obtain performance information, and more finely control the interaction between different application sets, improving security and flexibility at the same time.

If you have gotten this far, then you are probably wondering what this has to do with WAN services and rightly so. But, the analogy of batch to timesharing to virtual machines is a direction that networking needs to complete if it is to address those users that require more status and control over the shared commercial resources they use to build their network.

Here we go. The initial flat Internet Protocol address space created by using core IP routers defined the core network as a batch environment. All addresses went into a global address IP route table and thus the Internet connecting billions was born. However, like a batch system everyone sees everyone else on the network and except at the edge there is no way to provide security between users. Clearly, this was not acceptable so the creation of IPSec tunnel technology – an edge-to-edge solution was created but was not seen as robust enough for enterprise uses (this is why Layer 2 services such as ATM and Frame Relay continued in spite of IP technology). The next leap was the development of the MPLS VPN. This system is similar in concept to timesharing computer resources. Users don’t see or control the router, but they do have a virtual route table that separates each user from other users.

Now, what is the last step and why is it so important? With virtual machines, companies can control or obtain computing resources and customize each virtual environment to meet the applications needs. In the current MPLS VPN space, the user gets whatever features that the networking provider has decided to provide. If there are additional features or performance reporting desired by the user, the user must either take what is given, find another service provider that will meet the requirement, or make a feature request and wait for the provider to deliver on that request. This is exactly the reason why that network hardware features do not find their way into the hands of the users. Everything that must be done at the systems level on the core network must go through a carrier’s typically lengthy productization processes. Even then, some features will be included and others left out for the future making an endless cycle of customers waiting for what they want.

How can the game change? Carriers need to get out of being timeshared router system administrators and let customers define what they need and how they want to operate their WAN networks. Let customers have access to virtual routers in the WAN. Let the customer define what routing protocol to use, how to configure user interfaces, what SNMP variables to read, and the number of class of service queues and their configuration. This approach makes networking look like the virtual server farm. The server farm is characterized by physical parameters, such as memory, processing speed, and storage. Users are then assigned resources on a set virtual machines and can then do whatever they want. Doing this with a data network is very much the same. The physical parameters are network connectivity and bandwidth, and each user is assigned resources on a set of virtual routers. The server farm provider only needs to test that the virtual machine software ensures that each virtual machine cannot crash another virtual machine. The network provider need only do the same thing, ensuring that a virtual router cannot affect another virtual router.

This is the proverbial win-win for the service provider and the customer. The service provider only needs to do what any share resource owner needs to do, and that is to allocated virtualized resources, provide capacity upgrades (e.g., high capacity routers and more backbone bandwidth), and to monitor the health of the virtual routers provided to customers. Testing now becomes simpler as only the virtual router mechanism and a set of baseline router features need to be tested. The customer now can use the baseline router features anyway they want and as the customer has leased a slice of a set of virtual routers, support for other features within the router goes back directly to the router vendor. This gives incentive for hardware providers to make new features available as they can actually be used – without the year or more lag that normally interrupts the deployment of a feature in a commercial service provider core.

The time is upon us to stop the over control of commercial backbone networks and place the appropriate – that is virtual – control in the hands of the people that really use the network. It could be possible that carriers that do not develop this capability for large scale enterprise customers may find they are in the same place as the failed computer timesharing companies of the 1980s.