Technology Directions by Wesley Kaplow: 2008

Monday, November 10, 2008

Elegance and Brute Force: Who Wins?

As computer applications become more demanding on network and computing resources, what is the best way to ensure that performance requirements are met and that other objectives, such as greening are also achieved? I contend that not only can equipment be green, but that must be expanded to network data transport services, and must now include improving applications that use resource wasteful network protocols as well.

Over the past decade, I have witnessed the continual evolution of optical networking technology from transport to ensure user services. At each level in the protocol stack along with applications, what is being done to improve the efficiency, or elegance, of the environment? Clearly, much of the technology press is alive with issues related to data center efficiency. The old days, of not too long ago, of adding a new server every time a new application was introduced into an enterprise Information Technology (IT) infrastructure is rapidly disappearing - The old brute force approach. In fact, there are companies whose entire business model is structured around using technology such as Virtual Machine Mangers (VMMs) to reduce the number of discrete services in an enterprise’s data center. This reduction in servers and increase in utilization of the remaining servers can often reduce data center costs, and the associated power and cooling, by 50% or more.

So, these data center changes have real impact on IT costs, but why do we stop there? Why are we not always looking or demanding that efficiency be built into all aspects of the end-to-end service? For example, at business and at home, lighting is being changed into Compact Florescent Lights (CLFs) which reduce lighting costs by over 60% or more, hybrid cars of various types are being introduced to improve fuel efficiency for transportation, more people than ever are taking mass transit and teleworking – all to reduce energy costs.

However, this is not a place to stop. As I have written previously in my entry, Earth Hour for the Internet, telecommunications providers are under significant pressure to improve their transport and data infrastructures to improve the energy cost per bit transported. In the optical transport space this means that service providers are choosing equipment that crams more data per optical channel (e.g., going from 10 Gbps to 40 Gbps) and more channels per optical line system (e.g., 32 to over 100). In the data space, this means moving to Ethernet interfaces and switches to reduce the cost and power needed to provide data services. Service providers are replacing to starting to consider replacing their full-blown core routers (i.e., the Provider or “P” routers) in their core Multi-Protocol Labeled Switch (MPLS) backbone with MPLS capable switches. These switches are approximately 33% of the cost, 50% of the weight, and use 50% less power than the status quo.

So the data center is being worked, and service provider transport and data networks are being addressed, so what else is there? Well, it’s the application stupid! And, as we all know “Stupid is as stupid does” and many applications use network resources inefficiently. Two dimensions of application performance have traditionally been centered around how fast are the servers providing the data for the applications and how fast is the network providing services providing connectivity to the end-user locations. The power and space efficiency of the servers is being addressed in data center improvements, and data network connectivity is being improved by service providers as discussed above. However, is this all there is? Absolutely not.

As applications have grown in richness (e.g., from text, to graphics, to pictures, to sound, and to video), the bandwidth required, for satisfactory performance from the user’s point-of-view, has continued to increase. In addition, the number of Internet Protocol (IP) applications such as Voice over IP (VoIP) and video conferencing are also increasing bandwidth demands along with applications that never end, like Internet Radio, other streaming services, and even what used to be regarded as humble email applications.

The brute force approach, and essentially the approach used in virtually 95% of the time, is to continue to increase the raw bandwidth provided to each end-user location. Of course, this is exactly what network service providers want customers to do. However, from the perspective of the customer, the cost of increasing the capacity of enterprise-class dedicated access and services is extreme. For example, going from a T1 access speed (approximately 1.5 Mbps raw) to the next 2xT1 access speed is in general an increase in access costs of 100% and data services costs of around 80%. This means that the next application that goes on the network may be the one that breaks the proverbial camel’s back. So, the introduction of a new medical imaging application or collaboration application means that your network costs could increase by 90% on a monthly cost basis. This may be untenable as the cost for deploying the application and the cost to maintain the network for the application may break an organization’s budget.

But, there is another area where brute force is in play. For example, an IP packet that goes from a customer location to the server, for the same data file transfer, sends the full source and destination addresses of the customer’s computer and the servers computer over and over again – and in fact the move to IPv6 will make matters even worse. What’s more, many applications are “chatty”, that is they send many packets between the client application and the server, establishing new communication sessions each time which has another whole set of inefficiencies. One can say that applications are not developed with green in mind and neither were the basic data communication protocols (e.g., TCP and IP) created to be green.

However, all is not lost, as we can return to elegance as a way to improve this sad state of affairs. Just as using VMM approaches have improved the efficiency of usage of servers, most applications and the protocols that they use to communicate over a network are vastly inefficient – but this can be addressed.

There are two areas that this can be attacked, and quite frankly should be attacked. The first is the demand that developers, especially for client-server based application, develop against a new green standard for information flow. Just as there are other usability, security, and performance requirements for applications, a green standard for information flow between client applications and their supporting servicers needs to be put in place. These standards would limit the number of extraneous network flows created and limit the amount of redundant information that flows between the client and the server.

Barring the incredibly unlikely scenario that green network use and implementation makes an impact on application development any time soon, other approaches, specifically WAN Acceleration appliances may be a valuable addition to a growing network. These system understand how to improve the performance of TCP/IP using well known techniques such as head compression, as well novel techniques that reduced the amount of protocol chatter traffic, based on particular applications, over the WAN by limiting wasteful behavior to the high performance local LAN. In addition, some do sophisticated caching techniques that actually recognize data that has been send over the WAN before and does not have to be sent in its entirety again.

WAN Acceleration systems have been field proven to reduce WAN requirements by up to 50% or more while at the same time improving the end-user experience. This mean that with an investment of around 25% of one year’s network costs in WAN Acceleration equipment, enterprises may be able to postpone the 80%+ increase in monthly network service costs into the future, saving big bucks.

Reduce, Re-use, and Re-cycle is the mantra of waste reduction in the physical world. So, to get green:

· Reduce:

o Use Virtual Machine techniques to reduce the number of physical servers be tailored to ensure good resource utilization, and therefore efficiency

o Use WAN acceleration to reduce the unnecessary traffic flows between end-users and applications, reducing the need for WAN bandwidth therefore efficiency

o Whether you buy or develop applications, demand that bandwidth and protocol efficiency is part of the requirements of the software

· Re-use:

o Find hardware that has multiple functions so that it can be used or re-used for new applications. Buy a firewall that cannot be used for additional services (e.g., IDPS, anti-virus, etc.)

o Use software or appliances that re-use information that has already been sent from a client to a server (or in the other direction). Stop sending the same information back and forth that only clogs WAN pipes

· Re-cycle:

o Stop having to re-do all your applications to improve performance. Find a solution that enables WAN performance improvement to eliminate or delay adding additional expensive bandwidth without having to re-write, re-engineer, and re-buy equipment and applications.

Monday, November 3, 2008

Constructive Non-Complacency: Making a Technical Organizaiton Succeed

Although it may not make you especially popular with some, if you want your organization to succeed and "make things happen" then I believe that practicing and executing Constructive Non-Complacency is an approach worth considering.

Of course you are asking what exactly does this mean. Did Wes find this in one of the hundreds of leadership books that he reads voraciously? Is this something he learned from a Qwest CEO whispering into his ear? Or, has Wes simply taken leave of his senses? The answer is no. It is based on what I have personally experienced in the last decade and trying to figure out what went right.

There are two essential components to this concept. First and foremost is the word “constructive”. Everything that a leader does, and the organization the leader is responsible for, must be done in a constructive manner. This includes professional behavior to peers, vendor partners, and of course to customers. Constructive means that the leader’s organization needs to build solutions, build connections between people, help each other through tough personal and business issues, and look for help when needed. Nothing is so important that it cannot in a professional manner in pursuit of excellence and business success. When leadership and staff are perceived as constructive, people begin to listen and engage in suggested solutions to problems.

The second part of the theme is non-complacency. And, you are right that it means that leaders and their organization cannot be satisfied with merely the status quo. It means that leaders and their staff search for answers even when problems look too hard. It means that no matter how good a month, quarter, or year the business is having, the team is always looking for more growth. It means that when internal or external roadblocks are hit, the organization pursues with vigor (of course at all time obeying the “constructive” part of the theme) to get the answers and commitments that are needed for success. It means that a leader’s phone (office, mobile, and home), email, and messaging, are always ready to receive issues and take the necessary actions and perform the necessary escalations needed to get the job done – and the entire organization knows to do this as well. It means that every member of the team needs to understand the urgency and feel that these issues are screaming in their gut for solutions and each person knows that issues cannot be left to fester without resolution. Finally, it means that at every level, from leadership to staff, are all responsible for continually searching for better ways to do their jobs.

An organization that has bought into Constructive Non-Complacency will ask questions about why something cannot get done and more rapidly weed out the real reason that someone says no to an idea or approach that is critical for customer success. With an organization that drives for answers in a constructive manner, then issues and roadblocks will crumble and customers will understand that you and your organization care about customer success – and reward you with more business.

Sunday, October 5, 2008

Some thoughts on the state of telecomunications in the USA

Not too long ago, you phone from the Phone Company could be any color you like as long as it was black. Now, the choices for voice communications are legion. We have hardwired services we have good old Plain Old Telephone Service (POTS), Voice over IP (e.g, Vonage, etc.), Skype, wireless connections using cellular technology (i.e., GSM, CDMA), and now VoIP over WiMAX (at least in Baltimore). We can ask the question of whether all of this is actually good for us?

Well, of course, having multiple ways of doing the same thing is a way of fostering competition and technological innovation. If companies did now work in developing VoIP technology and if Internet Service Providers (ISPs) did not work on improving their quality of service, then we would not have the technology tools, systems, and services that enable the rich communications environment that we now enjoy.

But, is all of this good for us and for the industry? It's hard to say, and I am not going to draw a firm conclusion here. But let's look at some of this issues that this diversity brings:

As indicated previously, multiple technical directions fosters innovation and new technologies, services, and applications
However, with multiple approaches do we dilute the ability to achieve economies of scale in the deployment of any technology?

VoIP and its benefits is an example of issue one. However, there are areas where issue two may have impact for years to come.

What will happen with the current dramatic decrease in wired POTS users? Since these networks (predominately provided by the old Bell Operating Companies' systems), what happens when the user base falls below the threshold that causes the cost of providing services to the remaining user set is so expensive that is become prohibitively expensive provide?

Do we simply turn off the old system and make the remaining users go to a new technology or simply turn them off? This is exactly the scenario that is playing out with the elimination of Analog (i.e., NTSC) television transmission, scheduled to be virtually retired next year.
Who pays for this?
Which company is going to be the first to demand the required changes in the current regulatory environment?

Currently, in the United States of America, we have at least three existing mobile cellular technologies at play and now there is a third. The first three are GSM (used virtually everywhere around the world), CDMA (used almost exclusively in the USA), and iDen (used almost exclusively in the USA and some areas of Latin America). Now, we have added WiMAX as the very distant fourth, trying to take a share of the mobile data user and fighting against the so-called 3G services provided by GSM and CDMA - good luck.

These three standards compete in the US for a user population of around 150+ million users. Currently this means that that largest provider has only on the order of 50 million customers. In a few years, if it not already true, this will not even be the size of the smallest cellular provider in China. We can identify some serious issues:

Is the competition in the US going to provide better services, or leave us behind as our mobile voice and data services are diluted over carriers that will not have the capital to expand their services to meet emerging application requirements?
Will the USA fall behind in advanced wireless applications as availability of high-speed data services becomes fragmented and non-ubiquitous?

Finally, how do we balance the requirements of wireless to wired? Clearly, no current technology (or with known physics) is going to provide mobile gigabit services delivered to tens of thousands of users in a city. How then do we start valuing the the contribution of traditional wireline providers? And, how to we incent companies to provide the 20 Mbps to each business and home that will be needed to keep up with countries like South Korea and Japan?

Some issues?

Building fiber into building costs a lot of money. However, permitting, franchise fees, and in general lack of government support makes this more expensive and sometimes impossible
The apparent business environment for investing in companies trying to build new metropolitan infrastructure is extremely poor at best
The business case for building is getting worse and worse

I will finish the blog with a short summary of the worse and worse, which may lead to a new entry later. The phone company invests heavily in the development of fiber to the home and business. The business case for the service includes the sale of voice, data, and video services. However, they do their job only too well, as they enable 20 Mbps or so to their customers and (as may be likely in several years) almost none of them use the video or voice services provided by the service provider. This is because customers can get their TV shows directly from the source of the material and by using one of multiple VoIP providers.

What happens in this scenario?

Does the phone company stop investing in the fiber deployment?
Doesn't the same situation exist for cable companies?
Is there room for other companies to make an investment?

The bottom line is the simple question: Where is the value that enables investment? And, with the government now dealing with other monetary crises, can companies make the investment without a least more cooperative local, county, and state governments to make it easier and less costly to install new facilities?

Sunday, June 22, 2008

Going Horizontal on the Vertical

In this case, Going Horizontal on the Vertical is not a catch phase from the original Outer Limits television series where the announcer calmly proclaims that “we control the horizontal and we control the vertical”. The point of this post is to get the point across that almost no matter how diligent telecommunications carriers think they are when they develop products, or when customers write statement of objectives, invariably the result is a set of vertical services or sets of requirements that leaves the ultimate reason for the services in the first place out of the equation – the mission of the organization.

Why are we in this situation? Isn’t it clear that the purpose of network services is to provide an infrastructure and Information Technology (IT) resources to support the mission of the organization, company, or government agency?

Our first look will be from the perspective of a telecommunications service provider. Historically, and fundamentally, there are two sets of organizations that play a role in the development of the services offered. These are the Product Management organization, and the Technology Management organization. Note that neither of these organization’s names have anything to do with a customer focused solution. They are focused on an individual product and a set of technologies to enable that product. It actually goes further south from there – and although some aspects of this have improved over time, an example from my direct professional experience will shed some light on just how bad thing were.

In the beginning, technology created Asynchronous Transfer Mode (ATM) and Frame Relay (FR) – well not really in the beginning as there as Morse code, Baudot , X.25, Switched 56K, etc, etc… Then, along came the Internet and Internet Protocol Services. Let’s deal with ATM and FR first. So, as a product manager, what was the approach to selling the product? Well, instead of really understanding the problems that customers want to solve by using a network-based data service, you focus on what the box manufactures and standards bodies are telling you. You specify your product not in terms of a set of configurations that can be used to solve a series of customer focused network problems, but in terms absolutely incomprehensibly except by bowing to the alter of some network engineer. You’ve got PVCs, SVCs, DLCIs, VPI/VCIs, CBR, ABR, PCR, SCR, MBR, VBR, VBR-rt, VBR-nrt, UBR, and so on and so on and so on. As sales documentation you take out the glossies from your favorite box vendor and maybe something from an nearly otherwise useless book on ATM and make assertions like: “CBR is great for video, VBR-rt is great for voice, ABR is great for who knows what, UBR is great for pesky email you don’t really want to get, etc”.

Now let’s bring in the customer, who decides that if the carriers are going to declare technical mumbo-jumbo war in describing their products, then they are going to have to get their own technical mumbo-jumbo experts to counter. So, instead of the customer discussing what they are trying to accomplish, they create Request for Proposals that look more like someone picked up the ATM UNI 3.1 specification and picked out every nearly useless and arcane feature and made it a mandatory requirement in the carrier’s response. All this time, the actual mission and reason for the network is nowhere to be seen.

This mismatch between the way that the product is provided and priced to the customer and the real customer need demonstrates that pure technology is not likely to yield customer happiness.

Everyone knew that the reason that people did not like ATM and FR was that there was always a mismatch between the configuration control of the ATM and FR network and the almost exclusively private Internet Protocol (IP) network created at the edge by the customer. So, when public IP services came onto the scene, the Holy Grail was at hand and everything would be happy if you just used the Internet.

So, instead of recognizing that IP services were just another form of data transport, the previous, since time immemorial, approach of creating a new product management organization to care and feed the new IP services product was implemented. And worse, put in opposition to the existing data services product management. This led to the inevitable clash of product managers that arguing that “my product is better than yours because IP is golden and ATM is old junk”.

Because of this clash of religions, each product continued to be managed as independent entities, and each were given network management environments to support the technical features of each product. Even if these were combined into a unified portal, each data product was treaded individually. These management systems re-enforced the technical attributes of the services, but continued to ignore the reality of the customer’s mission. Even worse, service providers that had a managed Hosting service generally had a completely separate hosted services portal, completely focused on the hosting environment and completely oblivious to the network environment that is needed to support complex applications.

In an attempt to remedy part of this issue, the basic approach of carriers was to start developing Managed Network Services. These services were generally based on the carrier creating a Network Operations Center (NOC) that focused on the edge devices of the network. This carrier NOC approach is generally limited to whether the edge routers of a customer are working as designed and to providing a single point-of-contact for trouble management of the underlying carrier-based data transport network. In some cases, these NOCs may use the existing vertical service portals (e.g., to help determine performance issues with an MPLS VPN), or receive a data feed (e.g., in XML format) of the actual data elements that describe the performance of the underlying transport network. However, as previously mentioned, this is a limited view of the enterprise, as the network may be working, but some critical application is failing.

In some cases, this may not be a significant issue, as the operation that is watching a Web-server will address the server failing and is generally independent of a network service. The increase in the richness (i.e., voice and video) in Web services means that this independence between network performance and Web-server operation is disappearing and calls out for a unified environment so that the impact of network performance on application performance is clear and well reported.

How do we get horizontal on the vertical? There are two cases, one for carrier provided Network Management services and for the case where this function is either in-house to the enterprise or outsourced to a third party.

Carriers must start recognizing that they are actually application enablers and that their products are not the application in themselves. This means that all the vertical tools that are created to monitor and report on the technical aspects of a service (e.g., latency, jitter, packet loss, QoS operation, etc.) are actually components in the proper operation of an end-to-end application. Providing this data in Web page is a start, but these services need to have an Application Programmer Interface (API) that enables the performance parameters to be used to create the “big picture” of the status of an application. This API can be used both by the carrier in an integrated environment that combines multiple services, for example network and hosting, into an application view for their customers. In addition, this API can be used to provide a third party the information necessary to customize a network management platform that integrates information from various sources (i.e., internal to the enterprise and from service providers) to provide the application situational awareness needed to effectively operate the enterprise.

Another aspect that is generally not integrated into these systems so called Change Management process which encompasses items such as service affecting network upgrades. Currently, this is yet another process for each product and is generally not integrated either into the vertical management portal for individual services or into consistent Change Management portal for enterprise customers.

Carriers have the ability to reach into their systems for network data, security services data, trouble management information, change management information, and managed applications performance data. They are uniquely positioned to be able to take the high-ground on developing Web portals that can be rapidly customized using the data feeds from the multiple services being provided. This can create real application performance awareness to the enterprise and incorporate the information and visibility necessary to understand the impact of each element of the services being provided on the applications and enable more rapid root cause analysis and therefore more rapid trouble management response and repair.

Getting Horizontal on the Vertical really means understanding customers better and the inter-related nature of their use of services and applications. In my view, this is the ultimate Vertical penetration you can achieve in any customer’s operation.

Saturday, June 7, 2008

It comes at you when you least expect it...

Just when it appears that you've got everything figured out, technology has a way of throwing you a curve ball. The curve balls can throw away conventional wisdom built up over years virtually overnight. These changes are many times the actions of a "beast" that has been unleashed unknowingly and will eventually change everything. The beast changed the world of computing and now the question is will a new beast kill traditional core routing approaches and change network architecture?

Let's start with the traditional example – the beat that changed computers. Once upon a time, computers were measured in tons and kilowatts and each design's total production was measured in at most in the tens of thousands of units. Then, the beast was unleashed. At the same time IBM was producing a very successful line of mainframe computers, an almost aside operation created albeit not the first ever personal computer, but the first personal computer that did not sell just in the thousands but sold in the millions. The beast unleashed here was the production of millions of microprocessors. Because tremendous production volume and the demand for more and more performance, this beast rapidly transformed where computing power was found. No longer was it found in the single computer behind the glass window or even in the computer room of a department, this beast was now everywhere. And everywhere meant that economy of scale and the huge revenue opportunity for producing faster and better processors started a real exponential ride.

This beast destroyed empires that could not rapidly adapt. Examples include DEC, Data General, CDC, and many others. Other that could, like IBM, changed both the technology of their mainframes (from ECL to CMOS) and the entire direction of the company.

Today, each of us has more than a supercomputer of the recent past on our desk or laps that cost only a few hundred of dollars. If we now turn our attention to data networking, is there a beast and if yes what and where is it? If we draw a parallel to computing, I contend it is the data networking core. Consisting primarily of large routers, these "mainframe" routers cost hundreds of thousands of dollars – millions when fully populated, just like the old mainframes and moreover, are in production runs of a few thousand units.

It seems pretty clear that Ethernet technology is the beast. Most people are familiar with the Ethernet technology that is used to connect computers to a LAN – and more importantly the fact that you can go to a local office supply or computer store and buy hubs and switches right off the shelf. With 1 Gbps interfaces costing tens of dollars and 10 Gbps interfaces costing around two thousand dollars, this technology beast has dropped the cost of communications equipment dramatically. So if this is the beast, what does is replace? Well, the simple fact is, as I discussed in an earlier post, the cost of an Ethernet port on a carrier-class switch is almost 30-80 times cheaper than the equivalent 10G SONET-based port and almost 10-50 times less expensive than routers with 10G Ethernet interfaces.

So, what does this mean? With the huge increase in bandwidth needs for Internet and enterprise network services, is the current router-based architecture the best way to go? If we draw the conclusion from the computer analogy above, there are hundreds of thousands, if not millions, of high-performance Ethernet switches being sold, as compared to most likely two orders of magnitude fewer large-scale carrier routers. Again, just like in the computer microprocessor example massive production and standards drove increased performance and features that now serve everything from the laptop to the server room (and supercomputers). Will the Ethernet beast run down the same path? Does it really have the right environment to be game changing?

Well, there are several factors. Just like there is competition on price and performance for processors, there is the same competition on the high-volume fundamental chipsets for Ethernet interfaces and switching. Of course, this is not the entire story, but with incredible performance of embedded processors and open source software all the elements are place.

This complete environment of high-performance cheap fundamental switching hardware, high performance processing for higher-level protocols, and open source software for many of the protocols required for an Ethernet switch (and features for routers as well) is now available. Although at one time, open source software was not looked on too warmly by industry, commercial enterprises that have productized open source material and provide traditional software support functions have changed this attitude, and open source is now becoming viewed as being able to support mission critical functions.

The dramatic combination of available hardware and software is the new reality that may have profound and dramatic impact on the current approach to carrier-class MPLS services. Why is this true? In short, it is the need to cost effectively scale. The largest routers today, if we restrict ourselves to a single standard rack, have approximately 160 10G Ethernet ports. The largest and most dense Ethernet switches can be stacked to provide almost 300 or more 10G Ethernet ports in the same space and with much less power consumption (so, as with almost everything, there is a green angle to this as well). Of course, we have to ask the question of whether these switches will have the stability and software features to build carrier-class infrastructure.

The argument that they will is pretty simple. When computer manufacturers believed that microprocessors were toys because of word size or lack or memory management features, and rudimentary operating systems, it did not take very long until microprocessors were the guts of not only laptops and desktops, but mainframe computers as well with all the needed features and more. In the Ethernet world, it is going to be a lot easier to continue to upgrade Ethernet switches with features such as MPLS Fast-ReRoute (FRR) and VRF features for VPNs that will rival and then supplant traditional router approaches. This environment will also will enable new approaches, such as flow-based Layer 2 routing protocols, per flow authorization for security, and others.

In fact, with research projects such as Clean Slate Internet at Stanford University and the National Science Foundation’s Global Environment for Network Innovations (GENI), there are researchers all around the country that are in a position to take these hardware and software tools and create new components and new models for Internet services.

The battle of the LAN was won by Ethernet, the battle of the Edge is being won by Ethernet, so how far away is the battle for the core of the network? In fact it has already begun and for those whose network core router cheese is based on an old-style computer mainframe approach, it may move faster than you think. What will it do to networking equipment manufacturers? And, what will it do to network service providers?

Saturday, March 29, 2008

Earth Hour for the Internet

Yesterday was the first Earth Hour day where we are supposed to reduce energy consumption by turning off the lights for one hour in a worldwide energy conservation effort. There has been talk about the Internet being part of the solution for saving the planet (e.g., by enabling telecommuting), but is the infrastructure of the Internet really following "Green" principles?

Let's explore some of the places where energy is used in creating Internet services and get a view about how green they are relative to where they could be. First, there is the optical transport layer. This is the layer that provides both the "wavelengths" that are used to create the backbone of the Internet. It also provides the wavelengths are are used to create optical rings for high-availability SONET-based private lines and Ethernet needed to backhaul customer traffic to Internet edge routers.

Efficient use of power for these systems has always been a design goal and customer criteria. The reason for this is quite simple. Components of these systems (e.g., terminals and optical amplifiers) are located at hundreds of facilities that are in many cases literally in the middle of nowhere (and in some cases, 60 miles from nowhere on the way to nowhere). Controlling power consumption is critical to creating high-availability services, as the power needed by these systems determines the maximum run-time on batteries and generators.

The next place, and the place that today uses more and more power is at the data layer. Whereas optical transport systems may have power consumption of around two to four kilowatts per rack, high-end core routers have power consumptions in the range of 10 kilowatts per rack. Of course, it is not only the power that matters. Virtually every kilowatt of power input to a router becomes heat that must be removed using HVAC systems that use yet more power. Today, the leading router vendors With routers having approximately 100 10Gbps ports per rack yielding about 100 watts per port.

With the rapid growth of the Internet, each additional 10G of cross sectional bandwidth of the network consumes a significant amount of power. Are we doing everything that can be done to reduce the power consumption of the Internet?

First, let's look at the optical transport system. The major power consumption in the optical transport system is the amplification that takes place every 100 kilometers. Technologies exist today that can reduce the number of amplifiers. These include new optical fibers with less attenuation leading to less span loss making is possible to skip existing amplifier locations.

Second, let's examine the core of an Internet backbone network. For most Tier 1 providers is it comprised of approximately 20 core locations throughout the United States. To add 10Gbps of backbone bandwidth across the network, and retaining the typical goal of keeping any route through the network to at most three core routers, you have to add on the order of 100 backbone circuits. This means there are over 200 ports need to be added, or over 20 kilowatts of power (not including cooling requirements). With the explosive growth of Internet traffic (discussed in a previous post), this means that power consumption is growing right along with traffic. With growth today at over 50% compounded annual growth rate in traffic, we are talking about a hot topic.

As with optical transport, there are possibilities in reducing power needed in the Internet core. Most carriers today use full-blown routers (e.g., Cisco CRS-1 and Juniper T-series) to provide backbone MPLS switching and IP routing services. The general reason for this is that these platforms have significant features and proven reliability needed to create a robust and highly-availability network.

The obvious question is whether there is another core architecture that can provide the same backbone capabilities, but do it with less power. The short answer is yes, but the longer answer still requires some additional evaluation. One approach is to use Ethernet switching instead of router-based MPLS. From a power perspective, some of these devices use less than 20 watts per 10G port, or approximately 80% less electricity than a full-blown router.

However, there is no free lunch. There are reasons that high-performance routers have been used instead of Ethernet switches. These include technical features, operational issues, and robustness. Technical features include limitations on complex access control lists and rate limiting, which are tools that are commonly used to provide protection of network element control planes. Operational issues include the lack of comprehensive Ethernet OAM tools, making it difficult to perform fault detection and isolation, and to identify the root cause of poor performance. Finally, using Ethernet switches still requires backbone protection mechanisms that ensure high-availability backbone services. Today, much of this is done via MPLS Fast Re-Route, and there are Ethernet switches that provide this capability. There are other protection mechanisms, but they are either not robust enough or the technique is not proven on a nationwide scale. Other important features, such as hitless software and hardware upgrades, need improvement.

Finally, Business week in its March 20, 2008 magazine has a detailed article (also commented on by Bill St. Arnaud) about the issue of powering and cooling the data centers. It is these data centers where the applications that we know and love, such as Ebay, Google, YouTube, Yahoo, and others find there life. Finding "Green" locations, such as locations like Iceland with Geothermal power, and technologies to reduce power consumption is clearly on the minds of corporate executives eager to both reduce costs and make a little positive PR at the same time.

Apparently evident in the data center business, perhaps the most important question is whether there is an economic advantage for the major Internet providers to move towards a more power efficient transport of IP packets. There is always a significant amount of organizational and technical inertia that keeps network providers from radically changing their approach. However, with the cost of energy increasing and the rapid growth in Internet demand, the need for additional capital investment needed to keep pace may open up a significant opportunity to move towards both greener technology and greener architectures.

Monday, February 25, 2008

Bandwidth-On-Demand, What's up?

There has been a lively debate lately on the benefits or fallacy of Bandwidth-on-Demand (BoD) schemes. The bottom line is that this is a debate that has been going on since the beginning of time for all telecommunications services and it comes right down to a simple proposition: What is the fraction of the system bandwidth that is needed to meet the customer's requirement.

Let's dissect what this means. Once upon a time, the telephone represented the peak in technology for providing communications between two points and initially this represented a kind of BoD as customer could request, first through an operator request and then via a "dial", to get a circuit setup between two points. We then moved on to data communications technologies such as ISDN, X.25, and Frame Relay. Along with the Public Switched Telephone Network, these technologies enabled customers to setup defined bandwidth between customer end-points.

The reason for this allocation is straight forward, the desired allocation by a customer represented a significant fraction of the available system bandwidth. Because of this, any pure statistical best-effort system would lead to unacceptable performance to the customer. The idea of a traffic contract and end-to-end allocation of guaranteed bandwidth made it much easier to convince the customer that an end-to-end dedicated circuit, for example a switched T1 or nx64 Kbps ISDN circuit, was not necessary to ensure their application would work.

The Internet, for the most part, reflects a different approach which is the kill the problem with backbone bandwidth and ensure that user requirements are a small fraction of the network's capabilities. This all worked except that the Research and Education community has traditionally been the driving force not only in technology, but also in bandwidth use. It appeared, until the last couple of years, that the R&E community would continue this role, with networks and application requirements that spanned over 20 to 40 Gbps of nationwide cross-sectional bandwidth. As I stated in my earlier posts, the Internet bust is now over and Qwest, as well as other carriers are now faced with Internet bandwidth requirements that are growing at over 50% or more per year. So, what stagnated as a typical Tier 1 20 Gbps nationwide backbone for years is now adding that capacity every month if not significantly more - and it is accelerating.

What this all means is that public network infrastructure is growing at a huge compounded rate, and unless the R&E's requirements are growing at the same rate, the fraction of the resources of the commercial network infrastructure that R&E networks occupy will rapidly diminish. In fact, if this continues, then R&E networks will be a set of large but relatively common customers. This is true at the optical and data layers.

So, the bottom line is that commercial bandwidth capabilities are exploding and at some time in the near future R&E traffic may only be a large bump in the road leading to the conclusion that BoD (or switched multi-gigabit pipes) are not necessary. However, this is not the last word, as there are reasons other than cost that drives to creating special network services.

As I do with our customers, it is up the members of the R&E community to identify these special requirements and make an argument about why commercial network technology, or services, is not keeping pace with their needs. There are many potenial reasons, but over time the threshold for a real difference may continue to rise.

Sunday, January 13, 2008

Changing the WAN: Virtualizing the Core

Maybe it is time for a completely new paradigm of providing Wide-Area Networking (WAN) services. With this as a premise, first let us take a short trip down networking and computing memory lanes.

In the beginning, there was only analog and this was good. It was good for creating the first telephone networks that connected a nation. It drove the technology of analog amplification, manual and then automatic switching, and eventually connected the world with voice communication. However, we should not forget that it was really teletype that began the real-time connection of the States of the Union and countries around the world. It is easy to forget that teletype was the first digital network and the first networking technology to encompass ideas such as store-and-forward and message addresses – but I digress here. Oh, why not a bit more history, IT&T ordered 50 PDP-1 computers from DEC to automate the torn-tape processing needed to route telegraph messages.

On the next day of networking, motivated by the need to connect computers to people remotely, namely teleprocessing, the development of the Modulator-Demodulator (MODEM) opened virtually any analog phone line to the ability to communicate to a remote computer. Thus, the first nationwide teleprocessing application, the SABRE airline reservation system, was able to connect people to ticket agents to reservation computers and change the business of flying (and ultimately showed the way to commerce at the speed of light). During this time, data networks were something that from a digital perspective was something that only existed at customer locations. Jumping way ahead, the development of packet switching, the ARPAnet, NSFnet, and the Internet moved these activities from the edge of the network and created something called the network core. Since then (and actually before Internet Protocol network with Layer 2 technologies such as X.25 and Frame Relay) non-trivial activities take place within the core of the network that are unseen by users of that network.

One can naturally asks many questions if you are looking to buy or are a user of a commercial backbone network service such as an Multi Protocol Labeled Switched (MPLS) Virtual Private Network (VPN). Examples include: What is happening in the core and how is it affecting my service? If I buy network services from a provider how do I know what I am going to get in terms of performance and how can information be provided? If I have special requirements, can my provider accommodate them, and if yes, how quickly? This is the essence of the change necessary in WAN services, and it is the same story that moved the computer information processing world from batch, to timesharing, and now to wide scale compute resource virtualization.

What does this mean? In the early days of computers, the computer was more or less a personal computer. It was personal in the sense that although it was shared among many different users, when a user was actually using the machine, the machine was dedicated for some blocked amount of time just for that one user. Quickly, this was seen as inefficient and computer operating systems were created that could hand a batch of jobs. This moved the user away from the computer, creating programs off-line and submitting them in a batch queue for eventual execution, improving the utilization of the expensive computer. Two alternatives were then explored as the next direction. The first was the creation of multi-programming and computer timesharing. In this mode, multiple users could access the computer on-line and develop programs or run existing applications. Many businesses sprang-up in the late 1960s and into the 1980s to sell time on expensive, centrally located computers. Timesharing was originally touted as proving the user an experience just like if they had the computer to themselves. But of course, this was not really the case as the timesharing users did not have control of the computer, they just had the ability to run user-level applications and store files. Applications that wanted to communicate with the outside world or needed a new feature from the operating system were just not possible.

The real approach to improving what could be done effectively with a computer was the development of true personal computers. Cost-effective desktop machines provided users with the ability and control to do anything, and without the pesky interaction with a systems administrator. Whole new worlds of applications were developed, right down to the networked personal computer I am writing the message and the machine that you are reading this message. However, these machines also matured to the point where their processing performance and storage capabilities put them right back into what the good old central computer system looked like, something that today we call a server.

So, now just like in the batch and timesharing days, we have computers that are essentially centrally located to provide common functions for thousands, if not millions, of users. And this leads us right back to the original problem of not being able to address systems changes to enable new applications while traditional timeshared applications are running on these servers. With the fanfare of a revolution, virtual machine software was one of the biggest news items in 2007 (another history aside – IBM invented this technology with the introduction if the virtual machine operating system in the 1970s). Why? Because it rapidly can change the way that server farms can be managed, upgraded, shared, and protected. Because each user of these virtualized machines sees a whole machine and can tailor their operating system environment to what they need. They can obtain performance information, and more finely control the interaction between different application sets, improving security and flexibility at the same time.

If you have gotten this far, then you are probably wondering what this has to do with WAN services and rightly so. But, the analogy of batch to timesharing to virtual machines is a direction that networking needs to complete if it is to address those users that require more status and control over the shared commercial resources they use to build their network.

Here we go. The initial flat Internet Protocol address space created by using core IP routers defined the core network as a batch environment. All addresses went into a global address IP route table and thus the Internet connecting billions was born. However, like a batch system everyone sees everyone else on the network and except at the edge there is no way to provide security between users. Clearly, this was not acceptable so the creation of IPSec tunnel technology – an edge-to-edge solution was created but was not seen as robust enough for enterprise uses (this is why Layer 2 services such as ATM and Frame Relay continued in spite of IP technology). The next leap was the development of the MPLS VPN. This system is similar in concept to timesharing computer resources. Users don’t see or control the router, but they do have a virtual route table that separates each user from other users.

Now, what is the last step and why is it so important? With virtual machines, companies can control or obtain computing resources and customize each virtual environment to meet the applications needs. In the current MPLS VPN space, the user gets whatever features that the networking provider has decided to provide. If there are additional features or performance reporting desired by the user, the user must either take what is given, find another service provider that will meet the requirement, or make a feature request and wait for the provider to deliver on that request. This is exactly the reason why that network hardware features do not find their way into the hands of the users. Everything that must be done at the systems level on the core network must go through a carrier’s typically lengthy productization processes. Even then, some features will be included and others left out for the future making an endless cycle of customers waiting for what they want.

How can the game change? Carriers need to get out of being timeshared router system administrators and let customers define what they need and how they want to operate their WAN networks. Let customers have access to virtual routers in the WAN. Let the customer define what routing protocol to use, how to configure user interfaces, what SNMP variables to read, and the number of class of service queues and their configuration. This approach makes networking look like the virtual server farm. The server farm is characterized by physical parameters, such as memory, processing speed, and storage. Users are then assigned resources on a set virtual machines and can then do whatever they want. Doing this with a data network is very much the same. The physical parameters are network connectivity and bandwidth, and each user is assigned resources on a set of virtual routers. The server farm provider only needs to test that the virtual machine software ensures that each virtual machine cannot crash another virtual machine. The network provider need only do the same thing, ensuring that a virtual router cannot affect another virtual router.

This is the proverbial win-win for the service provider and the customer. The service provider only needs to do what any share resource owner needs to do, and that is to allocated virtualized resources, provide capacity upgrades (e.g., high capacity routers and more backbone bandwidth), and to monitor the health of the virtual routers provided to customers. Testing now becomes simpler as only the virtual router mechanism and a set of baseline router features need to be tested. The customer now can use the baseline router features anyway they want and as the customer has leased a slice of a set of virtual routers, support for other features within the router goes back directly to the router vendor. This gives incentive for hardware providers to make new features available as they can actually be used – without the year or more lag that normally interrupts the deployment of a feature in a commercial service provider core.

The time is upon us to stop the over control of commercial backbone networks and place the appropriate – that is virtual – control in the hands of the people that really use the network. It could be possible that carriers that do not develop this capability for large scale enterprise customers may find they are in the same place as the failed computer timesharing companies of the 1980s.