Sunday, May 6, 2012

The Internet Kill Switch


There are still many of us that the question “Where were you when the lights went out” meant where were you stuck during the November 9, 1965 blackout in New York City.  Even though I am older than I look, my only fleeting memory is that my Mom would not let me open the refrigerator and that we needed to go out of our apartment in the Bronx to hunt down some milk.

However dramatic, most of the result on that day were the people stuck on the Subway or stuck because all of the traffic lights were out.  Of note, was that phone service was not impacted.  Economic impact was that people could not get to factories or their offices, but commerce outside of the region was only minimally impacted.  New procedures were put in place to protect against a recurrence such a massive blackout affecting over 10 million people.  However, these, and subsequent technology and procedure changes did not stop:
  • The July 13, 1997 blackout that mostly affected New York City
  • An even larger blackout August 14, 2003 that affected over 45 million in the USA and 10 million in Canada
At our recent EIS 2012 conference at the Global Learning Center at Georgia Tech, I pondered the question of the impact of an Internet blackout and its effect on our lives, commerce, and the safety of the nation.  This was triggered by observing the following as entered my flight from Dulles to Hartsfield-Jaskson:
The Internet Kill Switch on a airplane built before the Internet!
To put this into context, this is on a MacDonnell Douglass MD-88 aircraft.  The MD-88 deliveries started in 1987 with the last delivered in 1997.  So, this plane is around 20 years old.  Surrounded by potentiometer controls for sound, the INTERNET SWITCH MUST BE ON AT ALL TIMES label seems to be standout as a metaphor that the Internet surrounds us almost everywhere we go, even at 35,000 feet.  I just hope that this is never connected to anything related to the control of the plane!

At the conference I asked, what would be the impact if the Internet would fail during different eras.  I started with the 1980s:

The network before the Network
Clearly, in 1980, the impact of a failure of the Internet (really at this point the ARPANET, there being no Internet yet) was virtually zero.  The networks for the government (e.g., command and control, space flight, etc.) and the nascent networks for commercial purposes had essentially zero commonality.  In fact, most of the services were still provided by dedicated analog phone lines and the emerging digital transmission of the T1 and DS3 family.  The ability for nefarious activity, particularly sourced from one location, to affect these archaic and diverse systems was virtually zero.

I stepped through the 1990s (the common consensus was that nothing “happened” in the ‘90s), quickly pointing out the emergence of the Internet and the beginnings of an Internet economy.  However, even in the 1990s the impact of the Internet in the “off” position, especially early on, would be an inconvenience for many, but of major impact to only a very few.

Skimming the 2000s, I pointed out that in spite of Time Magazine’s cover, the millennium really started in 2001 (there was no year zero).  However, at this point, the Internet economy, although it burst a bubble, started to really change the way that people worked and lived.  It became an integral part of commerce, reaching customers, and radically new business models.

Now, as we reach the 2010s, the impact is everywhere.  There are around a billion devices that attach to the Internet.  Wireless data services are more pervasive in many countries in ways the wired infrastructure never reached.  In fact, even in the USA, more than 30% do not even have a dedicated home phone.  In fact, even those that still do, the technology in the background maybe using the same technology and network infrastructure as the Internet.  More than iPads and Android devices, Internet-based services have now moved directly into our cars and may again change business models (Mobile Technology Killing Satellite Radio). 

So, what happens now if the Internet Kill Switch is moved to the ON position (that is turning the Internet off)?  Of course the issues affecting the performance of the Internet could be a combination of several different varieties:
  • One of the Tier 1 carries has a common-mode failure that causes their part of the Internet to fail.  This would cause a significant disruption to customers and businesses, but one would believe that this would be taken care of fairly quickly.  In 1990, AT&T’s digital voice network had a several hour outage that was caused by issues in their Signaling System 7 (SS7) control network.  The root of the problem was that the same bad code (there is a difference between ‘=’ and ‘==’ in the programming language ‘C’) common across the SS7 nodes in the network.  Today, most Tier 1 providers “core” MPLS/IP networks are built using a single vendor.  Service providers test like crazy to find potential problems, and they treat all software upgrades with suspicion and perform significant testing.   Reloading and restarting the backbone network could take hours or a few days, but overall is under the control of the service provider and their vendors.
  • The “bad guys” find magic bullet packets, sequences, or other vulnerabilities in the Internet that can be used to cause significant routing problems to propagate around multiple Internet backbone providers through their peering points.  Other vectors include compromising the typical Out-Of-Band (OOB) control network that is used to configure and monitor network devices.  This could cause one or more Tier 1 providers to start closing peering points, resetting routers, or even have to take their network down to reload clean software and firmware on their routers.   The service provider would probably take steps to ensure the isolation of their management network, and then the restart of the backbone routers is under their and their vendor’s control.
  • The other Internet Kill Switch is the one that gets talked about is one under some sort of control of the government.  What criteria would be used? Even the first item above, one that has nothing to do with hacking or Internet terrorism, could look like an attack on the provider’s network.  Do we flip the Internet into the OFF position and cause self-inflicted damage?
Clearly, the Internet now is so pervasive that it has become an essential utility for virtually every aspect of life.  Even local Internet outages cause significant disruption in commerce.  However, if we go back Before the Internet (B.I.), the exact same was said about our phone system and our railroad network.  I am not that old, but maybe there was discussion about a Telephone System Kill Switch?  Or, do we make the generally valid assumption that major companies that form the Tier 1 structure of the Internet do the right thing?  That is, it is in their commercial interest (corporate existence) to make their service as robust and reliable as possible to keep and attract customers?

The top Internet providers should show how they prepare for serious events that are natural and man-made.  This includes providing time estimates of restoring services in a variety of situations, and how this is coordinated between the other major network providers.  This is the only way to estimate the risk to our daily Internet conveniences and more importantly to our country’s economic health.   For example, the plans put into place by network provides ensured that during the 2003 massive blackout their equipment remained powered, with emergency deliveries of fuel for generators essentially went along as planned.  Do we have as well coordinated plans among the service providers for other issues:
  • Route repair coordination and optical transport bandwidth sharing?
  • Point-of-presence facility failure?
  • Depot and sharing of routers and other equipment in an emergency?
  • Response to large scale attacks at peering points?
  • Coordination with government to ensure situation awareness and prioritization of restoration efforts?
  • Coordination to limit the cross Tier 1 impacts to major application service providers to protect e-commerce?
I am not looking forward to the time when the topic of the day is: “Where were you when the Internet went out?”.  In a future post, we'll look at "Where were you when the Cloud went out?".