Skype offered a detailed explanation Wednesday for the “critical failure” that prevented millions of Skye users worldwide from placing or receiving calls last week. According to the VoIP operator, a bug in an older version of the Skype for Windows client software caused 25 to 30 percent of the service’s supernodes to fail.
Like any other peer-to-peer (P2P) network, Skype relies on supernodes with the ability to take on additional responsibilities compared to regular nodes by acting like a directory, supporting other Skype clients and creating local clusters, noted Skype CIO Lars Rabbe.
“Once a supernode has failed, even when restarted, it takes some time to become available as a resource to the P2P network again,” Rabbe wrote in a blog. “As a result, the P2P network was left with 25 percent to 30 percent fewer supernodes than normal,” which caused “a disproportionate load on the remaining available supernodes.”
Outages Are Inevitable
To recover the core system functionality as quickly as possible, Skype utilized resources normally dedicated to supporting group video calling, using them to deploy supernodes, Rabbe explained. “Over the course of Thursday night and Friday morning, we returned these to their normal use and restored group video calling functionality in time for Christmas,” Rabbe explained.
Last week’s outage underlined the risks involved with P2P-based communication services. “Many users felt lost, often also in corporations that rely on Skype for communication with remote clients and employees,” noted Gartner Vice President Andrea Di Maio. “I guess that we have to learn,” he wrote, that “nothing will ever be 100 percent reliable.”
Online outages are inevitable — as the service disruptions that occurred at Facebook, Twitter and Google’s Gmail service have already demonstrated this year. Still, Skype’s latest snafu came as an unpleasant pre-Christmas surprise for many business users, which Di Maio found…