Sky didn’t fall but Google’s cloud did; Massive outage allegedly triggered by ‘configuration change’

by WorldTribune Staff, June 4, 2019

The Internet was having a bad day on June 2, and a DrudgeReport headline suggested the U.S. Government was involved. As with anything involving bureaucracy and high technology, the answer to the question what happened got lost in a cloud of words.

A massive outage affected major tech brands that use Google Cloud as well as Google services including YouTube, Gmail, Google Search, G Suite, Google Drive, and Google Docs.

ZeroHedge tweeted that “The notification gateway to advise Google that cloud is down, is also on the cloud… and is down.”

Facebook, Instagram and Snapchat were also affected and, if that wasn’t bad enough, Pokemon Go was down.

Benjamin Treynor Sloss, Google’s VP of engineering, said in a blogpost that the root cause of the outage was a configuration change for a small group of servers in one region being wrongly applied to a larger number of servers across several neighboring regions.

“Overall, YouTube measured a 10 percent drop in global views during the incident, while Google Cloud Storage measured a 30 percent reduction in traffic,” said Sloss.

An “especially annoying” side effect of the Google Cloud’s downtime “was that Nest-branded smart home products for some users just failed to work,” Fast Company noted. “According to reports from Twitter, many people were unable to use their Nest thermostats, Nest smart locks, and Nest cameras during the downtime. This essentially meant that because of a cloud storage outage, people were prevented from getting inside their homes, using their AC, and monitoring their babies.”

The downtime “goes a long way to showing what can happen in an age when smart home technology requires always being connected to the cloud,” Fast Company said.

Sloss noted that, while Google’s engineers detected the issue “within seconds,” it took “far longer” than its target of a few minutes to remediate the problem, in part because the network congestion hampered engineers’ ability to restore the correct configurations.

Additionally, as one Google employee explained in a HackerNews post, the disruption took down internal tools that Google engineers had been using to communicate with each other about the outage.


Intelligence Brief __________ Replace The Media