cyber Archives - AstrotalkUK

Suddenly, at 16:51 on 4th October 2021, Facebook disappeared from the Internet for all the 3 billion users no matter where in the world they were. There was no warning, and the experience was identical for the head of a large commercial organisation as it was for a first-year university student using a low-cost android phone. Users of Instagram and WhatsApp, also owned by Facebook, suffered the same experience. The outrage started at 16:50 BST and returned at 22:20 BST. The impact was high because Facebook, a single company, is so large.

Facebook Availability. Source Cloudflare

The “what and why” is gradually emerging. The most surprising thing for me is that t was NOT a cyber attack. There was no malicious software, no ransomware, no Ddos and no hackers or disgruntled former employees. However, by chance, just before the outage, a former Facebook employee in the US now a whistleblower, Frances Haugen was providing testimony to Congress that Facebook prioritised profit over harm to children.

Facebook explained on their 259-word blog post the cause, “Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication”. Many independent sources provided an explanation including Reuters and Cloudflare.

The failure that prevented users from accessing Facebook also obstructed Facebook engineers attempting to fix it. Apparently, the systems used by Facebook for physical and logical access to its own buildings were also affected by the same outage.

In simple terms, the error involved two of the internet’s many interconnected sub-systems. The Domain Name System (DNS) and the Border Gateway Protocol. The DNS converts a URL like facebook.com to an IP address of a server (one of many around the world) hosting the Facebook application. The BGP provides routing information services on the Internet. In this case, it allows data from one Facebook Datacenter in say South Africa to find another in Norway.

Like signs on the motorway, the BGP provides drivers’ directions for their destination. The “configuration change” that went wrong on 4th October, meant that suddenly all the motor signs (the BGP) went blank (and DNS could no longer see Facebook). The drivers could not see how to get to their destination and the traffic came to a halt.

Although the outage lasted for just 6 hours, it had a huge global impact on individuals, businesses and governments that rely on Facebook for communication, data transfer, payments and education.

Facebook did not explain why this update, something they would have done many times in the past, went awry. It is unclear if this was a planned or unscheduled update nor why there was no simple regression mechanism in place for exactly these eventualities.

However, independent security specialists cannot rule the possibility of sabotage or other sinister activity.

This outage was limited to one company, albeit with a huge user base. A similar outage for Google, Amazon or Apple would potentially have a larger impact, affecting many more applications and businesses. The internet was designed and built around TCP/IP (Transaction Control / Internet Protocol). It has resilience at its core. That resilience still stands. This incidence illustrates the age-old problem of too many eggs (users) in a single basket (Facebook).

Quick update.

Down detector recorded a further Facebook outage for a few hours starting late on October 8th in to the early hours of the 9th. This was a far less significant outage that lasted just a coupe of hours and probably had a differrent cause thsn Monday’s. Here is how CNN reported it.

Facebook has provided a further update explaining the 4th October outage.

A new year and a new cyber threat. This time the vulnerabilities are baked into the design of microprocessors delivering most of the IT services on the planet. Virtually, all devices, independent of operating systems or installed applications could be affected. It is not just the laptops and PC but almost all devices including tablets, smartphones, virtual servers and impact all vendors including Microsoft, Google, Amazon and Apple.

The vulnerabilities come from serious security flaws in “speculative execution” a technique that enhances the performance of modern processors made by Intel, AMD and ARM. The vulnerabilities with their snazzy names, Meltdown and Spectre were discovered and reported to microprocessor manufactures in June 2017 by Google’s Project Zero team along two papers (Spectre and Meltdown) published by independent researchers around the same time. The difference between Spectre and Melton is summarised by https://meltdownattack.com/ as

“Meltdown breaks the mechanism that keeps applications from accessing arbitrary system memory. Consequently, applications can access system memory. Spectre tricks other applications into accessing arbitrary locations in their memory. Both attacks use side channels to obtain the information from the accessed memory location.”

Spectre is not easy to exploit but has no fix. Meltdown is arguably the more critical of two because it can be exploited in the Cloud Computing environment. Over the last decade, Cloud Computing services have blossomed and now deliver most of the popular applications used by online consumers, governments and industry. Multi tenanting is the mechanism by which cloud service providers can share computer resources (including processor, memory, storage) between multiple customers whilst ensuring secure segregation between them. Meltdown has the potential to undermine this fundamental principle of user segregation in a cloud-based service. Attackers in one cloud-based tenant can exploit Meltdown to access and download data (at 503 KB/s) that they are not authorised to do from a neighbouring tenant.

The flaws were due to be publicised next week but some news agencies, including The Register, published on Monday 2^nd January. The vulnerabilities are not easy to exploit and according to the NCSC no exploits have yet been reported, but eventually, the cost of the fix will be humongous. The proposed software fixes reduce CPU process by around 20% to 30%. Commercially that may be too high a price to pay. At the present, the ultimate fix appears to be a hardware one.

Facebook Outage – The bigger they are the harder they fall

New Year and New Cyber Vulnerabilities – Spectre and Meltdown

Share this:

Share this: