On Wednesday, March 9th at approximately 08:15 CET we experienced a specific server outage at our London data center. This resulted in service disruption for all websites hosted on this server.
With any significant event that affects our customers, we conduct an extensive examination to understand the root cause and develop a course of action to improve our systems and procedures. To that end, we wanted to provide a synopsis of the situation that occurred and our reassurance that we are working diligently to proactively mitigate and prevent future outages.
Earlier that day, at 04:10 CET we performed urgent security updates to the Linux kernel. We perform these kinds of kernel security updates quite frequently and usually don't last any longer than 10 to 30 seconds. We were actively monitoring the server's performance, but the server was performing as expected.
We have performed the same update on about 40 servers already without any problems, but after about 4 hours at approximately 08:15 CET MariaDB started crashing, ramping up to full outage at 09:10 CET.
Our Operations team started working on the problem, but it quickly became evident that the MariaDB logs had been corrupted.
In the time that followed we initiated a full restore from the backup server to a spare server in case the data turned out to be permanently damaged. In the meantime Operations was continuously working on recovering the corrupted databases. At 11:20 CET we were able to successfully confirm the full recovery of 97% of the affected databases. The remaining 3% unfortunately had to be restored from the backup server.
At 13:00 CET all databases were restored and recovered and the incident closed.
In our research into the root cause of the issue we've identified it as incompatible firmware versions. Going forward, we will be adding additional steps in ensuring incompatibilities are mitigated, and taken care off separate from emergency security updates.
Outages disrupt your life and your business. We understand and we take our responsibility to you very seriously. We sincerely apologize for the disruption and the inconveniences this likely has caused you.
Please allow me to take this opportunity to thank you for your business and provide my personal assurance that we are dedicated to meeting our commitment to you.