Connectivity down in OSL
Incident Report for Servebolt
Postmortem

On Monday, May 4th at approximately 18:07 we experienced a network outage at our Oslo data center. This resulted in service disruption for all websites hosted on this data center.

With any significant event that affects our customers, we conduct an extensive examination to understand the root cause and develop a course of action to improve our systems and procedures. To that end, we wanted to provide a synopsis of the situation that occurred and our reassurance that we are working diligently to proactively mitigate and prevent future outages.

Here's what happened

A microblad switch in our Oslo data center crashed causing internal network loops. Which, in turn, resulted in a partial failure outage for certain servers in our data center.

A large portion of the servers came back online automatically immediately. A small portion did not. We had identified the root of the issue at 18:28 and at 18:36 we had implemented a fix. At 19:18 the fix had allowed for all the servers to be back at full capacity.

Here's what we're doing

We are in the process of modifying the current network architecture to prevent or reduce the impact of any device failure by improving our monitoring and failover triggering. We're doing this by adding additional core switches to considerably strengthen the network.

Outages disrupt your life and your business. We understand and we take our responsibility to you very seriously.

Please allow me to take this opportunity to thank you for your business and provide my personal assurance that we are dedicated to meeting our commitment to you.

Sincerely,

Erlend Eide
CEO

Servebolt.com

Posted May 07, 2020 - 10:34 CEST

Resolved
This incident has been resolved.
Posted May 04, 2020 - 20:53 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 04, 2020 - 19:18 CEST
Update
Most servers are now back up, but we are still missing a few.
Posted May 04, 2020 - 18:36 CEST
Identified
The issue has been identified and a fix is being implemented.
Posted May 04, 2020 - 18:32 CEST
Investigating
We are currently investigating this issue.
Posted May 04, 2020 - 18:07 CEST
This incident affected: admin.servebolt.com and Servebolt Cloud OSL.