Status Updates - Dediserve

RCA - Singapore HV1 issues

Scheduled on 18/04/2018 22:00:00 Status Resolved Fault / Issue Estimated finish 19/04/2018 06:00:00

Dear Dediserve Singapore users,

Some of you may have experienced trouble connecting to your VMs in our Singapore datacentre last night between 22:00GMT and 06:00GMT. Apologies for the interruption, below details the issue faced at that time.

What was the issue?:
At 22:00GMT on the 18th April the BL460c compute blade that handled the role of "Hypervisor 1" began reporting critical issues with its motherboard. As a result, we had to act immediately and shut down the Hypervisor to begin replacement and ensure integrity of service.

What was the root cause?:
A host (Hypervisor 1) motherboard critical state indicating imminent failure of hardware.

Why was there downtime?:
To ensure integrity of data we had to power down the host immediately whilst the data stored therein was still in a good and viable state, attempting to migrate whilst motherboard issues were flagging could have been disastrous if power failed during migration.

What has been done to address this?:
The previous blade has been replaced with an upgraded and new host, now with dual 12 core CPUs rather than dual 8 cores.

Why did this take the time it did?:
We had to first ensure the integrity of all stored data before ordering the host be replaced, our engineers then had to arrive on-site to perform the work. Once complete, we again verified the data before bringing the VMs back online.

What was the impact?:
The impact was purely VM unavailability, due to the diligence of Dediserve staff during this period the integrity of all data associated with this host was ensured.

Will this happen again?:
All Dediserve hardware is rigorously tested prior to deployment, and whilst every effort is made to avoid hardware failures - they do still occur. It is very rare to see a motherboard degrade instantly from OK to critical and we do not expect to see this issue again with this host.

Again, apologies for the issues experienced.

RCA - Singapore HV1 issues

Related servers / services