Storage slowdown on Ceph platform
Incident Report for Civo
Resolved
The platform has remained stable over night and for today. We are now closing this off. Thank you for your patience and understanding on this matter.
Posted Jul 15, 2020 - 14:09 BST
Update
Ceph has now finished balancing data and the cluster is back in a healthy status again. We will continue to monitor the platform for the next 24 hours to make sure everything is working as expected. Once again we thank you all for your patience.
Posted Jul 14, 2020 - 15:32 BST
Update
The system is still balancing data over the remaining healthy disks. Access to instances running on this platform will still be slow as ceph is dealing with real world requests as well as trying to balance data over healthy disks. As soon as the requests for disk access are back to a reliable fast speed we will update this thread.
Posted Jul 14, 2020 - 13:29 BST
Update
Unfortunately another disk in the platform has failed. Data is rebalancing again now and requests to the platform are unfortunately going to be slow. We will update this thread when we have more information.
Posted Jul 14, 2020 - 10:33 BST
Monitoring
Ceph has now recovered itself and the platform is now fully accepting read and writes. We are still monitoring this as it is still very much in the early stages of being stable again. We will update this thread once we are happy Ceph is fully stable and everything is as fast as it should be. Once again we apologise for the inconvenience caused to yourseleves and appreciate your patience with this.
Posted Jul 14, 2020 - 09:25 BST
Update
We now have the ceph cluster out of an "Error" status and into a "Warning" status. The platform is now allowing read and writes again, however access is still very slow. Ceph is still migrating data over to healthy disks. We are having to let Ceph do the migration and do not want to intervene this as it could cause more issues. As soon as ceph is operating normally again we will update this thread. We are sorry for all of the disruption to users on this storage platform.
Posted Jul 14, 2020 - 07:38 BST
Update
The ceph platform is still currently trying to re-balance data over it's disks and it is currently blocking requests whilst it does so. We are trying to see if we are able to speed up the process now without causing any more harm to the recovery process. As soon as we have more information we will update this thread.
Posted Jul 14, 2020 - 04:22 BST
Update
Please during this time try to avoid restarting your instance if at all possible. Thanks
Posted Jul 13, 2020 - 23:57 BST
Identified
We are currently investigating a slowdown in storage access to our ceph platform. Customers on this platform may notice some slower than normal access requests to their instances. It appears some disks have failed on the platform and the data is currently rebalancing over the healthy disks. We will update this thread when more information is available. We apologise for any inconvenience caused.
Posted Jul 13, 2020 - 23:16 BST
This incident affected: Ceph.