This incident is now over and services should be working as normal, although file access may be a bit slow while Ceph rebalances and recovers.

The original cause seems to have been a bad optical cable in the datacenter. We're preparing an incident doc and I'll send that along in a followup email.

-Andrew + wmcs team


On 6/11/24 10:15 AM, Andrew Bogott wrote:
There is as-of-yet undiagnosed issue with our storage system (ceph) which is causing serious failures throughout cloud-vps and toolforge.

Multiple people are working on the issue, so watch this list for updates.  Sorry for the downtime!

-Andrew + wmcs team

_______________________________________________
Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to