This incident is now over and services should be working as normal,
although file access may be a bit slow while Ceph rebalances and recovers.
The original cause seems to have been a bad optical cable in the
datacenter. We're preparing an incident doc and I'll send that along in
a followup email.
-Andrew + wmcs team
On 6/11/24 10:15 AM, Andrew Bogott wrote:
There is as-of-yet undiagnosed issue with our storage system (ceph)
which is causing serious failures throughout cloud-vps and toolforge.
Multiple people are working on the issue, so watch this list for
updates. Sorry for the downtime!
-Andrew + wmcs team
_______________________________________________
Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/