tl;dr: a random assortment of security group/firewall rules were briefly broken which caused networking issues for some cloud-vps services. This is now resolved.

This outage began at 12:51 PMUTC, was mostly resolved at  13:25 UTC and full resolved at 13:35UTC.

Full (still short) story:

As part of ongoing efforts to make all of cloud-vps work with IPv6, we just now ran an automated script to expand existing securitygroup rules to ipv6 access. A bug in that script effectively destroyed existing rules rather than updating them, which caused many open doors to unceremoniously slam shut.

In order to recover from this as quickly as possible, we restored a backup of the network database from earlier today (04:28 UTC). This restored service but also wiped out any networking changes that might have been made during that time. This means that if you happen to have changed your security groups since then, you will need to repeat your changes.

The still-in-progress incident report can be found here: https://wikitech.wikimedia.org/wiki/Incidents/2025-05-07_cloud-vps_security_groups_deleted

- Andrew +WMCS staff, Slightly embarassed
_______________________________________________
Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to