tl;dr: a random assortment of security group/firewall rules were briefly
broken which caused networking issues for some cloud-vps services. This
is now resolved.
This outage began at 12:51 PMUTC, was mostly resolved at 13:25 UTC and
full resolved at 13:35UTC.
Full (still short) story:
As part of ongoing efforts to make all of cloud-vps work with IPv6, we
just now ran an automated script to expand existing securitygroup rules
to ipv6 access. A bug in that script effectively destroyed existing
rules rather than updating them, which caused many open doors to
unceremoniously slam shut.
In order to recover from this as quickly as possible, we restored a
backup of the network database from earlier today (04:28 UTC). This
restored service but also wiped out any networking changes that might
have been made during that time. This means that if you happen to have
changed your security groups since then, you will need to repeat your
changes.
The still-in-progress incident report can be found here:
https://wikitech.wikimedia.org/wiki/Incidents/2025-05-07_cloud-vps_security_groups_deleted
- Andrew +WMCS staff, Slightly embarassed
_______________________________________________
Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/