Hello!

Next week we'll be rebuilding and upgrading the hardware that provides DNS service to cloud-vps and toolforge.  These rebuilds will start at 14:00 UTC and the whole process may take 2-3 hours. It's likely that DNS lookups will be somewhat slower as clients fail over between the in-progress and the working server.  In theory there should be few other user-facing effects from these upgrades.

In practice, though, this isn't something that we've done for quite a while, and touching DNS is always risky since it underlies pretty much everything. Here are some things to be ready for:

- As a precaution we'll be disabling Horizon during the window to prevent new VMs or DNS changes landing in an inconsistent state.

- Some badly-behaved DNS clients won't fail over properly and will report errors when their primary DNS server is down.

- Puppet will almost certainly experience transient failures, since Puppet is known to be one of those badly-behaved clients.

- If things go very badly there may be periods of total DNS outage which will result in many WMCS-hosted services failing. There's no particular reason that this /should/ happen, but this is the worst-case scenario.

For additional context, the phabricator task for this work is https://phabricator.wikimedia.org/T253780


- Andrew + the WMCS team


_______________________________________________
Wikimedia Cloud Services announce mailing list
cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to