Hello everyone,

Please join us in celebrating a very successful Datacenter Switchover.
This switch to our data center in Texas was run by Scott French, the
newest addition to the SRE Service Operations team. This instance of
the Switchover continues the tradition of successful switchovers and
was completed without a hitch with a read only period of 2 minutes 46
seconds

For context, the Site Reliability Team (SRE) runs a planned data
center switchover periodically, moving all wikis from our primary data
center in (for this instance, Virginia) to the secondary data center
(for this instance, Texas). This is an important periodic test of our
tools and procedures, to ensure the wikis will continue to be
available even in the event of major technical issues. It also gives
all our SRE and ops teams a chance to do maintenance and upgrades on
systems that normally run 24 hours a day.

The switchover process requires a brief read-only period for all
Foundation-hosted wikis, which started at 14:58 UTC on Wednesday
September 25th, lasting 2 minutes and 46 seconds. All our public and
private wikis continued to be available for reading as usual. Users
saw a notification of the upcoming maintenance, and anyone still
editing was asked to try again in a few minutes.

As with the previous Switchover, I 've been trying to discern the
effect of the Switchover in many of the graphs we have to monitor the
infrastructure in https://grafana.wikimedia.org/. In most, it's
impossible to spot the event. We consider this very nice and attribute
it to various improvements done throughout the years from many teams,
in and outside SRE.

This switchover is our first where external and internal traffic flows
exclusively to MediaWiki on Kubernetes, a fact that makes me
personally pretty happy.

As per our newer process, we no longer have a Switchback. We will be
staying in Virginia as our primary data center for the next 6 months,
switching back to Virginia on Wednesday, March 19. Per the same
process, we 'll also be in Single DC for the next week, going back to
MultiDC on Wednesday October 2nd.

As always, my deepest thanks to all people that have helped with this,
in one way or another, ranging from the person running point, to all
SREs and developers/deployers participating or having contributed, to
people in Movement Communications for helping with the messaging.

To report any issues, you can reach us in #wikimedia-sre on IRC, or
file a Phabricator ticket with the datacenter-switchover tag; we'll be
monitoring closely for reports of trouble (If you're new to Phab,
there's more information at Phabricator/Help.) The switchover,
preparation as well as followup actions are tracked in Phabricator
Task T370962

-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to