Hello everyone,

Short version:

We will be upgrading the eqiad Wikikube kubernetes
<https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters#WikiKube> cluster
to 1.31 on Wednesday 2025-10-01 starting at 10:00 UTC
<https://zonestamp.toolforge.org/1759312800>, ending at 15:00 UTC
<https://zonestamp.toolforge.org/1759330800>.

Toolhub will be down during this maintenance.

If you are deploying services to the eqiad Wikikube kubernetes cluster:

   -

   Deployments will be unavailable during the maintenance. DO NOT DEPLOY.
   -

   SRE will redeploy all services
   -

   SRE will announce the end of maintenance, at which point the cluster
   will be usable again

---

Object: Kubernetes upgrade to 1.31

Target: eqiad Wikikube cluster

Maintenance window: 2025-10-01 10:00
<https://zonestamp.toolforge.org/1759312800>-15:00
<https://zonestamp.toolforge.org/1759330800> UTC

Tracking task: Phabricator at ⚓T405703 Update wikikube eqiad to kubernetes
1.31 <https://phabricator.wikimedia.org/T405703>

Operational channel: IRC #wikimedia-sre
<https://web.libera.chat/gamja/?nick=Guest#wikimedia-sre>, announcements
will be made to IRC #wikimedia-operations
<https://web.libera.chat/gamja/?nick=Guest#wikimedia-operations>

Operating team: SRE ServiceOps (contact IRC #wikimedia-serviceops
<https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>)

Impact:

Users:

   -

   Toolhub will be down for the duration of the window.
   -

   No user impact for other services.

Deployers:

   -

   Deployments to the target cluster will be unavailable. This includes
   MediaWiki backports and deployments. DO NOT DEPLOY.
   -

   The following deployment windows are cancelled:
   -

      Services: Citoid/Zotero 11:00 UTC
      <https://zonestamp.toolforge.org/1759316400>
      -

      UTC Afternoon Backport Window 13:00 UTC
      <https://zonestamp.toolforge.org/1759330800>
      -

      Wikifunctions Services UTC Afternoon 14:00 UTC
      <https://zonestamp.toolforge.org/1759327200>

Process:

All steps handled by SRE ServiceOps

   -

   Maintenance start is announced on #wikimedia-operations and as reply to
   this email chain
   -

   All deployments are stopped
   -

   SRE ServiceOps ensures all current versions of deployments can be safely
   deployed
   -

   Maintenance begins and should take a couple of hours
   -

   Toolhub downtime starts
   -

   Cluster is wiped and upgraded
   -

   Toolhub is redeployed first to minimize downtime
   -

   Toolhub downtime stops
   -

   SRE ServiceOps redeploys all target cluster services
   -

   Maintenance end is announced on #wikimedia-operations and as reply to
   this email chain
   -

   Deployments resume

Rationale:

The date was chosen for convenience as due to the data center switchover
process <https://wikitech.wikimedia.org/wiki/Switch_Datacenter>, eqiad is
currently fully depooled, receiving almost no traffic. eqiad is scheduled
to be repooled on 2025-10-02 <https://zonestamp.toolforge.org/1759417200>,
which would complicate the upgrade. With eqiad already drained, we expect
no visible user impact.

SRE ServiceOps will be checking that all services can be safely deployed
before the maintenance, and will be redeploying all services before marking
the cluster as usable. Deployers are not required to  re-deploy their
services, unless they have been informed to do so by SRE ServiceOps.

During last week’s switchover <https://phabricator.wikimedia.org/T399891>,
Toolhub remained in eqiad. This means that there will be an expected
unavoidable small downtime of a few hours. To minimize Toolhub’s downtime,
we will prioritize its redeployment during the initialization phase.



Thank you for your understanding and support! If you have any questions
regarding this process, please respond to this email, comment on
Phabricator at ⚓T405703 Update wikikube eqiad to kubernetes 1.31
<https://phabricator.wikimedia.org/T405703>, or reach out directly to me
(IRC nickname claime on #wikimedia-serviceops
<https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>).

On behalf of SRE ServiceOps,

-- 
Clément 'claime' Goubert (they/them)
Senior SRE
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to