Hi everyone,
TL;DR

SRE will be co-owning, together with Thomas Chin of Data Platform
Engineering, the service-utils
<https://www.mediawiki.org/wiki/Service-utils> Node.js utilities library,
service-runner <https://www.mediawiki.org/wiki/Service-runner>'s spiritual
successor and replacement. If you are thinking about starting a new Node.js
service to be deployed in production, using service-utils is expected to
greatly reduce friction, speed up development and allow you to focus on
your business needs instead of how to satisfy SRE requirements for getting
the service deployed. If you already own one or more services that is (are)
service-runner powered, consider migrating to service-utils at your
convenience.
Background

Service-runner <https://github.com/wikimedia/service-runner> is a library
that provides generalized runtime facilities for Node.js services,
including:


   -

   a standard worker cluster setup with restarts,
   -

   a generalized YAML config format with support for running multiple
   services in a single process,
   -

   runtime facilities for
   -

      logging
      -

      metrics reporting
      -

      rate limiting.


Usage in Foundation produced code and micro services is thought to have
increased the productivity of developer and SRE teams by abstracting and
automating away, the important but unrelated to the actual goals of the
developer teams, requirements mentioned above. Ongoing support from the
owning team, which is no longer extant, the Services team, also played a
pivotal role.

Out of the 20 NodeJS apps (powering 25 services), 15 currently use it
(those not using it are either pioneering service-utils already or are
trying out other solutions).

Unfortunately, it is also abandoned. Node.js developers have noticed and
the Phabricator workboard
<https://phabricator.wikimedia.org/project/view/1062/> makes this evident,
with Task Titles like "service-runner has vulnerable and outdated
dependencies", "service-runner depends on preq, a wrapper of request, which
is deprecated". Some efforts have been made to find a replacement.

Thomas Chin has been kind enough to create service-utils, a library
designed to be compatible (if not a drop-in replacement in many cases) with
service-runner. However, maintaining this library isn't Thomas' nor the
Data Platform Engineering team's mission. This has, unsurprisingly, led to
lower than wished for (by SRE at least) adoption.
Next steps

SRE will be, starting in Q2 2025-2026


   -

   Becoming more familiar with the code base
   -

   Updating documentation as needed.
   -

   Helping with ongoing maintenance of the library
   -

   Providing help and guidance for migration from service-runner to
   service-utils
   -

   Going through the backlog of work tracked in the corresponding
   Phabricator work boards for the 2 projects and implement/resolve
   bug/decline/stall as deemed appropriate, always after discussions with
   relevant stakeholders.
   -

   Contributing the following features they are interested in:
   -

      Abstracting away talking to the service-mesh
      -

      Finish testing and rolling out support for Open Telemetry
      -

   Announcing the full deprecation of service-runner, archival of code
   repositories and removal from library repositories (i.e. npm) when all, in
   scope, code bases have been migrated over.

FAQIs this a full drop-in replacement?

No. It's close enough for most use cases. However, service-runner is a
decade old code base. Some of the functionalities it provides either no
longer make sense in 2025 (e.g. worker cluster setup in a Kubernetes
environment is not needed), made assumptions about the Infrastructure that
no longer apply (e.g. rate limiting) or rely on entirely abandoned and
non-salvageable  libraries (e.g. kad that implements rate limiting). Those
are not and will not be supported in service-utils.
When should my team migrate to/adopt this?

If you start a new Node.Js service in the WMF, go for this from day 1. If
you already run a service-runner powered service, at your earliest
convenience.
Who do I talk to if I want some functionality implemented before I adopt
this?

Faster path forward is probably to come to either #wikimedia-sre on IRC or
#talk-to-sre in Slack. SRE Service operations will respond.
Is this tracked in the annual plan?

Yes, most work is already under KR WE6.2, aka Production Readiness
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2025-2026/Product_%26_Technology_OKRs#Accelerate_Path_to_Product_Outcomes_(WE6)>.
Some parts of the work described above will be ongoing and tracked under
Annual Essential Work
_______________________________________________
Wikitech-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to