TLDR

Graphite is deprecated.  Existing metrics are being actively migrated to
Prometheus.

The SRE Observability team asks that no new metrics be deployed to
Graphite, as the service will be transitioned to a read-only state by EoQ3
FY2024/25 (~Spring 2025).

For additional details, please see the tracking task
https://phabricator.wikimedia.org/T228380 or the roadmap here:
https://wikitech.wikimedia.org/wiki/Graphite/Deprecation_Roadmap.

------------------------------

Context

The SRE Observability team has been operating Prometheus in production for
several years, offering several operational benefits over Graphite. After a
long period of observation and usage, the team has determined that
migrating MW off Graphite ensures we stay ahead with a supported, scalable
metrics platform for more effective, multidimensional metrics analysis and
storage.

Prometheus also provides more robust data labeling, storage, and querying
capabilities. This initiative is fundamental to unifying our metrics,
enhancing monitoring, improving MW observability, and reducing tool
fragmentation.
Background

Last (fiscal) year, the SRE Observability team set out to test whether a
new metrics interface was viable <https://phabricator.wikimedia.org/T240685>
[1] and determined that long-term platform sustainability demanded
migrating MediaWiki metrics to Prometheus
<https://wikitech.wikimedia.org/wiki/Prometheus> [2]. To this end, the team
decided to utilize StatsLib, a new, internally developed,
Prometheus-capable metrics interface
<https://phabricator.wikimedia.org/T350592> [3].

By the end of FY23/24 Q2, the team had successfully tested the component in
production, and by the end of FY23/24 Q4, it had advanced the migration by
about 42% in total metrics volume; we are set on reaching ~60% migration by
the end of Q1 FY 2024/2025. See Graphite metrics volume migration dashboard
<https://grafana.wikimedia.org/d/nCxX65cSk/mediawiki-statslib-migration?orgId=1>.
[4]

Working to improve MW ecosystem sustainability, we are setting goals to
complete the migration of active, production, and in-use (by
dashboards/alerts) metrics to Prometheus this fiscal year.

For scoping purposes, we define as “in-use” any metric emitted to graphite
mapped to a dashboard panel or alert active in Grafana. See Graphite
Utilization Dashboard
<https://grafana.wikimedia.org/d/K6DEOo5Ik/grafana-graphite-datasource-utilization?orgId=1>
[5] for details.

RFC on Prometheus as a better interface for MW metrics: T249164
<https://phabricator.wikimedia.org/T249164> [6].
Notice and Action Required

The team plans to enable read-only mode on the Graphite cluster by the end
of Q3 FY 2024/2025 and begin the formal deprecation of Graphite in
production <https://phabricator.wikimedia.org/T228380> [7].

We are asking all teams and maintainers to review
https://phabricator.wikimedia.org/T350592 [3] and related subtasks, claim
metrics, and components under their care, disable/remove any unused metrics
and dashboards first, then follow the migration process outlined in the
task for any in-use relevant metrics before the end of Q3 FY 2024/2025
(March 2025). After this date, Graphite will be read-only, and no new data
will be ingested.

Graphite will continue to be available for another year to provide
historical data in read-only “mode” while new history is recorded in
Prometheus. For additional details, please see the tracking task T228380
<https://phabricator.wikimedia.org/T228380> [7] or roadmap here:
https://wikitech.wikimedia.org/wiki/Graphite/Deprecation_Roadmap [8].

Related Links:

[1] https://phabricator.wikimedia.org/T240685

[2] https://wikitech.wikimedia.org/wiki/Prometheus

[3] https://phabricator.wikimedia.org/T350592

[4]
https://grafana.wikimedia.org/d/nCxX65cSk/mediawiki-statslib-migration?orgId=1

[5]
https://grafana.wikimedia.org/d/K6DEOo5Ik/grafana-graphite-datasource-utilization?orgId=1

[6] https://phabricator.wikimedia.org/T249164

[7] https://phabricator.wikimedia.org/T228380

[8] https://wikitech.wikimedia.org/wiki/Graphite/Deprecation_Roadmap


thank you,
Leo Mata.

*Leo Mata* (he/him)
Engineering Manager - Observability
Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Wikitech-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to