dannyl1u opened a new issue, #42881: URL: https://github.com/apache/airflow/issues/42881
### Description From @ferruzzi: > Currently when you add a new metric to the codebase, you must also manually update the docs page. The docs page inevitably gets out of date and misses some details. We want an automated system to generate the docs page based on the actual metrics. There are also known instances where the same metric is being created and emitted in more than one place, causing duplicate data. These will have to be fixed manually and an automated check might possibly (stretch goal?) include checking for same or ”too similar” names while collecting the names for the docs page. > Phase 1 > Situation: > We support multiple different Metrics backends [0]. The two main ones are StatsD and OpenTelemetry. This is managed though an interface class [1] which is implemented for each backend (examples: StatsD[2] and OTel[3]). StatsD was the only supported version well into Airflow 2.x and the entire codebase was designed with StatsD in mind so it was a good chunk of work to abstract it out and there are a few remaining tasks to perfect the new implementation. > Task 1: > StatsD has a name length limit of around 300 characters. OTel limits names to 34 characters, but allows tagging. Our temporary solution was to emit almost everything twice, once in the long format for StatsD and again in the short format with tags for OTel. We also had to add code [4] to make sure the name is safe for OTel, and other hacks to make it work. > The first task in this project is to understand the difference in how the two implementations handle their names and them add a "get_name" method to the interface: `def get_name(metric_name: str, tags: dict[str: str])`. In the statsd_logger [2] implementation it will concatenate the tags onto the name and in the OTel implementation it will just return name. > Once that is implemented, it can be used in the various emit methods (incr, decr, etc) instead of all the name validation code, and search the code for places where we are emitting things more than once and clean it up. > Example: > You can see an example in local_task_job_runner [5]. We emit local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code>` for StatsD but that results in a name too long for OTel so we also emits `local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code>`, and the name validation method [4] in the OTel implementation catches the one that is too long and just swallows it. What we should do instead is pass incr() the name and the tags and let StatsD and OTel handle them accordingly. > [0] https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#metric-descriptions > [1] https://github.com/apache/airflow/blob/main/airflow/metrics/base_stats_logger.py > [2] https://github.com/apache/airflow/blob/main/airflow/metrics/statsd_logger.py > [3] https://github.com/apache/airflow/blob/main/airflow/metrics/otel_logger.py > [4] https://github.com/apache/airflow/blob/main/airflow/metrics/otel_logger.py#L128 > [5] https://github.com/apache/airflow/blob/main/airflow/jobs/local_task_job_runner.py#L352 ### Use case/motivation From @ferruzzi: > Currently when you add a new metric to the codebase, you must also manually update the docs page. The docs page inevitably gets out of date and misses some details. We want an automated system to generate the docs page based on the actual metrics. There are also known instances where the same metric is being created and emitted in more than one place, causing duplicate data. These will have to be fixed manually and an automated check might possibly (stretch goal?) include checking for same or ”too similar” names while collecting the names for the docs page. ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org