Following up on the discussion in the call this morning. We were running tight on time and I didn't want to derail the remaining conversation, but wanted to follow up here. There will be a separate thread on whether we intend to keep StatsD, either as a default or the secondary, or remove it entirely. That's a whole other discussion and someone else will be starting that thread. But I wanted to cover a related aspect of that which would likely derail that conversation and may be useful before that discussion.
Context: When I added the OTel Metrics support, I ran into an issue where OTel metric names have a much shorter character limit than StatsD metric names. This is at least in part because OTel supports tags and StatsD does not (out of the box). So any metric which was being emitted in StatsD but whose name was too long for OTel is getting double-emitted under the old (long) name and the new (truncated, but with tags) name side by side. Example: The StatsD metric name local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code> has several values embedded in it which can each be an unknown length. OTel can't emit that name but can emit metric local_task_job.task_exit with the job_id, dag_id, task_id, and return_code values attached as a dict called 'tags'. This means that there are a number of places throughout the code where we are creating multiple metrics for the same thing with slightly different names. Proposed solution: If we decide to drop StatsD entirely, then there is no issue, we just go through and remove the longer names wherever this happened. If we choose to keep StatsD support, then my proposed solution is to add an abstract method to BaseStatsLogger called something like "build_name()" which is implemented in each of the metrics loggers (StatsD, OTel, and DataDog). In the case of OTel (and DataDog? I'm not sure on this one and would need to confirm), it would just return the name as provided and the metric can be emitted with tags. In the StatsD class, it would return the name with any tags concatenated onto it; something like `return f"{self.name}_{"_".join([f"{key}_{value}" for key, value in self.tags.items()])}`. Then we can keep support for both options with predictable names (python dicts have been ordered since 3.(6? 8?) so the names should always parse out predictably) and simplify the codebase accordingly. The main down side is that it would almost certainly break at least some existing StatsD names, but maybe with 3.0 that is less of a concern. One positive side effect of doing it this way is that every metric will be created with tags so if we add a mandatory "description" field to the metric object then we could auto-generate the table on the metrics doc page. [1] [1] https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html - ferruzzi