I think that is a good point but I have a conceptually different
proposal. From the coding perspective it might be similar in terms of
implementation and complexity, but long term it will much better
reflect the way how "Airflow as a Platform" currently is implemented.

My proposal is  very close to what was at the center of our recent
discussions on the OpenLineage integration. I think we should  revert
the architecture.

IMHO rather than what we currently have as a common 'stats" interface
should be deprecated and open-telemetry API should be THE common
metrics API airflow should use as "common interface" for metrics (and
anyone who would like to use Airflow metrics should implement). Why
should we define our own "Stats" API, and chose which implementation
should handle this?  We already have Open Telemetry API and by Open
Telemetry configuration we can use which collector to use in order to
export the metrics.

So just to rephrase it - our current Stats/Datadog Statsd
implementation should be merely just our own simple custom
OpenTelemetry Collectors - they should collect the metrics which
airflow sends via OTEL API and send them to Stats/DatadogStats in the
same way current metrics are sent. But those should only be used for
backwards compatibility reasons - we should not aim to implement a
generic "fully-featured/reusable" statsd or DataDog Statsd collector -
just provide a bare minimum that mimics current behaviour for
backwards compatibility. That should be considerably smalller task.

If we do it this way, then it is rather simple, I think. OpenTelemetry
provider might or might not be a separate provider (we might not need
it eventually that should provide "open telemetry" functionality
in-airflow core - initialization, configuration some common code
etc.). This might well be in `airflow.otel` package - no need for
separate provider there I think.

Then any external entity (including provider packages) might provide a
collector implementation that will collect the metrics and export
them. In our case (for backported Statsd/DatadogStatsd)- those would
be StatsD and DataDog providers that will provide a collector that
might be configured as the collector used by Airflow's OpenTelemetry.
But the assumption is that the existing collectors for those who
already integrated with Open Telemetry (Grafana, New Relic, Amazon,
Google) should already have collector, that the users should be able
to just configure and use, so for exmaple I do not even expect
anything in AWS provider to collect the metrics from airflow - there
should be an existing CloudWatch OTEL collector, that should collect
the metrics and send them to CloudWatch. The most that should be -
possibly - in AWS provider is the documentation how to enable the
collector for CloudWatch and dependency to pull the cloudwatch
collector package.

I hope what I am writing makes sense :).

J.

On Thu, Mar 9, 2023 at 10:42 PM Ferruzzi, Dennis
<ferru...@amazon.com.invalid> wrote:
>
> Hi folks.  I am working on adding support for OpenTelemetry based on AIP-49 
> and I think we have come to a point where it is worth discussing options.
>
> Currently `airflow/stats.py`[1] contains classes for a base/NoStats option as 
> well as Statsd and DataDog, and I will be adding in another option for OTel.  
> I think it is getting to the point where we may want to break these out like 
> we do with provider packages and let the users install only the metrics 
> backend(s) they want.
>
>
> If we do, then where would they live?  Should they fall in with the service 
> provider packages at `airflow/providers/{statsd | datadog | otel}/`, or maybe 
> a new location like `airflow/stats/providers/{statsd | datadog | otel}`?  If 
> we make the move, then we would also need to sort out how to handle the 
> change.  Perhaps the provider packages for the existing options should be 
> bundled with core for now and later moved to fully-separated like all the 
> other provide packages?
>
> I'd love to hear what you folks think.
>  - ferruzzi
>
> [1] https://github.com/apache/airflow/blob/main/airflow/stats.py
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to