Yep. Absolutely. We are at the stage now (and this is something we are looking at (and I have planned to this weekend) is to see why auto-instrumentation of the open-telemetry in the PR of Melody's PR does not seem to auto-instrument our Flask integration (we chose flask as the first integration that should be "easy" but for whatever reason auto-instrumetnation - even in the `--debug` mode of airflow - does not seem to work despite everything seemingly "correct".
I plan to take a look today at it and we can discuss it in Melody's PR. That would be fantastic if we could work on it together :). J. On Sat, Jan 8, 2022 at 12:09 PM melodie ezeani <fluxi...@gmail.com> wrote: > > Hi nick, > > You can look at the PR or clone my Fork and try running in your local > environment and see if there’s any way we can improve on the > auto-instrumention > Would love to get a feedback. > Thank you > > On Sat, 8 Jan 2022 at 12:19 AM, <nick@shook.family> wrote: >> >> hi all, been lurking for a while - this is my first post. >> >> what I like about open telemetry is that you can send all telemetry traces >> to STDOUT (or any logs) which you can then pipe to many log forwarders of >> choice. imo this is the easiest way to set it up and a default that should >> work in the vast majority of airflow use cases. >> >> the PR looks like a great start! what can I do to help? >> --- >> nick >> >> On Jan 7, 2022, at 14:37, Elad Kalif <elad...@apache.org> wrote: >> >> Hi Howard, >> >> We actually have outreachy intern (Melodie) that is working on researching >> how open-telemetry can be integrated with Airflow. >> Draft PR for demo : https://github.com/apache/airflow/pull/20677 >> This is an initial effort for a POC. >> Maybe you can work together on this? >> >> >> On Sat, Jan 8, 2022 at 12:19 AM Howard Yoo >> <howard....@astronomer.io.invalid> wrote: >>> >>> Hi all, >>> >>> I’m a staff product manager in Astronomer, and wanted to post this email >>> according to the guide from >>> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals >>> . >>> >>> Currently, the main method to publish telemetry data out of airflow is >>> through its statsD implementation : >>> https://github.com/apache/airflow/blob/main/airflow/stats.py , and >>> currently airflow supports two flavors of stated, the original one, and >>> data dog’s dogstatsd implementation. >>> >>> Through this implementation, we have the following list of metrics that >>> would be available for other popular monitoring tools to collect, monitor, >>> visualize, and alert on metrics generated from airflow: >>> https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/metrics.html >>> >>> There are a number of limitations of airflow’s current implementation of >>> its metrics using stated. >>> 1. StatsD is based on simple metrics format that does not support richer >>> contexts. Its metric name would contain some of those contexts (such as dag >>> id, task id, etc), but those can be limited due to the formatting issue of >>> having to be a part of metric name itself. A better approach would be to >>> utilizing ‘tags’ to be attached to the metrics data to add more contexts. >>> 2. StatsD also utilizes UDP as its main network protocol, but UDP protocol >>> is simple and does not guarantee the reliable transmission of the payload. >>> Moreover, many monitoring protocols are moving into more modern protocols >>> such as https to send out metrics. >>> 3. StatsD does support ‘counter,’ ‘gauge,’ and ‘timer,’ but does not >>> support distributed traces and log ingestion. >>> >>> Due to the above reasons, I have been looking at opentelemetry >>> (https://github.com/open-telemetry) as a potential replacement for >>> airflow’s current telemetry instrumentation. Opentelemetry is a product of >>> opentracing and open census, and is quickly gaining momentum in terms of >>> ‘standardization’ of means to producing and delivering telemetry data. Not >>> only metrics, but distributed traces, as well as logs. The technology is >>> also geared towards better monitoring cloud-native software. Many >>> monitoring tools vendors are supporting opentelemetry (Tanzu, Datadog, >>> Honeycomb, lightstep, etc.) and opentelemetry’s modular architecture is >>> designed to be compatible with existing legacy instrumentations. There are >>> also a stable python SDKs and APIs to easily implement it into airflow. >>> >>> Therefore, I’d like to work on proposing of improving metrics and telemetry >>> capability of airflow by adding configuration and support of open telemetry >>> so that while maintaining the backward compatibility of existing stated >>> based metrics, we would also have an opportunity to have distributed traces >>> and logs to be based on it, so that it would be easier for any >>> Opentelemetry compatible tools to be able to monitor airflow with richer >>> information. >>> >>> If you were thinking of a need to improve the current metrics capabilities >>> of airflow, and have been thinking of standards like Opentelemetry, please >>> feel free to join the thread and provide any opinions or feedback. I also >>> generally think that we may need to review our current list of metrics and >>> assess whether they are really useful in terms of monitoring and >>> observability of airflow. There are things that we might want to add into >>> metrics such as more executor related metrics, scheduler related metrics, >>> as well as operators and even DB and XCOM related metrics to better assess >>> the health of airflow and make these information helpful for faster >>> troubleshooting and problem resolution. >>> >>> Thanks and regards, >>> Howard >> >>