Yep. Absolutely. We are at the stage now (and this is something we are
looking at (and I have planned to this weekend) is to see why
auto-instrumentation of the open-telemetry in the PR of Melody's PR
does not seem to auto-instrument our Flask integration (we chose flask
as the first integration that should be "easy" but for whatever reason
auto-instrumetnation - even in the `--debug` mode of airflow - does
not seem to work despite everything seemingly "correct".

I plan to take a look today at it and we can discuss it in Melody's
PR. That would be fantastic if we could work on it together  :).

J.

On Sat, Jan 8, 2022 at 12:09 PM melodie ezeani <fluxi...@gmail.com> wrote:
>
> Hi nick,
>
> You can look at the PR or clone my Fork and try running in your local 
> environment and see if there’s any way we can improve on the 
> auto-instrumention
> Would love to get a feedback.
> Thank you
>
> On Sat, 8 Jan 2022 at 12:19 AM, <nick@shook.family> wrote:
>>
>> hi all, been lurking for a while - this is my first post.
>>
>> what I like about open telemetry is that you can send all telemetry traces 
>> to STDOUT (or any logs) which you can then pipe to many log forwarders of 
>> choice. imo this is the easiest way to set it up and a default that should 
>> work in the vast majority of airflow use cases.
>>
>> the PR looks like a great start! what can I do to help?
>> ---
>> nick
>>
>> On Jan 7, 2022, at 14:37, Elad Kalif <elad...@apache.org> wrote:
>>
>> Hi Howard,
>>
>> We actually have outreachy intern (Melodie) that is working on researching 
>> how open-telemetry can be integrated with Airflow.
>> Draft PR for demo : https://github.com/apache/airflow/pull/20677
>> This is an initial effort for a POC.
>> Maybe you can work together on this?
>>
>>
>> On Sat, Jan 8, 2022 at 12:19 AM Howard Yoo 
>> <howard....@astronomer.io.invalid> wrote:
>>>
>>> Hi all,
>>>
>>> I’m a staff product manager in Astronomer, and wanted to post this email 
>>> according to the guide from 
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals
>>>  .
>>>
>>> Currently, the main method to publish telemetry data out of airflow is 
>>> through its statsD implementation : 
>>> https://github.com/apache/airflow/blob/main/airflow/stats.py , and 
>>> currently airflow supports two flavors of stated, the original one, and 
>>> data dog’s dogstatsd implementation.
>>>
>>> Through this implementation, we have the following list of metrics that 
>>> would be available for other popular monitoring tools to collect, monitor, 
>>> visualize, and alert on metrics generated from airflow: 
>>> https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/metrics.html
>>>
>>> There are a number of limitations of airflow’s current implementation of 
>>> its metrics using stated.
>>> 1. StatsD is based on simple metrics format that does not support richer 
>>> contexts. Its metric name would contain some of those contexts (such as dag 
>>> id, task id, etc), but those can be limited due to the formatting issue of 
>>> having to be a part of metric name itself. A better approach would be to 
>>> utilizing ‘tags’ to be attached to the metrics data to add more contexts.
>>> 2. StatsD also utilizes UDP as its main network protocol, but UDP protocol 
>>> is simple and does not guarantee the reliable transmission of the payload. 
>>> Moreover, many monitoring protocols are moving into more modern protocols 
>>> such as https to send out metrics.
>>> 3. StatsD does support ‘counter,’ ‘gauge,’ and ‘timer,’ but does not 
>>> support distributed traces and log ingestion.
>>>
>>> Due to the above reasons, I have been looking at opentelemetry 
>>> (https://github.com/open-telemetry) as a potential replacement for 
>>> airflow’s current telemetry instrumentation. Opentelemetry is a product of 
>>> opentracing and open census, and is quickly gaining momentum in terms of 
>>> ‘standardization’ of means to producing and delivering telemetry data. Not 
>>> only metrics, but distributed traces, as well as logs. The technology is 
>>> also geared towards better monitoring cloud-native software. Many 
>>> monitoring tools vendors are supporting opentelemetry (Tanzu, Datadog, 
>>> Honeycomb, lightstep, etc.) and opentelemetry’s modular architecture is 
>>> designed to be compatible with existing legacy instrumentations. There are 
>>> also a stable python SDKs and APIs to easily implement it into airflow.
>>>
>>> Therefore, I’d like to work on proposing of improving metrics and telemetry 
>>> capability of airflow by adding configuration and support of open telemetry 
>>> so that while maintaining the backward compatibility of existing stated 
>>> based metrics, we would also have an opportunity to have distributed traces 
>>> and logs to be based on it, so that it would be easier for any 
>>> Opentelemetry compatible tools to be able to monitor airflow with richer 
>>> information.
>>>
>>> If you were thinking of a need to improve the current metrics capabilities 
>>> of airflow, and have been thinking of standards like Opentelemetry, please 
>>> feel free to join the thread and provide any opinions or feedback. I also 
>>> generally think that we may need to review our current list of metrics and 
>>> assess whether they are really useful in terms of monitoring and 
>>> observability of airflow. There are things that we might want to add into 
>>> metrics such as more executor related metrics, scheduler related metrics, 
>>> as well as operators and even DB and XCOM related metrics to better assess 
>>> the health of airflow and make these information helpful for faster 
>>> troubleshooting and problem resolution.
>>>
>>> Thanks and regards,
>>> Howard
>>
>>

Reply via email to