Thanks Jarek, I'm happy I could help On Thu, Feb 3, 2022 at 5:07 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> Actually - maybe even bring it to the state in the 3rd decade of > the century ;) > > On Thu, Feb 3, 2022 at 5:05 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Hello everyone, >> >> Just to give some information on the progress and plans in the >> Open-Telemetry area. >> >> I just had a talk with Howard, and we are going to work together on the >> AIP proposal on "whys", "hows", and "whats" of the Open-Telemetry for >> Airflow. >> >> We have enough information already from the POC work done by Melodie >> during the internship regarding the "technical capabilities" of the >> OpenTelemetry and the ways it can be integrated with Airflow so that I >> think when we join the "Product" vision from Howard and my understanding of >> the internals of the OpenTelemetry and Airflow, we can come up with a good >> proposal that might be a great base for discussion and implementation. >> >> We will be working on the proposal together - if there is anyone who >> would like to join now - do let us know and we will join you. But I think >> relatively soon we will publish an AIP proposal that will start the "real" >> discussion. I know there are many people interested so we might add a >> dedicated channel in slack and maybe run a couple of demos/presentations of >> the proposal before we send it up for voting. >> >> Looking forward to getting this one sorted out, I think we have a chance >> together to bring Airflow telemetry to the state in-sync with the state of >> the telemetry in the 2nd decade of XXIst century ;) >> >> Thanks Melodie for all the investigation and research there! This >> internship was a really great start and gave me a lot of confidence on the >> next steps we can take there. >> >> J. >> >> >> On Wed, Jan 12, 2022 at 4:21 AM Howard Yoo >> <howard....@astronomer.io.invalid> wrote: >> >>> I am very much interested in how we can improve >>> Not only the instrumentation by using OpenTelemetry, but also >>> Think about how we can make the existing metrics list better. >>> >>> For example, perhaps in the future, maybe we can provide things like how >>> much CPU, memory, and disk I/O a task instance is using, by utilizing >>> python’s plutil package as mentioned here in ( >>> https://stackoverflow.com/questions/16326529/python-get-process-names-cpu-mem-usage-and-peak-mem-usage-in-windows), >>> because local task jobs are essentially subprocesses. By utilizing >>> OpenTelemetry, we could even collect Host metrics and platform metrics >>> that’s outside of the boundary of airflow easier - and even have them >>> collected by the OTEL collector agent at the same time. >>> >>> I would be very happy if this internship project can also include >>> Collecting metrics in addition to the Tracing, and think about how it >>> can be extended to cover more than what’s provided out of the box. >>> >>> - Howard >>> >>> On 2022/01/10 21:22:51 Jarek Potiuk wrote: >>> > > Also, I do have a feedback that current metrics list and what they >>> track are not really that useful >>> > >>> > Fully agree. >>> > >>> > > (I mean, there is so much that one can do for metrics like operator >>> failures and ti failures - since they don’t post any context specific >>> information) - so while we may be working with making OpenTelemetry >>> available for airflow, we might also investigate and try improvements on >>> reviewing these metrics and really verify whether these metrics are >>> helpful, and if there can be additional metrics that we can instrument >>> while doing this. >>> > >>> > Oh yeah. >>> > >>> > > I think when we are designing for the distributed traces on Airflow, >>> we should also work on defining what kind of traces would be useful and how >>> to come up with better name convention etc. to make things clear and easy >>> to understand, etc.. >>> > >>> > Absolutely! I think we have a very clear "separation" and actually >>> > "complementary" work that we should indeed do together! >>> > >>> > 1) From the "internship project" that we do together with Melody, the >>> > focus is more on the engineering side - "how we can easily integrate >>> > open-telemetry" with Airflow - seamlessly and in a modular fashion and >>> > in the way that will be easy to use and test in "development >>> > environment". It is more about solving all engineering obstacles with >>> > integration (for example what we learn now is that Open Telemetry >>> > requires some custom code to account for a "forking" model. Also about >>> > exposing a lot of low-level metrics that are not airflow specific >>> > (flask, db access etc - something that really allows to debug "any" >>> > application deployment, not only Airflow). Then we thought about >>> > simply adding the "current" metrics that we have in statsd as custom >>> > ones. >>> > >>> > * And I understand that your focus is - more "how we can actually make >>> > a really useful set of Airflow metrics" which is ideally complementing >>> > the "engineering" part - once we get OT fully integrated we can add >>> > not only (or maybe even not at all) the current metrics but, once you >>> > help defining "better" metrics, we can simply implement them in OT - >>> > including some example dashboards etc. >>> > >>> > Happy to collaborate on that! >>> > >>> > J. >>> > >>> > >>> > > - Howard >>> > > >>> > >>> >>