erratic-pattern commented on issue #9415: URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2635477969
> I am not clear what additional benefit more direct tracing integration in datafusion would provide, but I may be missing something The `tracing` API is more granular than the information provided by the current metrics, so while it's possible to convert the existing metrics into `tracing` spans, there is some information that is inaccessible or impossible to trace at the moment. The main example I can think of is the exact timing of entering/exiting execution of the async tasks. The datafusion metrics record a "start" and "end" timestamp for the whole operator, but they do not record when operators await and give up control to the executor. The tracing API allows for this because a span can be entered and exited multiple times before it is finally closed. This allows you to graph out exactly when async tasks are running in relation to each and for how long, which can be helpful for identifying bottlenecks where a task is waiting a long time for data from another task. You can kinda use the `elapsed_compute` metric for this purpose, if you are only interested in identifying which operators are slow and fast, but the added granularity that you get from `tracing` would make it easier to visualize the actual path that control flow takes when tasks are pre-empted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org