erratic-pattern commented on issue #9415:
URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2635477969

   > I am not clear what additional benefit more direct tracing integration in 
datafusion would provide, but I may be missing something
   
   The `tracing` API is more granular than the information provided by the 
current metrics, so while it's possible to convert the existing metrics into 
`tracing` spans, there is some information that is inaccessible or impossible 
to trace at the moment.
   
    The main example I can think of is the exact timing of entering/exiting 
execution of the async tasks. The datafusion metrics record a "start" and "end" 
timestamp for the whole operator, but they do not record when operators await 
and give up control to the executor. 
   
   The tracing API allows for this because a span can be entered and exited 
multiple times before it is finally closed. This allows you to graph out 
exactly when async tasks are running in relation to each and for how long, 
which can be helpful for identifying bottlenecks where a task is waiting a long 
time for data from another task. 
   
   You can kinda use the `elapsed_compute` metric for this purpose, if you are 
only interested in identifying which operators are slow and fast, but the added 
granularity that you get from `tracing` would make it easier to visualize the 
actual path that control flow takes when tasks are pre-empted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to