geoffreyclaude opened a new pull request, #14547:
URL: https://github.com/apache/datafusion/pull/14547

   ## Which issue does this PR close?
   
   Relates to #9415. Does not fully close the issue, but moves forward with a 
pre-requisite.
   
   ## Rationale for this change
   
   This allows DataFusion to integrate with users of the 
[`tracing`](https://docs.rs/tracing/latest/tracing/) crate by propagating the 
trace context as users would expect, without investing in the full integration 
of the `tracing` ecosystem.
   When the (new) `tracing` feature is enabled, all tasks spawned on new 
threads (e.g. those spawned during repartitioning or while reading/writing 
Parquet files) inherit the current tracing span. This enhancement allows to 
propagate trace context through thread boundaries, into external data sources 
or custom exec nodes, and allows linking all generated logs and spans to the 
expected trace context.
   Previously, tasks spawned on new threads would lose the trace context, as it 
is thread-local and must be "manually" propagated to the new thread.
   
   ## What changes are included in this PR?
   
   - Update the common runtime so that tasks spawned on new threads are 
instrumented with the current tracing span when the `tracing` feature is 
enabled by wrapping the `tokio::task::JoinSet` in a custom `JoinSet`.
   - Add a new Cargo.toml feature (`tracing`) in the common-runtime crate, 
along with necessary dependency updates.
   - Provide an integration example in 
`datafusion-examples/examples/tracing.rs` that runs a SQL query over the 
`alltypes_tiny_pages_plain.parquet` file to demonstrate end-to-end propagation 
of the tracing context across multiple threads.
   - Update root `README.md` to reflect the availability and usage of the new 
`trace` feature.
   
   ## Are these changes tested?
   
   Yes. While there are no dedicated unit tests for this feature, the 
integration example in `datafusion-examples/examples/tracing.rs` serves as a 
comprehensive test. This example executes a query that triggers task spawns 
(such as through repartitioning and Parquet reading) and logs tracing output. 
By reviewing the logs, one can verify that the tracing span context is 
correctly propagated end to end.
   
   ## Are there any user-facing changes?
   
   No changes are expected for users who do not enable the `tracing` feature. 
The performance overhead *should* be inexistent when the feature is disabled, 
and completely negligible when enabled.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to