andygrove opened a new pull request, #3734: URL: https://github.com/apache/datafusion-comet/pull/3734
## Which issue does this PR close? No issue. This adds observability to the shuffle write path. ## Rationale for this change The shuffle write path has existing metrics for `repart_time` (partition ID computation), `encode_time` (IPC + compression), and `write_time` (disk I/O), but a large portion of shuffle write time was unaccounted for. Profiling TPC-H 100GB revealed that `interleave_record_batch` (the gather step that pulls rows from buffered batches into per-partition output) accounts for 55% of total shuffle write time, but this was invisible in the existing metrics. ## What changes are included in this PR? Adds two new timing metrics to the shuffle write path: - **`gather_time`**: Time spent in `interleave_record_batch`, which gathers rows from buffered batches into per-partition batches during the write phase. This is the most expensive step in the shuffle write path (55.4% of total time in TPC-H 100GB benchmarks). - **`coalesce_time`**: Time spent in `BatchCoalescer`, which merges small batches before IPC serialization (1.4% of total time). These metrics are wired through from the Rust native code to Spark SQL metrics, appearing alongside existing shuffle metrics in the Spark UI. ### TPC-H 100GB Shuffle Write Breakdown (with new metrics) | Component | Time | % of shuffle write | |-----------|------|--------------------| | gather/interleave | 702.9s | 55.4% | | encoding + compression | 302.9s | 23.9% | | write to disk | 75.8s | 6.0% | | repartition (hash) | 74.1s | 5.8% | | batch coalescing | 17.4s | 1.4% | | unaccounted | 95.2s | 7.5% | The timing overhead is negligible (~0.3s out of 1268s total, using `Instant::now()` which is ~50ns per call). ## How are these changes tested? Existing shuffle tests (`CometNativeShuffleSuite`, `CometExecSuite`) continue to pass. The metrics are additive — they don't change any behavior, only add timing instrumentation to existing code paths. Verified with TPC-H 100GB benchmark that metrics are correctly reported and sum to expected totals. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
