[PR] feat: Add Grace Hash Join operator with spill-to-disk support [datafusion-comet]

via GitHub Sat, 21 Feb 2026 07:18:09 -0800


andygrove opened a new pull request, #3564:
URL: https://github.com/apache/datafusion-comet/pull/3564


   ## Summary
   
   - Adds a new Grace Hash Join operator that partitions both build and probe 
sides into hash buckets, spilling to disk when memory is tight, then joining 
each partition independently
   - Supports all join types (Inner, Left, Right, Full, LeftSemi, LeftAnti) 
with build-side selection
   - Uses incremental Arrow IPC spill I/O via `SpillWriter` for efficient disk 
access
   - Includes recursive repartitioning (up to 3 levels) for oversized partitions
   - Adds `CometGraceHashJoinExec` Spark operator with production metrics 
(build/probe/join time, spill count/bytes)
   - Controlled by `spark.comet.exec.graceHashJoin.enabled` (default false) and 
`spark.comet.exec.graceHashJoin.numPartitions` (default 16)
   - Includes comprehensive deterministic tests, fuzz tests with 
ParquetGenerator, and join microbenchmarks comparing Spark SMJ, Comet SMJ, 
Comet Hash Join, and Comet Grace Hash Join
   
   ## Test plan
   
   - [x] All existing CometJoinSuite tests pass
   - [x] 8 new deterministic tests covering all join types, filters, data 
types, empty tables, self-join, build side, plan verification, multi-key
   - [x] 2 fuzz tests using ParquetGenerator with and without spilling
   - [x] Rust unit tests for basic join and no-overlap scenarios
   - [x] `cargo clippy` passes with no warnings
   - [x] Join microbenchmark added (`CometJoinBenchmark`)
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: Add Grace Hash Join operator with spill-to-disk support [datafusion-comet]

Reply via email to