[PR] perf: Pre-resolve type dispatch in sort-merge join comparators [datafusion]

via GitHub Fri, 20 Feb 2026 07:58:45 -0800


andygrove opened a new pull request, #20452:
URL: https://github.com/apache/datafusion/pull/20452


   ## Summary
   
   - Replace per-row runtime `DataType` matching in `is_join_arrays_equal()` 
and `compare_join_arrays()` with a `JoinComparator` struct that resolves typed 
comparison function pointers once during `SortMergeJoinStream` construction
   - Eliminates the overhead of matching on 20+ `DataType` variants for every 
row comparison in the merge loop
   - The `JoinComparator` provides two methods:
     - `compare()` — for merge-loop ordering decisions (streamed vs buffered 
advance)
     - `is_equal()` — for buffered batch key-group expansion
   
   ## Benchmark Results
   
   Best of 3 iterations across 20 SMJ benchmark queries (`cargo run --release 
-p datafusion-benchmarks --bin dfbench -- smj`):
   
   | Query | Description | Baseline (ms) | Optimized (ms) | Change |
   |-------|-------------|---------------|----------------|--------|
   | Q1 | INNER 100K×100K, 1:1 | 3.16 | 2.22 | **-29.9%** |
   | Q2 | INNER 100K×1M, 1:10 | 10.41 | 10.02 | -3.8% |
   | Q3 | INNER 1M×1M, 1:100 | 53.06 | 55.55 | +4.7% |
   | Q4 | INNER 100K×1M, 1:10, 1% filter | 3.33 | 3.20 | -4.2% |
   | Q5 | INNER 1M×1M, 1:100, 10% filter | 11.41 | 11.85 | +3.9% |
   | Q6 | LEFT 100K×1M, 1:10 | 10.10 | 9.97 | -1.3% |
   | Q7 | LEFT 100K×1M, 1:10, 50% filter | 11.75 | 11.91 | +1.4% |
   | Q8 | FULL 100K×100K, 1:10 | 2.53 | 2.52 | -0.4% |
   | Q9 | FULL 100K×1M, 1:10, 10% filter | 11.32 | 11.02 | -2.7% |
   | Q10 | LEFT SEMI 100K×1M, 1:10 | 4.42 | 4.35 | -1.6% |
   | Q11 | LEFT SEMI 100K×1M, 1:10, 1% filter | 3.97 | 3.99 | +0.5% |
   | Q12 | LEFT SEMI 100K×1M, 1:10, 50% filter | 59.28 | 59.15 | -0.2% |
   | Q13 | LEFT SEMI 100K×1M, 1:10, 90% filter | 4.67 | 4.50 | -3.6% |
   | Q14 | LEFT ANTI 100K×1M, 1:10 | 4.40 | 4.34 | -1.6% |
   | Q15 | LEFT ANTI 100K×1M, 1:10, partial | 4.42 | 4.36 | -1.4% |
   | Q16 | LEFT ANTI 100K×100K, 1:1, stress | 2.14 | 2.21 | +3.3% |
   | Q17 | INNER 100K×5M, 1:50, 5% filter | 8.86 | 7.75 | **-12.5%** |
   | Q18 | LEFT SEMI 100K×5M, 1:50, 2% filter | 8.07 | 8.03 | -0.5% |
   | Q19 | LEFT ANTI 100K×5M, 1:50, partial | 19.52 | 18.88 | -3.3% |
   | Q20 | INNER 1M×10M, 1:100 + GROUP BY | 533.16 | 559.54 | +4.9% |
   
   The biggest wins are on comparison-dominated workloads (Q1: 1:1 join, Q17: 
filtered 1:50 join). High-cardinality joins (Q3, Q5, Q20) where output 
construction dominates show no significant change.
   
   ## Test plan
   
   - [x] All 48 `sort_merge_join` unit tests pass
   - [x] `cargo fmt` clean
   - [x] `cargo clippy` clean (zero warnings)
   - [x] Benchmark comparison shows no regressions beyond noise
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: Pre-resolve type dispatch in sort-merge join comparators [datafusion]

Reply via email to