andygrove opened a new pull request, #3986:
URL: https://github.com/apache/datafusion-comet/pull/3986

   ## Which issue does this PR close?
   
   Closes #3985.
   
   ## Rationale for this change
   
   `CometSortMergeJoinExec` rejected `TimestampType` join keys via 
`supportedSortMergeJoinEqualType`, forcing a fall-back to Spark whenever a 
SortMergeJoin keyed on a timestamp column. The restriction was historical — 
`TimestampNTZType` was already allowed but `TimestampType` had never been 
enabled.
   
   Naively flipping the guard surfaces an upstream limitation: DataFusion's SMJ 
comparator (`joins/sort_merge_join/stream.rs` `is_join_arrays_equal`, 
`joins/utils.rs` `compare_join_arrays`) only pattern-matches `Timestamp(_, 
None)`. Spark's `TimestampType` serializes to `Timestamp(µs, "UTC")`, hitting 
the fallthrough `not_impl_err!`. Because Spark stores `TimestampType` as 
UTC-normalized microseconds, casting `Timestamp(µs, Some(_))` → `Timestamp(µs, 
None)` is a metadata-only operation that preserves both equality and row order, 
so it is safe to apply at the join-key level.
   
   ## What changes are included in this PR?
   
   - **JVM guard**: add `TimestampType` to `supportedSortMergeJoinEqualType` in 
`operators.scala`.
   - **Native planner workaround**: in the `OpStruct::SortMergeJoin` arm of 
`planner.rs`, introduce a `strip_timestamp_tz` helper that wraps any 
`Timestamp(_, Some(_))` join-key expression in a `CastExpr` to `Timestamp(_, 
None)` before handing the keys to `SortMergeJoinExec::try_new`. `sort_options` 
and the child plans are untouched.
   - **Tests**: replace the obsolete "falls back to Spark" test in 
`CometJoinSuite` with four positive tests exercising the new path:
     - Inner join with `Asia/Kathmandu` session timezone.
     - `LEFT OUTER`, `RIGHT OUTER`, `FULL OUTER` variants.
     - Composite `(string, timestamp)` join key.
     - Nullable timestamp keys (both inner and full-outer variants).
   
   ## How are these changes tested?
   
   All four new tests use `checkSparkAnswerAndOperator` with 
`classOf[CometSortMergeJoinExec]` to assert both correctness against Spark and 
native execution. The full `CometJoinSuite` passes (13/13). `cargo clippy 
--all-targets --workspace -- -D warnings` is clean and `make format` reports no 
changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to