Re: [PR] feat: support multiply_ym_interval with YearMonth interval codegen dispatch [datafusion-comet]

via GitHub Wed, 01 Jul 2026 15:35:33 -0700


andygrove commented on PR #4715:
URL: 
https://github.com/apache/datafusion-comet/pull/4715#issuecomment-4860573590


   Thanks for this. Using the codegen dispatcher for `multiply_ym_interval` is 
the right call, since it runs Spark's own generated code and so matches Spark 
by construction, including the half-up rounding and the overflow behavior. The 
multiplier coverage in the SQL test is nice and broad (int, long, float, 
double, decimal, plus the half-up rounding case and the numeric-on-the-left 
normalization).
   
   A few things worth looking at:
   
   **Rebase on `main`.** This looks like it was branched before #4541 landed, 
and #4541 already added the YearMonth and DayTime interval type plumbing plus 
`make_ym_interval` via codegen dispatch. After a rebase, these pieces here 
become duplicates and will conflict: `types.proto`, `serde.rs`, `Utils.scala`, 
`serializeDataType`, `isSupportedDataType`, the `CometBatchKernelCodegenOutput` 
changes, `CometMakeYMInterval` and its registration, and the `make_ym_interval` 
row in `expressions.md`. Stripping those would leave just the multiply-specific 
work: `MultiplyYMInterval`, the interval **input** support in 
`CometBatchKernelCodegenInput` (which #4541 did not touch and which multiply 
genuinely needs), the interval literals in `literals.scala` and `planner.rs`, 
and the test. That would make the diff much easier to review. It should also 
clear the failing preflight, which is currently tripping on `installation.md`, 
a file this PR does not touch, which is a symptom of the branch being stale
 .
   
   **Is the `supportedDataType` change needed?** #4541 deliberately kept 
intervals out of `QueryPlanSerde.supportedDataType` because codegen dispatch 
does not consult it (it serializes the return type via `serializeDataType`). 
Adding `YearMonthIntervalType` there is a broader claim: it tells Comet that YM 
interval columns can flow through any native operator (scan, filter, project, 
sort, shuffle), not just the dispatch path, and nothing here exercises those 
paths. Is it actually required for `multiply_ym_interval`? If it is, a test 
that pushes a YM interval column through a plain projection or a shuffle would 
give confidence the FFI round-trip is correct. If it is not, dropping it keeps 
the scope tight.
   
   **Overflow test.** `MultiplyYMInterval` always throws on overflow via 
`multiplyExact` / `toIntExact` / `intValueExact`, independent of ANSI mode. The 
dispatched code should raise the same exception, but it is not covered. Could 
we add an `expect_error` case near `Int.MaxValue` to confirm Comet surfaces the 
same error as Spark rather than falling back or wrapping? A null-interval input 
case such as `make_ym_interval(NULL, m) * 2` would also round out the null 
coverage, which currently only exercises null multipliers.
   
   One small heads-up: the `make_ym_interval` row in `expressions.md` also 
changes in #4790, so expect a conflict there.
   
   On benchmarks: since this runs in the JVM via dispatch rather than as a 
native Rust kernel, I do not think the usual native microbenchmark expectation 
applies here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: support multiply_ym_interval with YearMonth interval codegen dispatch [datafusion-comet]

Reply via email to