andygrove commented on PR #4715: URL: https://github.com/apache/datafusion-comet/pull/4715#issuecomment-4860573590
Thanks for this. Using the codegen dispatcher for `multiply_ym_interval` is the right call, since it runs Spark's own generated code and so matches Spark by construction, including the half-up rounding and the overflow behavior. The multiplier coverage in the SQL test is nice and broad (int, long, float, double, decimal, plus the half-up rounding case and the numeric-on-the-left normalization). A few things worth looking at: **Rebase on `main`.** This looks like it was branched before #4541 landed, and #4541 already added the YearMonth and DayTime interval type plumbing plus `make_ym_interval` via codegen dispatch. After a rebase, these pieces here become duplicates and will conflict: `types.proto`, `serde.rs`, `Utils.scala`, `serializeDataType`, `isSupportedDataType`, the `CometBatchKernelCodegenOutput` changes, `CometMakeYMInterval` and its registration, and the `make_ym_interval` row in `expressions.md`. Stripping those would leave just the multiply-specific work: `MultiplyYMInterval`, the interval **input** support in `CometBatchKernelCodegenInput` (which #4541 did not touch and which multiply genuinely needs), the interval literals in `literals.scala` and `planner.rs`, and the test. That would make the diff much easier to review. It should also clear the failing preflight, which is currently tripping on `installation.md`, a file this PR does not touch, which is a symptom of the branch being stale . **Is the `supportedDataType` change needed?** #4541 deliberately kept intervals out of `QueryPlanSerde.supportedDataType` because codegen dispatch does not consult it (it serializes the return type via `serializeDataType`). Adding `YearMonthIntervalType` there is a broader claim: it tells Comet that YM interval columns can flow through any native operator (scan, filter, project, sort, shuffle), not just the dispatch path, and nothing here exercises those paths. Is it actually required for `multiply_ym_interval`? If it is, a test that pushes a YM interval column through a plain projection or a shuffle would give confidence the FFI round-trip is correct. If it is not, dropping it keeps the scope tight. **Overflow test.** `MultiplyYMInterval` always throws on overflow via `multiplyExact` / `toIntExact` / `intValueExact`, independent of ANSI mode. The dispatched code should raise the same exception, but it is not covered. Could we add an `expect_error` case near `Int.MaxValue` to confirm Comet surfaces the same error as Spark rather than falling back or wrapping? A null-interval input case such as `make_ym_interval(NULL, m) * 2` would also round out the null coverage, which currently only exercises null multipliers. One small heads-up: the `make_ym_interval` row in `expressions.md` also changes in #4790, so expect a conflict there. On benchmarks: since this runs in the JVM via dispatch rather than as a native Rust kernel, I do not think the usual native microbenchmark expectation applies here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
