andygrove opened a new pull request, #4645: URL: https://github.com/apache/datafusion-comet/pull/4645
## Which issue does this PR close? Part of #4502 (date/time expression audit follow-ups). Addresses the two high-priority items that are still actionable after the codegen-dispatch migration; the rest are now obsolete (see below). ## Rationale for this change Two of the high-priority findings in #4502 are stale EXPLAIN/compatibility-doc text that no longer matches runtime behavior: 1. `CometUnixTimestamp.getUnsupportedReasons` claimed `TimestampNTZType` is unsupported "because Comet incorrectly applies timezone conversion to TimestampNTZ values". The native `unix_timestamp` path has a dedicated `Timestamp(Microsecond, None)` branch that divides raw microseconds by `1_000_000` with no timezone conversion, which matches Spark (`ToTimestamp.eval` does raw `micros / MICROS_PER_SECOND` for both `TimestampType` and `TimestampNTZType`). The predicate (`isSupportedInputType`) and `getSupportLevel` already treat `TimestampNTZType` as supported; only the reason text was out of date. 2. `CometDateFormat.getCompatibleNotes` described the codegen dispatcher as "disabled (default)". `spark.comet.exec.scalaUDF.codegen.enabled` now defaults to `true`, so the note was inaccurate. Both strings feed the auto-generated compatibility documentation. ## What changes are included in this PR? - `CometUnixTimestamp.getUnsupportedReasons` now reads "Only `DateType`, `TimestampType`, and `TimestampNTZType` inputs are supported.", consistent with `isSupportedInputType` and `getSupportLevel`. - `CometDateFormat.getCompatibleNotes` now states that `spark.comet.exec.scalaUDF.codegen.enabled=true` is the default. The remaining high-priority items in #4502 are obsolete after the codegen-dispatch migration and the `scalaUDF.codegen.enabled` default flipping to `true`, so they are intentionally not changed here: - Item 2 (move `CometDateFormat` gating into `getSupportLevel`): marking non-UTC + whitelisted formats `Incompatible` would now route them to full Spark fallback under the default config instead of through the dispatcher, which produces Spark-identical results. That is a regression, not a fix. - Item 3 (mark Group A expressions `Incompatible` / surface the dispatcher flag in `getCompatibleNotes`): `CometCodegenDispatch` deliberately reports `Compatible()` and omits the flag note, and that is accurate now that the dispatcher is on by default. When it is disabled, `convert` returns `None` with a `withFallbackReason` that shows up in EXPLAIN. - Item 4 (`CometMakeTimestamp` ANSI): `CometMakeTimestamp` is now `CometCodegenDispatch[MakeTimestamp]`, so Spark's own `doGenCode` (which honors `failOnError`) runs in the kernel and throws under ANSI mode; when the dispatcher is disabled it falls back to Spark. ANSI behavior is correct in both configurations. ## How are these changes tested? These are documentation/EXPLAIN-text corrections with no behavior change. The native `unix_timestamp` TimestampNTZ path that the corrected text describes is covered by existing Rust unit tests in `native/spark-expr/src/datetime_funcs/unix_timestamp.rs`. `mvnw compile` is clean for the `spark` module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
