andygrove opened a new pull request, #4645:
URL: https://github.com/apache/datafusion-comet/pull/4645

   ## Which issue does this PR close?
   
   Part of #4502 (date/time expression audit follow-ups). Addresses the two 
high-priority items that are still actionable after the codegen-dispatch 
migration; the rest are now obsolete (see below).
   
   ## Rationale for this change
   
   Two of the high-priority findings in #4502 are stale 
EXPLAIN/compatibility-doc text that no longer matches runtime behavior:
   
   1. `CometUnixTimestamp.getUnsupportedReasons` claimed `TimestampNTZType` is 
unsupported "because Comet incorrectly applies timezone conversion to 
TimestampNTZ values". The native `unix_timestamp` path has a dedicated 
`Timestamp(Microsecond, None)` branch that divides raw microseconds by 
`1_000_000` with no timezone conversion, which matches Spark 
(`ToTimestamp.eval` does raw `micros / MICROS_PER_SECOND` for both 
`TimestampType` and `TimestampNTZType`). The predicate (`isSupportedInputType`) 
and `getSupportLevel` already treat `TimestampNTZType` as supported; only the 
reason text was out of date.
   
   2. `CometDateFormat.getCompatibleNotes` described the codegen dispatcher as 
"disabled (default)". `spark.comet.exec.scalaUDF.codegen.enabled` now defaults 
to `true`, so the note was inaccurate.
   
   Both strings feed the auto-generated compatibility documentation.
   
   ## What changes are included in this PR?
   
   - `CometUnixTimestamp.getUnsupportedReasons` now reads "Only `DateType`, 
`TimestampType`, and `TimestampNTZType` inputs are supported.", consistent with 
`isSupportedInputType` and `getSupportLevel`.
   - `CometDateFormat.getCompatibleNotes` now states that 
`spark.comet.exec.scalaUDF.codegen.enabled=true` is the default.
   
   The remaining high-priority items in #4502 are obsolete after the 
codegen-dispatch migration and the `scalaUDF.codegen.enabled` default flipping 
to `true`, so they are intentionally not changed here:
   
   - Item 2 (move `CometDateFormat` gating into `getSupportLevel`): marking 
non-UTC + whitelisted formats `Incompatible` would now route them to full Spark 
fallback under the default config instead of through the dispatcher, which 
produces Spark-identical results. That is a regression, not a fix.
   - Item 3 (mark Group A expressions `Incompatible` / surface the dispatcher 
flag in `getCompatibleNotes`): `CometCodegenDispatch` deliberately reports 
`Compatible()` and omits the flag note, and that is accurate now that the 
dispatcher is on by default. When it is disabled, `convert` returns `None` with 
a `withFallbackReason` that shows up in EXPLAIN.
   - Item 4 (`CometMakeTimestamp` ANSI): `CometMakeTimestamp` is now 
`CometCodegenDispatch[MakeTimestamp]`, so Spark's own `doGenCode` (which honors 
`failOnError`) runs in the kernel and throws under ANSI mode; when the 
dispatcher is disabled it falls back to Spark. ANSI behavior is correct in both 
configurations.
   
   ## How are these changes tested?
   
   These are documentation/EXPLAIN-text corrections with no behavior change. 
The native `unix_timestamp` TimestampNTZ path that the corrected text describes 
is covered by existing Rust unit tests in 
`native/spark-expr/src/datetime_funcs/unix_timestamp.rs`. `mvnw compile` is 
clean for the `spark` module.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to