andygrove opened a new pull request, #4649:
URL: https://github.com/apache/datafusion-comet/pull/4649
## Which issue does this PR close?
Closes #.
## Rationale for this change
The expression compatibility guide is auto-generated by
`GenerateDocs.scala`. For every `Incompatible` expression it printed a sentence
like:
> The following incompatibilities cause `Second` to fall back to Spark by
default. Set `spark.comet.expression.Second.allowIncompatible=true` to enable
Comet acceleration despite these differences.
This is no longer accurate for expressions that opt into the
`CodegenDispatchFallback` trait. Those expressions do not fall back to Spark by
default. They stay in Comet's native pipeline via the JVM codegen dispatcher
(running Spark's own generated code) and match Spark exactly. For these
expressions, `allowIncompatible=true` switches to the faster native
implementation that carries the listed differences, rather than enabling Comet
acceleration that was otherwise disabled.
## What changes are included in this PR?
- `GenerateDocs.scala` now emits different prose for expressions enrolled in
codegen-dispatch fallback. Those expressions document that Comet accelerates
them by default via JVM codegen dispatch (Spark-compatible), and that
`allowIncompatible=true` opts into the faster native path with the listed
differences. Non-dispatch expressions (such as `Cast`, `SortArray`,
`CollectSet`) keep the original "fall back to Spark by default" wording, which
remains accurate for them.
- Refactored the grown `CategoryNotes` tuple into an `ExprNotes` case class.
Aggregate serdes use a separate builder since `CometAggregateExpressionSerde`
is not a subtype of `CometExpressionSerde` and never participates in codegen
dispatch.
- Updated the general intro sentence on the expression compatibility index
pages (`expressions/index.md` and the four per-version
`spark-{3.4,3.5,4.0,4.1}/index.md` pages), which carried the same inaccuracy.
## How are these changes tested?
This is a documentation generation change. I ran `GenerateDocs` against a
temporary copy of the templates for the Spark 3.5 profile and confirmed both
prose variants render correctly: codegen-dispatch expressions (for example
`Second`) get the new wording, and non-dispatch expressions (for example
`SortArray`, `CollectSet`) retain the original wording. The Spark module
compiles and the changed files pass spotless and prettier.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]