[PR] docs: reflect codegen dispatch fallback in expression compatibility guide [datafusion-comet]

via GitHub Sat, 13 Jun 2026 06:46:35 -0700


andygrove opened a new pull request, #4649:
URL: https://github.com/apache/datafusion-comet/pull/4649


   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   The expression compatibility guide is auto-generated by 
`GenerateDocs.scala`. For every `Incompatible` expression it printed a sentence 
like:
   
   > The following incompatibilities cause `Second` to fall back to Spark by 
default. Set `spark.comet.expression.Second.allowIncompatible=true` to enable 
Comet acceleration despite these differences.
   
   This is no longer accurate for expressions that opt into the 
`CodegenDispatchFallback` trait. Those expressions do not fall back to Spark by 
default. They stay in Comet's native pipeline via the JVM codegen dispatcher 
(running Spark's own generated code) and match Spark exactly. For these 
expressions, `allowIncompatible=true` switches to the faster native 
implementation that carries the listed differences, rather than enabling Comet 
acceleration that was otherwise disabled.
   
   ## What changes are included in this PR?
   
   - `GenerateDocs.scala` now emits different prose for expressions enrolled in 
codegen-dispatch fallback. Those expressions document that Comet accelerates 
them by default via JVM codegen dispatch (Spark-compatible), and that 
`allowIncompatible=true` opts into the faster native path with the listed 
differences. Non-dispatch expressions (such as `Cast`, `SortArray`, 
`CollectSet`) keep the original "fall back to Spark by default" wording, which 
remains accurate for them.
   - Refactored the grown `CategoryNotes` tuple into an `ExprNotes` case class. 
Aggregate serdes use a separate builder since `CometAggregateExpressionSerde` 
is not a subtype of `CometExpressionSerde` and never participates in codegen 
dispatch.
   - Updated the general intro sentence on the expression compatibility index 
pages (`expressions/index.md` and the four per-version 
`spark-{3.4,3.5,4.0,4.1}/index.md` pages), which carried the same inaccuracy.
   
   ## How are these changes tested?
   
   This is a documentation generation change. I ran `GenerateDocs` against a 
temporary copy of the templates for the Spark 3.5 profile and confirmed both 
prose variants render correctly: codegen-dispatch expressions (for example 
`Second`) get the new wording, and non-dispatch expressions (for example 
`SortArray`, `CollectSet`) retain the original wording. The Spark module 
compiles and the changed files pass spotless and prettier.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] docs: reflect codegen dispatch fallback in expression compatibility guide [datafusion-comet]

Reply via email to