[PR] refactor(shim): add new shim infra with `enableIfVer` macro to avoid shims duplication [datafusion-comet]

via GitHub Mon, 15 Jun 2026 07:02:10 -0700


rluvaton opened a new pull request, #4657:
URL: https://github.com/apache/datafusion-comet/pull/4657


   > [!NOTE]
   > This does not remove all the shims in favor of the new infra as it will be 
very large PR
   > instead it moves some stuff to show how it is being used and the process 
of doing so
   
   A small macro-annotation library that lets a **single source file** carry 
code for multiple
   Spark versions. You annotate a member (`def` / `class` / `object`) with a 
version range, and at *
   *compile time** the macro keeps it, drops it, or tweaks it based on the 
version the build is
   targeting.
   
   ## The problem it solves
   
   Version-specific code used to live in parallel source folders 
(`src/main/spark-3.4/…`,
   `spark-3.5/…`, `spark-4.0/…`) - so a whole file got copied per version just 
because one method
   differed, or because a type like `WindowGroupLimitExec` only exists in Spark 
3.5+. `@enableIfVer`
   lets
   those differences live **inline, in one file**: no duplication, and the 
version logic sits right
   next to the code it gates.
   
   ## How you use it
   
   The targeted versions are supplied by the build per "dimension" (`spark`). 
You write
   a [semver](https://github.com/semver4j/semver4j#-ranges) range
   
   ```scala
   import org.apache.comet.enableIfVer
   
   // keep this method only on Spark 3.5+
   @enableIfVer(spark = ">=3.5.0")
   def onlyOn35Plus(): Unit =
   ...
   
   // two same-named methods, one per version range - exactly one survives per 
build
   @enableIfVer(spark = "<3.5.0") def isWGL(p: SparkPlan): Boolean = false
   @enableIfVer(spark = ">=3.5.0") def isWGL(p: SparkPlan): Boolean = 
p.isInstanceOf[WindowGroupLimitExec]
   
   ```
   
   ### The four behaviors
   
   | Annotation             | On a non-matching version                         
                                                                                
          |
   
|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
   | `@enableIfVer`         | **drops** the member entirely                     
                                                                                
          |
   | `@implementIfVer`      | **keeps the class/object** but empties its body 
and remove the inheritance (so a type that references 3.5-only symbols still 
exists on 3.4) |
   | `@enableOverrideIfVer` | **strips `override`** (for a member that only 
overrides on some versions)                                                     
              |
   
   ### Ranges
   
   Matched by semver4j: full `major.minor.patch` versions with `>` `>=` `<` 
`<=` `=` `!=`,
   space = AND, `||` = OR, `A - B` hyphen ranges, x-ranges / partials and more 
(FYI, it supports `package.json` format)
   
   ## What it gives us
   
   - **No more per-version source-folder duplication** for small differences.
   - **One file, version logic inline** - easier to read and maintain.
   - **Compile-time safety**: non-matching code (and its references to types 
that don't exist in that
     version) is removed before type-checking, so each build only ever sees 
valid code.
   - **Extensible & general**: any number of dimensions, combined with AND/OR, 
plus an explicit
     "not enabled" state - all from one unified `@enableIfVer` namespace.
   
   ## Caveats
   
   - `@enableIfVer` annotates **definitions only** - not statements (e.g. a 
bare `test(...)` call). To
     gate a statement, wrap it in a `def` and annotate that.
   - It cannot be applied to a **top-level class that has a companion object** 
(a macro-paradise
     limitation). For whole top-level classes that are entirely 
version-specific, the version-specific
     source folder is still the right tool.
   
   ## Apache Auron (Blaze)
   
   most of the annotation code taken from Blaze/auron in [sparkver.scala]
   
(https://github.com/apache/auron/blob/a0e2b5f787073c50d24bbce9e7059f2e22e1dbf1/spark-version-annotation-macros/src/main/scala/org/apache/auron/sparkver.scala)
   but updated the syntax of matching versions and replace 
`@sparkverEnableMembers` with
   `@implementIfVer` that remove the inheritance as well - useful for abstract 
class
   
   Blaze uses `/` to split between supported versions
   so for example
   like 
[this](https://github.com/apache/auron/blob/a0e2b5f787073c50d24bbce9e7059f2e22e1dbf1/spark-extension/src/main/scala/org/apache/spark/sql/hive/execution/auron/plan/NativeHiveTableScanBase.scala#L146-L148):
   
   ```scala
   @sparkver("3.0 / 3.1 / 3.2 / 3.3 / 3.4 / 3.5 / 4.0 / 4.1")
   override def simpleString(maxFields: Int): String =
     s"$nodeName (${basedHiveScan.simpleString(maxFields)})"
   ```
   
   This however use semver match syntax that is more widely known
   so the same code for us will be:
   
   ```scala
   @enableIfVer(spark = ">=3.0.0 <4.2.0")
   override def simpleString(maxFields: Int): String =
     s"$nodeName (${basedHiveScan.simpleString(maxFields)})"
   ```
   
   and if you want to automatically have this function for every spark version 
above 3.0 and not only
   until 4.1, you can do:
   
   ```scala
   @enableIfVer(spark = ">=3.0.0")
   override def simpleString(maxFields: Int): String =
     s"$nodeName (${basedHiveScan.simpleString(maxFields)})"
   ```
   
   the syntax is matching the one used in 
[`package.json`](https://docs.npmjs.com/cli/v11/configuring-npm/package-json#dependencies)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] refactor(shim): add new shim infra with `enableIfVer` macro to avoid shims duplication [datafusion-comet]

Reply via email to