rluvaton opened a new pull request, #4657: URL: https://github.com/apache/datafusion-comet/pull/4657
> [!NOTE] > This does not remove all the shims in favor of the new infra as it will be very large PR > instead it moves some stuff to show how it is being used and the process of doing so A small macro-annotation library that lets a **single source file** carry code for multiple Spark versions. You annotate a member (`def` / `class` / `object`) with a version range, and at * *compile time** the macro keeps it, drops it, or tweaks it based on the version the build is targeting. ## The problem it solves Version-specific code used to live in parallel source folders (`src/main/spark-3.4/…`, `spark-3.5/…`, `spark-4.0/…`) - so a whole file got copied per version just because one method differed, or because a type like `WindowGroupLimitExec` only exists in Spark 3.5+. `@enableIfVer` lets those differences live **inline, in one file**: no duplication, and the version logic sits right next to the code it gates. ## How you use it The targeted versions are supplied by the build per "dimension" (`spark`). You write a [semver](https://github.com/semver4j/semver4j#-ranges) range ```scala import org.apache.comet.enableIfVer // keep this method only on Spark 3.5+ @enableIfVer(spark = ">=3.5.0") def onlyOn35Plus(): Unit = ... // two same-named methods, one per version range - exactly one survives per build @enableIfVer(spark = "<3.5.0") def isWGL(p: SparkPlan): Boolean = false @enableIfVer(spark = ">=3.5.0") def isWGL(p: SparkPlan): Boolean = p.isInstanceOf[WindowGroupLimitExec] ``` ### The four behaviors | Annotation | On a non-matching version | |------------------------|---------------------------------------------------------------------------------------------------------------------------------------------| | `@enableIfVer` | **drops** the member entirely | | `@implementIfVer` | **keeps the class/object** but empties its body and remove the inheritance (so a type that references 3.5-only symbols still exists on 3.4) | | `@enableOverrideIfVer` | **strips `override`** (for a member that only overrides on some versions) | ### Ranges Matched by semver4j: full `major.minor.patch` versions with `>` `>=` `<` `<=` `=` `!=`, space = AND, `||` = OR, `A - B` hyphen ranges, x-ranges / partials and more (FYI, it supports `package.json` format) ## What it gives us - **No more per-version source-folder duplication** for small differences. - **One file, version logic inline** - easier to read and maintain. - **Compile-time safety**: non-matching code (and its references to types that don't exist in that version) is removed before type-checking, so each build only ever sees valid code. - **Extensible & general**: any number of dimensions, combined with AND/OR, plus an explicit "not enabled" state - all from one unified `@enableIfVer` namespace. ## Caveats - `@enableIfVer` annotates **definitions only** - not statements (e.g. a bare `test(...)` call). To gate a statement, wrap it in a `def` and annotate that. - It cannot be applied to a **top-level class that has a companion object** (a macro-paradise limitation). For whole top-level classes that are entirely version-specific, the version-specific source folder is still the right tool. ## Apache Auron (Blaze) most of the annotation code taken from Blaze/auron in [sparkver.scala] (https://github.com/apache/auron/blob/a0e2b5f787073c50d24bbce9e7059f2e22e1dbf1/spark-version-annotation-macros/src/main/scala/org/apache/auron/sparkver.scala) but updated the syntax of matching versions and replace `@sparkverEnableMembers` with `@implementIfVer` that remove the inheritance as well - useful for abstract class Blaze uses `/` to split between supported versions so for example like [this](https://github.com/apache/auron/blob/a0e2b5f787073c50d24bbce9e7059f2e22e1dbf1/spark-extension/src/main/scala/org/apache/spark/sql/hive/execution/auron/plan/NativeHiveTableScanBase.scala#L146-L148): ```scala @sparkver("3.0 / 3.1 / 3.2 / 3.3 / 3.4 / 3.5 / 4.0 / 4.1") override def simpleString(maxFields: Int): String = s"$nodeName (${basedHiveScan.simpleString(maxFields)})" ``` This however use semver match syntax that is more widely known so the same code for us will be: ```scala @enableIfVer(spark = ">=3.0.0 <4.2.0") override def simpleString(maxFields: Int): String = s"$nodeName (${basedHiveScan.simpleString(maxFields)})" ``` and if you want to automatically have this function for every spark version above 3.0 and not only until 4.1, you can do: ```scala @enableIfVer(spark = ">=3.0.0") override def simpleString(maxFields: Int): String = s"$nodeName (${basedHiveScan.simpleString(maxFields)})" ``` the syntax is matching the one used in [`package.json`](https://docs.npmjs.com/cli/v11/configuring-npm/package-json#dependencies) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
