dtenedor opened a new pull request, #54579:
URL: https://github.com/apache/spark/pull/54579

   ### What changes were proposed in this pull request?
   
   Add a `ConfigEntryType` sealed-trait enum to `ConfigEntry[T]`, threaded from 
`ConfigBuilder` through `TypedConfigBuilder` and all `create*` methods, so that 
config entries are tagged with their declared type at construction time without 
runtime type probing or exception handling.
   
   Specifically:
   
   - **New `ConfigEntryType` sealed trait** (`ConfigEntry.scala`) with case 
objects `BooleanEntry`, `IntEntry`, `LongEntry`, `DoubleEntry`, `StringEntry`, 
`EnumEntry`, `TimeEntry`, `BytesEntry`, `RegexEntry`, and `OtherEntry`.
   - **`ConfigEntry[T]`** gains a required `val configEntryType: 
ConfigEntryType` constructor parameter, propagated through all five subclasses 
(`ConfigEntryWithDefault`, `ConfigEntryWithDefaultFunction`, 
`ConfigEntryWithDefaultString`, `OptionalConfigEntry`, `FallbackConfigEntry`).
   - **`TypedConfigBuilder[T]`** gains a required `val configEntryType: 
ConfigEntryType` constructor parameter, propagated through `transform`, 
`toSequence`, and all `create*` methods (`createWithDefault`, 
`createWithDefaultFunction`, `createWithDefaultString`, `createOptional`).
   - **Every `ConfigBuilder.*Conf` factory method** (`intConf`, `longConf`, 
`doubleConf`, `booleanConf`, `stringConf`, `enumConf`, `timeConf`, `bytesConf`, 
`regexConf`) passes the appropriate enum variant. `fallbackConf` inherits the 
variant from the fallback entry.
   - **`configEntryType` is a required (non-default) constructor parameter** on 
both `ConfigEntry` and `TypedConfigBuilder`, so the compiler forces every new 
construction site to explicitly specify the type—preventing silent omission.
   
   Using an enum instead of a single `isBooleanEntry: Boolean` flag makes the 
design extensible: callers can match on the specific config type (e.g. to 
optimize access paths differently for boolean vs. numeric entries) without 
adding new boolean fields for each type.
   
   ### Why are the changes needed?
   
   Pattern matching on config values at runtime (e.g. `case b: Boolean => ...`) 
or using `isInstanceOf[Boolean]` type tests causes JVM `class_check` 
deoptimizations at megamorphic call sites. By tagging each config entry with 
its declared type at construction time, hot-path config access code can use a 
simple field check instead, avoiding these deoptimizations entirely.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. `ConfigEntryType` and `configEntryType` are `private[spark]`; no public 
API is affected.
   
   ### How was this patch tested?
   
   New unit test suite `RecordConfigAccessSuite` 
(`core/src/test/scala/org/apache/spark/RecordConfigAccessSuite.scala`) with 19 
tests covering:
   
   - Correct `configEntryType` assignment for builtin entries of every type 
(boolean, int, long, double, string, bytes, time).
   - `fallbackConf` inheritance of `configEntryType` from the fallback entry.
   - Preservation of `configEntryType` through all `create*` variants 
(`createWithDefault`, `createWithDefaultString`, `createWithDefaultFunction`, 
`createOptional`).
   - Preservation through `transform`, `checkValue`, and `toSequence`.
   - One test per `ConfigBuilder.*Conf` method confirming the correct enum 
variant.
   - Negative test verifying non-boolean entries do not carry `BooleanEntry`.
   
   Run with:
   
       build/sbt "core/testOnly org.apache.spark.RecordConfigAccessSuite"
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes, `claude-4.6-opus-high`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to