dtenedor opened a new pull request, #54579:
URL: https://github.com/apache/spark/pull/54579
### What changes were proposed in this pull request?
Add a `ConfigEntryType` sealed-trait enum to `ConfigEntry[T]`, threaded from
`ConfigBuilder` through `TypedConfigBuilder` and all `create*` methods, so that
config entries are tagged with their declared type at construction time without
runtime type probing or exception handling.
Specifically:
- **New `ConfigEntryType` sealed trait** (`ConfigEntry.scala`) with case
objects `BooleanEntry`, `IntEntry`, `LongEntry`, `DoubleEntry`, `StringEntry`,
`EnumEntry`, `TimeEntry`, `BytesEntry`, `RegexEntry`, and `OtherEntry`.
- **`ConfigEntry[T]`** gains a required `val configEntryType:
ConfigEntryType` constructor parameter, propagated through all five subclasses
(`ConfigEntryWithDefault`, `ConfigEntryWithDefaultFunction`,
`ConfigEntryWithDefaultString`, `OptionalConfigEntry`, `FallbackConfigEntry`).
- **`TypedConfigBuilder[T]`** gains a required `val configEntryType:
ConfigEntryType` constructor parameter, propagated through `transform`,
`toSequence`, and all `create*` methods (`createWithDefault`,
`createWithDefaultFunction`, `createWithDefaultString`, `createOptional`).
- **Every `ConfigBuilder.*Conf` factory method** (`intConf`, `longConf`,
`doubleConf`, `booleanConf`, `stringConf`, `enumConf`, `timeConf`, `bytesConf`,
`regexConf`) passes the appropriate enum variant. `fallbackConf` inherits the
variant from the fallback entry.
- **`configEntryType` is a required (non-default) constructor parameter** on
both `ConfigEntry` and `TypedConfigBuilder`, so the compiler forces every new
construction site to explicitly specify the type—preventing silent omission.
Using an enum instead of a single `isBooleanEntry: Boolean` flag makes the
design extensible: callers can match on the specific config type (e.g. to
optimize access paths differently for boolean vs. numeric entries) without
adding new boolean fields for each type.
### Why are the changes needed?
Pattern matching on config values at runtime (e.g. `case b: Boolean => ...`)
or using `isInstanceOf[Boolean]` type tests causes JVM `class_check`
deoptimizations at megamorphic call sites. By tagging each config entry with
its declared type at construction time, hot-path config access code can use a
simple field check instead, avoiding these deoptimizations entirely.
### Does this PR introduce _any_ user-facing change?
No. `ConfigEntryType` and `configEntryType` are `private[spark]`; no public
API is affected.
### How was this patch tested?
New unit test suite `RecordConfigAccessSuite`
(`core/src/test/scala/org/apache/spark/RecordConfigAccessSuite.scala`) with 19
tests covering:
- Correct `configEntryType` assignment for builtin entries of every type
(boolean, int, long, double, string, bytes, time).
- `fallbackConf` inheritance of `configEntryType` from the fallback entry.
- Preservation of `configEntryType` through all `create*` variants
(`createWithDefault`, `createWithDefaultString`, `createWithDefaultFunction`,
`createOptional`).
- Preservation through `transform`, `checkValue`, and `toSequence`.
- One test per `ConfigBuilder.*Conf` method confirming the correct enum
variant.
- Negative test verifying non-boolean entries do not carry `BooleanEntry`.
Run with:
build/sbt "core/testOnly org.apache.spark.RecordConfigAccessSuite"
### Was this patch authored or co-authored using generative AI tooling?
Yes, `claude-4.6-opus-high`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]