This could be valuable for some things, but I'm skeptical that we would want to rewrite significant pieces of existing code this way.
For example, you mentioned being able to "get rid of hand-written JSON serializers/deserializers". Automatic serialization is fiddly. You end up needing to scatter customizations around the code to annotate fields and add conversions. Then those annotations become the source of truth for serialization. If someone doesn't know what annotation to add to a field or just doesn't consider serialization you can serialize either more than you intended or something incorrect. This doesn't fit well with Iceberg, where the serialization to JSON is documented in the spec and is strict. In addition to making it hard to maintain a JSON format that aligns with the spec, this change isn't really needed. The existing parsers are already written and don't change much. We don't have to worry about updating dependencies and the implications to runtime Jars. But this would introduce new (and unnecessary) risk to integrations. To make that risk more concrete, consider object serialization in Spark. Today, we support Java, JSON, and Kryo serialization. There are places where we've needed to adapt the implementations and stop storing immutable collections for Kryo, or else Kryo can't reconstruct objects. So instead of storing an ImmutableMap, we store a HashMap and wrap it to be unmodifiable. If we were to rewrite classes that get serialized, we'd need to ensure we're not causing regressions where classes become incompatible with serialization that currently works. I don't think I would opt to spend time converting existing classes that we already know work, when those already have working JSON parsers, Java and Kryo serialization, and builders. Introducing big changes to achieve the same functionality is a pretty significant risk and should have a bigger upside than reducing lines of code, in my opinion. There may be uses other than converting existing code and there could be low risk existing classes that we can convert. Using this to support JSON serialization for `FileScanTask` is a good example because we need JSON serialization for Trino, but it doesn't need to conform to a spec. Ryan On Wed, Dec 8, 2021 at 4:37 AM Eduard Tudenhoefner <edu...@dremio.com> wrote: > Hello everyone, > > I was curious what people think about introducing/using the Immutables > <https://immutables.github.io/> library in Iceberg. The Immutables > library (Apache License 2.0) works via annotation processing, where an > abstract value class is annotated with *@Value.Immutable* to generate an > *immutable > implementation class*. > > The main motivation for the Immutables library is outlined below: > > - Use true immutable objects that are type-safe, thread-safe, null-safe > - Get builder classes for free > - reduce amount of boiler-plate code > - Get JSON serialization/deserialization for free (this will also be > helpful for the REST-based Catalog > <https://github.com/apache/iceberg/pull/3424>) since Immutable objects > are serialization ready (including JSON and its binary forms) > - Supports lazy, derived and optional attributes > - Immutable objects are constructed once, in a consistent state, and > can be safely shared > - Will fail if mandatory attributes are missing > - Cannot be sneakily modified when passed to other code > - Immutable objects are naturally thread-safe and can therefore be > safely shared among threads > - No excessive copying > - No excessive synchronization > - Object definitions are pleasant to write and read > - No boilerplate setter and getters > - No IDE-generated hashCode, equals and toString methods required. > However, custom implementation for these methods can still be written > > > I did a small PoC in #3688 <https://github.com/apache/iceberg/pull/3688> > (TableIdentifier/Namespace) > and #3580 <https://github.com/apache/iceberg/pull/3580> (TableMetadata) > but notice that both PRs still don't show the full potential of Immutables > as they only convert objects to being Immutable + using the generated > builders + doing some fairly easy validation checks. > The real benefit gets more obvious when we can get rid of hand-written > JSON serializers/deserializers. > > > Eduard > -- Ryan Blue Tabular