Re: [DISCUSS] Evaluate usage of Immutables library

Ryan Blue Wed, 08 Dec 2021 15:08:31 -0800

This could be valuable for some things, but I'm skeptical that we would
want to rewrite significant pieces of existing code this way.

For example, you mentioned being able to "get rid of hand-written JSON
serializers/deserializers". Automatic serialization is fiddly. You end up
needing to scatter customizations around the code to annotate fields and
add conversions. Then those annotations become the source of truth for
serialization. If someone doesn't know what annotation to add to a field or
just doesn't consider serialization you can serialize either more than you
intended or something incorrect. This doesn't fit well with Iceberg, where
the serialization to JSON is documented in the spec and is strict.

In addition to making it hard to maintain a JSON format that aligns with
the spec, this change isn't really needed. The existing parsers are already
written and don't change much. We don't have to worry about updating
dependencies and the implications to runtime Jars. But this would introduce
new (and unnecessary) risk to integrations.

To make that risk more concrete, consider object serialization in Spark.
Today, we support Java, JSON, and Kryo serialization. There are places
where we've needed to adapt the implementations and stop storing immutable
collections for Kryo, or else Kryo can't reconstruct objects. So instead of
storing an ImmutableMap, we store a HashMap and wrap it to be unmodifiable.
If we were to rewrite classes that get serialized, we'd need to ensure
we're not causing regressions where classes become incompatible with
serialization that currently works.

I don't think I would opt to spend time converting existing classes that we
already know work, when those already have working JSON parsers, Java and
Kryo serialization, and builders. Introducing big changes to achieve the
same functionality is a pretty significant risk and should have a bigger
upside than reducing lines of code, in my opinion.

There may be uses other than converting existing code and there could be
low risk existing classes that we can convert. Using this to support JSON
serialization for `FileScanTask` is a good example because we need JSON
serialization for Trino, but it doesn't need to conform to a spec.

Ryan

On Wed, Dec 8, 2021 at 4:37 AM Eduard Tudenhoefner <edu...@dremio.com>
wrote:

> Hello everyone,
>
> I was curious what people think about introducing/using the Immutables
> <https://immutables.github.io/> library in Iceberg. The Immutables
> library (Apache License 2.0) works via annotation processing, where an
> abstract value class is annotated with *@Value.Immutable* to generate an 
> *immutable
> implementation class*.
>
> The main motivation for the Immutables library is outlined below:
>
>    - Use true immutable objects that are type-safe, thread-safe, null-safe
>    - Get builder classes for free
>    - reduce amount of boiler-plate code
>    - Get JSON serialization/deserialization for free (this will also be
>    helpful for the REST-based Catalog
>    <https://github.com/apache/iceberg/pull/3424>) since Immutable objects
>    are serialization ready (including JSON and its binary forms)
>    - Supports lazy, derived and optional attributes
>    - Immutable objects are constructed once, in a consistent state, and
>    can be safely shared
>       - Will fail if mandatory attributes are missing
>       - Cannot be sneakily modified when passed to other code
>    - Immutable objects are naturally thread-safe and can therefore be
>    safely shared among threads
>       - No excessive copying
>       - No excessive synchronization
>    - Object definitions are pleasant to write and read
>       - No boilerplate setter and getters
>       - No IDE-generated hashCode, equals and toString methods required.
>       However, custom implementation for these methods can still be written
>
>
> I did a small PoC in #3688 <https://github.com/apache/iceberg/pull/3688> 
> (TableIdentifier/Namespace)
> and #3580 <https://github.com/apache/iceberg/pull/3580> (TableMetadata)
> but notice that both PRs still don't show the full potential of Immutables
> as they only convert objects to being Immutable + using the generated
> builders + doing some fairly easy validation checks.
> The real benefit gets more obvious when we can get rid of hand-written
> JSON serializers/deserializers.
>
>
> Eduard
>

-- 
Ryan Blue
Tabular

Re: [DISCUSS] Evaluate usage of Immutables library

Reply via email to