Max,

Can you use the factory methods in Expressions
<https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/Expressions.java#L304-L318>
rather than changing visibility?

Also, I don’t think that making SchemaUpdate public is a good idea. It has
a public interface that you can use and I don’t agree with exposing
internal implementation classes just to write tests. I would expect test
cases to be specific, much like how SchemaUpdate is itself tested.

Ryan

On Mon, May 19, 2025 at 9:41 AM Maximilian Michels <m...@apache.org> wrote:

> Hi,
>
> The Flink Dynamic Iceberg Sink requires a few iceberg-core class
> visibility changes, which I'd like to get your feedback on.
>
> Here is the diff:
>
> https://github.com/apache/iceberg/pull/13032/commits/1804b0ac4ff97c3c943463725e91a1e24b0f8c44
>
> There are three visibility changes:
>
> Change 1: Make NamedReference constructor public
> Change 2: Make UnboundTransform constructor public
> Change 3: Make SchemaUpdate class and constructor public
>
> Change (1) and (2) both fall under the same feature, which is
> rebinding PartitionSpec transforms.
>
> Background: The Dynamic Iceberg Sink can dynamically evolve Schema and
> PartitionSpec of a table. The user provides schema / spec in the input
> record. However, to ensure the changes are safe, we don't directly
> take the user schema / spec, but we rebind it to one of the existing
> schemas / specs. We do that by name-matching the fields in the
> user-provided schema with the fields in the table schemas. This allows
> us to directly use or evolve an existing schema and to preserve the
> internal field ids. Name-based schema evolution is also what the
> Iceberg Schema API offers.
>
> Similarly, user-provided PartitionSpecs need to be rewritten for their
> transforms to get bound to the correct source id, which is one of the
> field ids of the table schema. The UpdatePartitionSpec API works on
> Terms which is what we generate with this code and the class
> visibility changes:
>
> https://github.com/apache/iceberg/blob/61fedb11399ee344f9621362521e8a5071e91c05/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/PartitionSpecEvolution.java#L124
> In order to rebind the provided PartitionSpec to the id of the
> name-matching field in the table schema, we need the constructor of
> NamedReference and UnboundTransform to be public.
>
> Change (3) is used to test the schema update code. We want to test the
> schema evolution code, which uses SchemaUpdate API to make the
> changes. Our tests validate that the applied changes yield the
> expected new schema. Example test code:
>
> https://github.com/apache/iceberg/blob/61fedb11399ee344f9621362521e8a5071e91c05/flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/dynamic/TestEvolveSchemaVisitor.java#L111
>
> I hope that makes sense. I'm curious to hear what you think. Any concerns?
>
> Cheers,
> Max
>

Reply via email to