- Technically would it be possible not to force partition cols into the PK?

I believe this is possible, but probably less performant. It is mentioned
in the docs https://iceberg.apache.org/spec/#scan-planning

>From the documentation:

"An equality delete file must be applied to a data file when all of the
following are true: The data file’s partition (both spec and partition
values) is equal to the delete file’s partition or the delete file’s
partition spec is unpartitioned"
"In general, deletes are applied only to data files that are older and in
the same partition, except for two special cases: Equality delete files
stored with an unpartitioned spec are applied as global deletes. Otherwise,
delete files do not apply to files in other partitions."

Thanks ismail

On 2024/03/28 13:44:43 Péter Váry wrote:
> Hi Team,
>
> As discussed on yesterday's community sync, I am working on adding a
> possibility to the Flink Iceberg connector to run maintenance tasks on the
> Iceberg tables. This will fix the small files issues and in the long run
> help compacting the high number of positional and equality deletes created
> by Flink tasks writing CDC data to Iceberg tables without the need of
Spark
> in the infrastructure.
>
> I did some planning, prototyping and currently trying out the solution on
a
> larger scale.
>
> I put together a document how my current solution looks like:
>
https://docs.google.com/document/d/16g3vR18mVBy8jbFaLjf2JwAANuYOmIwr15yDDxovdnA/edit?usp=sharing
>
> I would love to hear your thoughts and feedback on this to find a good
> final solution.
>
> Thanks,
> Peter
>

Reply via email to