- Technically would it be possible not to force partition cols into the PK?
I believe this is possible, but probably less performant. It is mentioned in the docs https://iceberg.apache.org/spec/#scan-planning >From the documentation: "An equality delete file must be applied to a data file when all of the following are true: The data file’s partition (both spec and partition values) is equal to the delete file’s partition or the delete file’s partition spec is unpartitioned" "In general, deletes are applied only to data files that are older and in the same partition, except for two special cases: Equality delete files stored with an unpartitioned spec are applied as global deletes. Otherwise, delete files do not apply to files in other partitions." Thanks ismail On 2024/03/28 13:44:43 Péter Váry wrote: > Hi Team, > > As discussed on yesterday's community sync, I am working on adding a > possibility to the Flink Iceberg connector to run maintenance tasks on the > Iceberg tables. This will fix the small files issues and in the long run > help compacting the high number of positional and equality deletes created > by Flink tasks writing CDC data to Iceberg tables without the need of Spark > in the infrastructure. > > I did some planning, prototyping and currently trying out the solution on a > larger scale. > > I put together a document how my current solution looks like: > https://docs.google.com/document/d/16g3vR18mVBy8jbFaLjf2JwAANuYOmIwr15yDDxovdnA/edit?usp=sharing > > I would love to hear your thoughts and feedback on this to find a good > final solution. > > Thanks, > Peter >