Hey Iceberg Community,

  I would like to propose a change to the iceberg commit path that improves
when accumulated data files are written into manifests.

  To start with a problem, when a single snapshot adds a large number of
data files, MergingSnapshotProducer accumulates all of them in
memory before the commit. This unbounded collection can lead to memory
pressures or OOM failure before committing and losing all work. Wide-schema
tables that touch many files and commit atomically in a given snapshot is
particularly susceptible to the problem, such as full table data compaction
and GDPR-like deletion and large MERGE INTO workloads.

  Would love to get community's feedback on
https://docs.google.com/document/d/1uEH4AWh3PUt4t3oNpNEH6K_zxz2p_PsNWk9k97rmBaw/edit?usp=sharing
.

Thanks,
Hongyue Zhang

Reply via email to