Hey Iceberg Community, I would like to propose a change to the iceberg commit path that improves when accumulated data files are written into manifests.
To start with a problem, when a single snapshot adds a large number of data files, MergingSnapshotProducer accumulates all of them in memory before the commit. This unbounded collection can lead to memory pressures or OOM failure before committing and losing all work. Wide-schema tables that touch many files and commit atomically in a given snapshot is particularly susceptible to the problem, such as full table data compaction and GDPR-like deletion and large MERGE INTO workloads. Would love to get community's feedback on https://docs.google.com/document/d/1uEH4AWh3PUt4t3oNpNEH6K_zxz2p_PsNWk9k97rmBaw/edit?usp=sharing . Thanks, Hongyue Zhang
