julianbradford19-png commented on issue #10623: URL: https://github.com/apache/seatunnel/issues/10623#issuecomment-4090795541
Thank u for bringing this up On Thu, Mar 19, 2026, 5:07 AM Daniel Carter ***@***.***> wrote: > *DanielCarter-stack* left a comment (apache/seatunnel#10623) > <https://github.com/apache/seatunnel/issues/10623#issuecomment-4089050022> > > This is a valid memory management issue when writing large datasets to > Iceberg with upsert mode enabled. > > *Root cause*: IcebergSinkWriter.results accumulates WriteResult objects > between checkpoints. These hold DataFile[] and DeleteFile[] references > that are only cleared during snapshotState(), not in prepareCommit(). > > *Risk factors*: > > - *BATCH mode without checkpoint.interval* - snapshotState() may only > be called once at job end, causing unbounded accumulation > - *Large write.target-file-size-bytes* - buffers more data per file > before rolling > - *High-cardinality partitions* - maintains a separate writer per > partition > - *Upsert mode* - generates both data and delete files, doubling > memory overhead > > *Immediate mitigations*: > > 1. Add checkpoint.interval (critical for BATCH mode): env { > checkpoint.interval = 300000 } > 2. Reduce target file size: write.target-file-size-bytes = 134217728 > (128MB) > 3. Disable upsert if not required: iceberg.table.upsert-mode-enabled = > false > > *Questions to help design a proper fix*: > > 1. Are you running in BATCH or STREAMING mode? > 2. What is your checkpoint.interval setting? > 3. What is the partition key and approximate number of distinct > partitions? > 4. Can you share a heap dump or thread dump from the OOM? > > Potential improvements: memory-based flush thresholds and documenting > checkpoint requirements for BATCH mode. > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/seatunnel/issues/10623#issuecomment-4089050022>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/B74KUG25EADIIGURAXHV6ED4RPBHBAVCNFSM6AAAAACWXTB6LSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DAOBZGA2TAMBSGI> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
