Re: [I] [Bug] [connector-iceberg] Writing extremely large batches of data to iceberg will cause OOM [seatunnel]

via GitHub Thu, 19 Mar 2026 03:07:31 -0700


DanielCarter-stack commented on issue #10623:
URL: https://github.com/apache/seatunnel/issues/10623#issuecomment-4089050022


   <!-- code-pr-reviewer -->
   This is a valid memory management issue when writing large datasets to 
Iceberg with upsert mode enabled.
   
   **Root cause**: `IcebergSinkWriter.results` accumulates `WriteResult` 
objects between checkpoints. These hold `DataFile[]` and `DeleteFile[]` 
references that are only cleared during `snapshotState()`, not in 
`prepareCommit()`.
   
   **Risk factors**:
   - **BATCH mode without `checkpoint.interval`** - `snapshotState()` may only 
be called once at job end, causing unbounded accumulation
   - **Large `write.target-file-size-bytes`** - buffers more data per file 
before rolling
   - **High-cardinality partitions** - maintains a separate writer per partition
   - **Upsert mode** - generates both data and delete files, doubling memory 
overhead
   
   **Immediate mitigations**:
   1. Add `checkpoint.interval` (critical for BATCH mode): `env { 
checkpoint.interval = 300000 }`
   2. Reduce target file size: `write.target-file-size-bytes = 134217728` 
(128MB)
   3. Disable upsert if not required: `iceberg.table.upsert-mode-enabled = 
false`
   
   **Questions to help design a proper fix**:
   1. Are you running in BATCH or STREAMING mode?
   2. What is your `checkpoint.interval` setting?
   3. What is the partition key and approximate number of distinct partitions?
   4. Can you share a heap dump or thread dump from the OOM?
   
   Potential improvements: memory-based flush thresholds and documenting 
checkpoint requirements for BATCH mode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] [connector-iceberg] Writing extremely large batches of data to iceberg will cause OOM [seatunnel]

Reply via email to