Hi Team,
I am currently running batch jobs on flink 1.19.2 and the expected
throughput for a certain step is above 1TB.
What I have observed is that everytime my job reaches close to 30%, it
fails with the below error after a few retries. I am currently running this
on kubernetes and using 6 CPU and 48GB Memory resources for TaskManager and
3 CPU and 32GB Memory for JobManager.

Caused by: java.io.IOException: Cannot write record to fresh sort buffer.
Record too large.
at org.apache.flink.runtime.operators.chaining.
SynchronousChainedCombineDriver.collect(SynchronousChainedCombineDriver
.java:190)
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect
(CountingCollector.java:35)
at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(
ChainedMapDriver.java:78)
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect
(CountingCollector.java:35)
at org.apache.flink.api.java.operators.translation.
PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:58)
at org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:
113)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:516)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:359)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(
Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.base/java.lang.Thread.run(Unknown Source)
I have two questions here,
1.How can I handle this issue of sort buffer, Record too large?
2.What is the recommended way to handle large records in flink Batch Jobs?

Please help me out with this.

Regards,
Suraj

Reply via email to