Hello all, We are running Flink 1.5.3 on Kubernetes with RocksDB as statebackend. When performing some load testing we got an /OutOfMemoryError: native memory exhausted/, causing the job to fail and be restarted.
After the Taskmanager is restarted, the job is recovered from a Checkpoint, but it seems that there is a problem when trying to access the state. We got the error from the *onTimer* function of a *onProcessingTime*. It would be possible that the OOM error could have caused to checkpoint a corrupted state? We get Exceptions like: TimerException{java.lang.RuntimeException: Error while retrieving data from RocksDB.} at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:288) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522) at java.util.concurrent.FutureTask.run(FutureTask.java:277) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:191) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.lang.Thread.run(Thread.java:811) Caused by: java.lang.RuntimeException: Error while retrieving data from RocksDB. at org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:89) at com.xxx.ProcessFunction.*onTimer*(ProcessFunction.java:279) at org.apache.flink.streaming.api.operators.KeyedProcessOperator.invokeUserFunction(KeyedProcessOperator.java:94) at org.apache.flink.streaming.api.operators.KeyedProcessOperator.*onProcessingTime*(KeyedProcessOperator.java:78) at org.apache.flink.streaming.api.operators.HeapInternalTimerService.*onProcessingTime*(HeapInternalTimerService.java:266) at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:285) ... 7 more Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:208) at java.io.DataInputStream.readUTF(DataInputStream.java:618) at java.io.DataInputStream.readUTF(DataInputStream.java:573) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:381) at org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:87) ... 12 more Thanks in advance for any help -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/