---------- Forwarded message ---------- From: "Yassine MARZOUGUI" <y.marzou...@mindlytix.com> Date: Apr 23, 2017 20:53 Subject: Checkpoints very slow with high backpressure To: <user@flink.apache.org> Cc:
Hi all, I have a Streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements are kept in a MapState) 4- write results back to HDFS using a BucketingSink. I am using RocksdbStateBackend, and Flink 1.3-SNAPSHOT (Commit: 9fb074c). Currenlty the source contain just one a file of 1GB, so that's the maximum state that the job might hold. I noticed that the backpressure on the operators #1 and #2 is High, and the split reader has only read 60 Mb out of 1Gb source source file. I suspect this is because the ProcessFunction is slow (on purpose). However looks like this affected the checkpoints which are failing after the timeout (which is set to 2 hours), see attached screenshot. In the job manager logs I keep getting warnings : 2017-04-23 19:32:38,827 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 8 from 210769a077c67841d980776d8caece0a of job 6c7e44d205d738fc8a6cb4da181d2d86. Is the high backpressure the cause for the checkpoints being too slow? If yes Is there a way to disbale the backpressure mechanism since the records will be buffered in the rocksdb state after all which is backed by the disk? Thank you. Best, Yassine