Im sorry guys if you received multiple instances of this mail, I kept trying to send it yesterday, but looks like the mailing list was stuck and didn't dispatch it until now. Sorry for the disturb.
On Apr 23, 2017 20:53, "Yassine MARZOUGUI" <y.marzou...@mindlytix.com> wrote: > Hi all, > > I have a Streaming pipeline as follows: > 1 - read a folder continuousely from HDFS > 2 - filter duplicates (using keyby(x->x) and keeping a state per key > indicating whether its is seen) > 3 - schedule some future actions on the stream using ProcessFunction and > processing time timers (elements are kept in a MapState) > 4- write results back to HDFS using a BucketingSink. > > I am using RocksdbStateBackend, and Flink 1.3-SNAPSHOT (Commit: 9fb074c). > > Currenlty the source contain just one a file of 1GB, so that's the maximum > state that the job might hold. I noticed that the backpressure on the > operators #1 and #2 is High, and the split reader has only read 60 Mb out > of 1Gb source source file. I suspect this is because the ProcessFunction is > slow (on purpose). However looks like this affected the checkpoints which > are failing after the timeout (which is set to 2 hours), see attached > screenshot. > > > > In the job manager logs I keep getting warnings : > > 2017-04-23 19:32:38,827 WARN > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late > message for now expired checkpoint attempt 8 from > 210769a077c67841d980776d8caece0a of job 6c7e44d205d738fc8a6cb4da181d2d86. > > Is the high backpressure the cause for the checkpoints being too slow? If > yes Is there a way to disbale the backpressure mechanism since the records > will be buffered in the rocksdb state after all which is backed by the disk? > > Thank you. > > Best, > Yassine > >