Hi Vinay! True, the operator state (like Kafka) is currently not asynchronously checkpointed.
While it is rather small state, we have seen before that on S3 it can cause trouble, because S3 frequently stalls uploads of even data amounts as low as kilobytes due to its throttling policies. That would be a super important fix to add! Best, Stephan On Fri, Feb 24, 2017 at 2:58 PM, vinay patil <vinay18.pa...@gmail.com> wrote: > Hi, > > I have attached a snapshot for reference: > As you can see all the 3 checkpointins failed , for checkpoint ID 2 and 3 > it > is stuck at the Kafka source after 50% > (The data sent till now by Kafka source 1 is 65GB and sent by source 2 is > 15GB ) > > Within 10minutes 15M records were processed, and for the next 16minutes the > pipeline is stuck , I don't see any progress beyond 15M because of > checkpoints getting failed consistently. > > <http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/file/n11882/Checkpointing_Failed.png> > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Re- > Checkpointing-with-RocksDB-as-statebackend-tp11752p11882.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >