Fwd: Checkpoints very slow with high backpressure

Yassine MARZOUGUI Mon, 24 Apr 2017 00:46:17 -0700

---------- Forwarded message ----------
From: "Yassine MARZOUGUI" <y.marzou...@mindlytix.com>
Date: Apr 23, 2017 20:53
Subject: Checkpoints very slow with high backpressure
To: <user@flink.apache.org>
Cc:


Hi all,

I have a Streaming pipeline as follows:
1 - read a folder continuousely from HDFS
2 - filter duplicates (using keyby(x->x) and keeping a state per key
indicating whether its is seen)
3 - schedule some future actions on the stream using ProcessFunction and
processing time timers (elements are kept in a MapState)
4- write results back to HDFS using a BucketingSink.

I am using RocksdbStateBackend, and Flink 1.3-SNAPSHOT (Commit: 9fb074c).

Currenlty the source contain just one a file of 1GB, so that's the maximum
state that the job might hold. I noticed that the backpressure on the
operators #1 and #2 is High, and the split reader has only read 60 Mb out
of 1Gb source source file. I suspect this is because the ProcessFunction is
slow (on purpose). However looks like this affected the checkpoints which
are failing after the timeout (which is set to 2 hours), see attached
screenshot.



In the job manager logs I keep getting warnings :

2017-04-23 19:32:38,827 WARN
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
Received late message for now expired checkpoint attempt 8 from
210769a077c67841d980776d8caece0a of job
6c7e44d205d738fc8a6cb4da181d2d86.

Is the high backpressure the cause for the checkpoints being too slow? If
yes Is there a way to disbale the backpressure mechanism since the records
will be buffered in the rocksdb state after all which is backed by the disk?

Thank you.

Best,
Yassine

Fwd: Checkpoints very slow with high backpressure

Reply via email to