Hi Chris, could you also try what happens when you turn incremental checkpoints off?
Incremental checkpoints may create many small files which are a bad fit for HDFS. You could also evaluate other storage options (net drive, S3) if you find incremental checkpoints to be better. On Tue, Jun 2, 2020 at 2:36 AM Slotterback, Chris < chris_slotterb...@comcast.com> wrote: > Congxian, > > > > 1. The checkpoints were failing with this exception scattered through the > logs: > 2020-06-01 21:04:37,930 WARN org.apache.hadoop.hdfs.DataStreamer - > DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /flink/flink-checkpoints/ade55daec06ee72aaf7ceade86c6e7a9/chk-1/2093792d-7ebb-4008-8e20-4daf1849c2d4 > could only be replicated to 0 nodes instead of minReplication (=1). > > > 2. Yes, we are using incremental checkpointing > > 3. Currently our windows are configured to use the process function (we > were doing aggregates before), which is my understanding that should make > our state update/insert ratio lower, as we are building the liststates of > each window over time and processing them on trigger. > > 4. We set the max concurrent checkpoints back to 1, it was originally > configured to that and the checkpoints were taking too long to complete > before the next checkpoint interval began. > > > > Our tm’s are normally 3 slots (3_slots.png), we wanted to try running with > 1 slot (1_slot.png) and noticed the checkpoint times fell drastically, but > with 1 slot per tm our parallelism had to be dropped and our consumer lag > was growing. > > > > > > > > *From: *Congxian Qiu <qcx978132...@gmail.com> > *Date: *Friday, May 29, 2020 at 10:59 PM > *To: *"Slotterback, Chris" <chris_slotterb...@comcast.com> > *Cc: *"user@flink.apache.org" <user@flink.apache.org> > *Subject: *[EXTERNAL] Re: Inconsistent checkpoint durations vs state size > > > > Hi > > From the given picture, > > 1. there were some checkpoint failed(but not because of timeout), could > you please check why these checkpoint would fail? > > 2. The checkpoint data size is the delta size for current checkpoint[1], > assume you using incremental checkpoint > > 3. In fig1 the checkpoint size is ~3G, but in fig 2 the delta size can > grow to ~ 15G, my gut feeling is that the state update/insert ratio for > your program is very high? so that in one checkpoint you'll generate too > much sst files > > 4. from fig 2 seems you configurate > execution-checkpointing-max-concurrent-checkpoints[2] bigger than 1, could > you please try to set it to 1 and have a try? > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/checkpoint_monitoring.html#history-tab > <https://urldefense.com/v3/__https:/ci.apache.org/projects/flink/flink-docs-master/monitoring/checkpoint_monitoring.html*history-tab__;Iw!!CQl3mcHX2A!V-POF8zuG3zRTpac4NEwhP-2oPtcoufRMd761gk6tJaptpBFtVWI_8D-wEd8Azm9t4BRpnE$> > > > [2] > https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#execution-checkpointing-max-concurrent-checkpoints > <https://urldefense.com/v3/__https:/ci.apache.org/projects/flink/flink-docs-master/ops/config.html*execution-checkpointing-max-concurrent-checkpoints__;Iw!!CQl3mcHX2A!V-POF8zuG3zRTpac4NEwhP-2oPtcoufRMd761gk6tJaptpBFtVWI_8D-wEd8Azm9xWJS8V8$> > > Best, > > Congxian > > > > > > Slotterback, Chris <chris_slotterb...@comcast.com> 于2020年5月30日周六 上午7:43写道: > > Hi there, > > > > We are trying to upgrade a flink app from using FsStateBackend to > RocksDBStateBackend to reduce overhead memory requirements. When enabling > rocks, we are seeing a drop in used heap memory as it increments to disk, > but checkpoint durations have become inconsistent. Our data source has a > stable rate of reports coming in parallelly across partitions. The state > size doesn’t seem to correlate with the checkpoint duration from what I can > see in metrics. we have tried tmpfs and swap on SSDs with high iops, but > can’t get a good handle on what’s causing smaller state to take longer to > checkpoint. Our checkpoint location is hdfs, and works well in our > non-rocks cluster. > > > > Is ~100x checkpoint duration expected when going from fs to rocks state > backend, and is checkpoint duration supposed to vary this much with a > consistent data source normally? > > > > Chris > > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng