Ok, thanks for trying to reproduce this. If possible, could you also activate trace-level logging for class org.apache.flink.runtime.state.SharedStateRegistry? In case the problem occurs, this would greatly help to understand what was going on.
> Am 21.11.2017 um 15:16 schrieb gerardg <ger...@talaia.io>: > >> where exactly did you read many times that incremental checkpoints cannot > reference files from previous >> checkpoints, because we would have to correct that information. In fact, >> this is how incremental checkpoints work. > > My fault, I read it in some other posts in the mailing list but now that I > read it carefully it meant savepoints not checkpoints. > >> Now for this case, I would consider it extremely unlikely that a >> checkpoint 1620 would still reference a checkpoint 1, >> in particular if the files for that checkpoint are already deleted, which >> should only happen if it is no longer >> referenced. Which version of Flink are you using and what is your >> distributed filesystem? Is there any way to >> reproduce the problem? > > We are using Flink version 1.3.2 and GlusterFS. There are usually a few > checkpoints around at the same time, for example right now: > > chk-1 chk-26 chk-27 chk-28 chk-29 chk-30 chk-31 > > I'm not sure how to reproduce the problem but I'll monitor the folder to see > when chk-1 gets deleted and try to make the task fail when that happens. > > Gerard > > Gerard > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/