Hey,

I have encountered a weird issue in a checkpointing test I am trying to
write. The logic is the same as with the previous checkpointing tests,
there is a OnceFailingReducer.

My problem is that before the reducer fails, my job cannot take any
snapshots. The Runnables executing the checkpointing logic in the sources
keep waiting on some lock.

After the failure and the restart, everything is fine and the checkpointing
can succeed properly.

Also if I remove the failure from the reducer, the job doesnt take any
snapshots (waiting on lock) and the job will finish.

Here is the code:
https://github.com/gyfora/flink/blob/d1f12c2474413c9af357b6da33f1fac30549fbc3/flink-contrib/flink-streaming-contrib/src/test/java/org/apache/flink/contrib/streaming/state/TfileStateCheckpointingTest.java#L83

I assume there is no problem with the source as the Thread.sleep(..) is
outside of the synchronized block. (and as I said after the failure it
works fine).

Any ideas?

Thanks,
Gyula

Reply via email to