Could this be FLIP-15 related as well then?
> On May 4, 2020, at 9:41 PM, Ashish Pokharel <ashish...@yahoo.com> wrote:
>
> Hi all,
>
> Hope everyone is doing well!
>
> I am running into what seems like a deadlock (application stalled) situation
> with a Flink streaming job upon restore from savepoint. Job has a slowly
> moving stream (S1) that needs to be “stateful” and a continuous stream (S2)
> which is “joined” with slow moving stream (S1). Some level of loss/repetition
> is acceptable in continuous stream (S2) and hence can rely on something like
> Kafka consumer states upon restarts etc. Continuous stream (S2) however needs
> to be iterated through states from slowly moving streams (S1) a few times
> (mostly 2). States are fair sized (ends up being 15GB on HDFS). When job is
> restarted with no continuous data (S2) on topic job starts up, restores
> states and does it’s initial checkpoint within 3 minutes. However, when app
> is started from savepoint and continuous stream (S2) is actually present in
> Kafka it seems like application comes to a halt. Looking at progress of
> checkpoints, it seems like every attempt is stuck after until some timeouts
> happen at around 10 mins. If iteration on stream is removed app can
> successfully start and checkpoint even when continuous stream (S2) is flowing
> in as well. Unfortunately we are working on a hosted environment for both
> data and platform, hence debugging with thread dumps etc will be challenging.
>
> I couldn’t find a known issue on this but was wondering if anyone has seen
> such behavior or know of any issues in past. It does look like checkpointing
> has to be set to forced to get an iterative job to checkpoint in the first
> place (an option that is marked deprecated already - working on 1.8.2 version
> as of now). I do understand challenges around consistent checkpointing of
> iterative stream. As I mentioned earlier, what I really want to maintain for
> the most part are states of slowly moving dimensions. Iterations does solve
> the problem at hand (multiple loops of logic) pretty gracefully but not being
> able to restore from savepoint will be a show stopper.
>
> Will appreciate any pointer / suggestions.
>
> Thanks in advance,
>
> Ashish