Could this be FLIP-15 related as well then?

> On May 4, 2020, at 9:41 PM, Ashish Pokharel <ashish...@yahoo.com> wrote:
> 
> Hi all,
> 
> Hope everyone is doing well!
> 
> I am running into what seems like a deadlock (application stalled) situation 
> with a Flink streaming job upon restore from savepoint. Job has a slowly 
> moving stream (S1) that needs to be “stateful” and a continuous stream (S2) 
> which is “joined” with slow moving stream (S1). Some level of loss/repetition 
> is acceptable in continuous stream (S2) and hence can rely on something like 
> Kafka consumer states upon restarts etc. Continuous stream (S2) however needs 
> to be iterated through states from slowly moving streams (S1) a few times 
> (mostly 2). States are fair sized (ends up being 15GB on HDFS). When job is 
> restarted with no continuous data (S2) on topic job starts up, restores 
> states and does it’s initial checkpoint within 3 minutes. However, when app 
> is started from savepoint and continuous stream (S2) is actually present in 
> Kafka it seems like application comes to a halt. Looking at progress of 
> checkpoints, it seems like every attempt is stuck after until some timeouts 
> happen at around 10 mins. If iteration on stream is removed app can 
> successfully start and checkpoint even when continuous stream (S2) is flowing 
> in as well. Unfortunately we are working on a hosted environment for both 
> data and platform, hence debugging with thread dumps etc will be challenging. 
> 
> I couldn’t find a known issue on this but was wondering if anyone has seen 
> such behavior or know of any issues in past. It does look like checkpointing 
> has to be set to forced to get an iterative job to checkpoint in the first 
> place (an option that is marked deprecated already - working on 1.8.2 version 
> as of now). I do understand challenges around consistent checkpointing of 
> iterative stream. As I mentioned earlier, what I really want to maintain for 
> the most part are states of slowly moving dimensions. Iterations does solve 
> the problem at hand (multiple loops of logic) pretty gracefully but not being 
> able to restore from savepoint will be a show stopper. 
> 
> Will appreciate any pointer / suggestions.
> 
> Thanks in advance, 
> 
> Ashish

Reply via email to