Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-24 Thread Alexey Trenikhun
From: Yang Wang Sent: Tuesday, March 23, 2021 11:17:18 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint Hi Alexey, >From your attached logs, I do not think the new start JobManager will recover &g

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-23 Thread Yang Wang
; restarted on cancel, did not grab log at that time, but chances good that I > will able to reproduce. > Thanks, > Alexey > > ------ > *From:* Yang Wang > *Sent:* Sunday, March 14, 2021 7:50:21 PM > *To:* Alexey Trenikhun > *Cc:* Flink User M

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-15 Thread Yang Wang
2021 7:50:21 PM > *To:* Alexey Trenikhun > *Cc:* Flink User Mail List > *Subject:* Re: Kubernetes HA - attempting to restore from wrong > (non-existing) savepoint > > If the HA related ConfigMaps still exists, then I am afraid the data > located on the distributed stora

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-14 Thread Alexey Trenikhun
From: Yang Wang Sent: Sunday, March 14, 2021 7:50:21 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint If the HA related ConfigMaps still exists, then I am afraid the data located on the distributed storage

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-14 Thread Yang Wang
> > Thanks, > Alexey > -- > *From:* Yang Wang > *Sent:* Thursday, March 11, 2021 2:59 AM > *To:* Alexey Trenikhun > *Cc:* Flink User Mail List > *Subject:* Re: Kubernetes HA - attempting to restore from wrong > (non-existing) savepoint > > Hi Alexey, > > F

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-11 Thread Alexey Trenikhun
ary 28, 2021 10:04 PM To: Alexey Trenikhun mailto:yen...@msn.com>> Cc: Flink User Mail List mailto:user@flink.apache.org>> Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint Hi Alexey, It seems that the KubernetesHAService works well since all the

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-11 Thread Yang Wang
List > *Subject:* Re: Kubernetes HA - attempting to restore from wrong > (non-existing) savepoint > > Hi Alexey, > > It seems that the KubernetesHAService works well since all the checkpoints > have been cleaned up when the job is canceled. > And we could find relat

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-09 Thread Alexey Trenikhun
Hi Yang, The problem is re-occurred, full JM log is attached Thanks, Alexey From: Yang Wang Sent: Sunday, February 28, 2021 10:04 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-01 Thread Alexey Trenikhun
From: Yang Wang Sent: Sunday, February 28, 2021 10:04 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint Hi Alexey, It seems that the KubernetesHAService works well since all the checkpoints have been cleaned u

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-02-28 Thread Yang Wang
Hi Alexey, It seems that the KubernetesHAService works well since all the checkpoints have been cleaned up when the job is canceled. And we could find related logs "Found 0 checkpoints in KubernetesStateHandleStore{configMapName='gsp--jobmanager-leader'}.". However

Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-02-26 Thread Alexey Trenikhun
Hello, We have Flink job running in Kubernetes with Kuberenetes HA enabled (JM is deployed as Job, single TM as StatefulSet). We taken savepoint with cancel=true. Now when we are trying to start job using --fromSavepoint A, where is A path we got from taking savepoint (ClusterEntrypoint reports