I guess using a session cluster rather then a job cluster will decouple the
job from the container and may be the only option as of today?
On Sat, Jun 29, 2019, 9:34 AM Vishal Santoshi
wrote:
> So there a re 2 scenerios
>
> 1. If JM goes down ( exits ) and k8s re launches the Job Cluster ( the J
So there a re 2 scenerios
1. If JM goes down ( exits ) and k8s re launches the Job Cluster ( the JM
), it is using the exact command, the JM was brought up in the first place.
2. If the pipe is restarted for any other reason by the JM ( the JM has not
exited but *Job Kafka-to-HDFS (
Another point the JM had terminated. The policy on K8s for Job Cluster is
spec:
restartPolicy: OnFailure
*2019-06-29 00:33:14,308 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Terminating cluster
entrypoint process StandaloneJobClusterEntryPoint with exit code 1443.*
On Sat,
I have not tried on bare metal. We have no option but k8s.
And this is a job cluster.
On Sat, Jun 29, 2019 at 9:01 AM Timothy Victor wrote:
> Hi Vishal, can this be reproduced on a bare metal instance as well? Also
> is this a job or a session cluster?
>
> Thanks
>
> Tim
>
> On Sat, Jun 29, 2
Hi Vishal, can this be reproduced on a bare metal instance as well? Also
is this a job or a session cluster?
Thanks
Tim
On Sat, Jun 29, 2019, 7:50 AM Vishal Santoshi
wrote:
> OK this happened again and it is bizarre ( and is definitely not what I
> think should happen )
>
>
>
>
> The job fai
OK this happened again and it is bizarre ( and is definitely not what I
think should happen )
The job failed and I see these logs ( In essence it is keeping the last 5
externalized checkpoints ) but deleting the zk checkpoints directory
*06.28.2019 20:33:13.7382019-06-29 00:33:13,73
Ok, I will do that.
On Wed, Jun 5, 2019, 8:25 AM Chesnay Schepler wrote:
> Can you provide us the jobmanager logs?
>
> After the first restart the JM should have started deleting older
> checkpoints as new ones were created.
> After the second restart the JM should have recovered all 10 checkpoi
Can you provide us the jobmanager logs?
After the first restart the JM should have started deleting older
checkpoints as new ones were created.
After the second restart the JM should have recovered all 10
checkpoints, start from the latest, and start pruning old ones as new
ones were created.
Any one?
On Tue, Jun 4, 2019, 2:41 PM Vishal Santoshi
wrote:
> The above is flink 1.8
>
> On Tue, Jun 4, 2019 at 12:32 PM Vishal Santoshi
> wrote:
>
>> I had a sequence of events that created this issue.
>>
>> * I started a job and I had the state.checkpoints.num-retained: 5
>>
>> * As expected
The above is flink 1.8
On Tue, Jun 4, 2019 at 12:32 PM Vishal Santoshi
wrote:
> I had a sequence of events that created this issue.
>
> * I started a job and I had the state.checkpoints.num-retained: 5
>
> * As expected I have 5 latest checkpoints retained in my hdfs backend.
>
>
> * JM dies ( K
I had a sequence of events that created this issue.
* I started a job and I had the state.checkpoints.num-retained: 5
* As expected I have 5 latest checkpoints retained in my hdfs backend.
* JM dies ( K8s limit etc ) without cleaning the hdfs directory. The k8s
job restores from the latest che
11 matches
Mail list logo