subject:"Why did JM fail on K8s \(see original thread below\)"

Re: Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Vishal Santoshi

even though Max. number of execution retries Restart with fixed delay (24 ms). #20 restart attempts. On Sat, Jun 29, 2019 at 10:44 AM Vishal Santoshi wrote: > This is strange, the retry strategy was 20 times with 4 minute delay. > This job tried once ( we had a hadoop Name Node hiccup ) but

Re: Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Vishal Santoshi

This is strange, the retry strategy was 20 times with 4 minute delay. This job tried once ( we had a hadoop Name Node hiccup ) but I think it could not even get to NN and gave up ( as in did not retry the next 19 times ) *019-06-29 00:33:13,680 INFO org.apache.flink.runtime.executiongraph.E

Re: Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Vishal Santoshi

We are investigating that. But is the above theory plausible ( flink gurus ) even though this, as in forcing restartPolicy: Never pretty much nullifies HA on JM is it is a Job cluster ( at leats on k8s ) As for the reason we are investigating that. One thing we looking as the QOS ( https://kub

Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Timothy Victor

This is slightly off topic, so I'm changing the subject to not conflate the original issue you brought up. But do we know why JM crashed in the first place? We are also thinking of moving to K8s, but to be honest we had tons of stability issues in our first rodeo. That could just be our lack of

Re: Why did JM fail on K8s (see original thread below)

Re: Why did JM fail on K8s (see original thread below)

Re: Why did JM fail on K8s (see original thread below)

Why did JM fail on K8s (see original thread below)

4 matches

Site Navigation

Mail list logo

Footer information