even though
Max. number of execution retries Restart with fixed delay (24 ms). #20
restart attempts.
On Sat, Jun 29, 2019 at 10:44 AM Vishal Santoshi
wrote:
> This is strange, the retry strategy was 20 times with 4 minute delay.
> This job tried once ( we had a hadoop Name Node hiccup ) but
This is strange, the retry strategy was 20 times with 4 minute delay. This
job tried once ( we had a hadoop Name Node hiccup ) but I think it could
not even get to NN and gave up ( as in did not retry the next 19 times )
*019-06-29 00:33:13,680 INFO
org.apache.flink.runtime.executiongraph.E
We are investigating that. But is the above theory plausible ( flink
gurus ) even though this, as in forcing restartPolicy: Never pretty much
nullifies HA on JM is it is a Job cluster ( at leats on k8s )
As for the reason we are investigating that.
One thing we looking as the QOS (
https://kub
This is slightly off topic, so I'm changing the subject to not conflate the
original issue you brought up. But do we know why JM crashed in the first
place?
We are also thinking of moving to K8s, but to be honest we had tons of
stability issues in our first rodeo. That could just be our lack of