If the JobManager and TaskManager have some fatal errors which they could
not correctly handle,
then both of them will directly exit with non-zero code. In such a case,
the pod will be restarted.

Once possible scenario I could imagine that the liveness and readiness
could help is the long GC.
During the GC period, the rpc port could not be accessed successfully. Also
the network issues
could also benefit from the liveness check.


Best,
Yang

narasimha <swamy.haj...@gmail.com> 于2021年2月5日周五 上午10:26写道:

> I have been asked at the org to set it up as per org level standards, so
> trying to set them.
> As these are health checks with k8s, so that k8s can report if there are
> any intermittent issues.
>
> Does the JobManager and TaskManager handle failures diligently?
>
>
>
>
> On Fri, Feb 5, 2021 at 7:53 AM Yang Wang <danrtsey...@gmail.com> wrote:
>
>> Do you mean setting the liveness check like the following could not take
>> effect?
>>
>>         livenessProbe:
>>           tcpSocket:
>>             port: 6123
>>           initialDelaySeconds: 30
>>           periodSeconds: 60
>>
>> AFAIK, setting the liveness and the readiness probe is not very necessary
>> for the Flink job. Since
>> in most cases, the JobManager and TaskManager will exit before the rpc
>> port is not accessible.
>>
>> Best,
>> Yang
>>
>>
>> narasimha <swamy.haj...@gmail.com> 于2021年2月5日周五 上午2:08写道:
>>
>>>
>>> Hi, I'm using the ververica platform to host flink jobs.
>>>
>>> Need help in setting up readiness, liveness probes to the taskmanager,
>>> jobmanager pods.
>>> I tried it locally by adding the probe details in deployment.yml file
>>> respectively, but it didn't work.
>>>
>>> Can someone help me with setting up the probes. Another question is it
>>> possible in the  first place?
>>> --
>>> A.Narasimha Swamy
>>>
>>
>
> --
> A.Narasimha Swamy
>

Reply via email to