Is the kubernetes server you are using particularly busy? Maybe these issues occur because the server is overloaded?
"Triggering checkpoint 2193 (type=CHECKPOINT) @ 1630681482667 for job 00000000000000000000000000000000." "Completed checkpoint 2193 for job 00000000000000000000000000000000 (474 bytes in 195 ms)." "Triggering checkpoint 2194 (type=CHECKPOINT) @ 1630681492667 for job 00000000000000000000000000000000." "Completed checkpoint 2194 for job 00000000000000000000000000000000 (474 bytes in 161 ms)." "Renew deadline reached after 60 seconds while renewing lock ConfigMapLock: myNs - myJob-dispatcher-leader (1bcda6b0-8a5a-4969-b9e4-2257c4478572)" "Stopping SessionDispatcherLeaderProcess." At some point, the leader election mechanism in fabric8 seems to give up. On Tue, Sep 7, 2021 at 10:05 AM mejri houssem <mejrihousse...@gmail.com> wrote: > hello, > > Here's other logs of the latest jm crash. > > > Le lun. 6 sept. 2021 à 14:18, houssem <mejrihousse...@gmail.com> a écrit : > >> hello, >> >> I have three jobs running on my kubernetes cluster and each job has his >> own cluster id. >> >> On 2021/09/06 03:28:10, Yangze Guo <karma...@gmail.com> wrote: >> > Hi, >> > >> > The root cause is not "java.lang.NoClassDefFound". The job has been >> > running but could not edit the config map >> > "myJob-00000000000000000000000000000000-jobmanager-leader" and it >> > seems finally disconnected with the API server. Is there another job >> > with the same cluster id (myJob) ? >> > >> > I would also pull Yang Wang. >> > >> > Best, >> > Yangze Guo >> > >> > On Mon, Sep 6, 2021 at 10:10 AM Caizhi Weng <tsreape...@gmail.com> >> wrote: >> > > >> > > Hi! >> > > >> > > There is a message saying "java.lang.NoClassDefFound Error: >> org/apache/hadoop/hdfs/HdfsConfiguration" in your log file. Are you >> visiting HDFS in your job? If yes it seems that your Flink distribution or >> your cluster is lacking hadoop classes. Please make sure that there are >> hadoop jars in the lib directory of Flink, or your cluster has set the >> HADOOP_CLASSPATH environment variable. >> > > >> > > mejri houssem <mejrihousse...@gmail.com> 于2021年9月4日周六 上午12:15写道: >> > >> >> > >> >> > >> Hello , >> > >> >> > >> I am facing a JM crash lately. I am deploying a flink application >> cluster on kubernetes. >> > >> >> > >> When i install my chart using helm everything works fine but after >> some time ,the Jm starts to crash >> > >> >> > >> and then it gets deleted eventually after 5 restarts. >> > >> >> > >> flink version: 1.12.5 (upgraded recently from 1.12.2) >> > >> HA mode : k8s >> > >> >> > >> Here's the full log of the JM attached file. >> > >> >