Re: Job manager crash

Robert Metzger Thu, 09 Sep 2021 03:52:06 -0700

Is the kubernetes server you are using particularly busy? Maybe these
issues occur because the server is overloaded?


"Triggering checkpoint 2193 (type=CHECKPOINT) @ 1630681482667 for job
00000000000000000000000000000000."
"Completed checkpoint 2193 for job 00000000000000000000000000000000 (474
bytes in 195 ms)."
"Triggering checkpoint 2194 (type=CHECKPOINT) @ 1630681492667 for job
00000000000000000000000000000000."
"Completed checkpoint 2194 for job 00000000000000000000000000000000 (474
bytes in 161 ms)."
"Renew deadline reached after 60 seconds while renewing lock ConfigMapLock:
myNs - myJob-dispatcher-leader (1bcda6b0-8a5a-4969-b9e4-2257c4478572)"
"Stopping SessionDispatcherLeaderProcess."

At some point, the leader election mechanism in fabric8 seems to give up.


On Tue, Sep 7, 2021 at 10:05 AM mejri houssem <mejrihousse...@gmail.com>
wrote:

> hello,
>
> Here's other logs of the latest jm crash.
>
>
> Le lun. 6 sept. 2021 à 14:18, houssem <mejrihousse...@gmail.com> a écrit :
>
>> hello,
>>
>> I have three jobs running on my kubernetes cluster and each job has his
>> own cluster id.
>>
>> On 2021/09/06 03:28:10, Yangze Guo <karma...@gmail.com> wrote:
>> > Hi,
>> >
>> > The root cause is not "java.lang.NoClassDefFound". The job has been
>> > running but could not edit the config map
>> > "myJob-00000000000000000000000000000000-jobmanager-leader" and it
>> > seems finally disconnected with the API server. Is there another job
>> > with the same cluster id (myJob) ?
>> >
>> > I would also pull Yang Wang.
>> >
>> > Best,
>> > Yangze Guo
>> >
>> > On Mon, Sep 6, 2021 at 10:10 AM Caizhi Weng <tsreape...@gmail.com>
>> wrote:
>> > >
>> > > Hi!
>> > >
>> > > There is a message saying "java.lang.NoClassDefFound Error:
>> org/apache/hadoop/hdfs/HdfsConfiguration" in your log file. Are you
>> visiting HDFS in your job? If yes it seems that your Flink distribution or
>> your cluster is lacking hadoop classes. Please make sure that there are
>> hadoop jars in the lib directory of Flink, or your cluster has set the
>> HADOOP_CLASSPATH environment variable.
>> > >
>> > > mejri houssem <mejrihousse...@gmail.com> 于2021年9月4日周六 上午12:15写道：
>> > >>
>> > >>
>> > >> Hello ,
>> > >>
>> > >> I am facing a JM crash lately. I am deploying a flink application
>> cluster on kubernetes.
>> > >>
>> > >> When i install my chart using helm everything works fine but after
>> some time ,the Jm starts to crash
>> > >>
>> > >> and then it gets deleted eventually after 5 restarts.
>> > >>
>> > >> flink version: 1.12.5 (upgraded recently from 1.12.2)
>> > >> HA mode : k8s
>> > >>
>> > >> Here's the full log of the JM attached file.
>> >
>>
>

Re: Job manager crash

Reply via email to