The GC log looks quite normal. Maybe the K8s APIServer is overloaded.
Best,
Yang
houssem 于2021年9月13日周一 下午5:11写道:
> hello,
>
> here's some of full GC log:
>
> OpenJDK 64-Bit Server VM (25.232-b09) for linux-amd64 JRE (1.8.0_232-b09),
> built on Oct 18 2019 15:04:46 by "jenkins" with gcc 4.8.2 20
hello,
here's some of full GC log:
OpenJDK 64-Bit Server VM (25.232-b09) for linux-amd64 JRE (1.8.0_232-b09),
built on Oct 18 2019 15:04:46 by "jenkins" with gcc 4.8.2 20140120 (Red Hat
4.8.2-15)
Memory: 4k page, physical 976560k(946672k free), swap 0k(0k free)
CommandLine flags: -XX:Compresse
thanks for the response,
with respect to the api-server i don't think i can do so much about it
because i am just using a specific namespace in kubernetes cluster, it's
not me who administrate the cluster.
otherwise i will try the gc log option to see if can find something useful
in order to debu
Hello ,
with respect to the api-server i dotn re
On 2021/09/09 11:37:49, Yang Wang wrote:
> I think @Robert Metzger is right. You need to check
> whether your Kubernetes APIServer is working properly or not(e.g.
> overloaded).
>
> Another hint is about the fullGC. Please use the following con
I think @Robert Metzger is right. You need to check
whether your Kubernetes APIServer is working properly or not(e.g.
overloaded).
Another hint is about the fullGC. Please use the following config option to
enable the GC logs and check the full gc time.
env.java.opts.jobmanager: -verbose:gc -XX:+
Is the kubernetes server you are using particularly busy? Maybe these
issues occur because the server is overloaded?
"Triggering checkpoint 2193 (type=CHECKPOINT) @ 1630681482667 for job
."
"Completed checkpoint 2193 for job (474
byt
hello,
I have three jobs running on my kubernetes cluster and each job has his own
cluster id.
On 2021/09/06 03:28:10, Yangze Guo wrote:
> Hi,
>
> The root cause is not "java.lang.NoClassDefFound". The job has been
> running but could not edit the config map
> "myJob-0
Hi,
The root cause is not "java.lang.NoClassDefFound". The job has been
running but could not edit the config map
"myJob--jobmanager-leader" and it
seems finally disconnected with the API server. Is there another job
with the same cluster id (myJob) ?
I would also
Hi!
There is a message saying "java.lang.NoClassDefFound Error:
org/apache/hadoop/hdfs/HdfsConfiguration" in your log file. Are you
visiting HDFS in your job? If yes it seems that your Flink distribution or
your cluster is lacking hadoop classes. Please make sure that there are
hadoop jars in the
Thank you Till, that's perfect.
I increased the max retry attempts a bit, and now it works like a charm ( no
restarts ).
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi Barisa,
Could you give us the full logs of the run? It looks a bit that you
exceeded the maximum retry attempts while you upgraded your ZooKeeper
cluster. You can increase it via recovery.zookeeper.client.retry-wait
and recovery.zookeeper.client.max-retry-attempts.
>From Flink's perspective it
Great, thank you for help Matthias
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
r server zdzk.servicexxx/192.168.190.92:2181,
> unexpected error, closing socket connection and attempting reconnect
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192]
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:
sun.nio.ch.IOUtil.rea
FYI: I've asked same question on stackoverflow:
https://stackoverflow.com/questions/66120905/should-flink-job-manager-crash-during-zookeeper-upgrade
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
tates with a save point directory? e.g.
>
> ./bin/flink run myJob.jar -s savepointDirectory
>
>
>
> Regards,
>
>
>
> Min
>
>
>
> *From:* Zili Chen [mailto:wander4...@gmail.com]
> *Sent:* Dienstag, 20. August 2019 04:16
> *To:* Biao Liu
> *Cc:* Ta
From: Zili Chen [mailto:wander4...@gmail.com]
Sent: Dienstag, 20. August 2019 04:16
To: Biao Liu
Cc: Tan, Min; user
Subject: [External] Re: Recovery from job manager crash using check points
Hi Min,
I guess you use standalone high-availability and when TM fails,
JM can recovered the job from an
Hi Min,
I guess you use standalone high-availability and when TM fails,
JM can recovered the job from an in-memory checkpoint store.
However, when JM fails, since you don't persist state on ha backend
such as ZooKeeper, even JM relaunched by YARN RM superseded by a
stand by, the new one knows not
Hi Min,
> Do I need to set up zookeepers to keep the states when a job manager
crashes?
I guess you need to set up the HA [1] properly. Besides that, I would
suggest you should also check the state backend.
1.
https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availabilit
Wich kind of deployment system are you using,
Standalone ,yarn ... Other ?
On Mon, Aug 19, 2019, 18:28 wrote:
> Hi,
>
>
>
> I can use check points to recover Flink states when a task manger crashes.
>
>
>
> I can not use check points to recover Flink states when a job manger
> crashes.
>
>
>
> D
Hi,
I can use check points to recover Flink states when a task manger crashes.
I can not use check points to recover Flink states when a job manger crashes.
Do I need to set up zookeepers to keep the states when a job manager crashes?
Regards
Min
E-mails can involve SUBSTANTIAL RISKS, e.g. l
20 matches
Mail list logo