subject:"Job manager crash"

Re: Job manager crash

2021-09-18 Thread Yang Wang

The GC log looks quite normal. Maybe the K8s APIServer is overloaded. Best, Yang houssem 于2021年9月13日周一下午5:11写道： > hello, > > here's some of full GC log: > > OpenJDK 64-Bit Server VM (25.232-b09) for linux-amd64 JRE (1.8.0_232-b09), > built on Oct 18 2019 15:04:46 by "jenkins" with gcc 4.8.2 20

Re: Job manager crash

2021-09-13 Thread houssem

hello, here's some of full GC log: OpenJDK 64-Bit Server VM (25.232-b09) for linux-amd64 JRE (1.8.0_232-b09), built on Oct 18 2019 15:04:46 by "jenkins" with gcc 4.8.2 20140120 (Red Hat 4.8.2-15) Memory: 4k page, physical 976560k(946672k free), swap 0k(0k free) CommandLine flags: -XX:Compresse

Re: Job manager crash

2021-09-09 Thread mejri houssem

thanks for the response, with respect to the api-server i don't think i can do so much about it because i am just using a specific namespace in kubernetes cluster, it's not me who administrate the cluster. otherwise i will try the gc log option to see if can find something useful in order to debu

Re: Job manager crash

2021-09-09 Thread houssem

Hello , with respect to the api-server i dotn re On 2021/09/09 11:37:49, Yang Wang wrote: > I think @Robert Metzger is right. You need to check > whether your Kubernetes APIServer is working properly or not(e.g. > overloaded). > > Another hint is about the fullGC. Please use the following con

Re: Job manager crash

2021-09-09 Thread Yang Wang

I think @Robert Metzger is right. You need to check whether your Kubernetes APIServer is working properly or not(e.g. overloaded). Another hint is about the fullGC. Please use the following config option to enable the GC logs and check the full gc time. env.java.opts.jobmanager: -verbose:gc -XX:+

Re: Job manager crash

2021-09-09 Thread Robert Metzger

Is the kubernetes server you are using particularly busy? Maybe these issues occur because the server is overloaded? "Triggering checkpoint 2193 (type=CHECKPOINT) @ 1630681482667 for job ." "Completed checkpoint 2193 for job (474 byt

Re: Job manager crash

2021-09-06 Thread houssem

hello, I have three jobs running on my kubernetes cluster and each job has his own cluster id. On 2021/09/06 03:28:10, Yangze Guo wrote: > Hi, > > The root cause is not "java.lang.NoClassDefFound". The job has been > running but could not edit the config map > "myJob-0

Re: Job manager crash

2021-09-05 Thread Yangze Guo

Hi, The root cause is not "java.lang.NoClassDefFound". The job has been running but could not edit the config map "myJob--jobmanager-leader" and it seems finally disconnected with the API server. Is there another job with the same cluster id (myJob) ? I would also

Re: Job manager crash

2021-09-05 Thread Caizhi Weng

Hi! There is a message saying "java.lang.NoClassDefFound Error: org/apache/hadoop/hdfs/HdfsConfiguration" in your log file. Are you visiting HDFS in your job? If yes it seems that your Flink distribution or your cluster is lacking hadoop classes. Please make sure that there are hadoop jars in the

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-11 Thread Barisa Obradovic

Thank you Till, that's perfect. I increased the max retry attempts a bit, and now it works like a charm ( no restarts ). -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-11 Thread Till Rohrmann

Hi Barisa, Could you give us the full logs of the run? It looks a bit that you exceeded the maximum retry attempts while you upgraded your ZooKeeper cluster. You can increase it via recovery.zookeeper.client.retry-wait and recovery.zookeeper.client.max-retry-attempts. >From Flink's perspective it

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Barisa Obradovic

Great, thank you for help Matthias -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Matthias Pohl

r server zdzk.servicexxx/192.168.190.92:2181, > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:

Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Barisa Obradovic

sun.nio.ch.IOUtil.rea FYI: I've asked same question on stackoverflow: https://stackoverflow.com/questions/66120905/should-flink-job-manager-crash-during-zookeeper-upgrade -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Recovery from job manager crash using check points

2019-08-21 Thread Zili Chen

tates with a save point directory? e.g. > > ./bin/flink run myJob.jar -s savepointDirectory > > > > Regards, > > > > Min > > > > *From:* Zili Chen [mailto:wander4...@gmail.com] > *Sent:* Dienstag, 20. August 2019 04:16 > *To:* Biao Liu > *Cc:* Ta

RE: Recovery from job manager crash using check points

2019-08-21 Thread min.tan

From: Zili Chen [mailto:wander4...@gmail.com] Sent: Dienstag, 20. August 2019 04:16 To: Biao Liu Cc: Tan, Min; user Subject: [External] Re: Recovery from job manager crash using check points Hi Min, I guess you use standalone high-availability and when TM fails, JM can recovered the job from an

Re: Recovery from job manager crash using check points

2019-08-19 Thread Zili Chen

Hi Min, I guess you use standalone high-availability and when TM fails, JM can recovered the job from an in-memory checkpoint store. However, when JM fails, since you don't persist state on ha backend such as ZooKeeper, even JM relaunched by YARN RM superseded by a stand by, the new one knows not

Re: Recovery from job manager crash using check points

2019-08-19 Thread Biao Liu

Hi Min, > Do I need to set up zookeepers to keep the states when a job manager crashes? I guess you need to set up the HA [1] properly. Besides that, I would suggest you should also check the state backend. 1. https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availabilit

Re: Recovery from job manager crash using check points

2019-08-19 Thread miki haiat

Wich kind of deployment system are you using, Standalone ,yarn ... Other ? On Mon, Aug 19, 2019, 18:28 wrote: > Hi, > > > > I can use check points to recover Flink states when a task manger crashes. > > > > I can not use check points to recover Flink states when a job manger > crashes. > > > > D

Recovery from job manager crash using check points

2019-08-19 Thread min.tan

Hi, I can use check points to recover Flink states when a task manger crashes. I can not use check points to recover Flink states when a job manger crashes. Do I need to set up zookeepers to keep the states when a job manager crashes? Regards Min E-mails can involve SUBSTANTIAL RISKS, e.g. l

Re: Job manager crash

Re: Job manager crash

Re: Job manager crash

Re: Job manager crash

Re: Job manager crash

Re: Job manager crash

Re: Job manager crash

Re: Job manager crash

Re: Job manager crash

Re: Should flink job manager crash during zookeeper upgrade?

Re: Should flink job manager crash during zookeeper upgrade?

Re: Should flink job manager crash during zookeeper upgrade?

Re: Should flink job manager crash during zookeeper upgrade?

Should flink job manager crash during zookeeper upgrade?

Re: Recovery from job manager crash using check points

RE: Recovery from job manager crash using check points

Re: Recovery from job manager crash using check points

Re: Recovery from job manager crash using check points

Re: Recovery from job manager crash using check points

Recovery from job manager crash using check points

20 matches

Site Navigation

Mail list logo

Footer information