Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-22 Thread tison
> > > ~ Abhinav Bajaj > > > > *From: *Yang Wang > *Date: *Wednesday, March 18, 2020 at 12:14 AM > *To: *tison > *Cc: *Xintong Song , "Bajaj, Abhinav" < > abhinav.ba...@here.com>, "user@flink.apache.org" > *Subject: *Re: JobMaster does not r

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-18 Thread Yang Wang
ce of logs in this case look quite similar to one we have >>>> been discussing. >>>> >>>> >>>> >>>> If the code hasn’t changed in this area till 1.10 then maybe the latest >>>> version also has the potential issue. >>>>

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-17 Thread tison
gt;> >>> >>> If the code hasn’t changed in this area till 1.10 then maybe the latest >>> version also has the potential issue. >>> >>> >>> >>> Its not straightforward to bump up the Flink version in the >>> infrastructure av

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-17 Thread tison
deployed with 20 >> parallelism but it has 22 TMs to have 2 of them as spare to assist in quick >> failover. >> >> I did check the logs and all 22 of task executors from those TMs get >> registered by the time - 2020-02-27 06:35:47.050. >> >> >> >>

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-17 Thread Xintong Song
r.NoResourceAvailableException: > Could not allocate all requires slots within timeout of 300000 ms. Slots > required: 201, slots allocated: 0” at 2020-02-27 06:40:36.778. > > > > Thanks a ton for you help. > > > > ~ Abhinav Bajaj > > > > *From: *Xinto

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-16 Thread Xintong Song
>slot request before the timeout.* > > >- *AB*: To help check that may be you can use this log time > > >- 2020-02-27 06:29:53,732 [myid:1] - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@64] - FOLLO

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-05 Thread Xintong Song
above highlighted logs on jobmanager side. > > · 2020-02-27 06:29:53,732 [myid:1] - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@64] - FOLLOWING - LEADER > ELECTION TOOK - 25069 > > · 2020-02-27 06:29:53,766 [myid:1] - INFO > [QuorumPeer[myid=1]/0.0.0.0:218

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Xintong Song
back up and joined the quorum as follower > before the above highlighted logs on jobmanager side. > >- 2020-02-27 06:29:53,732 [myid:1] - INFO >[QuorumPeer[myid=1]/0.0.0.0:2181:Follower@64] - FOLLOWING - LEADER >ELECTION TOOK - 25069 >- 2020-02-27 06:29:53,766 [myid

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Bajaj, Abhinav
logs from JobMaster complaining for not being able to connect to zookeeper after that. ~ Abhinav Bajaj From: "Bajaj, Abhinav" Date: Wednesday, March 4, 2020 at 12:01 PM To: Xintong Song Cc: "user@flink.apache.org" Subject: Re: JobMaster does not register with ResourceMan

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Bajaj, Abhinav
Thanks Xintong for pointing that out. I will dig deeper and get back with my findings. ~ Abhinav Bajaj From: Xintong Song Date: Tuesday, March 3, 2020 at 7:36 PM To: "Bajaj, Abhinav" Cc: "user@flink.apache.org" Subject: Re: JobMaster does not register with Res

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-03 Thread Xintong Song
Hi Abhinav, The JobMaster log "Connecting to ResourceManager ..." is printed after JobMaster retrieve ResourceManager address from ZooKeeper. In your case, I assume there's some ZK problem that JM cannot resolve RM address. Have you confirmed whether the ZK pods are recovered after the second di

JobMaster does not register with ResourceManager in high availability setup

2020-03-03 Thread Bajaj, Abhinav
Hi, We recently came across an issue where JobMaster does not register with ResourceManager in Fink high availability setup. Let me share the details below. Setup * Flink 1.7.1 * K8s * High availability mode with a single Jobmanager and 3 zookeeper nodes in quorum. Scenario *