>
>
> ~ Abhinav Bajaj
>
>
>
> *From: *Yang Wang
> *Date: *Wednesday, March 18, 2020 at 12:14 AM
> *To: *tison
> *Cc: *Xintong Song , "Bajaj, Abhinav" <
> abhinav.ba...@here.com>, "user@flink.apache.org"
> *Subject: *Re: JobMaster does not r
ce of logs in this case look quite similar to one we have
>>>> been discussing.
>>>>
>>>>
>>>>
>>>> If the code hasn’t changed in this area till 1.10 then maybe the latest
>>>> version also has the potential issue.
>>>>
gt;>
>>>
>>> If the code hasn’t changed in this area till 1.10 then maybe the latest
>>> version also has the potential issue.
>>>
>>>
>>>
>>> Its not straightforward to bump up the Flink version in the
>>> infrastructure av
deployed with 20
>> parallelism but it has 22 TMs to have 2 of them as spare to assist in quick
>> failover.
>>
>> I did check the logs and all 22 of task executors from those TMs get
>> registered by the time - 2020-02-27 06:35:47.050.
>>
>>
>>
>>
r.NoResourceAvailableException:
> Could not allocate all requires slots within timeout of 300000 ms. Slots
> required: 201, slots allocated: 0” at 2020-02-27 06:40:36.778.
>
>
>
> Thanks a ton for you help.
>
>
>
> ~ Abhinav Bajaj
>
>
>
> *From: *Xinto
>slot request before the timeout.*
>
>
>- *AB*: To help check that may be you can use this log time
>
>
>- 2020-02-27 06:29:53,732 [myid:1] - INFO
> [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@64] - FOLLO
above highlighted logs on jobmanager side.
>
> · 2020-02-27 06:29:53,732 [myid:1] - INFO
> [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@64] - FOLLOWING - LEADER
> ELECTION TOOK - 25069
>
> · 2020-02-27 06:29:53,766 [myid:1] - INFO
> [QuorumPeer[myid=1]/0.0.0.0:218
back up and joined the quorum as follower
> before the above highlighted logs on jobmanager side.
>
>- 2020-02-27 06:29:53,732 [myid:1] - INFO
>[QuorumPeer[myid=1]/0.0.0.0:2181:Follower@64] - FOLLOWING - LEADER
>ELECTION TOOK - 25069
>- 2020-02-27 06:29:53,766 [myid
logs from JobMaster complaining for not being able to connect
to zookeeper after that.
~ Abhinav Bajaj
From: "Bajaj, Abhinav"
Date: Wednesday, March 4, 2020 at 12:01 PM
To: Xintong Song
Cc: "user@flink.apache.org"
Subject: Re: JobMaster does not register with ResourceMan
Thanks Xintong for pointing that out.
I will dig deeper and get back with my findings.
~ Abhinav Bajaj
From: Xintong Song
Date: Tuesday, March 3, 2020 at 7:36 PM
To: "Bajaj, Abhinav"
Cc: "user@flink.apache.org"
Subject: Re: JobMaster does not register with Res
Hi Abhinav,
The JobMaster log "Connecting to ResourceManager ..." is printed after
JobMaster retrieve ResourceManager address from ZooKeeper. In your case, I
assume there's some ZK problem that JM cannot resolve RM address.
Have you confirmed whether the ZK pods are recovered after the second
di
Hi,
We recently came across an issue where JobMaster does not register with
ResourceManager in Fink high availability setup.
Let me share the details below.
Setup
* Flink 1.7.1
* K8s
* High availability mode with a single Jobmanager and 3 zookeeper nodes in
quorum.
Scenario
*
12 matches
Mail list logo