Re: TaskManager not connecting to ResourceManager in HA mode

Zili Chen Thu, 22 Aug 2019 15:59:05 -0700

Nice to hear :-)

Best,
tison.



Aleksandar Mastilovic <amastilo...@sightmachine.com> 于2019年8月23日周五 上午2:22写道：

> Thanks for all the help, people - you made me go through my code once
> again and discover that I switched argument positions for job manager and
> resource manager addresses :-)
>
> The docker ensemble now starts fine, I’m working on ironing out the bugs
> now.
>
> I’ll participate in the survey too!
>
> On Aug 21, 2019, at 7:18 PM, Zili Chen <wander4...@gmail.com> wrote:
>
> Besides, would you like to participant our survey thread[1] on
> user list about "How do you use high-availability services in Flink?"
>
> It would help Flink improve its high-availability serving.
>
> Best,
> tison.
>
> [1]
> https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E
>
>
> Zili Chen <wander4...@gmail.com> 于2019年8月22日周四 上午10:16写道：
>
>> Hi Aleksandar,
>>
>> base on your log:
>>
>> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
>> to ResourceManager
>> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000)
>> .
>> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
>> resolve ResourceManager address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 ms:
>> Could not connect to rpc endpoint under address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>>
>> it looks like you return a jobmanager address on retrieval service of
>> resource manager. Please check the implementation carefully or share it on
>> mailing list that others can help for investigation.
>>
>> Best,
>> tison.
>>
>>
>> Zhu Zhu <reed...@gmail.com> 于2019年8月22日周四 上午10:11写道：
>>
>>> Hi Aleksandar,
>>>
>>> The resource manager address is retrieved from the HA services.
>>> Would you check whether your customized HA services is returning the
>>> right  LeaderRetrievalService and whether the LeaderRetrievalService is
>>> really retrieving the right leader's address?
>>> Or is it possible that the stored resource manager address in HA is
>>> replaced by jobmanager address in any case?
>>>
>>> Thanks,
>>> Zhu Zhu
>>>
>>> Aleksandar Mastilovic <amastilo...@sightmachine.com> 于2019年8月22日周四
>>> 上午8:16写道：
>>>
>>>> Hi all,
>>>>
>>>> I’m experimenting with using my own implementation of HA services
>>>> instead of ZooKeeper that would persist JobManager information on a
>>>> Kubernetes volume instead of in ZooKeeper.
>>>>
>>>> I’ve set the high-availability option in flink-conf.yaml to the FQN of
>>>> my factory class, and started the docker ensemble as I usually do (i.e.
>>>> with no special “cluster” arguments or scripts.)
>>>>
>>>> What’s happening now is that TaskManager is unable to connect to
>>>> ResourceManager, because it seems it’s trying to use the /user/jobmanager
>>>> path instead of /user/resourcemanager.
>>>>
>>>> Here’s what I found in the logs:
>>>>
>>>>
>>>> jobmanager_1    | 2019-08-22 00:05:00,963 INFO  akka.remote.Remoting
>>>>                                        - Remoting started; listening on
>>>> addresses :[akka.tcp://flink@jobmanager:6123]
>>>> jobmanager_1    | 2019-08-22 00:05:00,975 INFO
>>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor
>>>> system started at akka.tcp://flink@jobmanager:6123
>>>>
>>>> jobmanager_1    | 2019-08-22 00:05:02,380 INFO
>>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>>>> RPC endpoint for
>>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>>>> akka://flink/user/resourcemanager .
>>>>
>>>> jobmanager_1    | 2019-08-22 00:05:03,138 INFO
>>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>>>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>>>> at akka://flink/user/dispatcher .
>>>>
>>>> jobmanager_1    | 2019-08-22 00:05:03,211 INFO
>>>>  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>>>> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager
>>>> was granted leadership with fencing token 00000000000000000000000000000000
>>>>
>>>> jobmanager_1    | 2019-08-22 00:05:03,292 INFO
>>>>  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
>>>> akka.tcp://flink@jobmanager:6123/user/dispatcher was granted
>>>> leadership with fencing token 00000000-0000-0000-0000-000000000000
>>>>
>>>> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
>>>> to ResourceManager
>>>> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000)
>>>> .
>>>> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
>>>> resolve ResourceManager address
>>>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000
>>>> ms: Could not connect to rpc endpoint under address
>>>> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>>>>
>>>> Is this a known bug? I’d appreciate any help I can get.
>>>>
>>>> Thanks,
>>>> Aleksandar Mastilovic
>>>>
>>>
>

Re: TaskManager not connecting to ResourceManager in HA mode

Reply via email to