Nice to hear :-) Best, tison.
Aleksandar Mastilovic <amastilo...@sightmachine.com> 于2019年8月23日周五 上午2:22写道: > Thanks for all the help, people - you made me go through my code once > again and discover that I switched argument positions for job manager and > resource manager addresses :-) > > The docker ensemble now starts fine, I’m working on ironing out the bugs > now. > > I’ll participate in the survey too! > > On Aug 21, 2019, at 7:18 PM, Zili Chen <wander4...@gmail.com> wrote: > > Besides, would you like to participant our survey thread[1] on > user list about "How do you use high-availability services in Flink?" > > It would help Flink improve its high-availability serving. > > Best, > tison. > > [1] > https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E > > > Zili Chen <wander4...@gmail.com> 于2019年8月22日周四 上午10:16写道: > >> Hi Aleksandar, >> >> base on your log: >> >> taskmanager_1 | 2019-08-22 00:05:03,713 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting >> to ResourceManager >> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000) >> . >> taskmanager_1 | 2019-08-22 00:05:04,137 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >> resolve ResourceManager address >> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 ms: >> Could not connect to rpc endpoint under address >> akka.tcp://flink@jobmanager:6123/user/jobmanager.. >> >> it looks like you return a jobmanager address on retrieval service of >> resource manager. Please check the implementation carefully or share it on >> mailing list that others can help for investigation. >> >> Best, >> tison. >> >> >> Zhu Zhu <reed...@gmail.com> 于2019年8月22日周四 上午10:11写道: >> >>> Hi Aleksandar, >>> >>> The resource manager address is retrieved from the HA services. >>> Would you check whether your customized HA services is returning the >>> right LeaderRetrievalService and whether the LeaderRetrievalService is >>> really retrieving the right leader's address? >>> Or is it possible that the stored resource manager address in HA is >>> replaced by jobmanager address in any case? >>> >>> Thanks, >>> Zhu Zhu >>> >>> Aleksandar Mastilovic <amastilo...@sightmachine.com> 于2019年8月22日周四 >>> 上午8:16写道: >>> >>>> Hi all, >>>> >>>> I’m experimenting with using my own implementation of HA services >>>> instead of ZooKeeper that would persist JobManager information on a >>>> Kubernetes volume instead of in ZooKeeper. >>>> >>>> I’ve set the high-availability option in flink-conf.yaml to the FQN of >>>> my factory class, and started the docker ensemble as I usually do (i.e. >>>> with no special “cluster” arguments or scripts.) >>>> >>>> What’s happening now is that TaskManager is unable to connect to >>>> ResourceManager, because it seems it’s trying to use the /user/jobmanager >>>> path instead of /user/resourcemanager. >>>> >>>> Here’s what I found in the logs: >>>> >>>> >>>> jobmanager_1 | 2019-08-22 00:05:00,963 INFO akka.remote.Remoting >>>> - Remoting started; listening on >>>> addresses :[akka.tcp://flink@jobmanager:6123] >>>> jobmanager_1 | 2019-08-22 00:05:00,975 INFO >>>> org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor >>>> system started at akka.tcp://flink@jobmanager:6123 >>>> >>>> jobmanager_1 | 2019-08-22 00:05:02,380 INFO >>>> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting >>>> RPC endpoint for >>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at >>>> akka://flink/user/resourcemanager . >>>> >>>> jobmanager_1 | 2019-08-22 00:05:03,138 INFO >>>> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting >>>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher >>>> at akka://flink/user/dispatcher . >>>> >>>> jobmanager_1 | 2019-08-22 00:05:03,211 INFO >>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >>>> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager >>>> was granted leadership with fencing token 00000000000000000000000000000000 >>>> >>>> jobmanager_1 | 2019-08-22 00:05:03,292 INFO >>>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher >>>> akka.tcp://flink@jobmanager:6123/user/dispatcher was granted >>>> leadership with fencing token 00000000-0000-0000-0000-000000000000 >>>> >>>> taskmanager_1 | 2019-08-22 00:05:03,713 INFO >>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting >>>> to ResourceManager >>>> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000) >>>> . >>>> taskmanager_1 | 2019-08-22 00:05:04,137 INFO >>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >>>> resolve ResourceManager address >>>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 >>>> ms: Could not connect to rpc endpoint under address >>>> akka.tcp://flink@jobmanager:6123/user/jobmanager.. >>>> >>>> Is this a known bug? I’d appreciate any help I can get. >>>> >>>> Thanks, >>>> Aleksandar Mastilovic >>>> >>> >