Hi Ani, the problem is that you have to set a reachable jobmanager hostname in the flink-conf.yaml via jobmanager.rpc.address: [reachable hostname]. I assume that you use the default value which is localhost. You can see it in the fetcher info where the URL for the different files points to localhost:38985. Setting this value to the external hostname on which the JobManager is running should solve the problem.
Cheers, Till On Thu, Jun 8, 2017 at 11:21 PM, ani.desh1512 <ani.desh1...@gmail.com> wrote: > I am trying to configure Flink to work on top of Mesos. I am using Flink > release-1.3. I am using DCOS 1.9's underlying mesos which is version 1.2. I > am able to start Flink without any issues when the taskmanager starts on > the > same host as that of appmaster. But when the taskmanager is launched on a > different host, the container fails to launch. The flink mesos-appmaster > log > is something as follows: > > /2017-06-08 19:19:01,537 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Launching Mesos task taskmanager-00003 on host 10.101.2.117. > 2017-06-08 19:19:01,550 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Launching Mesos task taskmanager-00002 on host 10.101.2.117. > 2017-06-08 19:19:01,607 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Launching Mesos task taskmanager-00001 on host 10.101.2.117. > 2017-06-08 19:19:01,623 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Launching Mesos task taskmanager-00004 on host 10.101.2.117. > 2017-06-08 19:19:01,645 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Launching Mesos task taskmanager-00006 on host 10.101.2.91. > 2017-06-08 19:19:01,660 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Launching Mesos task taskmanager-00005 on host 10.101.2.91. > 2017-06-08 19:19:01,674 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Launching Mesos task taskmanager-00007 on host 10.101.2.91. > 2017-06-08 19:19:02,234 WARN org.apache.flink.mesos.scheduler.TaskMonitor > - Mesos task taskmanager-00003 failed unexpectedly. > 2017-06-08 19:19:02,234 WARN org.apache.flink.mesos.scheduler.TaskMonitor > - Mesos task taskmanager-00002 failed unexpectedly. > 2017-06-08 19:19:02,245 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Mesos task taskmanager-00002 failed, with a TaskManager in launch or > registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED > (Failed to launch container: Failed to fetch all URIs for container > '125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256) > 2017-06-08 19:19:02,246 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Diagnostics for task taskmanager-00002 in state TASK_FAILED : > reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container: > Failed to fetch all URIs for container > '125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256 > 2017-06-08 19:19:02,247 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Total number of failed tasks so far: 1 > 2017-06-08 19:19:02,252 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Mesos task taskmanager-00003 failed, with a TaskManager in launch or > registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED > (Failed to launch container: Failed to fetch all URIs for container > '69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256) > 2017-06-08 19:19:02,252 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Diagnostics for task taskmanager-00003 in state TASK_FAILED : > reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container: > Failed to fetch all URIs for container > '69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256 > 2017-06-08 19:19:02,252 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Total number of failed tasks so far: 2 > 2017-06-08 19:19:02,313 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Scheduling Mesos task taskmanager-00008 with (2048.0 MB, 1.0 cpus). > 2017-06-08 19:19:02,330 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Scheduling Mesos task taskmanager-00009 with (2048.0 MB, 1.0 cpus). > 2017-06-08 19:19:02,331 INFO > org.apache.flink.mesos.scheduler.LaunchCoordinator - Now > gathering offers for at least 2 task(s). > 2017-06-08 19:19:02,332 WARN org.apache.flink.mesos.scheduler.TaskMonitor > - Mesos task taskmanager-00004 failed unexpectedly. > 2017-06-08 19:19:02,332 WARN org.apache.flink.mesos.scheduler.TaskMonitor > - Mesos task taskmanager-00001 failed unexpectedly. > 2017-06-08 19:19:02,412 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Mesos task taskmanager-00004 failed, with a TaskManager in launch or > registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED > (Failed to launch container: Failed to fetch all URIs for container > 'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256) > 2017-06-08 19:19:02,412 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Diagnostics for task taskmanager-00004 in state TASK_FAILED : > reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container: > Failed to fetch all URIs for container > 'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256 > 2017-06-08 19:19:02,412 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Total number of failed tasks so far: 3 > 2017-06-08 19:19:02,432 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager > - > Mesos task taskmanager-00001 failed, with a TaskManager in launch or > registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED > (Failed to launch container: Failed to fetch all URIs for container > '325e14fe-8840-4996-96dc-5c7ffc159d12' with exit status: 256)/ > > I checked the stderr in Mesos sandbox and it is as follows: > > /I0608 19:20:06.184386 30480 fetcher.cpp:531] Fetcher Info: > {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/6b7667c0-1b1a-43a4-ba1f- > 27cb0660608f-S6\/flink","items":[{"action":"BYPASS_ > CACHE","uri":{"cache":true,"executable":true,"extract": > false,"output_file":"flink\/bin\/mesos-taskmanager.sh"," > value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53- > 8518-53c1b3e7ef78\/flink\/bin\/mesos-taskmanager.sh"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > true,"extract":false,"output_file":"flink\/bin\/yarn- > session.sh","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/yarn-session.sh"}} > ,{"action":"BYPASS_CACHE","uri":{"cache":true," > executable":false,"extract":false,"output_file":"flink\/ > conf\/log4j-console.properties","value":"http:\/\/ > localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/ > flink\/conf\/log4j-console.properties"}},{"action":" > BYPASS_CACHE","uri":{"cache":true,"executable":false," > extract":false,"output_file":"flink\/conf\/log4j.properties" > ,"value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53- > 8518-53c1b3e7ef78\/flink\/conf\/log4j.properties"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > false,"extract":false,"output_file":"flink\/lib\/log4j-1.2. > 17.jar","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/log4j-1.2.17.jar"} > },{"action":"BYPASS_CACHE","uri":{"cache":true," > executable":true,"extract":false,"output_file":"flink\/ > bin\/mesos-appmaster.sh","value":"http:\/\/localhost: > 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\ > /mesos-appmaster.sh"}},{"action":"BYPASS_CACHE","uri":{ > "cache":true,"executable":true,"extract":false,"output_ > file":"flink\/bin\/stop-zookeeper-quorum.sh","value":" > http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78\/flink\/bin\/stop-zookeeper-quorum.sh"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > true,"extract":false,"output_file":"flink\/bin\/stop-local. > sh","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-local.sh"}},{ > "action":"BYPASS_CACHE","uri":{"cache":true,"executable": > true,"extract":false,"output_file":"flink\/bin\/ > taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/taskmanager.sh"}}, > {"action":"BYPASS_CACHE","uri":{"cache":true,"executable": > true,"extract":false,"output_file":"flink\/bin\/start- > local.bat","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.bat"}} > ,{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract": > false,"output_file":"flink\/bin\/start-cluster.sh","value" > :"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78\/flink\/bin\/start-cluster.sh"}},{"action": > "BYPASS_CACHE","uri":{"cache":true,"executable":true," > extract":false,"output_file":"flink\/bin\/stop-cluster.sh"," > value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53- > 8518-53c1b3e7ef78\/flink\/bin\/stop-cluster.sh"}},{"action": > "BYPASS_CACHE","uri":{"cache":true,"executable":true," > extract":false,"output_file":"flink\/bin\/start-scala-shell. > sh","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-scala-shell. > sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true," > executable":true,"extract":false,"output_file":"flink\/ > bin\/flink","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink"}},{"action" > :"BYPASS_CACHE","uri":{"cache":true,"executable":true," > extract":false,"output_file":"flink\/bin\/pyflink.sh"," > value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53- > 8518-53c1b3e7ef78\/flink\/bin\/pyflink.sh"}},{"action":" > BYPASS_CACHE","uri":{"cache":true,"executable":false," > extract":false,"output_file":"flink\/conf\/log4j-yarn- > session.properties","value":"http:\/\/localhost:38985\/ > 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/ > log4j-yarn-session.properties"}},{"action":"BYPASS_CACHE"," > uri":{"cache":true,"executable":false,"extract": > false,"output_file":"flink\/conf\/logback-yarn.xml"," > value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53- > 8518-53c1b3e7ef78\/flink\/conf\/logback-yarn.xml"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > true,"extract":false,"output_file":"flink\/bin\/flink- > daemon.sh","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-daemon.sh"}} > ,{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract": > false,"output_file":"flink\/bin\/zookeeper.sh","value":" > http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78\/flink\/bin\/zookeeper.sh"}},{"action":" > BYPASS_CACHE","uri":{"cache":true,"executable":false," > extract":false,"output_file":"flink\/conf\/logback-console. > xml","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-console. > xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true," > executable":false,"extract":false,"output_file":"flink\/ > conf\/masters","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/masters"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > true,"extract":false,"output_file":"flink\/conf\/flink- > conf.yaml","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/flink-conf.yaml"} > },{"action":"BYPASS_CACHE","uri":{"cache":true," > executable":false,"extract":false,"output_file":"flink\/ > conf\/zoo.cfg","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/zoo.cfg"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > false,"extract":false,"output_file":"flink\/lib\/flink- > shaded-hadoop2-uber-1.3-SNAPSHOT.jar","value":"http:\/ > \/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/ > flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar" > }},{"action":"BYPASS_CACHE","uri":{"cache":true," > executable":false,"extract":false,"output_file":"flink\/ > conf\/slaves","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/slaves"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > false,"extract":false,"output_file":"flink\/lib\/flink-dist_ > 2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/ > 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/ > flink-dist_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_ > CACHE","uri":{"cache":true,"executable":false,"extract": > false,"output_file":"flink\/lib\/slf4j-log4j12-1.7.7.jar", > "value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53- > 8518-53c1b3e7ef78\/flink\/lib\/slf4j-log4j12-1.7.7.jar"}},{" > action":"BYPASS_CACHE","uri":{"cache":true,"executable": > false,"extract":false,"output_file":"flink\/conf\/log4j-cli. > properties","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-cli. > properties"}},{"action":"BYPASS_CACHE","uri":{"cache": > true,"executable":true,"extract":false,"output_file":" > flink\/bin\/historyserver.sh","value":"http:\/\/localhost: > 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\ > /historyserver.sh"}},{"action":"BYPASS_CACHE","uri":{"cache" > :true,"executable":false,"extract":false,"output_file":" > flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar","value":" > http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78\/flink\/lib\/flink-python_2.10-1.3- > SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache": > true,"executable":false,"extract":false,"output_file":" > flink\/conf\/logback.xml","value":"http:\/\/localhost: > 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/ > conf\/logback.xml"}},{"action":"BYPASS_CACHE","uri":{"cache" > :true,"executable":true,"extract":false,"output_file":" > flink\/bin\/pyflink.bat","value":"http:\/\/localhost: > 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\ > /pyflink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache": > true,"executable":true,"extract":false,"output_file":" > flink\/bin\/start-local.sh","value":"http:\/\/localhost: > 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\ > /start-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache": > true,"executable":true,"extract":false,"output_file":" > flink\/bin\/flink.bat","value":"http:\/\/localhost:38985\/ > 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink.bat"}},{"action":" > BYPASS_CACHE","uri":{"cache":true,"executable":true," > extract":false,"output_file":"flink\/bin\/start-zookeeper- > quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8- > f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-zookeeper- > quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache": > true,"executable":true,"extract":false,"output_file":" > flink\/bin\/jobmanager.sh","value":"http:\/\/localhost: > 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\ > /jobmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache": > true,"executable":true,"extract":false,"output_file":" > flink\/bin\/flink-console.sh","value":"http:\/\/localhost: > 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\ > /flink-console.sh"}},{"action":"BYPASS_CACHE","uri":{"cache" > :true,"executable":true,"extract":false,"output_file":" > flink\/bin\/config.sh","value":"http:\/\/localhost:38985\/ > 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/config.sh"}}],"sandbox_ > directory":"\/var\/lib\/mesos\/slave\/slaves\/6b7667c0-1b1a- > 43a4-ba1f-27cb0660608f-S6\/frameworks\/6b7667c0-1b1a- > 43a4-ba1f-27cb0660608f-0030\/executors\/taskmanager-00009\/ > runs\/d8d1756d-f977-43f6-a53f-55c19b6c6294","user":"flink"} > I0608 19:20:06.189909 30480 fetcher.cpp:442] Fetching URI > 'http://localhost:38985/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh' > I0608 19:20:06.189932 30480 fetcher.cpp:283] Fetching directly into the > sandbox directory > I0608 19:20:06.190213 30480 fetcher.cpp:220] Fetching URI > 'http://localhost:38985/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh' > I0608 19:20:06.190251 30480 fetcher.cpp:163] Downloading resource from > 'http://localhost:38985/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh' > to > '/var/lib/mesos/slave/slaves/6b7667c0-1b1a-43a4-ba1f- > 27cb0660608f-S6/frameworks/6b7667c0-1b1a-43a4-ba1f- > 27cb0660608f-0030/executors/taskmanager-00009/runs/ > d8d1756d-f977-43f6-a53f-55c19b6c6294/flink/bin/mesos-taskmanager.sh' > Failed to fetch > 'http://localhost:38985/567dfcb8-f7d7-4d53-8518- > 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh': > Error downloading resource: Couldn't connect to server > Failed to synchronize with agent (it's probably exited)/ > > So, my question is what am I missing? > Will I need to mention some special URI in marathon for flink? I am setting > mesos.master as /zk://leader.mesos:2181/mesos/. Is this the one that is > creating problem? > Or, have I missed some mesos or marathon setting? > Also, I am launching this via Marathon and I have the same flink dist at > same path in all the slaves > > Thanks, > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Flink-with- > Mesos-Fetcher-error-tp13603.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >