Hi Ani,

the problem is that you have to set a reachable jobmanager hostname in the
flink-conf.yaml via jobmanager.rpc.address: [reachable hostname]. I assume
that you use the default value which is localhost. You can see it in the
fetcher info where the URL for the different files points to
localhost:38985. Setting this value to the external hostname on which the
JobManager is running should solve the problem.

Cheers,
Till
​

On Thu, Jun 8, 2017 at 11:21 PM, ani.desh1512 <ani.desh1...@gmail.com>
wrote:

> I am trying to configure Flink to work on top of Mesos. I am using Flink
> release-1.3. I am using DCOS 1.9's underlying mesos which is version 1.2. I
> am able to start Flink without any issues when the taskmanager starts on
> the
> same host as that of appmaster. But when the taskmanager is launched on a
> different host, the container fails to launch. The flink mesos-appmaster
> log
> is something as follows:
>
> /2017-06-08 19:19:01,537 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Launching Mesos task taskmanager-00003 on host 10.101.2.117.
> 2017-06-08 19:19:01,550 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Launching Mesos task taskmanager-00002 on host 10.101.2.117.
> 2017-06-08 19:19:01,607 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Launching Mesos task taskmanager-00001 on host 10.101.2.117.
> 2017-06-08 19:19:01,623 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Launching Mesos task taskmanager-00004 on host 10.101.2.117.
> 2017-06-08 19:19:01,645 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Launching Mesos task taskmanager-00006 on host 10.101.2.91.
> 2017-06-08 19:19:01,660 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Launching Mesos task taskmanager-00005 on host 10.101.2.91.
> 2017-06-08 19:19:01,674 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Launching Mesos task taskmanager-00007 on host 10.101.2.91.
> 2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
> - Mesos task taskmanager-00003 failed unexpectedly.
> 2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
> - Mesos task taskmanager-00002 failed unexpectedly.
> 2017-06-08 19:19:02,245 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Mesos task taskmanager-00002 failed, with a TaskManager in launch or
> registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
> (Failed to launch container: Failed to fetch all URIs for container
> '125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256)
> 2017-06-08 19:19:02,246 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Diagnostics for task taskmanager-00002 in state TASK_FAILED :
> reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
> Failed to fetch all URIs for container
> '125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256
> 2017-06-08 19:19:02,247 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Total number of failed tasks so far: 1
> 2017-06-08 19:19:02,252 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Mesos task taskmanager-00003 failed, with a TaskManager in launch or
> registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
> (Failed to launch container: Failed to fetch all URIs for container
> '69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256)
> 2017-06-08 19:19:02,252 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Diagnostics for task taskmanager-00003 in state TASK_FAILED :
> reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
> Failed to fetch all URIs for container
> '69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256
> 2017-06-08 19:19:02,252 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Total number of failed tasks so far: 2
> 2017-06-08 19:19:02,313 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Scheduling Mesos task taskmanager-00008 with (2048.0 MB, 1.0 cpus).
> 2017-06-08 19:19:02,330 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Scheduling Mesos task taskmanager-00009 with (2048.0 MB, 1.0 cpus).
> 2017-06-08 19:19:02,331 INFO
> org.apache.flink.mesos.scheduler.LaunchCoordinator            - Now
> gathering offers for at least 2 task(s).
> 2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
> - Mesos task taskmanager-00004 failed unexpectedly.
> 2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
> - Mesos task taskmanager-00001 failed unexpectedly.
> 2017-06-08 19:19:02,412 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Mesos task taskmanager-00004 failed, with a TaskManager in launch or
> registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
> (Failed to launch container: Failed to fetch all URIs for container
> 'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256)
> 2017-06-08 19:19:02,412 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Diagnostics for task taskmanager-00004 in state TASK_FAILED :
> reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
> Failed to fetch all URIs for container
> 'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256
> 2017-06-08 19:19:02,412 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Total number of failed tasks so far: 3
> 2017-06-08 19:19:02,432 INFO
> org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager
> -
> Mesos task taskmanager-00001 failed, with a TaskManager in launch or
> registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
> (Failed to launch container: Failed to fetch all URIs for container
> '325e14fe-8840-4996-96dc-5c7ffc159d12' with exit status: 256)/
>
> I checked the stderr in Mesos sandbox and it is as follows:
>
> /I0608 19:20:06.184386 30480 fetcher.cpp:531] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/6b7667c0-1b1a-43a4-ba1f-
> 27cb0660608f-S6\/flink","items":[{"action":"BYPASS_
> CACHE","uri":{"cache":true,"executable":true,"extract":
> false,"output_file":"flink\/bin\/mesos-taskmanager.sh","
> value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-
> 8518-53c1b3e7ef78\/flink\/bin\/mesos-taskmanager.sh"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> true,"extract":false,"output_file":"flink\/bin\/yarn-
> session.sh","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/yarn-session.sh"}}
> ,{"action":"BYPASS_CACHE","uri":{"cache":true,"
> executable":false,"extract":false,"output_file":"flink\/
> conf\/log4j-console.properties","value":"http:\/\/
> localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/
> flink\/conf\/log4j-console.properties"}},{"action":"
> BYPASS_CACHE","uri":{"cache":true,"executable":false,"
> extract":false,"output_file":"flink\/conf\/log4j.properties"
> ,"value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-
> 8518-53c1b3e7ef78\/flink\/conf\/log4j.properties"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> false,"extract":false,"output_file":"flink\/lib\/log4j-1.2.
> 17.jar","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/log4j-1.2.17.jar"}
> },{"action":"BYPASS_CACHE","uri":{"cache":true,"
> executable":true,"extract":false,"output_file":"flink\/
> bin\/mesos-appmaster.sh","value":"http:\/\/localhost:
> 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\
> /mesos-appmaster.sh"}},{"action":"BYPASS_CACHE","uri":{
> "cache":true,"executable":true,"extract":false,"output_
> file":"flink\/bin\/stop-zookeeper-quorum.sh","value":"
> http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78\/flink\/bin\/stop-zookeeper-quorum.sh"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> true,"extract":false,"output_file":"flink\/bin\/stop-local.
> sh","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-local.sh"}},{
> "action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> true,"extract":false,"output_file":"flink\/bin\/
> taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/taskmanager.sh"}},
> {"action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> true,"extract":false,"output_file":"flink\/bin\/start-
> local.bat","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.bat"}}
> ,{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":
> false,"output_file":"flink\/bin\/start-cluster.sh","value"
> :"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78\/flink\/bin\/start-cluster.sh"}},{"action":
> "BYPASS_CACHE","uri":{"cache":true,"executable":true,"
> extract":false,"output_file":"flink\/bin\/stop-cluster.sh","
> value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-
> 8518-53c1b3e7ef78\/flink\/bin\/stop-cluster.sh"}},{"action":
> "BYPASS_CACHE","uri":{"cache":true,"executable":true,"
> extract":false,"output_file":"flink\/bin\/start-scala-shell.
> sh","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-scala-shell.
> sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"
> executable":true,"extract":false,"output_file":"flink\/
> bin\/flink","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink"}},{"action"
> :"BYPASS_CACHE","uri":{"cache":true,"executable":true,"
> extract":false,"output_file":"flink\/bin\/pyflink.sh","
> value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-
> 8518-53c1b3e7ef78\/flink\/bin\/pyflink.sh"}},{"action":"
> BYPASS_CACHE","uri":{"cache":true,"executable":false,"
> extract":false,"output_file":"flink\/conf\/log4j-yarn-
> session.properties","value":"http:\/\/localhost:38985\/
> 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/
> log4j-yarn-session.properties"}},{"action":"BYPASS_CACHE","
> uri":{"cache":true,"executable":false,"extract":
> false,"output_file":"flink\/conf\/logback-yarn.xml","
> value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-
> 8518-53c1b3e7ef78\/flink\/conf\/logback-yarn.xml"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> true,"extract":false,"output_file":"flink\/bin\/flink-
> daemon.sh","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-daemon.sh"}}
> ,{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":
> false,"output_file":"flink\/bin\/zookeeper.sh","value":"
> http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78\/flink\/bin\/zookeeper.sh"}},{"action":"
> BYPASS_CACHE","uri":{"cache":true,"executable":false,"
> extract":false,"output_file":"flink\/conf\/logback-console.
> xml","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-console.
> xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"
> executable":false,"extract":false,"output_file":"flink\/
> conf\/masters","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/masters"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> true,"extract":false,"output_file":"flink\/conf\/flink-
> conf.yaml","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/flink-conf.yaml"}
> },{"action":"BYPASS_CACHE","uri":{"cache":true,"
> executable":false,"extract":false,"output_file":"flink\/
> conf\/zoo.cfg","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/zoo.cfg"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> false,"extract":false,"output_file":"flink\/lib\/flink-
> shaded-hadoop2-uber-1.3-SNAPSHOT.jar","value":"http:\/
> \/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/
> flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar"
> }},{"action":"BYPASS_CACHE","uri":{"cache":true,"
> executable":false,"extract":false,"output_file":"flink\/
> conf\/slaves","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/slaves"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> false,"extract":false,"output_file":"flink\/lib\/flink-dist_
> 2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/
> 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/
> flink-dist_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_
> CACHE","uri":{"cache":true,"executable":false,"extract":
> false,"output_file":"flink\/lib\/slf4j-log4j12-1.7.7.jar",
> "value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-
> 8518-53c1b3e7ef78\/flink\/lib\/slf4j-log4j12-1.7.7.jar"}},{"
> action":"BYPASS_CACHE","uri":{"cache":true,"executable":
> false,"extract":false,"output_file":"flink\/conf\/log4j-cli.
> properties","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-cli.
> properties"}},{"action":"BYPASS_CACHE","uri":{"cache":
> true,"executable":true,"extract":false,"output_file":"
> flink\/bin\/historyserver.sh","value":"http:\/\/localhost:
> 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\
> /historyserver.sh"}},{"action":"BYPASS_CACHE","uri":{"cache"
> :true,"executable":false,"extract":false,"output_file":"
> flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar","value":"
> http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78\/flink\/lib\/flink-python_2.10-1.3-
> SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":
> true,"executable":false,"extract":false,"output_file":"
> flink\/conf\/logback.xml","value":"http:\/\/localhost:
> 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/
> conf\/logback.xml"}},{"action":"BYPASS_CACHE","uri":{"cache"
> :true,"executable":true,"extract":false,"output_file":"
> flink\/bin\/pyflink.bat","value":"http:\/\/localhost:
> 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\
> /pyflink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":
> true,"executable":true,"extract":false,"output_file":"
> flink\/bin\/start-local.sh","value":"http:\/\/localhost:
> 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\
> /start-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":
> true,"executable":true,"extract":false,"output_file":"
> flink\/bin\/flink.bat","value":"http:\/\/localhost:38985\/
> 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink.bat"}},{"action":"
> BYPASS_CACHE","uri":{"cache":true,"executable":true,"
> extract":false,"output_file":"flink\/bin\/start-zookeeper-
> quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8-
> f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-zookeeper-
> quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":
> true,"executable":true,"extract":false,"output_file":"
> flink\/bin\/jobmanager.sh","value":"http:\/\/localhost:
> 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\
> /jobmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":
> true,"executable":true,"extract":false,"output_file":"
> flink\/bin\/flink-console.sh","value":"http:\/\/localhost:
> 38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\
> /flink-console.sh"}},{"action":"BYPASS_CACHE","uri":{"cache"
> :true,"executable":true,"extract":false,"output_file":"
> flink\/bin\/config.sh","value":"http:\/\/localhost:38985\/
> 567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/config.sh"}}],"sandbox_
> directory":"\/var\/lib\/mesos\/slave\/slaves\/6b7667c0-1b1a-
> 43a4-ba1f-27cb0660608f-S6\/frameworks\/6b7667c0-1b1a-
> 43a4-ba1f-27cb0660608f-0030\/executors\/taskmanager-00009\/
> runs\/d8d1756d-f977-43f6-a53f-55c19b6c6294","user":"flink"}
> I0608 19:20:06.189909 30480 fetcher.cpp:442] Fetching URI
> 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
> I0608 19:20:06.189932 30480 fetcher.cpp:283] Fetching directly into the
> sandbox directory
> I0608 19:20:06.190213 30480 fetcher.cpp:220] Fetching URI
> 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
> I0608 19:20:06.190251 30480 fetcher.cpp:163] Downloading resource from
> 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
> to
> '/var/lib/mesos/slave/slaves/6b7667c0-1b1a-43a4-ba1f-
> 27cb0660608f-S6/frameworks/6b7667c0-1b1a-43a4-ba1f-
> 27cb0660608f-0030/executors/taskmanager-00009/runs/
> d8d1756d-f977-43f6-a53f-55c19b6c6294/flink/bin/mesos-taskmanager.sh'
> Failed to fetch
> 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-
> 53c1b3e7ef78/flink/bin/mesos-taskmanager.sh':
> Error downloading resource: Couldn't connect to server
> Failed to synchronize with agent (it's probably exited)/
>
> So, my question is what am I missing?
> Will I need to mention some special URI in marathon for flink? I am setting
> mesos.master as /zk://leader.mesos:2181/mesos/. Is this the one that is
> creating problem?
> Or, have I missed some mesos or marathon setting?
> Also, I am launching this via Marathon and I have the same flink dist at
> same path in all the slaves
>
> Thanks,
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Flink-with-
> Mesos-Fetcher-error-tp13603.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Reply via email to