Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/948#issuecomment-148403603 I tried running the code from this pull request again, this time using the `mesos-playa` vagrant image, and it does not work for me. I was following your instructions. When did you test the changes recently? My motivation to test this pull request goes down every time I'm testing it. I've spun up a Mesos cluster on GCE two times, plus the VM now. Maybe I'm doing it wrong, please let me know what I can do to get it to run. CLI output: ``` vagrant@mesos:~/flink/build-target$ java -Dlog4j.configuration=file://`pwd`/conf/log4j.properties -Dlog.file=logs.log -cp lib/flink-dist-0.10-SNAPSHOT.jar org.apache.flink.mesos.scheduler.FlinkScheduler --confDir conf/ I1015 14:05:01.591161 9992 sched.cpp:157] Version: 0.22.1 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@716: Client environment:host.name=mesos 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@724: Client environment:os.arch=3.16.0-30-generic 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@725: Client environment:os.version=#40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@733: Client environment:user.name=vagrant 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@741: Client environment:user.home=/home/vagrant 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vagrant/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT 2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7f67dac33a60 sessionId=0 sessionPasswd=<null> context=0x7f67f0004470 flags=0 2015-10-15 14:05:01,592:9991(0x7f67c6ffd700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:2181] Embedded server listening at http://127.0.0.1:40815 Press any key to stop. 2015-10-15 14:05:04,959:9991(0x7f67c6ffd700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:2181], sessionId=0x1506b6312fa000b, negotiated timeout=10000 I1015 14:05:04.959841 10024 group.cpp:313] Group process (group(1)@127.0.1.1:57437) connected to ZooKeeper I1015 14:05:04.959899 10024 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I1015 14:05:04.959928 10024 group.cpp:385] Trying to create path '/mesos' in ZooKeeper I1015 14:05:05.204282 10024 detector.cpp:138] Detected a new leader: (id='2') I1015 14:05:05.204489 10024 group.cpp:659] Trying to get '/mesos/info_0000000002' in ZooKeeper I1015 14:05:05.303072 10024 detector.cpp:452] A new leading master (UPID=master@127.0.1.1:5050) is detected I1015 14:05:05.303467 10024 sched.cpp:254] New master detected at master@127.0.1.1:5050 I1015 14:05:05.303890 10024 sched.cpp:264] No credentials provided. Attempting to register without authentication I1015 14:05:05.851562 10024 sched.cpp:448] Framework registered with 20151015-120419-16842879-5050-1244-0000 ``` log file content ``` 14:04:54,564 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - -------------------------------------------------------------------------------- 14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Starting JobManager (Version: 0.10-SNAPSHOT, Rev:d905af0, Date:06.10.2015 @ 19:37:22 UTC) 14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Current user: vagrant 14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.79-b02 14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Maximum heap size: 592 MiBytes 14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - JAVA_HOME: (not set) 14:04:55,823 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Hadoop version: 2.3.0 14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - JVM Options: 14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - -Dlog4j.configuration=file:///home/vagrant/flink/build-target/conf/log4j.properties 14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - -Dlog.file=logs.log 14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Program Arguments: 14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - --confDir 14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - conf/ 14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - -------------------------------------------------------------------------------- 14:04:55,875 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Maximum number of open file descriptors is 4096 14:04:55,875 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Loading configuration from /home/vagrant/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT/conf 14:04:58,375 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager 14:04:58,377 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor system at localhost:6123. 14:04:59,700 INFO org.eclipse.jetty.util.log - jetty-0.10-SNAPSHOT 14:05:01,985 INFO org.eclipse.jetty.util.log - Started SocketConnector@127.0.0.1:40815 14:05:07,698 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 14:05:07,750 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Accepting 14:05:07,960 INFO Remoting - Starting remoting 14:05:09,241 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@127.0.0.1:6123] 14:05:09,248 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor 14:05:09,597 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-9b7614f7-7d0d-4c5e-b4c6-911f0ab845ef 14:05:09,597 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:40000 - max concurrent requests: 50 - max backlog: 1000 14:05:10,470 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager at akka.tcp://flink@127.0.0.1:6123/user/jobmanager. 14:05:10,471 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory archivist akka://flink/user/archive 14:05:10,563 INFO org.apache.flink.runtime.jobmanager.JobManager - JobManager akka.tcp://flink@127.0.0.1:6123/user/jobmanager was granted leadership with leader session ID None. 14:05:10,593 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManger web frontend 14:05:10,735 INFO org.apache.flink.runtime.jobmanager.web.WebInfoServer - Setting up web info server, using web-root directory jar:file:/home/vagrant/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT/lib/flink-dist-0.10-SNAPSHOT.jar!/web-docs-infoserver. 14:05:11,162 INFO org.eclipse.jetty.util.log - jetty-0.10-SNAPSHOT 14:05:11,165 INFO org.eclipse.jetty.util.log - Started SelectChannelConnector@0.0.0.0:8081 14:05:11,166 INFO org.apache.flink.runtime.jobmanager.web.WebInfoServer - Started web info server for JobManager on 0.0.0.0:8081 14:05:14,936 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Declining offer(s) from slave 20151015-120419-16842879-5050-1244-S0 offered [cpus: 1.5 | mem : 488.0 | disk: 33044.0] required [cpus: 0.5 | mem: 512.0 | disk: 1024.0] 14:05:15,948 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - statusUpdate received from taskId: TaskManager_1 slaveId: 20151015-120419-16842879-5050-1244-S0 [TASK_LOST] 14:05:15,948 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Lost taskManager with TaskId: TaskManager_1 on slave: 20151015-120419-16842879-5050-1244-S0 14:05:16,939 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Accepting 14:05:17,092 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - statusUpdate received from taskId: TaskManager_2 slaveId: 20151015-120419-16842879-5050-1244-S0 [TASK_LOST] 14:05:17,092 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Lost taskManager with TaskId: TaskManager_2 on slave: 20151015-120419-16842879-5050-1244-S0 14:05:17,939 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Accepting 14:05:18,096 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - statusUpdate received from taskId: TaskManager_3 slaveId: 20151015-120419-16842879-5050-1244-S0 [TASK_LOST] 14:05:18,096 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Lost taskManager with TaskId: TaskManager_3 on slave: 20151015-120419-16842879-5050-1244-S0 14:05:18,940 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Accepting 14:05:19,112 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - statusUpdate received from taskId: TaskManager_4 slaveId: 20151015-120419-16842879-5050-1244-S0 [TASK_LOST] 14:05:19,113 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$ - Lost taskManager with TaskId: TaskManager_4 on slave: 20151015-120419-16842879-5050-1244-S0 .... this goes on forever? ... ``` mesos file `mesos-slave.WARNING`: ``` Log file created at: 2015/10/15 12:04:40 Running on machine: mesos Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg W1015 12:04:40.464870 1310 slave.cpp:1934] Ignoring updating pid for framework 20151007-005549-16842879-5050-1191-0001 because it does not exist W1015 12:05:08.030145 1313 slave.cpp:1934] Ignoring updating pid for framework 20151007-005549-16842879-5050-1191-0000 because it does not exist E1015 14:05:14.378486 1312 slave.cpp:3112] Container '74dc3694-16ec-470f-88c6-b06b7f295682' for executor 'executor_1' of framework '20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs for container '74dc3694-16ec-470f-88c6-b06b7f295682'with exit status: 256 E1015 14:05:15.768391 1315 slave.cpp:3461] Failed to unmonitor container for executor executor_1 of framework 20151015-120419-16842879-5050-1244-0000: Not monitored W1015 14:05:15.851459 1312 containerizer.cpp:814] Ignoring update for unknown container: 74dc3694-16ec-470f-88c6-b06b7f295682 E1015 14:05:16.989680 1307 slave.cpp:3112] Container '2af2d3c0-e30c-4405-9ff1-7f4389bb62e9' for executor 'executor_2' of framework '20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs for container '2af2d3c0-e30c-4405-9ff1-7f4389bb62e9'with exit status: 256 E1015 14:05:17.090631 1312 slave.cpp:3461] Failed to unmonitor container for executor executor_2 of framework 20151015-120419-16842879-5050-1244-0000: Not monitored W1015 14:05:17.091418 1305 containerizer.cpp:814] Ignoring update for unknown container: 2af2d3c0-e30c-4405-9ff1-7f4389bb62e9 E1015 14:05:17.993669 1310 slave.cpp:3112] Container '8cbc46f8-3200-4f9b-9134-099a0f6f3541' for executor 'executor_3' of framework '20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs for container '8cbc46f8-3200-4f9b-9134-099a0f6f3541'with exit status: 256 E1015 14:05:18.095177 1310 slave.cpp:3461] Failed to unmonitor container for executor executor_3 of framework 20151015-120419-16842879-5050-1244-0000: Not monitored W1015 14:05:18.095211 1310 containerizer.cpp:814] Ignoring update for unknown container: 8cbc46f8-3200-4f9b-9134-099a0f6f3541 E1015 14:05:19.006584 1305 slave.cpp:3112] Container 'aca9e80a-5a34-4c29-a123-f025dc4946fe' for executor 'executor_4' of framework '20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs for container 'aca9e80a-5a34-4c29-a123-f025dc4946fe'with exit status: 256 ``` I can not find any log files for the taskamanger
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---