Hi,

I've launched JobManager and TaskManager on DC/OS successfully.
Now I have two new issues:

1) All TaskManagers are scheduled on a single node. 
- Is it intended to maximize data locality and minimize network communication 
cost?
- Is there an option in Flink to adjust the behavior of JobManager when it 
considers multiple resource offers from different Mesos agents?
- I want to schedule TaskManager processes on different GPU servers so that 
each TaskManger process can use its own GPU cards exclusively.  
- Below is a part of JobManager log that is occurring while JobManager is 
negotiating resources with the Mesos master:
2018-01-09 07:34:54,872 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosJobManager  - JobManager 
akka.tcp://flink@dnn-g08-233:18026/user/jobmanager was granted leadership with 
leader session ID Some(00000000-0000-0000-0000-000000000000).
2018-01-09 07:34:55,889 INFO  
org.apache.flink.mesos.scheduler.ConnectionMonitor            - Connecting to 
Mesos...
2018-01-09 07:34:55,962 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Trying to associate with JobManager leader 
akka.tcp://flink@dnn-g08-233:18026/user/jobmanager
2018-01-09 07:34:55,977 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Resource Manager associating with leading JobManager 
Actor[akka://flink/user/jobmanager#-1481183359] - leader session 
00000000-0000-0000-0000-000000000000
2018-01-09 07:34:56,479 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Scheduling Mesos task taskmanager-00001 with (10240.0 MB, 8.0 cpus).
2018-01-09 07:34:56,481 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Scheduling Mesos task taskmanager-00002 with (10240.0 MB, 8.0 cpus).
2018-01-09 07:34:56,481 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Scheduling Mesos task taskmanager-00003 with (10240.0 MB, 8.0 cpus).
2018-01-09 07:34:56,481 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Scheduling Mesos task taskmanager-00004 with (10240.0 MB, 8.0 cpus).
2018-01-09 07:34:56,481 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Scheduling Mesos task taskmanager-00005 with (10240.0 MB, 8.0 cpus).
2018-01-09 07:34:56,483 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Now gathering 
offers for at least 5 task(s).
2018-01-09 07:34:56,484 INFO  
org.apache.flink.mesos.scheduler.ConnectionMonitor            - Connected to 
Mesos as framework ID 59b85b42-a4a2-4632-9578-9e480585ecdc-0004.
2018-01-09 07:34:56,690 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Received 
offer(s) of 606170.0 MB, 234.2 cpus:
2018-01-09 07:34:56,692 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   
59b85b42-a4a2-4632-9578-9e480585ecdc-O2174 from 50.1.100.233 of 111186.0 MB, 
45.9 cpus for [*]
2018-01-09 07:34:56,692 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   
59b85b42-a4a2-4632-9578-9e480585ecdc-O2175 from 50.1.100.235 of 123506.0 MB, 
47.3 cpus for [*]
2018-01-09 07:34:56,692 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   
59b85b42-a4a2-4632-9578-9e480585ecdc-O2176 from 50.1.100.234 of 124530.0 MB, 
46.6 cpus for [*]
2018-01-09 07:34:56,692 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   
59b85b42-a4a2-4632-9578-9e480585ecdc-O2177 from 50.1.100.231 of 123474.0 MB, 
47.2 cpus for [*]
2018-01-09 07:34:56,693 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   
59b85b42-a4a2-4632-9578-9e480585ecdc-O2178 from 50.1.100.232 of 123474.0 MB, 
47.2 cpus for [*]
2018-01-09 07:34:57,711 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Processing 5 
task(s) against 5 new offer(s) plus outstanding offers.
2018-01-09 07:34:57,726 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Resources 
considered: (note: expired offers not deducted from below)
2018-01-09 07:34:57,727 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   50.1.100.234 
has 124530.0 MB, 46.6 cpus
2018-01-09 07:34:57,728 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   50.1.100.235 
has 123506.0 MB, 47.3 cpus
2018-01-09 07:34:57,728 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   50.1.100.232 
has 123474.0 MB, 47.2 cpus
2018-01-09 07:34:57,728 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   50.1.100.233 
has 111186.0 MB, 45.9 cpus
2018-01-09 07:34:57,728 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   50.1.100.231 
has 123474.0 MB, 47.2 cpus
2018-01-09 07:34:58,069 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Launching Mesos task taskmanager-00005 on host 50.1.100.231.
2018-01-09 07:34:58,069 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Launched 5 
task(s) on 50.1.100.231 using 1 offer(s):
2018-01-09 07:34:58,070 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Launching Mesos task taskmanager-00002 on host 50.1.100.231.
2018-01-09 07:34:58,070 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Launching Mesos task taskmanager-00003 on host 50.1.100.231.
2018-01-09 07:34:58,070 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Launching Mesos task taskmanager-00004 on host 50.1.100.231.
2018-01-09 07:34:58,070 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
Launching Mesos task taskmanager-00001 on host 50.1.100.231.
2018-01-09 07:34:58,070 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            -   
59b85b42-a4a2-4632-9578-9e480585ecdc-O2177
2018-01-09 07:34:58,071 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - No longer 
gathering offers; all requests fulfilled.
2018-01-09 07:34:58,072 INFO  com.netflix.fenzo.TaskScheduler                   
            - Expiring all leases
2018-01-09 07:34:58,072 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Declined offer 
59b85b42-a4a2-4632-9578-9e480585ecdc-O2176 from 50.1.100.234 of 124530.0 MB, 
46.6 cpus.
2018-01-09 07:34:58,073 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Declined offer 
59b85b42-a4a2-4632-9578-9e480585ecdc-O2175 from 50.1.100.235 of 123506.0 MB, 
47.3 cpus.
2018-01-09 07:34:58,073 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Declined offer 
59b85b42-a4a2-4632-9578-9e480585ecdc-O2178 from 50.1.100.232 of 123474.0 MB, 
47.2 cpus.
2018-01-09 07:34:58,074 INFO  
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Declined offer 
59b85b42-a4a2-4632-9578-9e480585ecdc-O2174 from 50.1.100.233 of 111186.0 MB, 
45.9 cpus.
2018-01-09 07:35:05,868 INFO  org.apache.flink.mesos.scheduler.TaskMonitor      
            - Mesos task taskmanager-00005 is running.
2018-01-09 07:35:06,103 INFO  org.apache.flink.mesos.scheduler.TaskMonitor      
            - Mesos task taskmanager-00001 is running.
2018-01-09 07:35:06,111 INFO  org.apache.flink.mesos.scheduler.TaskMonitor      
            - Mesos task taskmanager-00004 is running.
2018-01-09 07:35:06,116 INFO  org.apache.flink.mesos.scheduler.TaskMonitor      
            - Mesos task taskmanager-00002 is running.
2018-01-09 07:35:06,119 INFO  org.apache.flink.mesos.scheduler.TaskMonitor      
            - Mesos task taskmanager-00003 is running.
2018-01-09 07:35:14,377 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
TaskManager taskmanager-00003 has started.
2018-01-09 07:35:14,380 INFO  org.apache.flink.runtime.instance.InstanceManager 
            - Registered TaskManager at DNN-G08-231 
(akka.tcp://flink@dnn-g08-231:1027/user/taskmanager) as 
b94277c8ad550eeef5364947e4330c00. Current number of registered hosts is 1. 
Current number of alive task slots is 8.
2018-01-09 07:35:14,389 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
TaskManager taskmanager-00004 has started.
2018-01-09 07:35:14,389 INFO  org.apache.flink.runtime.instance.InstanceManager 
            - Registered TaskManager at DNN-G08-231 
(akka.tcp://flink@dnn-g08-231:1033/user/taskmanager) as 
e0183a5317b331b90496049b1893c922. Current number of registered hosts is 2. 
Current number of alive task slots is 16.
2018-01-09 07:35:14,462 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
TaskManager taskmanager-00001 has started.
2018-01-09 07:35:14,462 INFO  org.apache.flink.runtime.instance.InstanceManager 
            - Registered TaskManager at DNN-G08-231 
(akka.tcp://flink@dnn-g08-231:1029/user/taskmanager) as 
8d85b49d4118514552fcad3b98fef3e2. Current number of registered hosts is 3. 
Current number of alive task slots is 24.
2018-01-09 07:35:14,465 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
TaskManager taskmanager-00005 has started.
2018-01-09 07:35:14,465 INFO  org.apache.flink.runtime.instance.InstanceManager 
            - Registered TaskManager at DNN-G08-231 
(akka.tcp://flink@dnn-g08-231:1031/user/taskmanager) as 
b740607fb2e88bcfc275498bb54ed9fd. Current number of registered hosts is 4. 
Current number of alive task slots is 32.
2018-01-09 07:35:14,560 INFO  
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - 
TaskManager taskmanager-00002 has started.
2018-01-09 07:35:14,560 INFO  org.apache.flink.runtime.instance.InstanceManager 
            - Registered TaskManager at DNN-G08-231 
(akka.tcp://flink@dnn-g08-231:1025/user/taskmanager) as 
95433440f37ea1790e7ef9309f110fe4. Current number of registered hosts is 5. 
Current number of alive task slots is 40.


2) After the TaskManagers are started, the following lines are repeated in the 
JobManage log every second:
2018-01-09 07:36:51,080 ERROR 
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler  - 
Caught exception
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:748)
2018-01-09 07:37:43,600 ERROR 
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler  - 
Caught exception
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:748)
2018-01-09 07:38:43,619 ERROR 
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler  - 
Caught exception
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:748)
2018-01-09 07:39:43,630 ERROR 
org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler  - 
Caught exception
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:748)
- Can I ignore this exception? or there's something I should fix up?

Best,

- Dongwon

Reply via email to