Yes, they're both set to the same value. These are the JVM Options as reported by the TM:
17:36:12,137 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM Options: 17:36:12,137 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xms51355M 17:36:12,137 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xmx51355M 17:36:12,137 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:MaxDirectMemorySize=51355M Same goes for JM: 7:36:10,837 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 17:36:10,837 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms25677m 17:36:10,837 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx25677m On Fri, Oct 2, 2015 at 5:35 PM, Stephan Ewen <se...@apache.org> wrote: > The delay you see happens when the TaskManager allocates the memory for > its memory manager. Allocating that much in a JVM can take a bit, although > 40 seconds looks a lot to me... > > How do you start the JVM? Are Xmx and Xms set to the same value? If not, > the JVM incrementally grows through multiple garbage collections, which > makes it quite slow. > > If the JVM starts with a large heap, it should actually not take as long > as in your case... > > On Fri, Oct 2, 2015 at 5:26 PM, Robert Schmidtke <ro.schmid...@gmail.com> > wrote: > >> Hi everyone, >> >> I'm wondering about the startup times of the TMs: >> >> ... >> 17:03:33,255 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Starting TaskManager actor >> 17:03:33,262 INFO org.apache.flink.runtime.io.network.netty.NettyConfig >> - NettyConfig [server address: cumu02-05/130.73.144.64, server >> port: 45731, memory segment size (bytes): 32768, transport type: NIO, >> number of server threads: 0 (use Netty's default), number of client >> threads: 0 (use Netty's default), server connect backlog: 0 (use Netty's >> default), client connect timeout (sec): 120, send/receive buffer size >> (bytes): 0 (use Netty's default)] >> 17:03:33,266 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Messages between TaskManager and JobManager have a max timeout of >> 100000 milliseconds >> 17:03:33,268 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Temporary file directory '/tmp': total 44 GB, usable 37 GB (84.09% >> usable) >> 17:03:33,295 INFO >> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated >> 64 MB for network buffer pool (number of memory segments: 2048, bytes per >> segment: 32768). >> 17:03:33,554 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Using 0.7 of the currently free heap space for Flink managed heap >> memory (34395 MB). >> >> // almost 40 seconds // >> >> 17:04:12,445 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager >> - I/O manager uses directory >> /tmp/flink-io-922d9bf4-254e-41e7-b151-525157cd5bfe for spill files. >> 17:04:12,455 INFO org.apache.flink.runtime.filecache.FileCache >> - User file cache uses directory >> /tmp/flink-dist-cache-792cf7f2-e2be-4950-a39f-d7a21326f054 >> 17:04:12,617 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Starting TaskManager actor at >> akka://flink/user/taskmanager#1341641688. >> 17:04:12,617 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - TaskManager data connection information: cumu02-05.zib.de >> (dataPort=45731) >> 17:04:12,618 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - TaskManager has 16 task slot(s). >> 17:04:12,618 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Memory usage stats: [HEAP: 35502/49216/49216 MB, NON HEAP: >> 25/52/214 MB (used/committed/max)] >> 17:04:12,623 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Trying to register at JobManager akka.tcp:// >> flink@130.73.144.59:6123/user/jobmanager (attempt 1, timeout: 500 >> milliseconds) >> 17:04:12,773 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Successful registration at JobManager (akka.tcp:// >> flink@130.73.144.59:6123/user/jobmanager), starting network stack and >> library cache. >> ... >> >> >> The same goes for the JM (obviously). >> >> ... >> 17:03:31,632 INFO org.apache.flink.runtime.jobmanager.JobManager >> - Starting JobManger web frontend >> 17:03:31,636 INFO org.apache.flink.runtime.jobmanager.web.WebInfoServer >> - Setting up web info server, using web-root directory >> jar:file:/nfs/csr/bzcschmi/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT/lib/flink-dist-0.10-SNAPSHOT.jar!/web-docs-infoserver. >> 17:03:31,753 INFO org.eclipse.jetty.util.log >> - jetty-0.10-SNAPSHOT >> 17:03:31,806 INFO org.eclipse.jetty.util.log >> - Started SelectChannelConnector@0.0.0.0:8081 >> 17:03:31,806 INFO org.apache.flink.runtime.jobmanager.web.WebInfoServer >> - Started web info server for JobManager on 0.0.0.0:8081 >> >> // almost 35 seconds // >> >> 17:04:05,091 INFO org.apache.flink.runtime.instance.InstanceManager >> - Registered TaskManager at cumu02-02 (akka.tcp:// >> flink@130.73.144.61:53549/user/taskmanager) as >> e5ae92397a912c7360524524cf2d172a. Current number of registered hosts is 1. >> Current number of alive task slots is 16. >> ... >> >> >> Is this to be expected? Any ideas what's happening in the meantime? I'm >> asking because I'm running into errors when submitting my job too early >> (and not enough TMs have connected). >> >> Cheers >> Robert >> >> -- >> My GPG Key ID: 336E2680 >> > > -- My GPG Key ID: 336E2680