The delay you see happens when the TaskManager allocates the memory for its
memory manager. Allocating that much in a JVM can take a bit, although 40
seconds looks a lot to me...

How do you start the JVM? Are Xmx and Xms set to the same value? If not,
the JVM incrementally grows through multiple garbage collections, which
makes it quite slow.

If the JVM starts with a large heap, it should actually not take as long as
in your case...

On Fri, Oct 2, 2015 at 5:26 PM, Robert Schmidtke <ro.schmid...@gmail.com>
wrote:

> Hi everyone,
>
> I'm wondering about the startup times of the TMs:
>
> ...
> 17:03:33,255 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Starting TaskManager actor
> 17:03:33,262 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig
>       - NettyConfig [server address: cumu02-05/130.73.144.64, server
> port: 45731, memory segment size (bytes): 32768, transport type: NIO,
> number of server threads: 0 (use Netty's default), number of client
> threads: 0 (use Netty's default), server connect backlog: 0 (use Netty's
> default), client connect timeout (sec): 120, send/receive buffer size
> (bytes): 0 (use Netty's default)]
> 17:03:33,266 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Messages between TaskManager and JobManager have a max timeout of
> 100000 milliseconds
> 17:03:33,268 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Temporary file directory '/tmp': total 44 GB, usable 37 GB (84.09%
> usable)
> 17:03:33,295 INFO
>  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated
> 64 MB for network buffer pool (number of memory segments: 2048, bytes per
> segment: 32768).
> 17:03:33,554 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Using 0.7 of the currently free heap space for Flink managed heap
> memory (34395 MB).
>
> // almost 40 seconds //
>
> 17:04:12,445 INFO  org.apache.flink.runtime.io.disk.iomanager.IOManager
>        - I/O manager uses directory
> /tmp/flink-io-922d9bf4-254e-41e7-b151-525157cd5bfe for spill files.
> 17:04:12,455 INFO  org.apache.flink.runtime.filecache.FileCache
>        - User file cache uses directory
> /tmp/flink-dist-cache-792cf7f2-e2be-4950-a39f-d7a21326f054
> 17:04:12,617 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Starting TaskManager actor at
> akka://flink/user/taskmanager#1341641688.
> 17:04:12,617 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - TaskManager data connection information: cumu02-05.zib.de
> (dataPort=45731)
> 17:04:12,618 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - TaskManager has 16 task slot(s).
> 17:04:12,618 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Memory usage stats: [HEAP: 35502/49216/49216 MB, NON HEAP:
> 25/52/214 MB (used/committed/max)]
> 17:04:12,623 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Trying to register at JobManager akka.tcp://
> flink@130.73.144.59:6123/user/jobmanager (attempt 1, timeout: 500
> milliseconds)
> 17:04:12,773 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Successful registration at JobManager (akka.tcp://
> flink@130.73.144.59:6123/user/jobmanager), starting network stack and
> library cache.
> ...
>
>
> The same goes for the JM (obviously).
>
> ...
> 17:03:31,632 INFO  org.apache.flink.runtime.jobmanager.JobManager
>        - Starting JobManger web frontend
> 17:03:31,636 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer
>       - Setting up web info server, using web-root directory
> jar:file:/nfs/csr/bzcschmi/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT/lib/flink-dist-0.10-SNAPSHOT.jar!/web-docs-infoserver.
> 17:03:31,753 INFO  org.eclipse.jetty.util.log
>        - jetty-0.10-SNAPSHOT
> 17:03:31,806 INFO  org.eclipse.jetty.util.log
>        - Started SelectChannelConnector@0.0.0.0:8081
> 17:03:31,806 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer
>       - Started web info server for JobManager on 0.0.0.0:8081
>
> // almost 35 seconds //
>
> 17:04:05,091 INFO  org.apache.flink.runtime.instance.InstanceManager
>       - Registered TaskManager at cumu02-02 (akka.tcp://
> flink@130.73.144.61:53549/user/taskmanager) as
> e5ae92397a912c7360524524cf2d172a. Current number of registered hosts is 1.
> Current number of alive task slots is 16.
> ...
>
>
> Is this to be expected? Any ideas what's happening in the meantime? I'm
> asking because I'm running into errors when submitting my job too early
> (and not enough TMs have connected).
>
> Cheers
> Robert
>
> --
> My GPG Key ID: 336E2680
>

Reply via email to