Re: JobManager not reachable

2015-10-15 Thread Matthias J. Sax
Please find the logs here http://www2.informatik.hu-berlin.de/~saxmatti/flink-mjsax-jobmanager-0-dbis21.log http://www2.informatik.hu-berlin.de/~saxmatti/flink-mjsax-taskmanager-0-dbis34.log -Matthias On 10/15/2015 12:16 PM, Stephan Ewen wrote: > Blocking actor calls should not be an issue (eve

Re: JobManager not reachable

2015-10-15 Thread Stephan Ewen
Blocking actor calls should not be an issue (even if they are there), because the heartbeats go between the actor systems, rather than the actors... On Thu, Oct 15, 2015 at 12:14 PM, Till Rohrmann wrote: > And please set akka.log.lifecycle.events: true to let Akka log also its > lifecycle events

Re: JobManager not reachable

2015-10-15 Thread Till Rohrmann
And please set akka.log.lifecycle.events: true to let Akka log also its lifecycle events. ​ On Thu, Oct 15, 2015 at 12:12 PM, Robert Metzger wrote: > Can you start flink with logging level DEBUG ? > Then we can see from the TaskManager logs when the TM became inactive. > Maybe an Akka message is

Re: JobManager not reachable

2015-10-15 Thread Robert Metzger
Can you start flink with logging level DEBUG ? Then we can see from the TaskManager logs when the TM became inactive. Maybe an Akka message is causing the actor to block? You can also monitor the GC from the TaskManager view in the web interface (for example by looking at the total time spend for

Re: JobManager not reachable

2015-10-15 Thread Stephan Ewen
Does not quite sound like GC is an issue. Hmmm, what else can make the failure detector kick in unexpectedly? On Thu, Oct 15, 2015 at 12:05 PM, Till Rohrmann wrote: > To verify wether GC is a problem you can enable logging of memory usage of > the JVM via taskmanager.debug.memory.startLogThread

Re: JobManager not reachable

2015-10-15 Thread Till Rohrmann
To verify wether GC is a problem you can enable logging of memory usage of the JVM via taskmanager.debug.memory.startLogThread: true. The interval of the logging is configured via taskmanager.debug.memory.logIntervalMs. ​ On Thu, Oct 15, 2015 at 12:00 PM, Matthias J. Sax wrote: > The problem is

Re: JobManager not reachable

2015-10-15 Thread Matthias J. Sax
The problem is reproducible (it happens on each run). I doubt that GC is an issue here (at least from an UDF point of view), because I read the file once and keep a String object for each line. This objects are kept to the very end; the UDF does not release them until it returns from "run()" metho

Re: JobManager not reachable

2015-10-15 Thread Stephan Ewen
>From what the logs show, the TaskManager does not send pings any more for a long time and is then considered failed and the tasks running on that TaskManager are considered failed as well. So far, nothing unusual... Question is, why is it considered failed? Is this a reproducible problem? Or a on

Re: JobManager not reachable

2015-10-14 Thread Matthias J. Sax
One thing I forgot the add. I also have a Storm-WordCount job (build via FlinkTopologyBuilder) that uses the same "buffer-file-and-emit-over-and-over-again-pattern" in a spout. This job run just fine and stops regularly after 5 minutes. -Matthias On 10/14/2015 10:42 PM, Matthias J. Sax wrote: >

Re: JobManager not reachable

2015-10-14 Thread Matthias J. Sax
No. See log below. Btw: the job is not cleaned up properly. Some task remain in state "Canceling". The program I execute is "Streaming WordCount" example with my own source function. This custom source (see below), reads a local (small) file, bufferes each line in an internal buffer, and emits th

Re: JobManager not reachable

2015-10-14 Thread Ufuk Celebi
> On 11 Oct 2015, at 23:54, Stephan Ewen wrote: > > Can you see is there is anything unusual in the JobManager logs? Ping. :)

Re: JobManager not reachable

2015-10-11 Thread Stephan Ewen
Can you see is there is anything unusual in the JobManager logs? Am 11.10.2015 18:56 schrieb "Matthias J. Sax" : > Hi, > > I was just playing arround with Flink. After submitting my job, it runs > for multiple minutes, until I get the following Exception in one if the > TaskManager logs and the jo

JobManager not reachable

2015-10-11 Thread Matthias J. Sax
Hi, I was just playing arround with Flink. After submitting my job, it runs for multiple minutes, until I get the following Exception in one if the TaskManager logs and the job fails. I have no clue what's going on... -Matthias > 18:43:23,567 WARN akka.remote.RemoteWatcher