On 23 Jun 2015, at 13:53, Stephan Ewen <se...@apache.org> wrote:

> Currently, Flink does not cache anything across runs, except JAR files on the 
> workers.
> 
> The reason the first run is slower may be:
>  - Because in the first run, code is distributed in the cluster. In 
> subsequent runs, the JAR files need not be redistributed.
>  - Because the JIT takes a bit to kick in and compile code in the first run. 
> In subsequent runs, the code is already JIT-ted.
> 
> 
> The system should not freeze after 100 runs. Can you tell us a bit more of 
> what you see? Can you identify which process hangs and send us a stack-trace 
> of that one? Then we could look into this...

If you have access to the task manager instances, you can do a `jps` to get the 
PID of the task manager and then you can do `jstack PID`.

$ jps
16242 Jps
89107 TaskManager
$ jstack 89107
[stack trace]

Would be great if you could share this after the task managers freeze.

- Can you also provide some information on your setup (what job? how many task 
managers? etc.) so that I can try to reproduce this?

Reply via email to