Hi guys, After investigating a bit more about this topic, we found a solution adding a small change in the Flink-1.3.2 source code.
We found that the issue occurred when different threads tried to build the Tuple2<JobGraph, ClassLoader> object at the same time (due to they use the static ExecutionEnvironmnet variable mentioned before in this thread). So, we just used a semaphore to lock threads at that point. We have done some perf suites in our side, and this seems to be working fine (you can take a look at the code at the attached file, and if you decide to include it in the flink source code we can create a pull request). JarRunHandler.java <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1051/JarRunHandler.java> However, after getting this, we have noticed we don´t get too much performance improvement in Flink. Besides knowing that the semaphore we´ve added would add some latency, we have also realised that some task managers are not using all their capacity at all. We have increased the CPU and Memory capacity for the Flink instance and the same, we don´t realise too much improvement. Do you have any clue or hint to make Flink use better the resources? Thanks in advance -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/