Hi, Have you checked task managers logs?
Piotrek > On 8 Dec 2018, at 12:23, Alieh <sae...@informatik.uni-leipzig.de> wrote: > > Hello Piotrek, > > thank you for your answer. I installed a Flink on a local cluster and used > the GUI in order to monitor the task managers. It seems the program does not > start at all. The whole time just the job manager is struggling... For very > very toy examples, after a long time (during this time I see the job manager > logs as I mentioned before), the job is started and can be executed in 2 > seconds. > > Best, > > Alieh > > > On 12/07/2018 10:43 AM, Piotr Nowojski wrote: >> Hi, >> >> Please investigate logs/standard output/error from the task manager that has >> failed (the logs that you showed are from job manager). Probably there is >> some obvious error/exception explaining why has it failed. Most common >> reasons: >> - out of memory >> - long GC pause >> - seg fault or other error from some native library >> - task manager killed via for example SIGKILL >> >> Piotrek >> >>> On 6 Dec 2018, at 17:34, Alieh <sae...@informatik.uni-leipzig.de> >>> <mailto:sae...@informatik.uni-leipzig.de> wrote: >>> >>> Hello all, >>> >>> I have an algorithm x () which contains several joins and usage of 3 times >>> of gelly ConnectedComponents. The problem is that if I call x() inside a >>> script more than three times, I receive the messages listed below in the >>> log and the program is somehow stopped. It happens even if I run it with a >>> toy example of a graph with less that 10 vertices. Do you have any clue >>> what is the problem? >>> >>> Cheers, >>> >>> Alieh >>> >>> >>> 129149 [flink-akka.actor.default-dispatcher-20] DEBUG >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >>> Trigger heartbeat request. >>> 129149 [flink-akka.actor.default-dispatcher-20] DEBUG >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >>> Trigger heartbeat request. >>> 129150 [flink-akka.actor.default-dispatcher-20] DEBUG >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat >>> request from e80ec35f3d0a04a68000ecbdc555f98b. >>> 129150 [flink-akka.actor.default-dispatcher-22] DEBUG >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >>> Received heartbeat from 78cdd7a4-0c00-4912-992f-a2990a5d46db. >>> 129151 [flink-akka.actor.default-dispatcher-22] DEBUG >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >>> Received new slot report from TaskManager >>> 78cdd7a4-0c00-4912-992f-a2990a5d46db. >>> 129151 [flink-akka.actor.default-dispatcher-22] DEBUG >>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received >>> slot report from instance 4c3e3654c11b09fbbf8e993a08a4c2da. >>> 129200 [flink-akka.actor.default-dispatcher-15] DEBUG >>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Release >>> TaskExecutor 4c3e3654c11b09fbbf8e993a08a4c2da because it exceeded the idle >>> timeout. >>> 129200 [flink-akka.actor.default-dispatcher-15] DEBUG >>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Worker >>> 78cdd7a4-0c00-4912-992f-a2990a5d46db could not be stopped. >>> >> >