Hello Piotrek,
thank you for your answer. I installed a Flink on a local cluster and
used the GUI in order to monitor the task managers. It seems the program
*d**oes not start at all*. The whole time just the job manager is
struggling... For very very toy examples, after a long time (during this
time I see the job manager logs as I mentioned before), the job is
started and can be executed in 2 seconds.
Best,
Alieh
On 12/07/2018 10:43 AM, Piotr Nowojski wrote:
Hi,
Please investigate logs/standard output/error from the task manager that has
failed (the logs that you showed are from job manager). Probably there is some
obvious error/exception explaining why has it failed. Most common reasons:
- out of memory
- long GC pause
- seg fault or other error from some native library
- task manager killed via for example SIGKILL
Piotrek
On 6 Dec 2018, at 17:34, Alieh <sae...@informatik.uni-leipzig.de> wrote:
Hello all,
I have an algorithm x () which contains several joins and usage of 3 times of
gelly ConnectedComponents. The problem is that if I call x() inside a script
more than three times, I receive the messages listed below in the log and the
program is somehow stopped. It happens even if I run it with a toy example of a
graph with less that 10 vertices. Do you have any clue what is the problem?
Cheers,
Alieh
129149 [flink-akka.actor.default-dispatcher-20] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger
heartbeat request.
129149 [flink-akka.actor.default-dispatcher-20] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger
heartbeat request.
129150 [flink-akka.actor.default-dispatcher-20] DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat
request from e80ec35f3d0a04a68000ecbdc555f98b.
129150 [flink-akka.actor.default-dispatcher-22] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Received
heartbeat from 78cdd7a4-0c00-4912-992f-a2990a5d46db.
129151 [flink-akka.actor.default-dispatcher-22] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Received
new slot report from TaskManager 78cdd7a4-0c00-4912-992f-a2990a5d46db.
129151 [flink-akka.actor.default-dispatcher-22] DEBUG
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received
slot report from instance 4c3e3654c11b09fbbf8e993a08a4c2da.
129200 [flink-akka.actor.default-dispatcher-15] DEBUG
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Release
TaskExecutor 4c3e3654c11b09fbbf8e993a08a4c2da because it exceeded the idle
timeout.
129200 [flink-akka.actor.default-dispatcher-15] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Worker
78cdd7a4-0c00-4912-992f-a2990a5d46db could not be stopped.