Hi,

I think the problem in the attched image is not the root cause of your job 
failure. It must exist other task or TaskManager failures, then all the related 
tasks will be cancelled by job manager, and the problem in attched image is 
just caused by task cancelled.

You can review the log of job manager to check whether there are any failures 
to cause failing the whole job.
 FYI, the task manager may be killed by yarn because of memory exceed. You 
mentioned the job fails in half an hour after starts, so I guess it exits the 
possibility that the task manager is killed by yarn.

Best,
Zhijiang
------------------------------------------------------------------
发件人:杨力 <bill.le...@gmail.com>
发送时间:2018年9月7日(星期五) 13:09
收件人:user <user@flink.apache.org>
主 题:Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed.

Hi all,
I am encountering a weird problem when running flink 1.6 in yarn per-job 
clusters.
The job fails in about half an hour after it starts. Related logs is attached 
as an imange.

This piece of log comes from one of the taskmanagers. There are not any other 
related log lines.
No ERROR-level logs. The job just runs for tens of minutes without printing any 
logs
and suddenly throws this exception.

It is reproducable in my production environment, but not in my test environment.
The 'Buffer pool is destroed' exception is always thrown while emitting latency 
marker.

Reply via email to