Hi Bill, Can you provide more information, such as whether Checkpoint is enabled and whether exact-once is specified, and whether there is back pressure generated in the Flink web UI. Here is a ticket that also gives feedback to this question. [1] Stackoverflow has also been asked the same question, but I don't know if the answer is valid.[2]
[1]: https://issues.apache.org/jira/browse/FLINK-9054 [2]: Https://stackoverflow.com/questions/48276484/flink-throwing-java-lang-runtimeexception-buffer-pool-is-destroyed Thanks, vino. 杨力 <bill.le...@gmail.com> 于2018年9月7日周五 下午1:09写道: > Hi all, > I am encountering a weird problem when running flink 1.6 in yarn per-job > clusters. > The job fails in about half an hour after it starts. Related logs is > attached as an imange. > > This piece of log comes from one of the taskmanagers. There are not any > other related log lines. > No ERROR-level logs. The job just runs for tens of minutes without > printing any logs > and suddenly throws this exception. > > It is reproducable in my production environment, but not in my test > environment. > The 'Buffer pool is destroed' exception is always thrown while emitting > latency marker. >