Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

Yingjie Cao Wed, 16 Jun 2021 01:34:43 -0700

Hi yidan,

1. Is the network stable?
2. Is there any GC problem?
3. Is it a batch job? If so, please use sort-shuffle, see [1] for more
information.
4. You may try to config these two options: taskmanager.network.retries,
taskmanager.network.netty.client.connectTimeoutSec. More relevant options
can be found in 'Data Transport Network Stack' section of [2].
5. If it is not the above cases, it is may related to [3], you may need to
check the number of tcp connection per TM and node.


Hope this helps.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/batch/blocking_shuffle/
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/
[3] https://issues.apache.org/jira/browse/FLINK-22643

Best,
Yingjie

yidan zhao <hinobl...@gmail.com> 于2021年6月16日周三 下午3:36写道：

> Attachment is the exception stack from flink's web-ui. Does anyone
> have also met this problem?
>
> Flink1.12 - Flink1.13.1.  Standalone Cluster, include 30 containers,
> each 28G mem.
>

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

Reply via email to