Ok, I will try.
Yingjie Cao <kevin.ying...@gmail.com> 于2021年6月16日周三 下午8:00写道: > > Maybe you can try to increase taskmanager.network.retries, > taskmanager.network.netty.server.backlog and > taskmanager.network.netty.sendReceiveBufferSize. These options are useful for > our jobs. > > yidan zhao <hinobl...@gmail.com> 于2021年6月16日周三 下午7:10写道: >> >> Hi, yingjie. >> If the network is not stable, which config parameter I should adjust. >> >> yidan zhao <hinobl...@gmail.com> 于2021年6月16日周三 下午6:56写道: >> > >> > 2: I use G1, and no full gc occurred, young gc count: 422, time: >> > 142892, so it is not bad. >> > 3: stream job. >> > 4: I will try to config taskmanager.network.retries which is default >> > 0, and taskmanager.network.netty.client.connectTimeoutSec 's default >> > is 120s。 >> > 5: I checked the net fd number of the taskmanager, it is about 1000+, >> > so I think it is a reasonable value. >> > >> > 1: can not be sure. >> > >> > Yingjie Cao <kevin.ying...@gmail.com> 于2021年6月16日周三 下午4:34写道: >> > > >> > > Hi yidan, >> > > >> > > 1. Is the network stable? >> > > 2. Is there any GC problem? >> > > 3. Is it a batch job? If so, please use sort-shuffle, see [1] for more >> > > information. >> > > 4. You may try to config these two options: taskmanager.network.retries, >> > > taskmanager.network.netty.client.connectTimeoutSec. More relevant >> > > options can be found in 'Data Transport Network Stack' section of [2]. >> > > 5. If it is not the above cases, it is may related to [3], you may need >> > > to check the number of tcp connection per TM and node. >> > > >> > > Hope this helps. >> > > >> > > [1] >> > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/batch/blocking_shuffle/ >> > > [2] >> > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/ >> > > [3] https://issues.apache.org/jira/browse/FLINK-22643 >> > > >> > > Best, >> > > Yingjie >> > > >> > > yidan zhao <hinobl...@gmail.com> 于2021年6月16日周三 下午3:36写道: >> > >> >> > >> Attachment is the exception stack from flink's web-ui. Does anyone >> > >> have also met this problem? >> > >> >> > >> Flink1.12 - Flink1.13.1. Standalone Cluster, include 30 containers, >> > >> each 28G mem.