Ok, I will try.

Yingjie Cao <kevin.ying...@gmail.com> 于2021年6月16日周三 下午8:00写道:
>
> Maybe you can try to increase taskmanager.network.retries, 
> taskmanager.network.netty.server.backlog and 
> taskmanager.network.netty.sendReceiveBufferSize. These options are useful for 
> our jobs.
>
> yidan zhao <hinobl...@gmail.com> 于2021年6月16日周三 下午7:10写道:
>>
>> Hi, yingjie.
>> If the network is not stable, which config parameter I should adjust.
>>
>> yidan zhao <hinobl...@gmail.com> 于2021年6月16日周三 下午6:56写道:
>> >
>> > 2: I use G1, and no full gc occurred, young gc count: 422, time:
>> > 142892, so it is not bad.
>> > 3: stream job.
>> > 4: I will try to config taskmanager.network.retries which is default
>> > 0, and taskmanager.network.netty.client.connectTimeoutSec 's default
>> > is 120s。
>> > 5: I checked the net fd number of the taskmanager, it is about 1000+,
>> > so I think it is a reasonable value.
>> >
>> > 1: can not be sure.
>> >
>> > Yingjie Cao <kevin.ying...@gmail.com> 于2021年6月16日周三 下午4:34写道:
>> > >
>> > > Hi yidan,
>> > >
>> > > 1. Is the network stable?
>> > > 2. Is there any GC problem?
>> > > 3. Is it a batch job? If so, please use sort-shuffle, see [1] for more 
>> > > information.
>> > > 4. You may try to config these two options: taskmanager.network.retries, 
>> > > taskmanager.network.netty.client.connectTimeoutSec. More relevant 
>> > > options can be found in 'Data Transport Network Stack' section of [2].
>> > > 5. If it is not the above cases, it is may related to [3], you may need 
>> > > to check the number of tcp connection per TM and node.
>> > >
>> > > Hope this helps.
>> > >
>> > > [1] 
>> > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/batch/blocking_shuffle/
>> > > [2] 
>> > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/
>> > > [3] https://issues.apache.org/jira/browse/FLINK-22643
>> > >
>> > > Best,
>> > > Yingjie
>> > >
>> > > yidan zhao <hinobl...@gmail.com> 于2021年6月16日周三 下午3:36写道:
>> > >>
>> > >> Attachment is the exception stack from flink's web-ui. Does anyone
>> > >> have also met this problem?
>> > >>
>> > >> Flink1.12 - Flink1.13.1.  Standalone Cluster, include 30 containers,
>> > >> each 28G mem.

Reply via email to