Hi
If there are indeed so many files need to upload to hdfs, then currently we
do not have any solutions to limit the open files, there exist an issue[1]
wants to fix this problem, and a pr for it, maybe you can try the attached
pr to try it can solve your problem.

[1] https://issues.apache.org/jira/browse/FLINK-11937
Best,
Congxian


ysnakie <ysna...@hotmail.com> 于2020年4月24日周五 下午11:30写道:

> Hi everyone
> We have a Flink Job to write files to HDFS's different directories. It
> will open many files due to its high parallelism. I also found that if
> using rocksdb state backend, it will have even more files open during the
> checkpointing.  We use yarn to schedule Flink job. However yarn always
> schedule taskmanagers to the same machine and I cannot control it! So the
> datanode will get very very high pressure and always throw a "bad link"
> error.  We hava already increase the xiceviers limit of HDFS to 16384
>
> Any idea to solve this problem? reduce the number of opening file or
> control the yarn scheduling to put taskmanager on different machines!
>
> Thank you very much!
> regards
>
> Shengnan
>
>

Reply via email to