Hi everyone
We have a Flink Job to write files to HDFS's different directories. It will open many files due to its high parallelism. I also found that if using rocksdb state backend, it will have even more files open during the checkpointing. We use yarn to schedule Flink job. However yarn always schedule taskmanagers to the same machine and I cannot control it! So the datanode will get very very high pressure and always throw a "bad link" error. We hava already increase the xiceviers limit of HDFS to 16384
Any idea to solve this problem? reduce the number of opening file or control the yarn scheduling to put taskmanager on different machines!
Thank you very much!
regards
Shengnan