Hi gary
Our Hadoop version is very low and does not support truncate function. Some people are working on upgrading it. I just think the ".valid-length" file is not friendly enough for users. Maybe it should be configurable to choose whether using the ".valid-length" file at least. Regards, Xinyu Zhang ------------------ ???????? ------------------ ??????: "Gary Yao"<g...@data-artisans.com>; ????????: 2018??5??15??(??????) ????3:31 ??????: "dev"<dev@flink.apache.org>; ????: "Xinyu Zhang"<342689...@qq.com>; ????: Re: Rewriting a new file instead of writing a ".valid-length" file inBucketSink when restoring Hi, The BucketingSink truncates the file if the Hadoop FileSystem supports this operation (Hadoop 2.7 and above) [1]. What version of Hadoop are you using? Best, Gary [1] https://github.com/apache/flink/blob/bcd028d75b0e5c5c691e24640a2196b2fdaf85e0/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L301 On Mon, May 14, 2018 at 1:37 PM, ?????? <342689...@qq.com> wrote: > Hi > > > I'm trying to copy data from kafka to HDFS . The data in HDFS is used to > do other computations by others in map/reduce. > If some tasks failed, the ".valid-length" file is created for the low > version hadoop. The problem is other people must know how to deal with the > ".valid-length" file, otherwise, the data may be not exactly-once. > Hence, why not rewrite a new file when restoring instead of writing a > ".valid-length" file. In this way, others who use the data in HDFS don't > need to know how to deal with the ".valid-length" file. > > > Thanks! > > > Zhang Xinyu