Hi Till
Thanks for your suggestion. A small tool can work lightly and asynchronously. However, I don't know when others will use the data, so I should use the tool to check and truncate the finished file once a valid-length file is found. I think it's hard to maintain it and it shouldn't be maintained by users (just like the current implementation of BucketingSink with truncate function). Regards, Zhang Xinyu ------------------ ???????? ------------------ ??????: "Till Rohrmann"<trohrm...@apache.org>; ????????: 2018??5??15??(??????) ????11:27 ??????: "dev"<dev@flink.apache.org>; ????: "kkloudas"<kklou...@apache.org>; ????: Re: ?????? Rewriting a new file instead of writing a ".valid-length" file inBucketSink when restoring Hi Xinyu, would it help to have a small tool which can truncate the finished files which have a valid-length file associated? That way, one could use this tool before others are using the data farther down stream. Cheers, Till On Tue, May 15, 2018 at 3:05 PM, Xinyu Zhang <342689...@qq.com> wrote: > Yes, I'm glad to do it. but I'm not sure writing a new file is a good > solution. So I want to discuss it here. > Do you have any ideas? @Kostas > > > > > ------------------ ???????? ------------------ > ??????: "twalthr"<twal...@apache.org>; > ????????: 2018??5??15??(??????) ????8:21 > ??????: "Xinyu Zhang"<342689...@qq.com>; > ????: "dev"<dev@flink.apache.org>; "kkloudas"<kklou...@apache.org>; > ????: Re: ?????? Rewriting a new file instead of writing a ".valid-length" > file > inBucketSink when restoring > > > > As far as I know, the bucketing sink is currenlty also limited by > relying on Hadoops file system abstraction. It is planned to switch to > Flink's file system abstraction which might also improve this situation. > Kostas (in CC) might know more about it. > > But I think we can discuss if an other behavior should be configurable > as well. Would you be willing to contribute? > > Regards, > Timo > > > Am 15.05.18 um 14:01 schrieb Xinyu Zhang: > > Thanks for your reply. > > Indeed, if a file is very large, it will take a long time. However, > > the the ??.valid-length?? file is not convenient for others who use the > > data in HDFS. > > Maybe we should provide a configuration for users to choose which > > strategy they prefer. > > Do you have any ideas? > > > > > > ------------------ ???????? ------------------ > > *??????:* "Timo Walther"<twal...@apache.org>; > > *????????:* 2018??5??15??(??????) ????7:30 > > *??????:* "dev"<dev@flink.apache.org>; > > *????:* Re: Rewriting a new file instead of writing a ".valid-length" > > file inBucketSink when restoring > > > > I guess writing a new file would take much longer than just using the > > .valid-length file, especially if the files are very large. The > > restoring time should be as minimal as possible to ensure little > > downtime on restarts. > > > > Regards, > > Timo > > > > > > Am 15.05.18 um 09:31 schrieb Gary Yao: > > > Hi, > > > > > > The BucketingSink truncates the file if the Hadoop FileSystem > > supports this > > > operation (Hadoop 2.7 and above) [1]. What version of Hadoop are you > > using? > > > > > > Best, > > > Gary > > > > > > [1] > > > > > https://github.com/apache/flink/blob/bcd028d75b0e5c5c691e24640a2196 > b2fdaf85e0/flink-connectors/flink-connector-filesystem/ > src/main/java/org/apache/flink/streaming/connectors/fs/ > bucketing/BucketingSink.java#L301 > > > > > > On Mon, May 14, 2018 at 1:37 PM, ?????? <342689...@qq.com> wrote: > > > > > >> Hi > > >> > > >> > > >> I'm trying to copy data from kafka to HDFS . The data in HDFS is > > used to > > >> do other computations by others in map/reduce. > > >> If some tasks failed, the ".valid-length" file is created for the low > > >> version hadoop. The problem is other people must know how to deal > > with the > > >> ".valid-length" file, otherwise, the data may be not exactly-once. > > >> Hence, why not rewrite a new file when restoring instead of writing a > > >> ".valid-length" file. In this way, others who use the data in HDFS > > don't > > >> need to know how to deal with the ".valid-length" file. > > >> > > >> > > >> Thanks! > > >> > > >> > > >> Zhang Xinyu > > >