Thanks for reply, Till ! Buy the way, If Flink going to support compatibility with Hadoop 2.6 I don't see another way how to achieve it. As I mention before one of popular distributive Cloudera still based on Hadoop 2.6 and it very sad if Flink unsupport it. I really want to help Flink comunity to support this legacy. But currently I see only one way to acheve it by emulate 'truncate' logic and recreate new file with needed lenght and replace old .
Cheers, Artsem On Tue, 21 Aug 2018 at 14:41, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Artsem, > > if I recall correctly, then we explicitly decided to not support the valid > file length files with the new StreamingFileSink because they are really > hard to handle for the user. I've pulled Klou into this conversation who is > more knowledgeable and can give you a bit more advice. > > Cheers, > Till > > On Mon, Aug 20, 2018 at 2:53 PM Artsem Semianenka <artfulonl...@gmail.com> > wrote: > > > I have an idea to create new version of > HadoopRecoverableFsDataOutputStream > > class (for example with name LegacyHadoopRecoverableFsDataOutputStream > :) ) > > which will works with valid-length files without invoking truncate. And > > modify check in HadoopRecoverableWriter to use > > LegacyHadoopRecoverableFsDataOutputStream in case if Hadoop version is > > lower then 2.7 . I will try to provide PR soon if no objections. I hope I > > am on the right way. > > > > On Mon, 20 Aug 2018 at 14:40, Artsem Semianenka <artfulonl...@gmail.com> > > wrote: > > > > > Hi guys ! > > > I have a question regarding new StreamingFileSink (introduced in 1.6 > > > version) . We use this sink to write data into Parquet format. But I > > faced > > > with issue when trying to run job on Yarn cluster and save result to > > HDFS. > > > In our case we use latest Cloudera distributive (CHD 5.15) and it > > contains > > > HDFS 2.6.0 . This version is not support truncate method . I would > like > > to > > > create Pull request but I want to ask your advice how better design > this > > > fix and which ideas are behind this decision . I saw similiar PR for > > > BucketingSink https://github.com/apache/flink/pull/6108 . Maybe I > could > > > also add support of valid-length files for older Hadoop versions ? > > > > > > P.S.Unfortently CHD 5.15 (with Hadoop 2.6) is the latest version of > > > Cloudera distributive and we can't upgrade hadoop to 2.7 Hadoop . > > > > > > Best regards, > > > Artsem > > > > > > > > > -- > > > > С уважением, > > Артем Семененко > > > -- С уважением, Артем Семененко