Thank you for your email. Would then the assumption that this possibility ( part reported length > part file size ( reported by FileStatus on NN) ) is only attributable to this edge case be correct ? Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus on NN recovers ?
On Tue, Mar 26, 2019 at 9:10 AM Paul Lam <paullin3...@gmail.com> wrote: > Hi Vishal, > > I’ve come across the same problem. The problem is that by default the file > length is not updated when the output stream is not closed properly. > I modified the writer to update file lengths on each flush, but it comes > with some overhead, so this approach should be used when strong consistency > is required. > > I’ve just filed a ticket [1], please take a look. > > [1] https://issues.apache.org/jira/browse/FLINK-12022 > > Best, > Paul Lam > > 在 2019年3月12日,09:24,Vishal Santoshi <vishal.santo...@gmail.com> 写道: > > This seems strange. When I pull the ( copyToLocal ) the part file to > local FS, it has the same length as reported by the length file. The > fileStatus from hadoop seems to have a wrong length. > This seems to be true for all these type of discrepancies. It might be > that the block information did not get updated ? > > Either am wondering whether the recover ( the one that does a truncate ) > need to account for the length in the length file or the length reported by > the FileStatus ? > > > On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <vishal.santo...@gmail.com> > wrote: > >> Hello folks, >> I have flink 1.7.2 working with hadoop 2.6 and b'coz >> there is no in build truncate ( in hadoop 2.6 ) I am writing a method to >> cleanup ( truncate ) part files based on the length in the valid-length >> files dropped by flink during restore. I see some thing very strange >> >> hadoop fs -cat >> hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length >> >> *1765887805* >> >> >> >> >> hadoop fs -ls >> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0 >> >> -rw-r--r-- 3 root hadoop *1280845815* 2019-03-07 16:00 >> hdfs://**********/dt=2019-03-07/part-9-0 >> >> I see the valid-length file reporting a larger length then the part >> file itself. >> >> Any clue why would that be the case ? >> >> Regards. >> >> >> >