Hi Vishal, I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly. I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required.
I’ve just filed a ticket [1], please take a look. [1] https://issues.apache.org/jira/browse/FLINK-12022 <https://issues.apache.org/jira/browse/FLINK-12022> Best, Paul Lam > 在 2019年3月12日,09:24,Vishal Santoshi <vishal.santo...@gmail.com> 写道: > > This seems strange. When I pull the ( copyToLocal ) the part file to local > FS, it has the same length as reported by the length file. The fileStatus > from hadoop seems to have a wrong length. > This seems to be true for all these type of discrepancies. It might be that > the block information did not get updated ? > > Either am wondering whether the recover ( the one that does a truncate ) > need to account for the length in the length file or the length reported by > the FileStatus ? > > > On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <vishal.santo...@gmail.com > <mailto:vishal.santo...@gmail.com>> wrote: > Hello folks, > I have flink 1.7.2 working with hadoop 2.6 and b'coz there > is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( > truncate ) part files based on the length in the valid-length files dropped > by flink during restore. I see some thing very strange > > hadoop fs -cat hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length > > 1765887805 > > > > > hadoop fs -ls > hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0 > -rw-r--r-- 3 root hadoop 1280845815 2019-03-07 16:00 > hdfs://**********/dt=2019-03-07/part-9-0 > > I see the valid-length file reporting a larger length then the part file > itself. > > Any clue why would that be the case ? > > Regards. > >