Hi Vishal,

I’ve come across the same problem. The problem is that by default the file 
length is not updated when the output stream is not closed properly. 
I modified the writer to update file lengths on each flush, but it comes with 
some overhead, so this approach should be used when strong consistency is 
required.

I’ve just filed a ticket [1], please take a look.

[1] https://issues.apache.org/jira/browse/FLINK-12022 
<https://issues.apache.org/jira/browse/FLINK-12022>

Best,
Paul Lam

> 在 2019年3月12日,09:24,Vishal Santoshi <vishal.santo...@gmail.com> 写道:
> 
> This seems strange.  When I pull the ( copyToLocal ) the part file to local 
> FS, it has the same length as reported by the length file. The fileStatus 
> from hadoop seems to have a wrong length. 
> This seems to be true for all these type of discrepancies. It might be that 
> the block information did not get updated ? 
> 
> Either am wondering whether the recover ( the one that does a truncate )  
> need to account for the length in the length file or the length reported by 
> the FileStatus ? 
> 
> 
> On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <vishal.santo...@gmail.com 
> <mailto:vishal.santo...@gmail.com>> wrote:
> Hello folks,
>                  I have flink 1.7.2 working with hadoop 2.6 and b'coz there 
> is no in build truncate ( in hadoop 2.6 )  I am writing a method to cleanup ( 
> truncate ) part files based on the length in the valid-length files dropped 
> by flink during restore. I see some thing very strange 
> 
> hadoop fs -cat  hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length
> 
> 1765887805
> 
> 
> 
> 
>  hadoop fs -ls  
> hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0
> -rw-r--r--   3 root hadoop 1280845815 2019-03-07 16:00 
> hdfs://**********/dt=2019-03-07/part-9-0
> 
>  I see the  valid-length  file reporting a larger length then the part file 
> itself.
> 
> Any clue why would that be the case ? 
> 
> Regards.
> 
> 

Reply via email to