StreamingFileSink with hdfs less than 2.7

Rinat Mon, 17 Jun 2019 02:30:45 -0700

Hi mates, I decided to enable persist the state of our flink jobs, that write 
data into hdfs, but got some troubles with that.


I’m trying to use StreamingFileSink with cloudera hadoop, which version is 
2.6.5,  and it doesn’t contain truncate method.

So, job fails immediately when it’s trying to start, when trying to initialize 
HadoopRecoverableWriter. Because it only works with hadoop fs, greater or 
equals than 2.7

Do you have any plans to adopt recovery for hadoop file systems, that doesn’t 
contain truncate method, or how I can workaround such limitation ?

If workaround does not exist, than the following behaviour will be good enough:

get a path to the file, that should be restored
get a valid-length from the state
create a temporary directory and write stream from the restoring file into tmp 
until the valid-length is not reached
replace the restoring file with the file from tmp catalog
move file to the final state

what do you think about it ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru>
mobile: +7 (925) 416-37-26

CleverDATA
make your data clever

StreamingFileSink with hdfs less than 2.7

Reply via email to