Does 'DataStream.writeAsCsv' suppose to work like this?

Rex Ge Sun, 25 Oct 2015 23:57:07 -0700

Hi, flinkers!

I'm new to this whole thing,
and it seems to me that
' org.apache.flink.streaming.api.datastream.DataStream.writeAsCsv(String,
WriteMode, long)' does not work properly.
To be specific, data were not flushed by update frequency when write to
HDFS.


what make it more disturbing is that, if I check the content with 'hdfs dfs
-cat xxx', sometimes I got partial records.


I did a little digging in flink-0.9.1.
And it turns out all
'org.apache.flink.streaming.api.functions.sink.FileSinkFunction.invoke(IN)'
does
is pushing data to 'org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream'
which is a delegate of  'org.apache.hadoop.fs.FSDataOutputStream'.

In this scenario, 'org.apache.hadoop.fs.FSDataOutputStream' is never
flushed.
Which result in data being held in local buffer, and 'hdfs dfs -cat xxx'
might return partial records.


Does 'DataStream.writeAsCsv' suppose to work like this? Or I messed up
somewhere?


Best regards and thanks for your time!

Rex

Does 'DataStream.writeAsCsv' suppose to work like this?

Reply via email to