Re: Batch version of StreamingFileSink.forRowFormat(...)

Timo Walther Tue, 11 Aug 2020 01:33:59 -0700

Hi Dan,

InputFormats are the connectors of the DataSet API. Yes, you can useeither readFile, readCsvFile, readFileOfPrimitives etc. However, I wouldrecommend to also give Table API a try. The unified TableEnvironment isable to perform batch processing and is integrated with a bunch ofconnectors such as for filesystems [1] and through Hive abstractions [2].


I hope this helps.

Regards,
Timo

[1]https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/filesystem.html[2]https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_read_write.html


On 11.08.20 00:13, Dan Hill wrote:

Hi. I have a streaming job that writes toStreamingFileSink.forRowFormat(...) with an encoder that convertsprotocol buffers to byte arrays.
How do read this data back in during a batch pipeline (using DataSet)?Do I use env.readFile with a custom DelimitedInputFormat? Thestreamfile sink documentation<https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html>is a bit vague.
These files are used as raw logs. They're processed offline and thewhole record is read and used at the same time.
Thanks!
- Dan
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html>

Re: Batch version of StreamingFileSink.forRowFormat(...)

Reply via email to