Thanks! I'll take a look. On Tue, Aug 11, 2020 at 1:33 AM Timo Walther <twal...@apache.org> wrote:
> Hi Dan, > > InputFormats are the connectors of the DataSet API. Yes, you can use > either readFile, readCsvFile, readFileOfPrimitives etc. However, I would > recommend to also give Table API a try. The unified TableEnvironment is > able to perform batch processing and is integrated with a bunch of > connectors such as for filesystems [1] and through Hive abstractions [2]. > > I hope this helps. > > Regards, > Timo > > [1] > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/filesystem.html > [2] > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_read_write.html > > On 11.08.20 00:13, Dan Hill wrote: > > Hi. I have a streaming job that writes to > > StreamingFileSink.forRowFormat(...) with an encoder that converts > > protocol buffers to byte arrays. > > > > How do read this data back in during a batch pipeline (using DataSet)? > > Do I use env.readFile with a custom DelimitedInputFormat? The > > streamfile sink documentation > > < > https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html> > > > is a bit vague. > > > > These files are used as raw logs. They're processed offline and the > > whole record is read and used at the same time. > > > > Thanks! > > - Dan > > < > https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html > > > >