Re: Question about 'Structured Streaming'

2017-08-08 Thread Michael Armbrust
> > 1) Parsing data/Schema creation: The Bro IDS logs have a 8 line header > that contains the 'schema' for the data, each log http/dns/etc will have > different columns with different data types. So would I create a specific > CSV reader inherited from the general one? Also I'm assuming this woul

Re: Question about 'Structured Streaming'

2017-08-08 Thread Brian Wylie
I can see your point that you don't really want an external process being used for the streaming data sourceOkay so on the CSV/TSV front, I have two follow up questions: 1) Parsing data/Schema creation: The Bro IDS logs have a 8 line header that contains the 'schema' for the data, each log htt

Re: Question about 'Structured Streaming'

2017-08-08 Thread Michael Armbrust
Cool stuff! A pattern I have seen is to use our CSV/TSV or JSON support to read bro logs, rather than a python library. This is likely to have much better performance since we can do all of the parsing on the JVM without having to flow it though an external python process. On Tue, Aug 8, 2017 at