Re: dataset flatmap with multiple output types

2016-05-30 Thread Robert Metzger
Hi, Alexis is right. The original data set is only read once and the two flatMaps run in parallel on multiple machines in the cluster. Regards, Robert On Fri, May 27, 2016 at 11:10 PM, Alexis Gendronneau < a.gendronn...@gmail.com> wrote: > Hi Jon, > > I'm pretty sure your input will be processed

Re: dataset flatmap with multiple output types

2016-05-27 Thread Alexis Gendronneau
Hi Jon, I'm pretty sure your input will be processed only once. I may be wrong ( correction needed if so ), but your pipeline should be compiled as : source --> flatmap(words) -> result /sink |---> flatmap(chars) -> result /sink As your input become streamed, each "line" goes through pi