subject:"Re\: dataset flatmap with multiple output types"

Re: dataset flatmap with multiple output types

2016-05-30 Thread Robert Metzger

Hi, Alexis is right. The original data set is only read once and the two flatMaps run in parallel on multiple machines in the cluster. Regards, Robert On Fri, May 27, 2016 at 11:10 PM, Alexis Gendronneau < a.gendronn...@gmail.com> wrote: > Hi Jon, > > I'm pretty sure your input will be processed

Re: dataset flatmap with multiple output types

2016-05-27 Thread Alexis Gendronneau

Hi Jon, I'm pretty sure your input will be processed only once. I may be wrong ( correction needed if so ), but your pipeline should be compiled as : source --> flatmap(words) -> result /sink |---> flatmap(chars) -> result /sink As your input become streamed, each "line" goes through pi