Hi,
Alexis is right. The original data set is only read once and the two
flatMaps run in parallel on multiple machines in the cluster.
Regards,
Robert
On Fri, May 27, 2016 at 11:10 PM, Alexis Gendronneau <
a.gendronn...@gmail.com> wrote:
> Hi Jon,
>
> I'm pretty sure your input will be processed
Hi Jon,
I'm pretty sure your input will be processed only once. I may be wrong (
correction needed if so ), but your pipeline should be compiled as :
source --> flatmap(words) -> result /sink
|---> flatmap(chars) -> result /sink
As your input become streamed, each "line" goes through pi