Hi,
Alexis is right. The original data set is only read once and the two
flatMaps run in parallel on multiple machines in the cluster.

Regards,
Robert

On Fri, May 27, 2016 at 11:10 PM, Alexis Gendronneau <
a.gendronn...@gmail.com> wrote:

> Hi Jon,
>
> I'm pretty sure your input will be processed only once. I may be wrong (
> correction needed if so ), but your pipeline should be compiled as :
> source --> flatmap(words) -> result /sink
>           |---> flatmap(chars) -> result /sink
>
> As your input become streamed, each "line" goes through pipeline and is
> processed by each flatmap. These flatmap will produce both a result and
> send it to their sink. So your input is processed twice but at the same
> time.
>
> hope that help
>
> 2016-05-27 22:33 GMT+02:00 Jon Stewart <jstew...@strozfriedberg.com>:
>
>> Hello,
>>
>> I am new to Apache Flink, so my apologies if this is a common question. I
>> have a rather complex operation I'd like to apply to an item in a data set.
>> Conceptually, the operation could produce many types of each data, each one
>> that I'd like to flow into a different result set.
>>
>> In Flink, it looks like the output of a flatMap operation must be of the
>> same type, so I would need to split my processing up from a complex map
>> operation to several to express the flow. For example, I might want to
>> split a data set of text lines into words as well as individual characters:
>>
>> val lines: DataSet[String] = // lines of text
>> val words = lines.flatMap { _.split(" ") }
>> val chars = lines.flatMap { _.toCharArray() }
>>
>> Since "words" and "chars" in the example above have the same input
>> DataSet and both have a flatMap operation applied to them, will "lines"
>> only be iterated once and have both operations computed simultaneously? The
>> big problem I have is that my objects are considerably heavier-weight than
>> lines of text, so I really only want to iterate them once while performing
>> multiple operations on them.
>>
>> Thank in advance,
>>
>> Jon
>>
>>
>
>
> --
> Alexis Gendronneau
>
> alexis.gendronn...@corp.ovh.com
> a.gendronn...@gmail.com
>

Reply via email to