Hello,

I am new to Apache Flink, so my apologies if this is a common question. I have 
a rather complex operation I'd like to apply to an item in a data set. 
Conceptually, the operation could produce many types of each data, each one 
that I'd like to flow into a different result set.

In Flink, it looks like the output of a flatMap operation must be of the same 
type, so I would need to split my processing up from a complex map operation to 
several to express the flow. For example, I might want to split a data set of 
text lines into words as well as individual characters:

val lines: DataSet[String] = // lines of text
val words = lines.flatMap { _.split(" ") }
val chars = lines.flatMap { _.toCharArray() }

Since "words" and "chars" in the example above have the same input DataSet and 
both have a flatMap operation applied to them, will "lines" only be iterated 
once and have both operations computed simultaneously? The big problem I have 
is that my objects are considerably heavier-weight than lines of text, so I 
really only want to iterate them once while performing multiple operations on 
them.

Thank in advance,

Jon

Reply via email to