Hello all,

I have a question about parallelism and partitioning in the
DataStreams API.  In Flink, a user can the parallelism of a data
source as well as operators.  So when I set the parallelism of a data
source e.g.

DataStream<String> text =
env.readTextFile(params.get("input")).setParallelism(5)

does this mean that the resulting "text" DataStream in going to be
partitioned into 5 partitions or does it mean that there are going to
be 5 parallel tasks that are going to run for this stage?

If the next operator is:

DataStream<Tuple2<String, Integer>> counts = text.flatMap(new
Tokenizer()).setParallelism(10)

and the parallelism is set to 10.  Are there 10 parallel tasks
consuming from the 5 partitions? and how is the resulting "counts"
DataStream partitioned? into 10 partitions?

Thanks in advance!

Best,

Jerry

Reply via email to