Hello all, I have a question about parallelism and partitioning in the DataStreams API. In Flink, a user can the parallelism of a data source as well as operators. So when I set the parallelism of a data source e.g.
DataStream<String> text = env.readTextFile(params.get("input")).setParallelism(5) does this mean that the resulting "text" DataStream in going to be partitioned into 5 partitions or does it mean that there are going to be 5 parallel tasks that are going to run for this stage? If the next operator is: DataStream<Tuple2<String, Integer>> counts = text.flatMap(new Tokenizer()).setParallelism(10) and the parallelism is set to 10. Are there 10 parallel tasks consuming from the 5 partitions? and how is the resulting "counts" DataStream partitioned? into 10 partitions? Thanks in advance! Best, Jerry