Hey Jerry, On Wed, Oct 21, 2015 at 11:11 PM, Jerry Peng <jerry.boyang.p...@gmail.com> wrote: > > When I submit the job, the number of task slots that gets used > (displayed on the UI) is only 20. Why is that? The total number of > tasks listed on the ui is 55.
Do you mean the number of task slots is 55 (you just wrote tasks)? Each task slot runs a pipeline of parallel sub tasks. In your case the number of used task slots corresponds to the maximum parallelism of the job, which is 20. You can have a look at [1]. There is a figure giving an example. > And also why does the > filter->project->flatmap get compress into one operator with a > parallelism of 20? Can I not set the individual operators (i.e. > filter and project) to have an individual parallelism of 20? > This is an optimisation, which drastically reduces the overhead for the data exchange between operators. It skips serialisation and results in a simple chain of local method calls. This is possible, because all operators just forward their data. You can disable it via env.disableOperatorChaining(). Does this help? – Ufuk [1] https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/job_scheduling.html