Hey Jerry,

On Wed, Oct 21, 2015 at 11:11 PM, Jerry Peng <jerry.boyang.p...@gmail.com>
wrote:
>
> When I submit the job, the number of task slots that gets used
> (displayed on the UI) is only 20.  Why is that? The total number of
> tasks listed on the ui is 55.


Do you mean the number of task slots is 55 (you just wrote tasks)?

Each task slot runs a pipeline of parallel sub tasks. In your case the
number of used task slots corresponds to the maximum parallelism of the
job, which is 20. You can have a look at [1]. There is a figure giving an
example.


> And also why does the
> filter->project->flatmap get compress into one operator with a
> parallelism of 20?  Can I not set the individual operators (i.e.
> filter and project) to have an individual parallelism of 20?
>

This is an optimisation, which drastically reduces the overhead for the
data exchange between operators. It skips serialisation and results in a
simple chain of local method calls. This is possible, because all operators
just forward their data. You can disable it via
env.disableOperatorChaining().


Does this help?

– Ufuk

[1]
https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/job_scheduling.html

Reply via email to