Hi Till,

thanks for the clarification. It all makes sense now.

So the keyBy call is more a partitioning scheme and less of an operator, 
similar to Storm's field grouping, and Flink's other schemes such as forward 
and broadcast. The difference is that it produces KeyedStreams, which are a 
prerequisite for certain types of transformations.

Regards
Leon


8. Jun 2016 14:05 by trohrm...@apache.org:


> Hi Leon,
> yes, you're right. The plan visualization shows the actual tasks. Each task 
> can contain one or more (if chained) operators. A task is split into 
> sub-tasks which are executed on the TaskManagers. A TaskManager slot can 
> accommodate one subtask of each task (if the task has not been assigned to 
> a different slot sharing group). Thus (per default) the number of required 
> slots is defined by the operator with the highest parallelism.
> If you click on the different tasks, then you see in the list view at the 
> bottom of the page, where the individual sub-tasks have been deployed to.
> The keyBy API call is actually not realized as a distinct Flink operator at 
> runtime. Instead, the keyBy transformation influences how a downstream 
> operator is connected with the upstream operator (pointwise or all to all). 
> Furthermore, the keyBy transformation sets a special StreamPartitioner 
> (HashPartitioner) which is used by the StreamRecordWriters to select the 
> output channels (receiving sub tasks) for the current stream record. That 
> is the reason why you don't see the keyBy transformation in the plan 
> visualization. Consequently, the illustration on our website is not totally 
> consistent with the actual implementation.
>
> Cheers,> Till
> On Wed, Jun 8, 2016 at 11:01 AM,  <> leon_mcl...@tutanota.com> > wrote:
>
>>           >> I am using the dashboard to inspect my multi stage pipeline. 
>> I cannot seem to find a manual or other description for the dashboard 
>> aside from the quickstart section (>> 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html>>
>>  
>> ).
>>
>> I would like to know how approximately my physical layout (in terms of 
>> task slots, tasks and task chaining) looks like.  I am assuming that each 
>> block in the attached topology plan is an operation that will execute as 
>> multiple parallel tasks based on the assigned parallelism. Multilevel 
>> operator chaining has been applied in most cases. The plan then occupies 3 
>> task slots, as dictated by the highest DOP.  Is this correct?
>>
>> What i am also unsure about is that i have keyBy transformations in my 
>> code, yet they are not shown as transformations in the plan. The 
>> corresponding section on workers and slots (>> 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#workers-slots-resources>>
>>  
>> ) however explicitly shows keyBy in the layout.
>>
>> Regards
>> Leon
>>
>
> 

Reply via email to