Re: Keys distribution insights

Aljoscha Krettek Tue, 06 Jun 2017 06:49:08 -0700

Hi,

There is no way of doing it with any Flink UI but you could try and do it 
manually: in your job, instead of doing the actual computation just count how 
many elements you have per key (in your GroupReduce). Then put a MapPartition 
right after the GroupReduce (which should preserve the same partitioning) and 
inside that see what keys you have and how many elements you had per key. With 
this you know which partition, i.e. which parallel instance had which keys and 
how many they were.


Best,
Aljoscha

> On 5. Jun 2017, at 12:01, Flavio Pompermaier <pomperma...@okkam.it> wrote:
> 
> Hi everybody,
> in my job I have a groupReduce operator with parallelism 4 and one of the 
> sub-tasks takes a huge amount of time (wrt the others).
> My guess is that the objects assigned to that slot have much more data to 
> reduce (an thus are somehow computationally heavy within the groupReduce 
> operator). 
> What I'm trying to understand which keys are assigned to that slot: is there 
> any way (from the JobManager UI or from the logs) to investigate the keys 
> distribution (that from the plan visualization is the result of an hash 
> partition)?
> 
> Best,
> Flavio

Re: Keys distribution insights

Reply via email to