You can count the number of elements per key. This allows you to see how
they are distributed.

On Sat, Feb 6, 2016 at 1:23 PM, Flavio Pompermaier <pomperma...@okkam.it>
wrote:

> And what if I detect some skewness in some task? Do I have to try to call
> rebalance()?is there a way to identify the keys causing the skewness?
> On 5 Feb 2016 21:33, "Ufuk Celebi" <u...@apache.org> wrote:
>
>>
>> > On 05 Feb 2016, at 16:38, Flavio Pompermaier <pomperma...@okkam.it>
>> wrote:
>> >
>> > Is there an easy way to understand if and when my data get skewed in
>> the pipeline?
>>
>> Yes, the web frontend shows how many bytes and records the sub tasks send
>> and receive respectively. Skew would show as some tasks having higher
>> numbers than the others.
>>
>> – Ufuk
>>
>>

Reply via email to