You can count the number of elements per key. This allows you to see how they are distributed.
On Sat, Feb 6, 2016 at 1:23 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > And what if I detect some skewness in some task? Do I have to try to call > rebalance()?is there a way to identify the keys causing the skewness? > On 5 Feb 2016 21:33, "Ufuk Celebi" <u...@apache.org> wrote: > >> >> > On 05 Feb 2016, at 16:38, Flavio Pompermaier <pomperma...@okkam.it> >> wrote: >> > >> > Is there an easy way to understand if and when my data get skewed in >> the pipeline? >> >> Yes, the web frontend shows how many bytes and records the sub tasks send >> and receive respectively. Skew would show as some tasks having higher >> numbers than the others. >> >> – Ufuk >> >>