I have given 3000 task to mapToPair now its taking so much memory and
shuffling and wasting time there. Here is the stats when I run with very
small data almost for all data its doing shuffling not sure what is
happening here any idea?


   - *Total task time across all tasks: *11.0 h
   - *Shuffle read: *153.8 MB
   - *Shuffle write: *288.0 MB


On 17 April 2015 at 14:32, Jeetendra Gangele <gangele...@gmail.com> wrote:

> mapToPair is running with 32 tasks but very slow because lot of shuffles
> read. attaching screen shot
> each task is running from 10 mins. even Though Inside function i m not
> doing anything costly.
>

Reply via email to