I have given 3000 task to mapToPair now its taking so much memory and shuffling and wasting time there. Here is the stats when I run with very small data almost for all data its doing shuffling not sure what is happening here any idea?
- *Total task time across all tasks: *11.0 h - *Shuffle read: *153.8 MB - *Shuffle write: *288.0 MB On 17 April 2015 at 14:32, Jeetendra Gangele <gangele...@gmail.com> wrote: > mapToPair is running with 32 tasks but very slow because lot of shuffles > read. attaching screen shot > each task is running from 10 mins. even Though Inside function i m not > doing anything costly. >