Thanks Nicolae , So In my case all executers are sending results back to the driver and and " *shuffle* *is just sending out the textFile to distribute the partitions", *could you please elaborate on this ? what exactly is in this file ?
On Wed, Sep 30, 2015 at 9:57 PM, Nicolae Marasoiu < nicolae.maras...@adswizz.com> wrote: > > Hi, > > 2- the end results are sent back to the driver; the shuffles are > transmission of intermediate results between nodes such as the -> which are > all intermediate transformations. > > More precisely, since flatMap and map are narrow dependencies, meaning > they can usually happen on the local node, I bet shuffle is just sending > out the textFile to a few nodes to distribute the partitions. > > > ------------------------------ > *From:* Kartik Mathur <kar...@bluedata.com> > *Sent:* Thursday, October 1, 2015 12:42 AM > *To:* user > *Subject:* Problem understanding spark word count execution > > Hi All, > > I tried running spark word count and I have couple of questions - > > I am analyzing stage 0 , i.e > *sc.textFile -> flatMap -> Map (Word count example)* > > 1) In the *Stage logs* under Application UI details for every task I am > seeing Shuffle write as 2.7 KB, *question - how can I know where all did > this task write ? like how many bytes to which executer ?* > > 2) In the executer's log when I look for same task it says 2000 bytes of > result is sent to driver , my question is , *if the results were directly > sent to driver what is this shuffle write ? * > > Thanks, > Kartik >