That will depend on what is your transformation , your code snippet might
help .



On Tue, Oct 20, 2015 at 1:53 AM, shahid ashraf <sha...@trialx.com> wrote:

> Hi
>
> Any idea why is 50 GB shuffle read and write for 3.3 gb data
>
> On Mon, Oct 19, 2015 at 11:58 PM, Kartik Mathur <kar...@bluedata.com>
> wrote:
>
>> That sounds like correct shuffle output , in spark map reduce phase is
>> separated by shuffle , in map each executer writes on local disk and in
>> reduce phase reducerS reads data from each executer over the network , so
>> shuffle definitely hurts performance , for more details on spark shuffle
>> phase please read this
>>
>> http://0x0fff.com/spark-architecture-shuffle/
>>
>> Thanks
>> Kartik
>>
>> On Mon, Oct 19, 2015 at 6:54 AM, shahid <sha...@trialx.com> wrote:
>>
>>> @all i did partitionby using default hash partitioner on data
>>> [(1,data)(2,(data),(n,data)]
>>> the total data was approx 3.5 it showed shuffle write 50G and on next
>>> action
>>> e.g count it is showing shuffle read of 50 G. i don't understand this
>>> behaviour and i think the performance is getting slow with so much
>>> shuffle
>>> read on next tranformation operations.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-tp584p25119.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> with Regards
> Shahid Ashraf
>

Reply via email to