@all i did partitionby using default hash partitioner on data [(1,data)(2,(data),(n,data)] the total data was approx 3.5 it showed shuffle write 50G and on next action e.g count it is showing shuffle read of 50 G. i don't understand this behaviour and i think the performance is getting slow with so much shuffle read on next tranformation operations.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-tp584p25119.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org