Thanks a lot for clarifying this. This explains why there is less serialization happening with lesser parallelism. There would be less network communication, and hence less serialization right?
But then if we compare 100 cores in local mode vs. 10 nodes of 10 cores each in standalone mode, then am I seeing huge improvement in the standalone mode as compared to local mode? Is the amount of network communication in both the cases not same? Thanks, Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723p10739.html Sent from the Apache Spark User List mailing list archive at Nabble.com.