Even in local mode, Spark serializes data that would be sent across the network, e.g. in a reduce operation, so that you can catch errors that would happen in distributed mode. You can make serialization much faster by using the Kryo serializer; see http://spark.apache.org/docs/latest/tuning.html. But it won't go away. Basically the code is not optimized for the very best performance on a single node, it's designed to make it easy to build your program locally and run it on a cluster without surprises.
Matei On Jul 26, 2014, at 3:08 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote: > Thanks for the reply. I understand this now. > > But in another situation, when I use large heap size to avoid any spilling > (I confirm, there are no spilling messages in log), I still see a lot of > time being spent in writeObject0() function. Can you please tell me why > would there be any serialization done? > > > Thanks > Lokesh > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723p10727.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.