Re: "Spilling in-memory..." messages in log even with MEMORY_ONLY

Matei Zaharia Sat, 26 Jul 2014 17:30:19 -0700

Even in local mode, Spark serializes data that would be sent across the 
network, e.g. in a reduce operation, so that you can catch errors that would 
happen in distributed mode. You can make serialization much faster by using the 
Kryo serializer; see http://spark.apache.org/docs/latest/tuning.html. But it 
won't go away. Basically the code is not optimized for the very best 
performance on a single node, it's designed to make it easy to build your 
program locally and run it on a cluster without surprises.


Matei

On Jul 26, 2014, at 3:08 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote:

> Thanks for the reply. I understand this now.
> 
> But in another situation, when I use large heap size to avoid any spilling
> (I confirm, there are no spilling messages in log), I still see a lot of
> time being spent in writeObject0() function. Can you please tell me why
> would there be any serialization done?
> 
> 
> Thanks
> Lokesh
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723p10727.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: "Spilling in-memory..." messages in log even with MEMORY_ONLY

Reply via email to