Hi All,

We are currently trying to benchmark the various cache options on RDDs with
respect to speed and efficiency.
The data that we are using is mostly filled with numbers (floating point).

We have noticed that the memory consumption of the RDD for MEMORY_ONLY
(519.1 MB) and MEMORY_ONLY_SER (511.5 MB) 

which uses Kryo serialization.
Both consumes almost equivalent storage (519.1 MB vs 511.5 MB respectively).

Is this behavior expected?
Because we were under the impression that kryo serialization is efficient
and were expecting it to compress further.

Also,we have noticed that when we enable compression(LZ4) on RDDs, the
memory consumption of the RDD for MEMORY_ONLY 

with compression is same as without compression i.e. 519.1 MB. 
But for MEMORY_ONLY_SER (kryo serialization) with compression consumes only
386.5 MB.

Why isn't enabling compression without serialization working for
MEMORY_ONLY?
Is there anything else we need to do for MEMORY_ONLY to get it compressed?

Thanks,
Pradeep



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-serialization-does-not-compress-tp2042.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to