Kryo serialization does not compress

2014-02-25 Thread pradeeps8
Hi All, We are currently trying to benchmark the various cache options on RDDs with respect to speed and efficiency. The data that we are using is mostly filled with numbers (floating point). We have noticed that the memory consumption of the RDD for MEMORY_ONLY (519.1 MB) and MEMORY_ONLY_SER (51

Re: Kryo serialization does not compress

2014-03-06 Thread pradeeps8
We are trying to use kryo serialization, but with kryo serialization ON the memory consumption does not change. We have tried this on multiple sets of data. We have also checked the logs of Kryo serialization and have confirmed that Kryo is being used. Can somebody please help us with this? The s

Re: Kryo serialization does not compress

2014-03-07 Thread pradeeps8
Hi Patrick, Thanks for your reply. I am guessing even an array type will be registered automatically. Is this correct? Thanks, Pradeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-serialization-does-not-compress-tp2042p2400.html Sent from the Apac

java.lang.ClassNotFoundException in spark 0.9.0, shark 0.9.0 (pre-release) and hadoop 2.2.0

2014-03-07 Thread pradeeps8
Hi, We are currently trying to migrate to hadoop 2.2.0 and hence we have installed spark 0.9.0 and the pre-release version of shark 0.9.0. When we execute the script ( script.txt ) we get the following error. /org.apache.

Re: java.lang.ClassNotFoundException in spark 0.9.0, shark 0.9.0 (pre-release) and hadoop 2.2.0

2014-03-13 Thread pradeeps8
Hi All, We have found the actual problem. The problem was with the getList method in row class. Earlier, the row class used to return java.util.List for getList method but as of now the new source code (shark 0.9.0) returns a string. This is the commit log. https://github.com/amplab/shark/commit/

Re: SequenceFileRDDFunctions cannot be used output of spark package

2014-03-28 Thread pradeeps8
Hi Aureliano, I followed this thread to create a custom saveAsObjectFile. The following is the code. /new org.apache.spark.rdd.SequenceFileRDDFunctions[NullWritable, BytesWritable](saveRDD.mapPartitions(iter => iter.grouped(10).map(_.toArray)).map(x => (NullWritable.get(), new BytesWritable(seria

Re: SequenceFileRDDFunctions cannot be used output of spark package

2014-03-31 Thread pradeeps8
Hi Sonal, There are no custom objects in saveRDD, it is of type RDD[(String, String)]. Thanks, Pradeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SequenceFileRDDFunctions-cannot-be-used-output-of-spark-package-tp250p3508.html Sent from the Apache Spa