date:20150130

Re: Get size of rdd in memory

2015-01-30 Thread Cheng Lian

Here is a toy |spark-shell| session snippet that can show the memory consumption difference: |import org.apache.spark.sql.SQLContext import sc._ val sqlContext = new SQLContext(sc) import sqlContext._ setConf("spark.sql.shuffle.partitions","1") case class KV(key:Int, value:String) p

Get size of rdd in memory

2015-01-30 Thread ankits

Hi, I want to benchmark the memory savings by using the in-memory columnar storage for schemardds (using cacheTable) vs caching the SchemaRDD directly. It would be really helpful to be able to query this from the spark-shell or jobs directly. Could a dev point me to the way to do this? From what I

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-30 Thread Marcelo Vanzin

+1 (non-binding) Ran spark-shell and Scala jobs on top of yarn (using the hadoop-2.4 tarball). There's a very slight behavioral change in the API. This code now throws an NPE: new SparkConf().setIfMissing("foo", null) It worked before. It's probably fine, though, since `SparkConf.set` would t