Here is a toy |spark-shell| session snippet that can show the memory
consumption difference:
|import org.apache.spark.sql.SQLContext
import sc._
val sqlContext = new SQLContext(sc)
import sqlContext._
setConf("spark.sql.shuffle.partitions","1")
case class KV(key:Int, value:String)
p
Hi,
I want to benchmark the memory savings by using the in-memory columnar
storage for schemardds (using cacheTable) vs caching the SchemaRDD directly.
It would be really helpful to be able to query this from the spark-shell or
jobs directly. Could a dev point me to the way to do this? From what I
+1 (non-binding)
Ran spark-shell and Scala jobs on top of yarn (using the hadoop-2.4 tarball).
There's a very slight behavioral change in the API. This code now throws an NPE:
new SparkConf().setIfMissing("foo", null)
It worked before. It's probably fine, though, since `SparkConf.set`
would t