It's already fixed in the master branch. Sorry that we forgot to update
this before releasing 1.2.0 and caused you trouble...
Cheng
On 2/2/15 2:03 PM, ankits wrote:
Great, thank you very much. I was confused because this is in the docs:
https://spark.apache.org/docs/1.2.0/sql-programming-guid
Great, thank you very much. I was confused because this is in the docs:
https://spark.apache.org/docs/1.2.0/sql-programming-guide.html, and on the
"branch-1.2" branch,
https://github.com/apache/spark/blob/branch-1.2/docs/sql-programming-guide.md
"Note that if you call schemaRDD.cache() rather tha
Actually |SchemaRDD.cache()| behaves exactly the same as |cacheTable|
since Spark 1.2.0. The reason why your web UI didn’t show you the cached
table is that both |cacheTable| and |sql("SELECT ...")| are lazy :-)
Simply add a |.collect()| after the |sql(...)| call.
Cheng
On 2/2/15 12:23 PM, an
Thanks for your response. So AFAICT
calling parallelize(1 to1024).map(i =>KV(i,
i.toString)).toSchemaRDD.cache().count(), will allow me to see the size of
the schemardd in memory
and parallelize(1 to1024).map(i =>KV(i, i.toString)).cache().count() will
show me the size of a regular rdd.
But
Here is a toy |spark-shell| session snippet that can show the memory
consumption difference:
|import org.apache.spark.sql.SQLContext
import sc._
val sqlContext = new SQLContext(sc)
import sqlContext._
setConf("spark.sql.shuffle.partitions","1")
case class KV(key:Int, value:String)
p