subject:"Re\: Get size of rdd in memory"

Re: Get size of rdd in memory

2015-02-02 Thread Cheng Lian

It's already fixed in the master branch. Sorry that we forgot to update this before releasing 1.2.0 and caused you trouble... Cheng On 2/2/15 2:03 PM, ankits wrote: Great, thank you very much. I was confused because this is in the docs: https://spark.apache.org/docs/1.2.0/sql-programming-guid

Re: Get size of rdd in memory

2015-02-02 Thread ankits

Great, thank you very much. I was confused because this is in the docs: https://spark.apache.org/docs/1.2.0/sql-programming-guide.html, and on the "branch-1.2" branch, https://github.com/apache/spark/blob/branch-1.2/docs/sql-programming-guide.md "Note that if you call schemaRDD.cache() rather tha

Re: Get size of rdd in memory

2015-02-02 Thread Cheng Lian

Actually |SchemaRDD.cache()| behaves exactly the same as |cacheTable| since Spark 1.2.0. The reason why your web UI didn’t show you the cached table is that both |cacheTable| and |sql("SELECT ...")| are lazy :-) Simply add a |.collect()| after the |sql(...)| call. Cheng On 2/2/15 12:23 PM, an

Re: Get size of rdd in memory

2015-02-02 Thread ankits

Thanks for your response. So AFAICT calling parallelize(1 to1024).map(i =>KV(i, i.toString)).toSchemaRDD.cache().count(), will allow me to see the size of the schemardd in memory and parallelize(1 to1024).map(i =>KV(i, i.toString)).cache().count() will show me the size of a regular rdd. But

Re: Get size of rdd in memory

2015-01-30 Thread Cheng Lian

Here is a toy |spark-shell| session snippet that can show the memory consumption difference: |import org.apache.spark.sql.SQLContext import sc._ val sqlContext = new SQLContext(sc) import sqlContext._ setConf("spark.sql.shuffle.partitions","1") case class KV(key:Int, value:String) p

Re: Get size of rdd in memory

Re: Get size of rdd in memory

Re: Get size of rdd in memory

Re: Get size of rdd in memory

Re: Get size of rdd in memory

5 matches

Site Navigation

Mail list logo

Footer information