Re: How to cache SparkPlan.execute for reusing?

Liang-Chi Hsieh Thu, 02 Mar 2017 18:59:17 -0800

Internally, in each partition of the resulting RDD[InternalRow], you will
get the same UnsafeRow when iterating the rows. Typical RDD.cache doesn't
work for it. You will get the output with the same rows. Not sure why you
get empty output.


Dataset.cache() is used for caching SQL query results. Even you really cache
RDD[InternalRow] by RDD.cache with the trick which copies the rows (with
significant performance penalty), a new query (plan) will not automatically
reuse the cached RDD, because new RDDs will be created.


summerDG wrote
> We are optimizing the Spark SQL for adaptive execution. So the SparkPlan
> maybe reused for strategy choice. But we find that  once the result of
> SparkPlan.execute, RDD[InternalRow], is cached using RDD.cache, the query
> output is empty.
> 1. How to cache the result of SparkPlan.execute?
> 2. Why is RDD.cache invalid for RDD[InternalRow]?





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-cache-SparkPlan-execute-for-reusing-tp21097p21098.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: How to cache SparkPlan.execute for reusing?

Reply via email to