Internally, in each partition of the resulting RDD[InternalRow], you will get the same UnsafeRow when iterating the rows. Typical RDD.cache doesn't work for it. You will get the output with the same rows. Not sure why you get empty output.
Dataset.cache() is used for caching SQL query results. Even you really cache RDD[InternalRow] by RDD.cache with the trick which copies the rows (with significant performance penalty), a new query (plan) will not automatically reuse the cached RDD, because new RDDs will be created. summerDG wrote > We are optimizing the Spark SQL for adaptive execution. So the SparkPlan > maybe reused for strategy choice. But we find that once the result of > SparkPlan.execute, RDD[InternalRow], is cached using RDD.cache, the query > output is empty. > 1. How to cache the result of SparkPlan.execute? > 2. Why is RDD.cache invalid for RDD[InternalRow]? ----- Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-cache-SparkPlan-execute-for-reusing-tp21097p21098.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org