Hi everyone,

We'd like to discuss our proposal of Spark relational cache in this thread. 
Spark has native command for RDD caching, but the use of CACHE command in Spark 
SQL is limited, as we cannot use the cache cross session, as well as we have to 
rewrite queries by ourselves to make use of existing cache.
To resolve this, we have done some initial work to do the following:


 1. allow user to persist cache on HDFS in format of Parquet.
 2. rewrite user queries in Catalyst, to utilize any existing cache (on HDFS or 
defined as in memory in current session) if possible.


I have created a jira ticket(https://issues.apache.org/jira/browse/SPARK-26764) 
for this and attached an official SPIP document.


Thanks for taking a look at the proposal.


Best Regards,
Daoyuan

Reply via email to