[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247735#comment-14247735 ]
Xuefu Zhang commented on HIVE-8843: ----------------------------------- [~jxiang], thanks for working on this. The change made here seems a little more complicated and pervasive than I thought. A SparkPlan object has all the references to the RDDs including those being cached. Thus, once the plan is executed, these cached RDDs can be released by accessing SparkPlan object. Thus, the changes will most likely be made in RemoteHiveSparkClient and LocalHiveSparkClient. > Release RDD cache when Hive query is done [Spark Branch] > -------------------------------------------------------- > > Key: HIVE-8843 > URL: https://issues.apache.org/jira/browse/HIVE-8843 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Jimmy Xiang > Attachments: HIVE-8843.1-spark.patch > > > In some multi-inser cases, RDD.cache() is called to improve performance. RDD > is SparkContext specific, but the caching is useful only for the query. Thus, > once the query is executed, we need to release the cache used by calling > RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)