[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

Xuefu Zhang (JIRA) Mon, 15 Dec 2014 20:33:23 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247735#comment-14247735
 ]


Xuefu Zhang commented on HIVE-8843:
-----------------------------------

[~jxiang], thanks for working on this. The change made here seems a little more 
complicated and pervasive than I thought. A SparkPlan object has all the 
references to the RDDs including those being cached. Thus, once the plan is 
executed, these cached RDDs can be released by accessing SparkPlan object. 
Thus, the changes will most likely be made in RemoteHiveSparkClient and 
LocalHiveSparkClient.

> Release RDD cache when Hive query is done [Spark Branch]
> --------------------------------------------------------
>
>                 Key: HIVE-8843
>                 URL: https://issues.apache.org/jira/browse/HIVE-8843
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Jimmy Xiang
>         Attachments: HIVE-8843.1-spark.patch
>
>
> In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
> is SparkContext specific, but the caching is useful only for the query. Thus, 
> once the query is executed, we need to release the cache used by calling 
> RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

Reply via email to