[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Ashutosh Chauhan (JIRA) Wed, 27 Jun 2012 11:39:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402448#comment-13402448
 ]


Ashutosh Chauhan commented on HIVE-3098:
----------------------------------------

I think we are mixing two things here: Performance Vs Memory Leak.

Patch is originally intended to plug memory leak, not the performance. It fixes 
the original leak, but will introduce a new one. It fixes the case when a 
limited number of users are contacting metastore(which was the original 
Mithun's test where same 60 clients keep hitting the metastore). But, if 
different users hit the metastore, leak is still there, rather more acute with 
a patch, since there are now two caches (ugi and fs) which both will grow, 
instead of just one (fs). 

Problem stems from the fact that there is no expiration policy either in fs or 
ugi cache. We need to design for UGI cache eviction policy. There, when we are 
expiring stale ugi's from ugi-cache we can do {{closeAllForUGI}} for evicting 
ugi to evict cached FS objects from fs-cache.

      
                
> Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
> (Must cache UGIs.)
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3098
>                 URL: https://issues.apache.org/jira/browse/HIVE-3098
>             Project: Hive
>          Issue Type: Bug
>          Components: Shims
>    Affects Versions: 0.9.0
>         Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
> turned on.
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: HIVE-3098.patch
>
>
> The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
> the Oracle backend).
> The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
> in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
> 1000000 instances of FileSystem, whose combined retained-mem consumed the 
> entire heap.
> It boiled down to hadoop::UserGroupInformation::equals() being implemented 
> such that the "Subject" member is compared for equality ("=="), and not 
> equivalence (".equals()"). This causes equivalent UGI instances to compare as 
> unequal, and causes a new FileSystem instance to be created and cached.
> The UGI.equals() is so implemented, incidentally, as a fix for yet another 
> problem (HADOOP-6670); so it is unlikely that that implementation can be 
> modified.
> The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
> the Hive metastore), using an cache for UGI instances in the shims.
> I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
> test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Reply via email to