[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Alejandro Abdelnur (JIRA) Thu, 28 Jun 2012 16:08:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403563#comment-13403563
 ]


Alejandro Abdelnur commented on HIVE-3098:
------------------------------------------

@Daryn,

Your solution of closeAllForUGI means you have to keep the original UGI, if you 
keep recreating them you are back to square one.

Thanks for the UGI mutability explanation. Still, I'd argue that we could 
achieve UGI immutability if we create a new UGI everytime you add credentials 
to it by composing the old UGI and the new credentials. But this still would 
not solve the caching problem if we want to do it by user ID.

Leaving UGI alone, it seem that one thing it would help is hadoop-common 
providing an (<KEY>, FileSystem) ExpirationCache implementation for others to 
use. This cache should return a FileSystemProxy wrapping the original 
filesystem instance which in turn wraps the IO stream returned by 
open()/create() to be able to detect streams in use to not start the eviction 
timer.

thx
                
> Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
> (Must cache UGIs.)
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3098
>                 URL: https://issues.apache.org/jira/browse/HIVE-3098
>             Project: Hive
>          Issue Type: Bug
>          Components: Shims
>    Affects Versions: 0.9.0
>         Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
> turned on.
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: HIVE-3098.patch
>
>
> The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
> the Oracle backend).
> The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
> in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
> 1000000 instances of FileSystem, whose combined retained-mem consumed the 
> entire heap.
> It boiled down to hadoop::UserGroupInformation::equals() being implemented 
> such that the "Subject" member is compared for equality ("=="), and not 
> equivalence (".equals()"). This causes equivalent UGI instances to compare as 
> unequal, and causes a new FileSystem instance to be created and cached.
> The UGI.equals() is so implemented, incidentally, as a fix for yet another 
> problem (HADOOP-6670); so it is unlikely that that implementation can be 
> modified.
> The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
> the Hive metastore), using an cache for UGI instances in the shims.
> I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
> test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Reply via email to