[ https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402246#comment-13402246 ]
Daryn Sharp commented on HIVE-3098: ----------------------------------- I believe neither disabling the fs cache nor switching to {{FileContext}} will help since Rohini stated earlier the desire to take advantage of the cache for performance, albeit w/o leaking. {{FileContext}} isn't going to be a panacea for this issue. I think it only implements hdfs, view, and ftp (not hftp). It provides no {{close()}} method, so there's no way to cleanup or shutdown clients until jvm shutdown, ie. aborting streams, deleting tmp files, closing the dfs client, etc. The latter will lead to leaks such as the dfs socket cache leaks, dfs lease renewer threads, etc. Even with the fs cache disabled, leaks such as the aforementioned dfs leaks will still occur unless _all_ fs instances are explicitly closed. I'd suggest either {{closeAllForUGI}} which provides a cache boost for each request, but degrades performance across multiple requests. Or do the oozie style UGI caching with a periodic cache purging. > Memory leak from large number of FileSystem instances in FileSystem.CACHE. > (Must cache UGIs.) > --------------------------------------------------------------------------------------------- > > Key: HIVE-3098 > URL: https://issues.apache.org/jira/browse/HIVE-3098 > Project: Hive > Issue Type: Bug > Components: Shims > Affects Versions: 0.9.0 > Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security > turned on. > Reporter: Mithun Radhakrishnan > Assignee: Mithun Radhakrishnan > Attachments: HIVE-3098.patch > > > The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing > the Oracle backend). > The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, > in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had > 1000000 instances of FileSystem, whose combined retained-mem consumed the > entire heap. > It boiled down to hadoop::UserGroupInformation::equals() being implemented > such that the "Subject" member is compared for equality ("=="), and not > equivalence (".equals()"). This causes equivalent UGI instances to compare as > unequal, and causes a new FileSystem instance to be created and cached. > The UGI.equals() is so implemented, incidentally, as a fix for yet another > problem (HADOOP-6670); so it is unlikely that that implementation can be > modified. > The solution for this is to check for UGI equivalence in HCatalog (i.e. in > the Hive metastore), using an cache for UGI instances in the shims. > I have a patch to fix this. I'll upload it shortly. I just ran an overnight > test to confirm that the memory-leak has been arrested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira