[ https://issues.apache.org/jira/browse/HIVE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline resolved HIVE-10600. --------------------------------- Resolution: Duplicate HIVE-12369 > optimize group by for GC > ------------------------ > > Key: HIVE-10600 > URL: https://issues.apache.org/jira/browse/HIVE-10600 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Matt McCline > > Quoting [~gopalv]: > {noformat} > So, something like a sum() GROUP BY will create a few hundred thousand > AbstractAggregationBuffer objects all of which will suddenly go out of > scope when the map.aggr flushes it down to the sort buffer. > That particular GC collection takes forever because the tiny buffers take > a lot of time to walk over and then they leave the memory space > fragmented, which requires a compaction pass (which btw, writes to a > page-interleaved NUMA zone). > And to make things worse, the pre-allocated sort buffers with absolutely > zero data in them take up most of the tenured regions causing these chunks > of memory to be visited more and more often as they are part of the Eden > space. > {noformat} > We need flat data structures to be GC friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)