[ https://issues.apache.org/jira/browse/HIVE-28428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryu Kobayashi reassigned HIVE-28428: ------------------------------------ Assignee: Ryu Kobayashi > Map hash aggregation performance degradation > --------------------------------------------- > > Key: HIVE-28428 > URL: https://issues.apache.org/jira/browse/HIVE-28428 > Project: Hive > Issue Type: Improvement > Reporter: Ryu Kobayashi > Assignee: Ryu Kobayashi > Priority: Major > Attachments: 2024-08-02 14.35.46.png, > image-2024-08-02-14-37-01-824.png, image-2024-08-02-14-38-45-459.png > > > The following ticket has been fixed to enable map hash aggregation, but > performance degradation than when it is disabled. > https://issues.apache.org/jira/browse/HIVE-23356 > I found a few reasons for this. If there are a large number of keys, the > following log will be output in large volume, affecting performance. And, > this can also cause an OOM. > {code:java} > 2024-08-02 05:21:53,675 [INFO] [TezChild] |exec.GroupByOperator|: Hash Tbl > flush: #hash table = 171000 > 2024-08-02 05:21:53,713 [INFO] [TezChild] |exec.GroupByOperator|: Hash Table > flushed: new size = 153900 > {code} > By fixing this, we can improve performance as follows. > Before: > !image-2024-08-02-14-37-01-824.png! > After: > !2024-08-02 14.35.46.png! > And, currently the flush size is fixed, but performance can be improved by > changing it depending on the data: > !image-2024-08-02-14-38-45-459.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)