[ https://issues.apache.org/jira/browse/HIVE-17783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225678#comment-16225678 ]
Wei Zheng commented on HIVE-17783: ---------------------------------- [~Ferd] Sorry for the late reply. Yes the spilling part is the bottleneck and there's no easy way to get around it. In your case for the n-way joins, the optimizer stats estimation may not be accurate which makes the situation worse. Anyway, the ultimate way to solve this problem is to have a reliable memory manager which can provide memory usage/quota at any moment. Right now we're following a conservative approach, which is to use a soft (possibly inaccurate) memory limit. That way we can avoid unnecessary spilling if there is enough memory for loading the hashtable. > Hybrid Grace Hash Join has performance degradation for N-way join using Hive > on Tez > ----------------------------------------------------------------------------------- > > Key: HIVE-17783 > URL: https://issues.apache.org/jira/browse/HIVE-17783 > Project: Hive > Issue Type: Bug > Affects Versions: 2.2.0 > Environment: 8*Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > 1 master + 7 workers > TPC-DS at 3TB data scales > Hive version : 2.2.0 > Reporter: Ferdinand Xu > Attachments: Hybrid_Grace_Hash_Join.xlsx, screenshot-1.png > > > Most configurations are using default value. And the benchmark is to test > enabling against disabling hybrid grace hash join using TPC-DS queries at 3TB > data scales. Many queries related to N-way join has performance degradation > over three times test. Detailed result is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)