[ https://issues.apache.org/jira/browse/HIVE-17783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226472#comment-16226472 ]
Ferdinand Xu commented on HIVE-17783: ------------------------------------- Thanks [~wei.zheng] for your reply. > In your case for the n-way joins, the optimizer stats estimation may not be > accurate which makes the situation worse. AFAIK, the row size estimated should be the same with non-hybrid grace hash join case. It's strange why the spill happens in hybrid grace hash join case. Another observation is o rows of data for one partition is occupying about 65636 bytes memory. > Anyway, the ultimate way to solve this problem is to have a reliable memory > manager which can provide memory usage/quota at any moment. Right now we're > following a conservative approach, which is to use a soft (possibly > inaccurate) memory limit. That way we can avoid unnecessary spilling if there > is enough memory for loading the hashtable. Interest. Any ticket addressing this part of work? > Hybrid Grace Hash Join has performance degradation for N-way join using Hive > on Tez > ----------------------------------------------------------------------------------- > > Key: HIVE-17783 > URL: https://issues.apache.org/jira/browse/HIVE-17783 > Project: Hive > Issue Type: Bug > Affects Versions: 2.2.0 > Environment: 8*Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > 1 master + 7 workers > TPC-DS at 3TB data scales > Hive version : 2.2.0 > Reporter: Ferdinand Xu > Attachments: Hybrid_Grace_Hash_Join.xlsx, screenshot-1.png > > > Most configurations are using default value. And the benchmark is to test > enabling against disabling hybrid grace hash join using TPC-DS queries at 3TB > data scales. Many queries related to N-way join has performance degradation > over three times test. Detailed result is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)