[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087996#comment-14087996
 ] 

Mostafa Mokhtar commented on HIVE-7616:
---------------------------------------

This will work for most of the TPC-DS queries since joins with the dimension 
tables is always on key columns and there is a PK/FK relationship between the 
dimension tables and the fact tables , hence for most cases the number of rows 
for the broadcast table will be equal to the number of keys. (One to Many joins)

In MapJoins where tables don't naturally have a PK/FK relation (Many to Many 
joins) the number of rows can be significantly higher than the number of keys.

Can you add the following perflogging to track such potential issue:
1) Number of keys in hash table after load Vs. Number of keys at init
2) Number of times expandAndRehash was called and total amount of time spent 
there

Using these metrics we can track the performance and behavior of the hash table.


> pre-size mapjoin hashtable based on statistics
> ----------------------------------------------
>
>                 Key: HIVE-7616
>                 URL: https://issues.apache.org/jira/browse/HIVE-7616
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-7616.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to