[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696671#comment-14696671
 ] 

Gopal V commented on HIVE-11306:
--------------------------------

The patch .3 does not give performance boost observed in patch .2.

The crucial difference is that patch .3 does not really consider the bloom 
filter to be valid for spilled partitions.

{code}
+      if (!bloom1.testLong(keyHash) && !isOnDisk(partitionId)) {
{code}

the isOnDisk check negates all the performance benefits of checking the bloom 
filter to avoid spilling.

> Add a bloom-1 filter for Hybrid MapJoin spills
> ----------------------------------------------
>
>                 Key: HIVE-11306
>                 URL: https://issues.apache.org/jira/browse/HIVE-11306
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to