[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated HIVE-24205: ------------------------------------ Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master. Thanks, Mustafa! > Optimise CuckooSetBytes > ----------------------- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Mustafa Iman > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)