[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mustafa Iman updated HIVE-24205: -------------------------------- Attachment: Screen Shot 2020-10-02 at 4.15.32 PM.png > Optimise CuckooSetBytes > ----------------------- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Mustafa Iman > Priority: Major > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)