[ https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Panagiotis Garefalakis updated HIVE-22731: ------------------------------------------ Attachment: HIVE-22731.WIP.patch > Probe MapJoin hashtables for row level filtering > ------------------------------------------------ > > Key: HIVE-22731 > URL: https://issues.apache.org/jira/browse/HIVE-22731 > Project: Hive > Issue Type: Bug > Components: Hive, llap > Reporter: Panagiotis Garefalakis > Assignee: Panagiotis Garefalakis > Priority: Major > Attachments: HIVE-22731.WIP.patch, decode_time_bars.pdf > > > Currently, RecordReaders such as ORC support filtering at coarser-grained > levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. > They only filter sets of rows if they can guarantee that none of the rows can > pass a filter (usually given as searchable argument). > However, a significant amount of time can be spend decoding rows with > multiple columns that are not even used in the final result. See figure where > original is what happens today and in LazyDecode we skip decoding rows that > do not match the key. > To enable a more fine-grained filtering in the particular case of a MapJoin > we could utilize the key HashTable created from the smaller table to skip > deserializing row columns at the larger table that do not match any key and > thus save CPU time. > This Jira investigates this direction. -- This message was sent by Atlassian Jira (v8.3.4#803005)