[ https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Panagiotis Garefalakis updated HIVE-22731: ------------------------------------------ Environment: (was: [^decode_time_bars.pdf]) > Use MapJoin hashtables for row level filtering > ---------------------------------------------- > > Key: HIVE-22731 > URL: https://issues.apache.org/jira/browse/HIVE-22731 > Project: Hive > Issue Type: Bug > Components: Hive, llap > Reporter: Panagiotis Garefalakis > Assignee: Panagiotis Garefalakis > Priority: Major > Attachments: decode_time_bars.pdf > > > Currently, RecordReaders such as ORC support filtering at coarser-grained > levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. > They only filter sets of rows if they can guarantee that none of the rows can > pass a filter (usually given as searchable argument). > However, a significant amount of time can be spend deconding rows with > multiple columns that are not even used in the final result. See figure where > original is what happens today and in LazyDecode we skip decoding rows that > do not much the key. > To enable a more fine-grained filtering in the particular case of a MapJoin > we could utilize the key HashTable created from the smaller table to skip > deserializing row columns at the larger table that do not match any key and > thus save CPU time. > This Jira investigates this direction. -- This message was sent by Atlassian Jira (v8.3.4#803005)