[ https://issues.apache.org/jira/browse/HIVE-22959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071127#comment-17071127 ]
Panagiotis Garefalakis edited comment on HIVE-22959 at 3/30/20, 4:41 PM: ------------------------------------------------------------------------- Hey [~omalley] – the idea here is to abstract the information needed (by data-format consumers) to enable more fine-grained filtering (e.g., ORC-611) You are right, VRB does contains similar information but the problem is not all consumers make use of VRB — for example in Hive we are currently using Batches of [ColumnVectors]([https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java]) instead. The proposed MutableFilterContext also provides some optimizations like the borrowSelected method to reuse the allocated selected array across filters and exposes a immutable context by default to make it harder for API users to modify the context values when they shouldn't. was (Author: pgaref): Hey [~omalley] – the idea here is to abstract the information needed (by data-format consumers) to enable more fine-grained filtering (e.g., ORC-611) You are right, VRB does contains similar information but the problem is not all consumers make use of VRB — for example in Hive we are currently using Batches of [ColumnVectors|[https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java]] instead. The proposed MutableFilterContext also provides some optimizations like the borrowSelected method to reuse the allocated selected array across filters and exposes a immutable context by default to make it harder for API users to modify the context values when they shouldn't. > Extend storage-api to expose FilterContext > ------------------------------------------ > > Key: HIVE-22959 > URL: https://issues.apache.org/jira/browse/HIVE-22959 > Project: Hive > Issue Type: Sub-task > Components: storage-api > Reporter: Panagiotis Garefalakis > Assignee: Panagiotis Garefalakis > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, storage-2.7.2 > > Attachments: HIVE-22959.1.patch, HIVE-22959.2.patch, > HIVE-22959.3.patch, HIVE-22959.4.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > To enable row-level filtering at the ORC level ORC-577, or as an extension > ProDecode MapJoin HIVE-22731 we need a common context class that will hold > all the needed information for the filter. > I propose this class to be part of the storage-api – similar to > VectorizedRowBatch class and hold the information below: > * A boolean variable showing if the filter is enabled > * A int array storing the row Ids that are actually selected (passing the > filter) > * An int variable storing the the number or rows that passed the filter > -- This message was sent by Atlassian Jira (v8.3.4#803005)