[ 
https://issues.apache.org/jira/browse/HIVE-22959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071127#comment-17071127
 ] 

Panagiotis Garefalakis edited comment on HIVE-22959 at 3/30/20, 4:44 PM:
-------------------------------------------------------------------------

Hey [~omalley] – the idea here is to abstract the information needed (by 
data-format consumers) to enable more fine-grained filtering (e.g., ORC-577)

You are right, VRB does contains similar information but the problem is not all 
consumers make use of VRB — for example in Hive we are currently using Batches 
of 
[ColumnVectors]([https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java])
 instead.
  
 The proposed MutableFilterContext also provides some optimizations like the  
borrowSelected method to reuse the allocated selected array across filters and 
exposes a immutable context by default to make it harder for API users to 
modify the context values when they shouldn't.


was (Author: pgaref):
Hey [~omalley] – the idea here is to abstract the information needed (by 
data-format consumers) to enable more fine-grained filtering (e.g., ORC-611)

You are right, VRB does contains similar information but the problem is not all 
consumers make use of VRB — for example in Hive we are currently using Batches 
of 
[ColumnVectors]([https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java])
 instead.
  
 The proposed MutableFilterContext also provides some optimizations like the  
borrowSelected method to reuse the allocated selected array across filters and 
exposes a immutable context by default to make it harder for API users to 
modify the context values when they shouldn't.

> Extend storage-api to expose FilterContext
> ------------------------------------------
>
>                 Key: HIVE-22959
>                 URL: https://issues.apache.org/jira/browse/HIVE-22959
>             Project: Hive
>          Issue Type: Sub-task
>          Components: storage-api
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0, storage-2.7.2
>
>         Attachments: HIVE-22959.1.patch, HIVE-22959.2.patch, 
> HIVE-22959.3.patch, HIVE-22959.4.patch
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> To enable row-level filtering at the ORC level ORC-577, or as an extension 
> ProDecode MapJoin HIVE-22731 we need a common context class that will hold 
> all the needed information for the filter.
> I propose this class to be part of the storage-api – similar to 
> VectorizedRowBatch class and hold the information below:
>  * A boolean variable showing if the filter is enabled
>  * A int array storing the row Ids that are actually selected (passing the 
> filter)
>  * An int variable storing the the number or rows that passed the filter
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to