[ 
https://issues.apache.org/jira/browse/HIVE-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11705:
------------------------------------
    Description: 
For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny item 
to create it on OrcInputFormat.
For metastore path, these methods will be called from expression proxy similar 
to current objectstore expr filtering; it will change to have serialized sarg 
and column list to come from request instead of conf; includedCols/etc. will 
also come from request instead of assorted java objects. 
The types and stripe stats will need to be extracted from HBase. This is a 
little bit of a problem, since ideally we want to be inside HBase 
filter/coprocessor/.... I'd need to take a look to see if this is possible... 
since that filter would need to either deserialize orc, or we would need to 
store types and stats information in some other, non-ORC manner on write. The 
latter is probably a better idea, although it's dangerous because there's no 
sync between this code and ORC itself.

Meanwhile minimize dependencies for stripe picking to essentials (and conf 
which is easy to remove).


  was:
For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny item 
to create it on OrcInputFormat.
For metastore path, these methods will be called from expression proxy similar 
to current objectstore expr filtering; it will change to have serialized sarg 
and column list to come from request instead of conf; includedCols/etc. will 
also come from request instead of assorted java objects. 
The types and stripe stats will need to be extracted from HBase. This is a 
little bit of a problem, since ideally we want to be inside HBase 
filter/coprocessor/.... I'd need to take a look to see if this is possible... 
since that filter would need to either deserialize orc, or we would need to 
store types and stats information in some other, non-ORC manner on write. The 
latter is probably a better idea, although it's dangerous because there's no 
sync between this code and ORC itself.



> refactor SARG stripe filtering for ORC into a method
> ----------------------------------------------------
>
>                 Key: HIVE-11705
>                 URL: https://issues.apache.org/jira/browse/HIVE-11705
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny 
> item to create it on OrcInputFormat.
> For metastore path, these methods will be called from expression proxy 
> similar to current objectstore expr filtering; it will change to have 
> serialized sarg and column list to come from request instead of conf; 
> includedCols/etc. will also come from request instead of assorted java 
> objects. 
> The types and stripe stats will need to be extracted from HBase. This is a 
> little bit of a problem, since ideally we want to be inside HBase 
> filter/coprocessor/.... I'd need to take a look to see if this is possible... 
> since that filter would need to either deserialize orc, or we would need to 
> store types and stats information in some other, non-ORC manner on write. The 
> latter is probably a better idea, although it's dangerous because there's no 
> sync between this code and ORC itself.
> Meanwhile minimize dependencies for stripe picking to essentials (and conf 
> which is easy to remove).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to