[ https://issues.apache.org/jira/browse/HIVE-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-11705: ------------------------------------ Description: For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny item to create it on OrcInputFormat. For metastore path, these methods will be called from expression proxy similar to current objectstore expr filtering; it will change to have serialized sarg and column list to come from request instead of conf; includedCols/etc. will also come from request instead of assorted java objects. The types and stripe stats will need to be extracted from HBase. This is a little bit of a problem, since ideally we want to be inside HBase filter/coprocessor/.... I'd need to take a look to see if this is possible... since that filter would need to either deserialize orc, or we would need to store types and stats information in some other, non-ORC manner on write. The latter is probably a better idea, although it's dangerous because there's no sync between this code and ORC itself. Meanwhile minimize dependencies for stripe picking to essentials (and conf which is easy to remove). was: For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny item to create it on OrcInputFormat. For metastore path, these methods will be called from expression proxy similar to current objectstore expr filtering; it will change to have serialized sarg and column list to come from request instead of conf; includedCols/etc. will also come from request instead of assorted java objects. The types and stripe stats will need to be extracted from HBase. This is a little bit of a problem, since ideally we want to be inside HBase filter/coprocessor/.... I'd need to take a look to see if this is possible... since that filter would need to either deserialize orc, or we would need to store types and stats information in some other, non-ORC manner on write. The latter is probably a better idea, although it's dangerous because there's no sync between this code and ORC itself. > refactor SARG stripe filtering for ORC into a method > ---------------------------------------------------- > > Key: HIVE-11705 > URL: https://issues.apache.org/jira/browse/HIVE-11705 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny > item to create it on OrcInputFormat. > For metastore path, these methods will be called from expression proxy > similar to current objectstore expr filtering; it will change to have > serialized sarg and column list to come from request instead of conf; > includedCols/etc. will also come from request instead of assorted java > objects. > The types and stripe stats will need to be extracted from HBase. This is a > little bit of a problem, since ideally we want to be inside HBase > filter/coprocessor/.... I'd need to take a look to see if this is possible... > since that filter would need to either deserialize orc, or we would need to > store types and stats information in some other, non-ORC manner on write. The > latter is probably a better idea, although it's dangerous because there's no > sync between this code and ORC itself. > Meanwhile minimize dependencies for stripe picking to essentials (and conf > which is easy to remove). -- This message was sent by Atlassian JIRA (v6.3.4#6332)