[ https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374184#comment-15374184 ]
Xuefu Zhang commented on HIVE-13873: ------------------------------------ [~Ferd], thanks for working on this. Patch looks good for the initial cut as I went through the patch. Here I have a couple of immature thoughts to share with you: 1. nested column pruning should goes beyond just select op or groupby op. For instance, {code} select msg.a from t where msg.b = 'x'; {code} In this case, parquet reader should only read a and b from msg field. Thus, I think we need to consider expressions from more operators. 2. Secondly, there may need a consolidation/merging process in determining finally read schema. For instance, {code} select msg from t where msg.a='x'; {code} In this case, the projected column should be just msg rather than msg + msg.a. 3. While it's fine to support just struct at first, we may need to consider how to find a more extensible way to pass the projected fields to the reader to support other types (array and map). I have no idea on this, so love to hear your thoughts. > Column pruning for nested fields > -------------------------------- > > Key: HIVE-13873 > URL: https://issues.apache.org/jira/browse/HIVE-13873 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer > Reporter: Xuefu Zhang > Assignee: Ferdinand Xu > Attachments: HIVE-13873.wip.patch > > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)