[ https://issues.apache.org/jira/browse/HIVE-17696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226071#comment-16226071 ]
Vihang Karajgaonkar commented on HIVE-17696: -------------------------------------------- Thanks [~Ferd] Can you also please merge this patch to branch-2? > Vectorized reader does not seem to be pushing down projection columns in > certain code paths > ------------------------------------------------------------------------------------------- > > Key: HIVE-17696 > URL: https://issues.apache.org/jira/browse/HIVE-17696 > Project: Hive > Issue Type: Sub-task > Reporter: Vihang Karajgaonkar > Assignee: Ferdinand Xu > Fix For: 3.0.0 > > Attachments: HIVE-17696.2.patch, HIVE-17696.patch > > > This is the code snippet from {{VectorizedParquetRecordReader.java}} > {noformat} > MessageType tableSchema; > if (indexAccess) { > List<Integer> indexSequence = new ArrayList<>(); > // Generates a sequence list of indexes > for(int i = 0; i < columnNamesList.size(); i++) { > indexSequence.add(i); > } > tableSchema = DataWritableReadSupport.getSchemaByIndex(fileSchema, > columnNamesList, > indexSequence); > } else { > tableSchema = DataWritableReadSupport.getSchemaByName(fileSchema, > columnNamesList, > columnTypesList); > } > indexColumnsWanted = > ColumnProjectionUtils.getReadColumnIDs(configuration); > if (!ColumnProjectionUtils.isReadAllColumns(configuration) && > !indexColumnsWanted.isEmpty()) { > requestedSchema = > DataWritableReadSupport.getSchemaByIndex(tableSchema, > columnNamesList, indexColumnsWanted); > } else { > requestedSchema = fileSchema; > } > this.reader = new ParquetFileReader( > configuration, footer.getFileMetaData(), file, blocks, > requestedSchema.getColumns()); > {noformat} > Couple of things to notice here: > Most of this code is duplicated from {{DataWritableReadSupport.init()}} > method. > the else condition passes in fileSchema instead of using tableSchema like we > do in DataWritableReadSupport.init() method. Does this cause projection > columns to be missed when we read parquet files? We should probably just > reuse ReadContext returned from {{DataWritableReadSupport.init()}} method > here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)