[ https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdinand Xu reopened HIVE-11611: --------------------------------- Hi [~spena], I think if we bump up the latest version of parquet, we still need to change the code to the original one. I'd like to reopen this jira. > A bad performance regression issue with Parquet happens if Hive does not > select any columns > ------------------------------------------------------------------------------------------- > > Key: HIVE-11611 > URL: https://issues.apache.org/jira/browse/HIVE-11611 > Project: Hive > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: Sergio Peña > Assignee: Ferdinand Xu > Attachments: HIVE-11611.patch > > > A possible performance issue may happen with the below code when using a > query like this {{SELECT count(1) FROM parquetTable}}. > {code} > if (!ColumnProjectionUtils.isReadAllColumns(configuration) && > !indexColumnsWanted.isEmpty()) { > MessageType requestedSchemaByUser = > getSchemaByIndex(tableSchema, columnNamesList, > indexColumnsWanted); > return new ReadContext(requestedSchemaByUser, contextMetadata); > } else { > return new ReadContext(tableSchema, contextMetadata); > } > {code} > If there are not columns nor indexes selected, then the above code will read > the full schema from Parquet even if Hive does not do anything with such > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)