[ https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709468#comment-14709468 ]
Sergio Peña commented on HIVE-11611: ------------------------------------ Thanks [~rdblue] [~Ferd] As you told me offline, then we should close this ticket as 'not fix'. We will wait until parquet releases a new version, and then change to that new one. > A bad performance regression issue with Parquet happens if Hive does not > select any columns > ------------------------------------------------------------------------------------------- > > Key: HIVE-11611 > URL: https://issues.apache.org/jira/browse/HIVE-11611 > Project: Hive > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: Sergio Peña > Assignee: Ferdinand Xu > Attachments: HIVE-11611.patch > > > A possible performance issue may happen with the below code when using a > query like this {{SELECT count(1) FROM parquetTable}}. > {code} > if (!ColumnProjectionUtils.isReadAllColumns(configuration) && > !indexColumnsWanted.isEmpty()) { > MessageType requestedSchemaByUser = > getSchemaByIndex(tableSchema, columnNamesList, > indexColumnsWanted); > return new ReadContext(requestedSchemaByUser, contextMetadata); > } else { > return new ReadContext(tableSchema, contextMetadata); > } > {code} > If there are not columns nor indexes selected, then the above code will read > the full schema from Parquet even if Hive does not do anything with such > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)