[ https://issues.apache.org/jira/browse/HIVE-22495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Xu updated HIVE-22495: ---------------------------- Description: Running a hive query on a Parquet table select count ( * ) from t The query read in all data (all columns) instead of just metadata. For comparison, hive 0.13 and Spark read in much less data. ||engine||HDFS data read|| |Hive 2.3.4| 452.9 MB| |Hive 0.13| 22.5 KB| |Spark| 41.6 KB| Seems cause is that Parquet read support fall back to file schema if indexColumnsWanted is empty, logic still exist in master branch. Don't know why this empty list check was added, please suggest if there're any other impact. was: Running a hive query on a Parquet table select count ( * ) from t The query read in all data (all columns) instead of just metadata. For comparison, hive 0.13 and Spark read in much less data. ||engine||HDFS data read|| |Hive 2.3.4| 452.9 MB| |Hive 0.13| 22.5 KB| |Spark| 41.6 KB| Seems cause is that Parquet read support fall back to file schema if indexColumnsWanted is empty, logic still exist in master branch. > Parquet count(*) read in all data > --------------------------------- > > Key: HIVE-22495 > URL: https://issues.apache.org/jira/browse/HIVE-22495 > Project: Hive > Issue Type: Bug > Components: Reader > Reporter: Jason Xu > Assignee: Jason Xu > Priority: Major > Attachments: HIVE-22495.patch > > > Running a hive query on a Parquet table > select count ( * ) from t > The query read in all data (all columns) instead of just metadata. > For comparison, hive 0.13 and Spark read in much less data. > > ||engine||HDFS data read|| > |Hive 2.3.4| 452.9 MB| > |Hive 0.13| 22.5 KB| > |Spark| 41.6 KB| > > Seems cause is that Parquet read support fall back to file schema if > indexColumnsWanted is empty, logic still exist in master branch. > Don't know why this empty list check was added, please suggest if there're > any other impact. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)