[ https://issues.apache.org/jira/browse/HIVE-22495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Xu updated HIVE-22495: ---------------------------- Description: Running a hive query on a Parquet table select count ( * ) from t The query read in all data (all columns) instead of just metadata. For comparison, hive 0.13 and Spark read in much less data. ||engine||HDFS data read|| |Hive| 452.9 MB| |Hive 0.13| 22.5 KB| |Spark| 41.6 KB| Seems cause is that Parquet read support fall back to file schema if indexColumnsWanted is empty, logic still exist in master branch. was: Running a hive query on a Parquet table select count ( * ) from t The query read in all data (all columns) instead of just metadata. For comparison, hive 0.13 and Spark read in much less data. ||engine||HDFS data read|| |Hive 2.3.4| 452.9 MB| |Hive 0.13| 22.5 KB| |Spark| 41.6 KB| Seems cause is that Parquet read support fall back to file schema if indexColumnsWanted is empty. > Parquet count(*) read in all data > --------------------------------- > > Key: HIVE-22495 > URL: https://issues.apache.org/jira/browse/HIVE-22495 > Project: Hive > Issue Type: Bug > Components: Reader > Affects Versions: 2.3.4 > Reporter: Jason Xu > Assignee: Jason Xu > Priority: Major > > Running a hive query on a Parquet table > select count ( * ) from t > The query read in all data (all columns) instead of just metadata. > For comparison, hive 0.13 and Spark read in much less data. > > ||engine||HDFS data read|| > |Hive| 452.9 MB| > |Hive 0.13| 22.5 KB| > |Spark| 41.6 KB| > > Seems cause is that Parquet read support fall back to file schema if > indexColumnsWanted is empty, logic still exist in master branch. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)