[ https://issues.apache.org/jira/browse/HIVE-22495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423250#comment-17423250 ]
katty he commented on HIVE-22495: --------------------------------- hi, i'am also meet the same situation, so how did you solve? > Parquet count(*) read in all data > --------------------------------- > > Key: HIVE-22495 > URL: https://issues.apache.org/jira/browse/HIVE-22495 > Project: Hive > Issue Type: Bug > Components: Reader > Reporter: Jason Xu > Assignee: Jason Xu > Priority: Major > Attachments: HIVE-22495.patch, HIVE-22495.patch > > > Running a hive query on a Parquet table > select count ( * ) from test_table > The query read in all data (all columns) instead of just metadata. > For comparison, hive 0.13 and Spark read in much less data with my test table. > > ||engine||HDFS data read|| > |Hive 2.3.4| 452.9 MB| > |Hive 0.13| 22.5 KB| > |Spark| 41.6 KB| > > Seems cause is that Parquet read support fall back to file schema if > indexColumnsWanted is empty, logic still exist in master branch. > Don't know why this empty list check was added, please suggest if there're > any other impact. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)