[ https://issues.apache.org/jira/browse/HIVE-21632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marta Kuczora resolved HIVE-21632. ---------------------------------- Resolution: Duplicate > Hive should not push partition columns to the Parquet predicate, even if the > data file contains the partition column > -------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-21632 > URL: https://issues.apache.org/jira/browse/HIVE-21632 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Marta Kuczora > Priority: Minor > > If there is a partitioned Parquet table in Hive, and the data file in one of > the partitions (not correctly) contains the partition column as well, > filtering on the partition column will return no rows if the Parquet > predicate pushdown is enabled. If the PPD is disabled, the rows will return > correctly. > The reason why it doesn't work is that, if the PPD is switched on, Hive will > send the predicate 'partition_column= ...' to parquet and a requested schema > which doesn't contain the partition column. When the data is read from > parquet, this column will be skipped, because the requested schema doesn't > contain it, but it still tries to apply the filter predicate, so it will > return an empty result set. > I think if the rows are returned correctly without PPD, they should be > returned with PPD as well. Hive should omit the partition column from the > Parquet predicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)