[ https://issues.apache.org/jira/browse/HIVE-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aniket Adnaik reassigned HIVE-25244: ------------------------------------ Assignee: Aniket Adnaik > Hive predicate pushdown with Parquet format for `date` as partitioned column > name produce empty resultset > --------------------------------------------------------------------------------------------------------- > > Key: HIVE-25244 > URL: https://issues.apache.org/jira/browse/HIVE-25244 > Project: Hive > Issue Type: Bug > Components: Hive, Parquet > Affects Versions: 3.1.0, 3.1.1, 3.1.2 > Reporter: Aniket Adnaik > Assignee: Aniket Adnaik > Priority: Major > Fix For: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > > Attachments: test_table3_data.tar.gz > > > Hive predicate push down with Parquet format for partitioned column with > column name as keyword -> `date` produces empty result set. > If any of the followings configs is set to false, then the select query > returns results. > hive.optimize.ppd.storage, hive.optimize.ppd , hive.optimize.index.filter . > Repro steps: > -------------- > 1. > 1) Create an external partitioned table in Hive > CREATE EXTERNAL TABLE `test_table3`(`id` string) PARTITIONED BY (`date` > string) STORED AS parquet; > 2) In spark-shell create data frame and write the data parquet file > import java.sql.Timestamp > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > import spark.implicits._ > val someDF = Seq(("1", "05172021"),("2", "05172021"), ("3", "06182021"), > ("4", "07192021")).toDF("id", "date") > someDF.write.mode("overwrite").parquet("<prefix > path>/hive/warehouse/external/test_table3/date=05172021") > 3) In Hive change the permissions and add partition to the table > $> hdfs dfs -chmod -R 777 <prefix path>/hive/warehouse/external/test_table3 > Hive Beeline -> > ALTER TABLE test_table3 ADD PARTITION(`date`='05172021') LOCATION '<prefix > path>/hive/warehouse/external/test_table3/date=05172021' > 4) SELECT * FROM test_table3; <----- produces all rows > SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces no rows > SET hive.optimize.ppd.storage=false; <--- turn off ppd push down optimization > SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces rows after > setting above config to false > Attaching parquet data files for reference: > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)