[jira] [Assigned] (HIVE-25244) Hive predicate pushdown with Parquet format for `date` as partitioned column name produce empty resultset

Aniket Adnaik (Jira) Tue, 15 Jun 2021 09:29:28 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aniket Adnaik reassigned HIVE-25244:
------------------------------------

    Assignee: Aniket Adnaik

> Hive predicate pushdown with Parquet format for `date` as partitioned column 
> name produce empty resultset
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25244
>                 URL: https://issues.apache.org/jira/browse/HIVE-25244
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Parquet
>    Affects Versions: 3.1.0, 3.1.1, 3.1.2
>            Reporter: Aniket Adnaik
>            Assignee: Aniket Adnaik
>            Priority: Major
>             Fix For: 3.1.0, 3.1.1, 3.1.2, 3.2.0
>
>         Attachments: test_table3_data.tar.gz
>
>
> Hive predicate push down with Parquet format for partitioned column with 
> column name as  keyword -> `date` produces empty result set.
> If any of the followings configs is set to false, then the select query 
> returns results.
> hive.optimize.ppd.storage, hive.optimize.ppd , hive.optimize.index.filter .
> Repro steps:
> --------------
> 1. 
> 1) Create an external partitioned table in Hive
> CREATE EXTERNAL TABLE `test_table3`(`id` string) PARTITIONED BY (`date` 
> string) STORED AS parquet;
> 2) In spark-shell create data frame and write the data parquet file
> import java.sql.Timestamp
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> import spark.implicits._
> val someDF = Seq(("1", "05172021"),("2", "05172021"), ("3", "06182021"), 
> ("4", "07192021")).toDF("id", "date")
> someDF.write.mode("overwrite").parquet("<prefix 
> path>/hive/warehouse/external/test_table3/date=05172021")
> 3) In Hive change the permissions and add partition to the table
> $> hdfs dfs -chmod -R 777 <prefix path>/hive/warehouse/external/test_table3
> Hive Beeline ->
> ALTER TABLE test_table3 ADD PARTITION(`date`='05172021') LOCATION  '<prefix 
> path>/hive/warehouse/external/test_table3/date=05172021'
> 4) SELECT * FROM test_table3;   <----- produces all rows
> SELECT * FROM test_table3 WHERE `date`='05172021';   <--- produces no rows   
> SET hive.optimize.ppd.storage=false;  <--- turn off ppd push down optimization
> SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces rows after 
> setting above config to false
> Attaching parquet data files for reference:
>  
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25244) Hive predicate pushdown with Parquet format for `date` as partitioned column name produce empty resultset

Reply via email to