[jira] [Work logged] (HIVE-26320) Incorrect case evaluation for Parquet based table

ASF GitHub Bot (Jira) Thu, 29 Sep 2022 07:38:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26320?focusedWorklogId=813359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-813359
 ]


ASF GitHub Bot logged work on HIVE-26320:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Sep/22 14:37
            Start Date: 29/Sep/22 14:37
    Worklog Time Spent: 10m 
      Work Description: jfsii commented on code in PR #3628:
URL: https://github.com/apache/hive/pull/3628#discussion_r983634554


##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java:
##########
@@ -91,11 +93,61 @@ public class ParquetHiveSerDe extends AbstractSerDe 
implements SchemaInference {
 
   private ObjectInspector objInspector;
   private ParquetHiveRecord parquetRow;
+  private ObjectInspectorConverters.Converter converter;
 
   public ParquetHiveSerDe() {
     parquetRow = new ParquetHiveRecord();
   }
 
+  // Recursively check if CHAR or VARCHAR types are used
+  private boolean needsConversion(TypeInfo type) {

Review Comment:
   Yes - I agree it could be a perf hit here. Though it wouldn't surprise me if 
the perf difference is small since these paths already do many allocations per 
row.
   
   BUT - I did change my approach with what I think is a much more appropriate 
solution that hopefully avoids the extra allocation.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 813359)
    Time Spent: 2h 10m  (was: 2h)

> Incorrect case evaluation for Parquet based table
> -------------------------------------------------
>
>                 Key: HIVE-26320
>                 URL: https://issues.apache.org/jira/browse/HIVE-26320
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, Query Planning
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: Chiran Ravani
>            Assignee: John Sherman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Query involving case statement with two or more conditions leads to incorrect 
> result for tables with parquet format, The problem is not observed with ORC 
> or TextFile.
> *Steps to reproduce*:
> {code:java}
> create external table case_test_parquet(kob varchar(2),enhanced_type_code 
> int) stored as parquet;
> insert into case_test_parquet values('BB',18),('BC',18),('AB',18);
> select case when (
>                    (kob='BB' and enhanced_type_code='18')
>                    or (kob='BC' and enhanced_type_code='18')
>                  )
>             then 1
>             else 0
>         end as logic_check
> from case_test_parquet;
> {code}
> Result:
> {code}
> 0
> 0
> 0
> {code}
> Expected result:
> {code}
> 1
> 1
> 0
> {code}
> The problem does not appear when setting hive.optimize.point.lookup=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26320) Incorrect case evaluation for Parquet based table

Reply via email to