[jira] [Work logged] (HIVE-26320) Incorrect case evaluation for Parquet based table

ASF GitHub Bot (Jira) Fri, 30 Sep 2022 08:32:14 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26320?focusedWorklogId=813740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-813740
 ]


ASF GitHub Bot logged work on HIVE-26320:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Sep/22 15:31
            Start Date: 30/Sep/22 15:31
    Worklog Time Spent: 10m 
      Work Description: jfsii commented on code in PR #3628:
URL: https://github.com/apache/hive/pull/3628#discussion_r984714606


##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java:
##########
@@ -481,12 +485,36 @@ protected BytesWritable convert(Binary binary) {
   },
   ESTRING_CONVERTER(String.class) {
     @Override
-    PrimitiveConverter getConverter(final PrimitiveType type, final int index, 
final ConverterParent parent, TypeInfo hiveTypeInfo) {
+    PrimitiveConverter getConverter(final PrimitiveType type, final int index, 
final ConverterParent parent,
+        TypeInfo hiveTypeInfo) {
+      // If we have type information, we should return properly typed strings. 
However, there are a variety
+      // of code paths that do not provide the typeInfo in those cases we 
default to Text. This idiom is also
+      // followed by for example the BigDecimal converter in which if there is 
no type information,
+      // it defaults to the widest representation
+      if (hiveTypeInfo != null) {
+        String typeName = hiveTypeInfo.getTypeName().toLowerCase();
+        if (typeName.startsWith(serdeConstants.CHAR_TYPE_NAME)) {
+          return new BinaryConverter<HiveCharWritable>(type, parent, index) {
+            @Override
+              protected HiveCharWritable convert(Binary binary) {
+                return new HiveCharWritable(binary.getBytes(), ((CharTypeInfo) 
hiveTypeInfo).getLength());
+              }
+          };
+        } else if (typeName.startsWith(serdeConstants.VARCHAR_TYPE_NAME)) {
+          return new BinaryConverter<HiveVarcharWritable>(type, parent, index) 
{
+            @Override
+              protected HiveVarcharWritable convert(Binary binary) {
+                return new HiveVarcharWritable(binary.getBytes(), 
((VarcharTypeInfo) hiveTypeInfo).getLength());
+              }
+          };
+        }
+      }

Review Comment:
   Yeah, I did follow the file on its convention. However, I changed it to your 
suggestion because it looks cleaner and might as well encourage this still in 
case someone copies me





Issue Time Tracking
-------------------

    Worklog Id:     (was: 813740)
    Time Spent: 3.5h  (was: 3h 20m)

> Incorrect case evaluation for Parquet based table
> -------------------------------------------------
>
>                 Key: HIVE-26320
>                 URL: https://issues.apache.org/jira/browse/HIVE-26320
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, Query Planning
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: Chiran Ravani
>            Assignee: John Sherman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Query involving case statement with two or more conditions leads to incorrect 
> result for tables with parquet format, The problem is not observed with ORC 
> or TextFile.
> *Steps to reproduce*:
> {code:java}
> create external table case_test_parquet(kob varchar(2),enhanced_type_code 
> int) stored as parquet;
> insert into case_test_parquet values('BB',18),('BC',18),('AB',18);
> select case when (
>                    (kob='BB' and enhanced_type_code='18')
>                    or (kob='BC' and enhanced_type_code='18')
>                  )
>             then 1
>             else 0
>         end as logic_check
> from case_test_parquet;
> {code}
> Result:
> {code}
> 0
> 0
> 0
> {code}
> Expected result:
> {code}
> 1
> 1
> 0
> {code}
> The problem does not appear when setting hive.optimize.point.lookup=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26320) Incorrect case evaluation for Parquet based table

Reply via email to