[ 
https://issues.apache.org/jira/browse/HIVE-24817?focusedWorklogId=562704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-562704
 ]

ASF GitHub Bot logged work on HIVE-24817:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Mar/21 23:21
            Start Date: 08/Mar/21 23:21
    Worklog Time Spent: 10m 
      Work Description: scarlin-cloudera commented on a change in pull request 
#2027:
URL: https://github.com/apache/hive/pull/2027#discussion_r589822395



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##########
@@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
             T columnDesc = children.get(0);
             T valueDesc = interpretNode(columnDesc, children.get(i));
             if (valueDesc == null) {
-              if (hasNullValue) {
-                // Skip if null value has already been added
-                continue;
-              }
-              TypeInfo targetType = exprFactory.getTypeInfo(columnDesc);
+              // Keep original
+              TypeInfo targetType = exprFactory.getTypeInfo(children.get(i));
               if (!expressions.containsKey(targetType)) {
                 expressions.put(targetType, columnDesc);
               }
-              T nullConst = exprFactory.createConstantExpr(targetType, null);
-              expressions.put(targetType, nullConst);
-              hasNullValue = true;
+              expressions.put(targetType, children.get(i));
             } else {

Review comment:
       So I'm not sure how to address your comments, I"m gonna address all of 
them here.  Lemme take a step back and tell you what I've already discussed 
with Jesus and the direction we were going.
   
   So we know we're losing some optimizations as you've noted.  Jesus felt that 
they weren't that big of a deal.  For instance, we'd lose the optimization of 
"tinyint_col in (2500000)".  Previously, we saw this changing to false since a 
tinyint col can never be that value, but that check won't be optimized out now. 
 I think that's the main one I saw with the tests.
   
   So for now, we'd like to avoid a bigger rewrite.  He also noted that this 
optimization should perhaps be more in the Calcite framework, which makes sense 
to me.
   
   You did have one other comment about an "if" statement not being hit.  I'm 
not sure I understand the bug that you're referring to.  It's a bit complicated 
to understand, but it seems ok to me?  Can you explain this further?
   Thanks again!
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 562704)
    Time Spent: 40m  (was: 0.5h)

> "not in" clause returns incorrect data when there is coercion
> -------------------------------------------------------------
>
>                 Key: HIVE-24817
>                 URL: https://issues.apache.org/jira/browse/HIVE-24817
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Steve Carlin
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the query has a where clause that has an integer column checking against 
> being "not in" a decimal column, the decimal column is being changed to null, 
> causing incorrect results.
> This is a sample query of a failure:
> select count(*) from my_tbl where int_col not in (355.8);
> Since the int_col can never be 355.8, one would expect all the rows to be 
> returned, but it is changing the 355.8 into a null value causing no rows to 
> be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to