[ 
https://issues.apache.org/jira/browse/HIVE-19493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471354#comment-16471354
 ] 

Vihang Karajgaonkar commented on HIVE-19493:
--------------------------------------------

Thanks [~mmccline] for the review. Updated the diff with the suggested change. 
Actually I am having hard time to understand the complete logic of when noNulls 
needs to be set to {{true}}. For example,

in the below code snippet from {{copySelected}} method
{code:java}
// Handle repeating case
    if (input.isRepeating) {
      if (input.noNulls || !input.isNull[0]) {
        String string = new String(input.vector[0], input.start[0], 
input.length[0]);
        try {
          date.setTime(formatter.parse(string).getTime());
          output.vector[0] = DateWritable.dateToDays(date);
          output.isNull[0] = false;
        } catch (ParseException e) {
          output.isNull[0] = true;
          output.noNulls = false;
        }
      } else {
        output.isNull[0] = true;
        output.noNulls = false;
      }
      output.isRepeating = true;
      return;
    }
{code}
Can you please help me understand the following?
 * Why do we *not* set output.notNulls = true in the {{if (input.noNulls || 
!input.isNull[0])}} code block when we know that {{input.isRepeating == true}}? 
Shouldn't we be setting it to true and reset output.isNull[] array to all false 
values? My guess is that we are not doing this for performance reasons since it 
doesn't make sense to reset the whole {{output.isNull[]}} when we know that we 
only need to look for the first element. Just want to confirm if this 
understanding is correct (may be even add a comment so that its easy to 
remember next time)
 * As a corollary of the above statement it means both the following conditions 
are valid but they represent the same state of the columnVector.
 {{vector.isRepeating == true && vector.noNulls == true}} --> vector has a 
non-null repeating value
 {{vector.isRepeating == true && vector.noNulls == false && vector.isNull[0] == 
false}} --> vector has non-null repeating value. Is this understanding correct?
 * In case of {{input.isRepeating == false && selected == true}} case when 
{{input.noNulls == true && output.noNull == false}} why don't we set 
{{output.noNull = true}}. Is it because we think that when {{selected == true}} 
there may be less number of rows to be updated and hence its unnecessary work 
to reset {{output.isNull[]}} which we have to do everytime when set 
{{output.noNulls = true}}?

I think the bug which was fixed in HIVE-18622 manifests when the 
{{output.noNulls}} is flipped from {{false to true}} and when the {{isNull}} 
array has some entries which are {{true}}. So may be we create an expression 
such that the columnVector is reused and this flag gets flipped. Just adding a 
row of nulls may not exercise the code which has the bug. Let me see if I can 
create an expression to test this particular issue.

> VectorUDFDateDiffColCol copySelected does not handle nulls correctly
> --------------------------------------------------------------------
>
>                 Key: HIVE-19493
>                 URL: https://issues.apache.org/jira/browse/HIVE-19493
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-19493.01.patch, HIVE-19493.02.patch
>
>
> The {{copySelected}} method in {{VectorUDFDateDiffColCol}} class was missed 
> during HIVE-18622



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to