[ https://issues.apache.org/jira/browse/HIVE-19493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471354#comment-16471354 ]
Vihang Karajgaonkar commented on HIVE-19493: -------------------------------------------- Thanks [~mmccline] for the review. Updated the diff with the suggested change. Actually I am having hard time to understand the complete logic of when noNulls needs to be set to {{true}}. For example, in the below code snippet from {{copySelected}} method {code:java} // Handle repeating case if (input.isRepeating) { if (input.noNulls || !input.isNull[0]) { String string = new String(input.vector[0], input.start[0], input.length[0]); try { date.setTime(formatter.parse(string).getTime()); output.vector[0] = DateWritable.dateToDays(date); output.isNull[0] = false; } catch (ParseException e) { output.isNull[0] = true; output.noNulls = false; } } else { output.isNull[0] = true; output.noNulls = false; } output.isRepeating = true; return; } {code} Can you please help me understand the following? * Why do we *not* set output.notNulls = true in the {{if (input.noNulls || !input.isNull[0])}} code block when we know that {{input.isRepeating == true}}? Shouldn't we be setting it to true and reset output.isNull[] array to all false values? My guess is that we are not doing this for performance reasons since it doesn't make sense to reset the whole {{output.isNull[]}} when we know that we only need to look for the first element. Just want to confirm if this understanding is correct (may be even add a comment so that its easy to remember next time) * As a corollary of the above statement it means both the following conditions are valid but they represent the same state of the columnVector. {{vector.isRepeating == true && vector.noNulls == true}} --> vector has a non-null repeating value {{vector.isRepeating == true && vector.noNulls == false && vector.isNull[0] == false}} --> vector has non-null repeating value. Is this understanding correct? * In case of {{input.isRepeating == false && selected == true}} case when {{input.noNulls == true && output.noNull == false}} why don't we set {{output.noNull = true}}. Is it because we think that when {{selected == true}} there may be less number of rows to be updated and hence its unnecessary work to reset {{output.isNull[]}} which we have to do everytime when set {{output.noNulls = true}}? I think the bug which was fixed in HIVE-18622 manifests when the {{output.noNulls}} is flipped from {{false to true}} and when the {{isNull}} array has some entries which are {{true}}. So may be we create an expression such that the columnVector is reused and this flag gets flipped. Just adding a row of nulls may not exercise the code which has the bug. Let me see if I can create an expression to test this particular issue. > VectorUDFDateDiffColCol copySelected does not handle nulls correctly > -------------------------------------------------------------------- > > Key: HIVE-19493 > URL: https://issues.apache.org/jira/browse/HIVE-19493 > Project: Hive > Issue Type: Bug > Components: Vectorization > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Priority: Major > Attachments: HIVE-19493.01.patch, HIVE-19493.02.patch > > > The {{copySelected}} method in {{VectorUDFDateDiffColCol}} class was missed > during HIVE-18622 -- This message was sent by Atlassian JIRA (v7.6.3#76005)