[ 
https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26408:
--------------------------------
    Description: 
This is similar to HIVE-15588. With a customer query, I reproduced a vectorized 
expression tree like the below one (I'll attach a simple repro query when it's 
possible):
{code}
selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
61:string)(children: StringColumnInList(col 13, values TermDeposit, 
RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
[61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
_col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col 
68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
ConstantVectorExpression(val ) -> 61:string) -> 62:string
{code}

query part was:
{code}
  CASE WHEN DLY_BAL.PDELP_VALUE in (
    'TermDeposit', 'RecurringDeposit',
    'CertificateOfDeposit'
  ) THEN NVL(
    (
      from_unixtime(
        unix_timestamp(
          cast(DLY_BAL.APATD_MTRTY_DATE as date)
        ),
        'MM-dd-yyyy'
      )
    ),
    ' '
  ) ELSE '' END AS MAT_DTE
{code}

Here is the problem described:
1. IfExprCondExprColumn has 62:string as its 
[outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
 which is a reused scratch column (see 5) )
2. in evaluation time, [isRepeating is 
reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
children is required, so 
[conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
belongs to the second branch of VectorCoalesce, so to the '' empty string in 
NVL's second argument
5. in 4) 62: string column is set to an isRepeating column (and it's released 
by 
[freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
 so it's marked as a reusable scratch column
6. after the conditional evaluation in 3), the final output of 
IfExprCondExprColumn set 
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
 but here we get an exception 
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
{code}
2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
java.lang.AssertionError: Output column number expected to be 0 when isRepeating
        at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
        at 
org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
        at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
        at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
{code}

this is clearly an incorrect scratch column reuse, which must not be fixed by 
resetting vectors in IfExprCondExprColumn, as it would just hide the original 
issue

I realized that the problem can be easily fixed by simply prevent releasing 
ConstantVectorExpressions, that's what I'm trying to test now

  was:
This is similar to HIVE-15588. With a customer query, I reproduced a vectorized 
expression tree like the below one (I'll attach a simple repro query when it's 
possible):
{code}
selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
61:string)(children: StringColumnInList(col 13, values TermDeposit, 
RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
[61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
_col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col 
68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
ConstantVectorExpression(val ) -> 61:string) -> 62:string
{code}

query part was:
{code}
  CASE WHEN DLY_BAL.PDELP_VALUE in (
    'TermDeposit', 'RecurringDeposit',
    'CertificateOfDeposit'
  ) THEN NVL(
    (
      from_unixtime(
        unix_timestamp(
          cast(DLY_BAL.APATD_MTRTY_DATE as date)
        ),
        'MM-dd-yyyy'
      )
    ),
    ' '
  ) ELSE '' END AS MAT_DTE
{code}

Here is the problem described:
1. IfExprCondExprColumn has 62:string as its 
[outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
 which is a reused scratch column (see 5) )
2. in evaluation time, [isRepeating is reset
|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
children is required, so 
[conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
belongs to the second branch of VectorCoalesce, so to the '' empty string in 
NVL's second argument
5. in 4) 62: string column is set to an isRepeating column (and it's released 
by 
[freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
 so it's marked as a reusable scratch column
6. after the conditional evaluation in 3), the final output of 
IfExprCondExprColumn set 
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
 but here we get an exception 
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
{code}
2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
java.lang.AssertionError: Output column number expected to be 0 when isRepeating
        at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
        at 
org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
        at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
        at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
{code}

this is clearly an incorrect scratch column reuse, which must not be fixed by 
resetting vectors in IfExprCondExprColumn, as it would just hide the original 
issue

I realized that the problem can be easily fixed by simply prevent releasing 
ConstantVectorExpressions, that's what I'm trying to test now


> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26408
>                 URL: https://issues.apache.org/jira/browse/HIVE-26408
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
>     'TermDeposit', 'RecurringDeposit',
>     'CertificateOfDeposit'
>   ) THEN NVL(
>     (
>       from_unixtime(
>         unix_timestamp(
>           cast(DLY_BAL.APATD_MTRTY_DATE as date)
>         ),
>         'MM-dd-yyyy'
>       )
>     ),
>     ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>       at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> {code}
> this is clearly an incorrect scratch column reuse, which must not be fixed by 
> resetting vectors in IfExprCondExprColumn, as it would just hide the original 
> issue
> I realized that the problem can be easily fixed by simply prevent releasing 
> ConstantVectorExpressions, that's what I'm trying to test now



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to