[ https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-26408: -------------------------------- Description: This is similar to HIVE-15588. With a customer query, I reproduced a vectorized expression tree like the below one (I'll attach a simple repro query when it's possible): {code} selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 61:string)(children: StringColumnInList(col 13, values TermDeposit, RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string, ConstantVectorExpression(val ) -> 61:string) -> 62:string {code} query part was: {code} CASE WHEN DLY_BAL.PDELP_VALUE in ( 'TermDeposit', 'RecurringDeposit', 'CertificateOfDeposit' ) THEN NVL( ( from_unixtime( unix_timestamp( cast(DLY_BAL.APATD_MTRTY_DATE as date) ), 'MM-dd-yyyy' ) ), ' ' ) ELSE '' END AS MAT_DTE {code} Here is the problem described: 1. IfExprCondExprColumn has 62:string as its [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64], which is a reused scratch column (see 5) ) 2. in evaluation time, [isRepeating is reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68] 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of children is required, so [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95] 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which belongs to the second branch of VectorCoalesce, so to the '' empty string in NVL's second argument 5. in 4) 62: string column is set to an isRepeating column (and it's released by [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]), so it's marked as a reusable scratch column 6. after the conditional evaluation in 3), the final output of IfExprCondExprColumn set [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99], but here we get an exception [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]: {code} 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: java.lang.AssertionError: Output column number expected to be 0 when isRepeating at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494) at org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) {code} this is clearly an incorrect scratch column reuse, which must not be fixed by resetting vectors in IfExprCondExprColumn, as it would just hide the original issue I realized that the problem can be easily fixed by simply prevent releasing ConstantVectorExpressions, that's what I'm trying to test now was: This is similar to HIVE-15588. With a customer query, I reproduced a vectorized expression tree like the below one (I'll attach a simple repro query when it's possible): {code} selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 61:string)(children: StringColumnInList(col 13, values TermDeposit, RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string, ConstantVectorExpression(val ) -> 61:string) -> 62:string {code} query part was: {code} CASE WHEN DLY_BAL.PDELP_VALUE in ( 'TermDeposit', 'RecurringDeposit', 'CertificateOfDeposit' ) THEN NVL( ( from_unixtime( unix_timestamp( cast(DLY_BAL.APATD_MTRTY_DATE as date) ), 'MM-dd-yyyy' ) ), ' ' ) ELSE '' END AS MAT_DTE {code} Here is the problem described: 1. IfExprCondExprColumn has 62:string as its [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64], which is a reused scratch column (see 5) ) 2. in evaluation time, [isRepeating is reset |https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68] 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of children is required, so [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95] 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which belongs to the second branch of VectorCoalesce, so to the '' empty string in NVL's second argument 5. in 4) 62: string column is set to an isRepeating column (and it's released by [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]), so it's marked as a reusable scratch column 6. after the conditional evaluation in 3), the final output of IfExprCondExprColumn set [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99], but here we get an exception [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]: {code} 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: java.lang.AssertionError: Output column number expected to be 0 when isRepeating at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494) at org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) {code} this is clearly an incorrect scratch column reuse, which must not be fixed by resetting vectors in IfExprCondExprColumn, as it would just hide the original issue I realized that the problem can be easily fixed by simply prevent releasing ConstantVectorExpressions, that's what I'm trying to test now > Vectorization: Fix deallocation of scratch columns, don't reuse a child > ConstantVectorExpression as an output > ------------------------------------------------------------------------------------------------------------- > > Key: HIVE-26408 > URL: https://issues.apache.org/jira/browse/HIVE-26408 > Project: Hive > Issue Type: Bug > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > > This is similar to HIVE-15588. With a customer query, I reproduced a > vectorized expression tree like the below one (I'll attach a simple repro > query when it's possible): > {code} > selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col > 61:string)(children: StringColumnInList(col 13, values TermDeposit, > RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns > [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( > _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col > 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> > 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string, > ConstantVectorExpression(val ) -> 61:string) -> 62:string > {code} > query part was: > {code} > CASE WHEN DLY_BAL.PDELP_VALUE in ( > 'TermDeposit', 'RecurringDeposit', > 'CertificateOfDeposit' > ) THEN NVL( > ( > from_unixtime( > unix_timestamp( > cast(DLY_BAL.APATD_MTRTY_DATE as date) > ), > 'MM-dd-yyyy' > ) > ), > ' ' > ) ELSE '' END AS MAT_DTE > {code} > Here is the problem described: > 1. IfExprCondExprColumn has 62:string as its > [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64], > which is a reused scratch column (see 5) ) > 2. in evaluation time, [isRepeating is > reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68] > 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of > children is required, so > [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95] > 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which > belongs to the second branch of VectorCoalesce, so to the '' empty string in > NVL's second argument > 5. in 4) 62: string column is set to an isRepeating column (and it's released > by > [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]), > so it's marked as a reusable scratch column > 6. after the conditional evaluation in 3), the final output of > IfExprCondExprColumn set > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99], > but here we get an exception > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]: > {code} > 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: > java.lang.AssertionError: Output column number expected to be 0 when > isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > {code} > this is clearly an incorrect scratch column reuse, which must not be fixed by > resetting vectors in IfExprCondExprColumn, as it would just hide the original > issue > I realized that the problem can be easily fixed by simply prevent releasing > ConstantVectorExpressions, that's what I'm trying to test now -- This message was sent by Atlassian Jira (v8.20.10#820010)