Mahesh Raju Somalaraju created HIVE-28503: ---------------------------------------------
Summary: Wrong results(NULL) when string concat operation with || operator for ORC file format when vectorization enabled Key: HIVE-28503 URL: https://issues.apache.org/jira/browse/HIVE-28503 Project: Hive Issue Type: Bug Components: Hive Reporter: Mahesh Raju Somalaraju Assignee: Mahesh Raju Somalaraju Wrong results(NULL) when string concat operation with || operator for ORC file format when vectorization enabled. set hive.query.results.cache.enabled=false; set hive.fetch.task.conversion=none; set hive.vectorized.execution.enabled=true; Result is NULL when we do concat operation with || operator. Locally it is not able to reproduce. It is able to reproduce in cluster with more records.Input data should be mix of NULL and NOT NULL values something like below. create a table with orc file format and has 3 string columns and insert data such way that it should have mix of NULL values and NOT NULL values. |column1|column2|column3|count| |NULL |NULL |NULL |18000 | |G |L |A1 |123932 | with above configuration, perform concat() operation with || operator and insert new row with the concat() results. select * from (select t1.column1, t1.column2, t1.column3, *t1.column1 || t1.column2 || t1.column3 as VEH_MODEL_ID* from test_table t1 )t where VEH_MODEL_ID is NULL and if(column1 is null,0,1)=1 AND if(column2 is null,0,1)=1 AND if(column3 is null,0,1)=1 limit 1; in above query, *t1.column1 || t1.column2 || t1.column3 as VEH_MODEL_ID* operation is returning the NULL result eventhough the input string values are not null. |t.VEH_MODEL_ID|t.column1|t.column2|t.column3| |NULL|G|L|A2| +Proposed solution as per code review:+ +*Root cause:*+ While doing concat() operation, In *StringGroupConcatColCol* class, if input batch vector has mixed of NULL and NOT NULL values of inputs then we are not setting output vector batch flags related to NULL and NOT NULLS correctly . Each value in the vector has the flag whether it is NULL or NOT NULL. But here we are not setting correctly the whole output vector flag (outV.noNulls). Without this flag it is working for parquet, some how they may be referring each value instead of checking whole output vector flag whether it is NULL or NOT NULL. +*code snippet:*+ *StringGroupConcatColCol->evaluate() method:* if (inV1.noNulls && !inV2.noNulls) { *>> if any one input has NULL, then output should be NULL.* outV.noNulls = false; *--> setting this flag false as all values in this are NULLs* ---- } else if (!inV1.noNulls && inV2.noNulls) { *>> if any one input has NULL, then output should be NULL.* outV.noNulls = false; --> *setting this flag false as all values in this are NULLs* --- } else if (!inV1.noNulls && !inV2.noNulls) { *>> if two inputs are NULL, then output should be NULL.* outV.noNulls = false; *--> setting this flag false as all values in this are NULLs** --- } else { *--> there are no nulls in either input vector* {color:#4c9aff}*outV.noNulls = true; --> this has to be set true, as there are no NULL values, this check is missed currently.*{color} // perform data operation --- } -- This message was sent by Atlassian Jira (v8.20.10#820010)