Mihailo Timotic created SPARK-51732: ---------------------------------------
Summary: Apply `rpad` on attributes with same `ExprId` if they need to be deduplicated Key: SPARK-51732 URL: https://issues.apache.org/jira/browse/SPARK-51732 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0, 4.1.0 Reporter: Mihailo Timotic We need to apply `rpad` on attributes that have the same `ExprId` if those attributes should be deduplicated. For example: {code:java} CREATE OR REPLACE TABLE t(a CHAR(50)); {code} {code:java} SELECT t1.aFROM t t1 WHERE (SELECT count(*) AS item_cnt FROM t t2 WHERE (t1.a = t2.a)) > 0 {code} In the above case, `ApplyCharTypePadding` will run for subquery where `t1.a` and `t2.a` will reference the same `ExprId`, therefore we won't apply `rpad`. However, after `DeduplicateRelations` runs for outer query, `t1.a` and `t2.a` will get different `ExprIds` and would therefore need `rpad`. However, this doesn't happen because `ApplyCharTypePadding` for outer query does not recurse into the subquery. On the other hand, for a query: {code:java} SELECT t1.a FROM t t1, t t2 WHERE t1.a = t2.a {code} `ApplyCharTypePadding` will correctly add `rpad` to both `t1.a` and `t2.a` because attributes will first be deduplicated. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org