[ https://issues.apache.org/jira/browse/SPARK-51732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-51732: ----------------------------------- Assignee: Mihailo Timotic > Apply `rpad` on attributes with same `ExprId` if they need to be deduplicated > ------------------------------------------------------------------------------ > > Key: SPARK-51732 > URL: https://issues.apache.org/jira/browse/SPARK-51732 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.0.0, 4.1.0 > Reporter: Mihailo Timotic > Assignee: Mihailo Timotic > Priority: Major > Labels: pull-request-available > > We need to apply `rpad` on attributes that have the same `ExprId` if those > attributes should be deduplicated. > For example: > {code:java} > CREATE OR REPLACE TABLE t(a CHAR(50)); {code} > {code:java} > SELECT t1.aFROM t t1 > WHERE (SELECT count(*) AS item_cnt FROM t t2 WHERE (t1.a = t2.a)) > 0 > {code} > In the above case, `ApplyCharTypePadding` will run for subquery where `t1.a` > and `t2.a` will reference the same `ExprId`, therefore we won't apply `rpad`. > However, after `DeduplicateRelations` runs for outer query, `t1.a` and `t2.a` > will get different `ExprIds` and would therefore need `rpad`. However, this > doesn't happen because `ApplyCharTypePadding` for outer query does not > recurse into the subquery. > On the other hand, for a query: > {code:java} > SELECT t1.a > FROM t t1, t t2 > WHERE t1.a = t2.a {code} > `ApplyCharTypePadding` will correctly add `rpad` to both `t1.a` and `t2.a` > because attributes will first be deduplicated. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org