[ 
https://issues.apache.org/jira/browse/SPARK-51732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-51732:
-----------------------------------

    Assignee: Mihailo Timotic

> Apply `rpad` on attributes with same `ExprId` if they need to be deduplicated 
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-51732
>                 URL: https://issues.apache.org/jira/browse/SPARK-51732
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: Mihailo Timotic
>            Assignee: Mihailo Timotic
>            Priority: Major
>              Labels: pull-request-available
>
> We need to apply `rpad` on attributes that have the same `ExprId` if those 
> attributes should be deduplicated.
> For example:
> {code:java}
> CREATE OR REPLACE TABLE t(a CHAR(50)); {code}
> {code:java}
> SELECT t1.aFROM t t1 
> WHERE (SELECT count(*) AS item_cnt FROM t t2 WHERE (t1.a = t2.a)) > 0
> {code}
> In the above case, `ApplyCharTypePadding` will run for subquery where `t1.a` 
> and `t2.a` will reference the same `ExprId`, therefore we won't apply `rpad`. 
> However, after `DeduplicateRelations` runs for outer query, `t1.a` and `t2.a` 
> will get different `ExprIds` and would therefore need `rpad`. However, this 
> doesn't happen because `ApplyCharTypePadding` for outer query does not 
> recurse into the subquery.
> On the other hand, for a query:
> {code:java}
> SELECT t1.a
> FROM t t1, t t2
> WHERE t1.a = t2.a {code}
> `ApplyCharTypePadding` will correctly add `rpad` to both `t1.a` and `t2.a` 
> because attributes will first be deduplicated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to