Mihailo Timotic created SPARK-51732:
---------------------------------------

             Summary: Apply `rpad` on attributes with same `ExprId` if they 
need to be deduplicated 
                 Key: SPARK-51732
                 URL: https://issues.apache.org/jira/browse/SPARK-51732
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.0.0, 4.1.0
            Reporter: Mihailo Timotic


We need to apply `rpad` on attributes that have the same `ExprId` if those 
attributes should be deduplicated.

For example:
{code:java}
CREATE OR REPLACE TABLE t(a CHAR(50)); {code}
{code:java}
SELECT t1.aFROM t t1 
WHERE (SELECT count(*) AS item_cnt FROM t t2 WHERE (t1.a = t2.a)) > 0
{code}
In the above case, `ApplyCharTypePadding` will run for subquery where `t1.a` 
and `t2.a` will reference the same `ExprId`, therefore we won't apply `rpad`. 
However, after `DeduplicateRelations` runs for outer query, `t1.a` and `t2.a` 
will get different `ExprIds` and would therefore need `rpad`. However, this 
doesn't happen because `ApplyCharTypePadding` for outer query does not recurse 
into the subquery.

On the other hand, for a query:
{code:java}
SELECT t1.a
FROM t t1, t t2
WHERE t1.a = t2.a {code}
`ApplyCharTypePadding` will correctly add `rpad` to both `t1.a` and `t2.a` 
because attributes will first be deduplicated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to