[ 
https://issues.apache.org/jira/browse/SPARK-50091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-50091:
-----------------------------------

    Assignee: Bruce Robbins

> Query fails when aggregate expression is in left-hand operand of IN-subquery
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-50091
>                 URL: https://issues.apache.org/jira/browse/SPARK-50091
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.2
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Major
>              Labels: pull-request-available
>
> Consider this query:
> {noformat}
> create or replace temp view v1(c1, c2) as values
> (1, 2), (1, 3), (2, 2), (3, 7), (3, 1);
> create or replace temp view v2(col1, col2) as values
> (1, 2), (1, 3), (2, 2), (3, 7), (3, 1);
> select col1, sum(col2) in (select c2 from v1)
> from v2 group by col1;
> {noformat}
> It fails with the following error:
> {noformat}
> [INTERNAL_ERROR] Cannot generate code for expression: sum(input[1, int, 
> false]) SQLSTATE: XX000
> {noformat}
> With SPARK_TESTING=1, it fails with
> {noformat}
> [PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule 
> org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery in batch 
> RewriteSubquery generated an invalid plan: Special expressions are placed in 
> the wrong plan: Aggregate [col1#11], [col1#11, first(exists#20, false) AS 
> (sum(col2) IN (listquery()))#19]
> +- Join ExistenceJoin(exists#20), (sum(col2#12) = c2#18L)
>    :- LocalRelation [col1#11, col2#12]
>    +- LocalRelation [c2#18L]
> {noformat}
> The {{Join}} is invalid because it contains an aggregate expression in the 
> join condition.
> The bug is in the handler for UnaryNode in 
> {{RewritePredicateSubquery#apply,}} which adds a {{Join}} below the 
> {{Aggregate}} and assumes that the left-hand operand of IN-subquery can be 
> used in the join condition. This works fine for most cases, but not when the 
> left-hand operand is an aggregate expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to