[ https://issues.apache.org/jira/browse/SPARK-50091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-50091. --------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 48627 [https://github.com/apache/spark/pull/48627] > Query fails when aggregate expression is in left-hand operand of IN-subquery > ---------------------------------------------------------------------------- > > Key: SPARK-50091 > URL: https://issues.apache.org/jira/browse/SPARK-50091 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.2 > Reporter: Bruce Robbins > Assignee: Bruce Robbins > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Consider this query: > {noformat} > create or replace temp view v1(c1, c2) as values > (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); > create or replace temp view v2(col1, col2) as values > (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); > select col1, sum(col2) in (select c2 from v1) > from v2 group by col1; > {noformat} > It fails with the following error: > {noformat} > [INTERNAL_ERROR] Cannot generate code for expression: sum(input[1, int, > false]) SQLSTATE: XX000 > {noformat} > With SPARK_TESTING=1, it fails with > {noformat} > [PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule > org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery in batch > RewriteSubquery generated an invalid plan: Special expressions are placed in > the wrong plan: Aggregate [col1#11], [col1#11, first(exists#20, false) AS > (sum(col2) IN (listquery()))#19] > +- Join ExistenceJoin(exists#20), (sum(col2#12) = c2#18L) > :- LocalRelation [col1#11, col2#12] > +- LocalRelation [c2#18L] > {noformat} > The {{Join}} is invalid because it contains an aggregate expression in the > join condition. > The bug is in the handler for UnaryNode in > {{RewritePredicateSubquery#apply,}} which adds a {{Join}} below the > {{Aggregate}} and assumes that the left-hand operand of IN-subquery can be > used in the join condition. This works fine for most cases, but not when the > left-hand operand is an aggregate expression. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org