pepijnve opened a new issue, #19684:
URL: https://github.com/apache/datafusion/issues/19684
### Describe the bug
When the common subexpression elimination deduplicates aggregations it can
generate aliases for the common expression of the form `__common_expr_<n>`. In
the logical plan explain output this gets output as `<original expr> as
__common_expr_<n>`. In the physical plan explain output though only
`__common_expr_<n>` is printed. The actual expression corresponding to this
alias is no longer visible. This makes the explain output hard to interpret.
### To Reproduce
Here's an example logic plan constructed using the data frame API. The
problematic line is
```
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
```
```
Logical plan
============
Projection: idx, agg, ord
Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS agg, sum(column1) AS
ord]]
Projection: column1, column2, CASE WHEN column2 <= Int64(0) THEN
Int64(0) WHEN column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314)
THEN Int64(3) ELSE Int64(4) END AS idx
Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3),
Int64(314))
Optimized logical plan
======================
Projection: idx, __common_expr_1 AS agg, __common_expr_1 AS ord
Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS __common_expr_1]]
Projection: column1, CASE WHEN column2 <= Int64(0) THEN Int64(0) WHEN
column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314) THEN Int64(3)
ELSE Int64(4) END AS idx
Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3),
Int64(314))
Physical plan
=============
ProjectionExec: expr=[idx@0 as idx, __common_expr_1@1 as agg,
__common_expr_1@1 as ord]
AggregateExec: mode=FinalPartitioned, gby=[idx@0 as idx],
aggr=[__common_expr_1]
RepartitionExec: partitioning=Hash([idx@0], 10), input_partitions=1
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
ProjectionExec: expr=[column1@0 as column1, CASE WHEN column2@1 <= 0
THEN 0 WHEN column2@1 <= 200 THEN 1 WHEN column2@1 <= 314 THEN 3 ELSE 4 END as
idx]
DataSourceExec: partitions=1, partition_sizes=[1]
```
### Expected behavior
Rather than
```
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
```
the explain output should show
```
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[sum(column1@0) as
__common_expr_1]
```
similarly to how the group by expression are printed.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]