pepijnve opened a new issue, #19684:
URL: https://github.com/apache/datafusion/issues/19684

   ### Describe the bug
   
   When the common subexpression elimination deduplicates aggregations it can 
generate aliases for the common expression of the form `__common_expr_<n>`. In 
the logical plan explain output this gets output as `<original expr> as 
__common_expr_<n>`. In the physical plan explain output though only 
`__common_expr_<n>` is printed. The actual expression corresponding to this 
alias is no longer visible. This makes the explain output hard to interpret.
   
   ### To Reproduce
   
   Here's an example logic plan constructed using the data frame API. The 
problematic line is
   ```
   AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
   ```
   
   ```
   Logical plan
   ============
   Projection: idx, agg, ord
     Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS agg, sum(column1) AS 
ord]]
       Projection: column1, column2, CASE WHEN column2 <= Int64(0) THEN 
Int64(0) WHEN column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314) 
THEN Int64(3) ELSE Int64(4) END AS idx
         Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3), 
Int64(314))
   
   Optimized logical plan
   ======================
   Projection: idx, __common_expr_1 AS agg, __common_expr_1 AS ord
     Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS __common_expr_1]]
       Projection: column1, CASE WHEN column2 <= Int64(0) THEN Int64(0) WHEN 
column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314) THEN Int64(3) 
ELSE Int64(4) END AS idx
         Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3), 
Int64(314))
   
   Physical plan
   =============
   ProjectionExec: expr=[idx@0 as idx, __common_expr_1@1 as agg, 
__common_expr_1@1 as ord]
     AggregateExec: mode=FinalPartitioned, gby=[idx@0 as idx], 
aggr=[__common_expr_1]
       RepartitionExec: partitioning=Hash([idx@0], 10), input_partitions=1
         AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
           ProjectionExec: expr=[column1@0 as column1, CASE WHEN column2@1 <= 0 
THEN 0 WHEN column2@1 <= 200 THEN 1 WHEN column2@1 <= 314 THEN 3 ELSE 4 END as 
idx]
             DataSourceExec: partitions=1, partition_sizes=[1]
   ```
   
   ### Expected behavior
   
   Rather than
   ```
   AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
   ```
   the explain output should show
   ```
   AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[sum(column1@0) as 
__common_expr_1]
   ```
   similarly to how the group by expression are printed.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to