Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

via GitHub Mon, 17 Mar 2025 11:42:52 -0700


alamb commented on code in PR #15253:
URL: https://github.com/apache/datafusion/pull/15253#discussion_r1999392980



##########
datafusion/expr/src/expr.rs:
##########
@@ -2607,11 +2793,23 @@ pub(crate) fn 
schema_name_from_exprs_comma_separated_without_space(
     schema_name_from_exprs_inner(exprs, ",")
 }
 
+/// Get `sql_name` for Vector of expressions.

Review Comment:
   Instead of adding so many new so specific functions, what do you think about 
adding something like the following. I think the API would be easier to 
understand and it would also let us save a `String` copy when formatting as sql
   
   ```rust
   /// Formats a list of `&Expr` with a custom separator
   struct ExprListDisplay<'a> {
     exprs: &'a [Expr],
     sep: &'a str,
   }
   ...
   impl <'a> Display for  ExprListDisplay<'a> {
   ...
   }
   ```
   



##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -801,6 +803,16 @@ impl DisplayAs for AggregateExec {
                 }
             }
             DisplayFormatType::TreeRender => {
+                let format_expr_with_alias =
+                    |(e, alias): &(Arc<dyn PhysicalExpr>, String)| -> String {
+                        let expr_sql = fmt_sql(e.as_ref()).to_string();

Review Comment:
   I think you can avoid a string copy here like:
   ```suggestion
                           let expr_sql = fmt_sql(e.as_ref());
   ```
   
   And then directly wrote them to `f` rather than creating strings
   
   However I don't think this is performance critical code so it is also ok 
this way too (and it follows the previous pattern)
   
   Maybe as a follow on PR we can avoid as many copies in this code



##########
datafusion/expr/src/expr.rs:
##########
@@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> {
     }
 }
 
+struct SqlDisplay<'a>(&'a Expr);
+impl Display for SqlDisplay<'_> {

Review Comment:
   We already have code to convert from `Expr` to SQL, in 
https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html#method.expr_to_sql
   
   However, datafusion-expr doesn't currently depend on the datafusion-sql 
crate which is probably a good thing
   
   So I think we should add a note in `Expr::sql_name` that if the user's want 
syntactically correct SQL to feed to some other system, they should use the 
`Unparser` and leave a link to 
https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html



##########
datafusion/physical-expr/src/aggregate.rs:
##########
@@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr {
     /// Output / return type of this aggregate
     data_type: DataType,
     name: String,

Review Comment:
   ```suggestion
       /// Output column name that this expression creates
       name: String,
   ```



##########
datafusion/physical-expr/src/aggregate.rs:
##########
@@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr {
     /// Output / return type of this aggregate
     data_type: DataType,
     name: String,
+    sql_name: String,

Review Comment:
   I am worried about adding this new String for every single 
AggregateFunctionExpr even when we are not printing an explain plan -- we are 
trying try to keep DataFusion's planning time from getting out of hand so if we 
can avoid forcing a new String  that would be better



##########
datafusion/sqllogictest/test_files/explain_tree.slt:
##########
@@ -204,53 +204,50 @@ physical_plan
 04)│           aggr:           │
 05)│   sum(table1.bigint_col)  │
 06)│                           │
-07)│         group_by:         │
-08)│ string_col@0 as string_col│
-09)│                           │
-10)│           mode:           │
-11)│      FinalPartitioned     │
-12)└─────────────┬─────────────┘
-13)┌─────────────┴─────────────┐
-14)│    CoalesceBatchesExec    │
-15)│    --------------------   │
-16)│     target_batch_size:    │
-17)│            8192           │
-18)└─────────────┬─────────────┘
-19)┌─────────────┴─────────────┐
-20)│      RepartitionExec      │
-21)│    --------------------   │
-22)│  output_partition_count:  │
-23)│             4             │
-24)│                           │
-25)│    partitioning_scheme:   │
-26)│  Hash([string_col@0], 4)  │
-27)└─────────────┬─────────────┘
-28)┌─────────────┴─────────────┐
-29)│       AggregateExec       │
-30)│    --------------------   │
-31)│           aggr:           │
-32)│   sum(table1.bigint_col)  │
-33)│                           │
-34)│         group_by:         │
-35)│ string_col@0 as string_col│
-36)│                           │
-37)│       mode: Partial       │
-38)└─────────────┬─────────────┘
-39)┌─────────────┴─────────────┐
-40)│      RepartitionExec      │
-41)│    --------------------   │
-42)│  output_partition_count:  │
-43)│             1             │
-44)│                           │
-45)│    partitioning_scheme:   │
-46)│     RoundRobinBatch(4)    │
-47)└─────────────┬─────────────┘
-48)┌─────────────┴─────────────┐
-49)│       DataSourceExec      │
-50)│    --------------------   │
-51)│          files: 1         │
-52)│        format: csv        │
-53)└───────────────────────────┘
+07)│    group_by: string_col   │

Review Comment:
   💯 



##########
datafusion/physical-expr/src/aggregate.rs:
##########
@@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr {
     /// Output / return type of this aggregate
     data_type: DataType,
     name: String,
+    sql_name: String,

Review Comment:
   I played around with it and I couldn't find any way to avoid this. 
   
   ```suggestion
       /// Simplified name for `tree` explain.
           sql_name: String,
   ```



##########
datafusion/sqllogictest/test_files/explain_tree.slt:
##########
@@ -1477,69 +1474,60 @@ physical_plan
 07)┌─────────────┴─────────────┐
 08)│       AggregateExec       │
 09)│    --------------------   │
-10)│   aggr: count(Int64(1))   │
-11)│                           │
-12)│         group_by:         │
-13)│       name@0 as name      │
-14)│                           │
-15)│           mode:           │
-16)│     SinglePartitioned     │
-17)└─────────────┬─────────────┘
-18)┌─────────────┴─────────────┐
-19)│       InterleaveExec      ├──────────────┐
-20)└─────────────┬─────────────┘              │
-21)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-22)│       AggregateExec       ││       AggregateExec       │
-23)│    --------------------   ││    --------------------   │
-24)│            aggr           ││            aggr           │
-25)│                           ││                           │
-26)│         group_by:         ││         group_by:         │
-27)│       name@0 as name      ││       name@0 as name      │
-28)│                           ││                           │
-29)│           mode:           ││           mode:           │
-30)│      FinalPartitioned     ││      FinalPartitioned     │
-31)└─────────────┬─────────────┘└─────────────┬─────────────┘
-32)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-33)│    CoalesceBatchesExec    ││    CoalesceBatchesExec    │
-34)│    --------------------   ││    --------------------   │
-35)│     target_batch_size:    ││     target_batch_size:    │
-36)│            8192           ││            8192           │
-37)└─────────────┬─────────────┘└─────────────┬─────────────┘
-38)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-39)│      RepartitionExec      ││      RepartitionExec      │
-40)│    --------------------   ││    --------------------   │
-41)│  output_partition_count:  ││  output_partition_count:  │
-42)│             4             ││             4             │
-43)│                           ││                           │
-44)│    partitioning_scheme:   ││    partitioning_scheme:   │
-45)│     Hash([name@0], 4)     ││     Hash([name@0], 4)     │
-46)└─────────────┬─────────────┘└─────────────┬─────────────┘
-47)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-48)│      RepartitionExec      ││      RepartitionExec      │
-49)│    --------------------   ││    --------------------   │
-50)│  output_partition_count:  ││  output_partition_count:  │
-51)│             1             ││             1             │
-52)│                           ││                           │
-53)│    partitioning_scheme:   ││    partitioning_scheme:   │
-54)│     RoundRobinBatch(4)    ││     RoundRobinBatch(4)    │
-55)└─────────────┬─────────────┘└─────────────┬─────────────┘
-56)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-57)│       AggregateExec       ││       AggregateExec       │
-58)│    --------------------   ││    --------------------   │
-59)│            aggr           ││            aggr           │
-60)│                           ││                           │
-61)│         group_by:         ││         group_by:         │
-62)│       name@0 as name      ││       name@0 as name      │
-63)│                           ││                           │
-64)│       mode: Partial       ││       mode: Partial       │
-65)└─────────────┬─────────────┘└─────────────┬─────────────┘
-66)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
-67)│       DataSourceExec      ││       DataSourceExec      │
-68)│    --------------------   ││    --------------------   │
-69)│        bytes: 1320        ││        bytes: 1312        │
-70)│       format: memory      ││       format: memory      │
-71)│          rows: 1          ││          rows: 1          │
-72)└───────────────────────────┘└───────────────────────────┘
+10)│       aggr: count(1)      │

Review Comment:
   this is wonderful



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

Reply via email to