alamb commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r1999392980
########## datafusion/expr/src/expr.rs: ########## @@ -2607,11 +2793,23 @@ pub(crate) fn schema_name_from_exprs_comma_separated_without_space( schema_name_from_exprs_inner(exprs, ",") } +/// Get `sql_name` for Vector of expressions. Review Comment: Instead of adding so many new so specific functions, what do you think about adding something like the following. I think the API would be easier to understand and it would also let us save a `String` copy when formatting as sql ```rust /// Formats a list of `&Expr` with a custom separator struct ExprListDisplay<'a> { exprs: &'a [Expr], sep: &'a str, } ... impl <'a> Display for ExprListDisplay<'a> { ... } ``` ########## datafusion/physical-plan/src/aggregates/mod.rs: ########## @@ -801,6 +803,16 @@ impl DisplayAs for AggregateExec { } } DisplayFormatType::TreeRender => { + let format_expr_with_alias = + |(e, alias): &(Arc<dyn PhysicalExpr>, String)| -> String { + let expr_sql = fmt_sql(e.as_ref()).to_string(); Review Comment: I think you can avoid a string copy here like: ```suggestion let expr_sql = fmt_sql(e.as_ref()); ``` And then directly wrote them to `f` rather than creating strings However I don't think this is performance critical code so it is also ok this way too (and it follows the previous pattern) Maybe as a follow on PR we can avoid as many copies in this code ########## datafusion/expr/src/expr.rs: ########## @@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> { } } +struct SqlDisplay<'a>(&'a Expr); +impl Display for SqlDisplay<'_> { Review Comment: We already have code to convert from `Expr` to SQL, in https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html#method.expr_to_sql However, datafusion-expr doesn't currently depend on the datafusion-sql crate which is probably a good thing So I think we should add a note in `Expr::sql_name` that if the user's want syntactically correct SQL to feed to some other system, they should use the `Unparser` and leave a link to https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html ########## datafusion/physical-expr/src/aggregate.rs: ########## @@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr { /// Output / return type of this aggregate data_type: DataType, name: String, Review Comment: ```suggestion /// Output column name that this expression creates name: String, ``` ########## datafusion/physical-expr/src/aggregate.rs: ########## @@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr { /// Output / return type of this aggregate data_type: DataType, name: String, + sql_name: String, Review Comment: I am worried about adding this new String for every single AggregateFunctionExpr even when we are not printing an explain plan -- we are trying try to keep DataFusion's planning time from getting out of hand so if we can avoid forcing a new String that would be better ########## datafusion/sqllogictest/test_files/explain_tree.slt: ########## @@ -204,53 +204,50 @@ physical_plan 04)│ aggr: │ 05)│ sum(table1.bigint_col) │ 06)│ │ -07)│ group_by: │ -08)│ string_col@0 as string_col│ -09)│ │ -10)│ mode: │ -11)│ FinalPartitioned │ -12)└─────────────┬─────────────┘ -13)┌─────────────┴─────────────┐ -14)│ CoalesceBatchesExec │ -15)│ -------------------- │ -16)│ target_batch_size: │ -17)│ 8192 │ -18)└─────────────┬─────────────┘ -19)┌─────────────┴─────────────┐ -20)│ RepartitionExec │ -21)│ -------------------- │ -22)│ output_partition_count: │ -23)│ 4 │ -24)│ │ -25)│ partitioning_scheme: │ -26)│ Hash([string_col@0], 4) │ -27)└─────────────┬─────────────┘ -28)┌─────────────┴─────────────┐ -29)│ AggregateExec │ -30)│ -------------------- │ -31)│ aggr: │ -32)│ sum(table1.bigint_col) │ -33)│ │ -34)│ group_by: │ -35)│ string_col@0 as string_col│ -36)│ │ -37)│ mode: Partial │ -38)└─────────────┬─────────────┘ -39)┌─────────────┴─────────────┐ -40)│ RepartitionExec │ -41)│ -------------------- │ -42)│ output_partition_count: │ -43)│ 1 │ -44)│ │ -45)│ partitioning_scheme: │ -46)│ RoundRobinBatch(4) │ -47)└─────────────┬─────────────┘ -48)┌─────────────┴─────────────┐ -49)│ DataSourceExec │ -50)│ -------------------- │ -51)│ files: 1 │ -52)│ format: csv │ -53)└───────────────────────────┘ +07)│ group_by: string_col │ Review Comment: 💯 ########## datafusion/physical-expr/src/aggregate.rs: ########## @@ -215,6 +224,7 @@ pub struct AggregateFunctionExpr { /// Output / return type of this aggregate data_type: DataType, name: String, + sql_name: String, Review Comment: I played around with it and I couldn't find any way to avoid this. ```suggestion /// Simplified name for `tree` explain. sql_name: String, ``` ########## datafusion/sqllogictest/test_files/explain_tree.slt: ########## @@ -1477,69 +1474,60 @@ physical_plan 07)┌─────────────┴─────────────┐ 08)│ AggregateExec │ 09)│ -------------------- │ -10)│ aggr: count(Int64(1)) │ -11)│ │ -12)│ group_by: │ -13)│ name@0 as name │ -14)│ │ -15)│ mode: │ -16)│ SinglePartitioned │ -17)└─────────────┬─────────────┘ -18)┌─────────────┴─────────────┐ -19)│ InterleaveExec ├──────────────┐ -20)└─────────────┬─────────────┘ │ -21)┌─────────────┴─────────────┐┌─────────────┴─────────────┐ -22)│ AggregateExec ││ AggregateExec │ -23)│ -------------------- ││ -------------------- │ -24)│ aggr ││ aggr │ -25)│ ││ │ -26)│ group_by: ││ group_by: │ -27)│ name@0 as name ││ name@0 as name │ -28)│ ││ │ -29)│ mode: ││ mode: │ -30)│ FinalPartitioned ││ FinalPartitioned │ -31)└─────────────┬─────────────┘└─────────────┬─────────────┘ -32)┌─────────────┴─────────────┐┌─────────────┴─────────────┐ -33)│ CoalesceBatchesExec ││ CoalesceBatchesExec │ -34)│ -------------------- ││ -------------------- │ -35)│ target_batch_size: ││ target_batch_size: │ -36)│ 8192 ││ 8192 │ -37)└─────────────┬─────────────┘└─────────────┬─────────────┘ -38)┌─────────────┴─────────────┐┌─────────────┴─────────────┐ -39)│ RepartitionExec ││ RepartitionExec │ -40)│ -------------------- ││ -------------------- │ -41)│ output_partition_count: ││ output_partition_count: │ -42)│ 4 ││ 4 │ -43)│ ││ │ -44)│ partitioning_scheme: ││ partitioning_scheme: │ -45)│ Hash([name@0], 4) ││ Hash([name@0], 4) │ -46)└─────────────┬─────────────┘└─────────────┬─────────────┘ -47)┌─────────────┴─────────────┐┌─────────────┴─────────────┐ -48)│ RepartitionExec ││ RepartitionExec │ -49)│ -------------------- ││ -------------------- │ -50)│ output_partition_count: ││ output_partition_count: │ -51)│ 1 ││ 1 │ -52)│ ││ │ -53)│ partitioning_scheme: ││ partitioning_scheme: │ -54)│ RoundRobinBatch(4) ││ RoundRobinBatch(4) │ -55)└─────────────┬─────────────┘└─────────────┬─────────────┘ -56)┌─────────────┴─────────────┐┌─────────────┴─────────────┐ -57)│ AggregateExec ││ AggregateExec │ -58)│ -------------------- ││ -------------------- │ -59)│ aggr ││ aggr │ -60)│ ││ │ -61)│ group_by: ││ group_by: │ -62)│ name@0 as name ││ name@0 as name │ -63)│ ││ │ -64)│ mode: Partial ││ mode: Partial │ -65)└─────────────┬─────────────┘└─────────────┬─────────────┘ -66)┌─────────────┴─────────────┐┌─────────────┴─────────────┐ -67)│ DataSourceExec ││ DataSourceExec │ -68)│ -------------------- ││ -------------------- │ -69)│ bytes: 1320 ││ bytes: 1312 │ -70)│ format: memory ││ format: memory │ -71)│ rows: 1 ││ rows: 1 │ -72)└───────────────────────────┘└───────────────────────────┘ +10)│ aggr: count(1) │ Review Comment: this is wonderful -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org