Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

via GitHub Wed, 12 Feb 2025 15:15:24 -0800


anlinc commented on code in PR #14553:
URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953535318



##########
datafusion/substrait/tests/cases/roundtrip_logical_plan.rs:
##########
@@ -300,6 +300,17 @@ async fn aggregate_grouping_rollup() -> Result<()> {
     ).await
 }
 
+#[tokio::test]
+async fn multilayer_aggregate() -> Result<()> {

Review Comment:
   This test actually fails without my changes with:
   ```
   Error: Substrait("Named schema must contain names for all fields")
   ```
   1. SQL -> Substrait (we don't add the additional grouping expression here).
   2. Substrait -> LogicalPlan (this would add the additional grouping 
expression, throwing off the projection relations above).
   
   I didn't use `roundtrip` here because there were some unrelated renaming 
differences that made the test fail.
   ```
   Aggregate: groupBy=[[data.a]], aggr=[[sum(count(data.b)) AS 
sum(partial_count_b)]]
     Aggregate: groupBy=[[data.a]], aggr=[[count(data.b)]]
       TableScan: data projection=[a, b]
   ```
   vs.
   ```
   Aggregate: groupBy=[[data.a]], aggr=[[sum(partial_count_b)]]
     Projection: data.a, count(data.b) AS partial_count_b
       Aggregate: groupBy=[[data.a]], aggr=[[count(data.b)]]
         TableScan: data projection=[a, b]
   ```



##########
datafusion/substrait/tests/cases/roundtrip_logical_plan.rs:
##########
@@ -300,6 +300,17 @@ async fn aggregate_grouping_rollup() -> Result<()> {
     ).await
 }
 
+#[tokio::test]
+async fn multilayer_aggregate() -> Result<()> {

Review Comment:
   This test actually fails without my changes with:
   ```
   Error: Substrait("Named schema must contain names for all fields")
   ```
   1. SQL -> Substrait (we don't add the additional grouping expression here).
   2. Substrait -> LogicalPlan (this would add the additional grouping 
expression, throwing off the projection relations above).
   
   I didn't use `roundtrip` here because there were some unrelated plan 
differences that made the test fail.
   ```
   Aggregate: groupBy=[[data.a]], aggr=[[sum(count(data.b)) AS 
sum(partial_count_b)]]
     Aggregate: groupBy=[[data.a]], aggr=[[count(data.b)]]
       TableScan: data projection=[a, b]
   ```
   vs.
   ```
   Aggregate: groupBy=[[data.a]], aggr=[[sum(partial_count_b)]]
     Projection: data.a, count(data.b) AS partial_count_b
       Aggregate: groupBy=[[data.a]], aggr=[[count(data.b)]]
         TableScan: data projection=[a, b]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

Reply via email to