Blizzara opened a new issue, #16140:
URL: https://github.com/apache/datafusion/issues/16140

   ### Describe the bug
   
   A Substrait plan with an aggregation that has duplicate entries doesn't 
provide output columns for all of the duplicates. This causes issues 
downstream, since expected columns aren't found (we'd [expect to 
find](https://substrait.io/relations/logical_relations/#aggregate-operation) 
`The list of grouping expressions in declaration order followed by the list of 
measures in declaration order --`).
   
   One can wonder if those duplicates are useful (probs not), but the plan 
should still be valid (and substrait-spark does seem to produce these in some 
cases).
   
   A possible solution might be to automatically wrap the Aggregate in a 
Project which duplicates the missing columns.
   
   
   ### To Reproduce
   
   The following plan fails to read:
   ```
   {
     "extensionUris": [],
     "extensions": [],
     "relations": [
       {
         "root": {
           "input": {
             "aggregate": {
               "input": {
                 "read": {
                   "common": {
                     "direct": {}
                   },
                   "baseSchema": {
                     "names": [],
                     "struct": {
                       "types": [],
                       "nullability": "NULLABILITY_NULLABLE"
                     }
                   },
                   "namedTable": {
                     "names": [
                       "data"
                     ]
                   }
                 }
               },
               "groupings": [
                 {
                   "groupingExpressions": [
                     {
                       "literal": {
                         "i32": 1
                       }
                     },
                     {
                       "literal": {
                         "i32": 1
                       }
                     }
                   ]
                 }
               ],
               "measures": []
             }
           },
           "names": [
             "grouping_col_1",
             "grouping_col_2"
           ]
         }
       }
     ],
     "version": {
       "minorNumber": 54,
       "producer": "manual"
     }
   }
   ```
   
   Changing one of the literals to another value makes the plan pass.
   
   ### Expected behavior
   
   Expected answer would be:
   ```
   Aggregate: groupBy=[[Int32(1) AS grouping_col_1, Int32(1) AS 
grouping_col_2]], aggr=[[]]
     TableScan: data
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to