nuno-faria commented on PR #17058:
URL: https://github.com/apache/datafusion/pull/17058#issuecomment-3171931826

   Thanks @berkaysynnada for the pointers. I did a little more digging and 
found that the `get_finer_aggregate_exprs_requirement` function returns a 
different value for `string_agg` and `array_agg`. The source of the difference 
is that `AggregateUDFImpl::reverse_expr` is not defined for `string_agg`. This 
means the `SortExec` will not be created for `string_agg`, and that is still 
true in the main branch of datafusion.
   
   ```
   # string_agg
   | physical_plan after CombinePartialFinalAggregate           | 
OutputRequirementExec: order_by=[], dist_by=Unspecified
   |                                                            |   
AggregateExec: mode=Single, gby=[], aggr=[string_agg(t.k,Utf8(",")) ORDER BY 
[t.v ASC NULLS LAST]]
   |                                                            |     
DataSourceExec: partitions=1, partition_sizes=[1]
   |                                                            |
   | physical_plan after EnforceSorting                         | SAME TEXT AS 
ABOVE
   
   
   # array_agg
   | physical_plan after CombinePartialFinalAggregate           | 
OutputRequirementExec: order_by=[], dist_by=Unspecified
   |                                                            |   
AggregateExec: mode=Single, gby=[], aggr=[array_agg(t.k) ORDER BY [t.v ASC 
NULLS LAST]]
   |                                                            |     
DataSourceExec: partitions=1, partition_sizes=[1]
   |                                                            |
   | physical_plan after EnforceSorting                         | 
OutputRequirementExec: order_by=[], dist_by=Unspecified
   |                                                            |   
AggregateExec: mode=Single, gby=[], aggr=[array_agg(t.k) ORDER BY [t.v ASC 
NULLS LAST]]
   |                                                            |     SortExec: 
expr=[v@1 ASC NULLS LAST], preserve_partitioning=[false]
   |                                                            |       
DataSourceExec: partitions=1, partition_sizes=[1]
   ```
   
   The fix is to simply implement `AggregateUDFImpl::reverse_expr` for 
`StringAgg`. @alamb @findepi what do you think of this approach? With it there 
is no need cherry pick a new feature into `49.0.1`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to