jayzhan211 commented on issue #15818:
URL: https://github.com/apache/datafusion/issues/15818#issuecomment-2823740436

   > There's no `SortExec` in the query plan. Maybe it was removed by the 
optimizer? In this query plan, the `agg.child` is not sorted by the GROUP BY 
key.
   > 
   > > 
   > 
   > ```
   >  query TT
   > explain select sum(a) from (select a from t order by a) group by a;
   > ----
   > logical_plan
   > 01)Projection: sum(t.a)
   > 02)--Aggregate: groupBy=[[t.a]], aggr=[[sum(CAST(t.a AS Int64))]]
   > 03)----TableScan: t projection=[a]
   > physical_plan
   > 01)ProjectionExec: expr=[sum(t.a)@1 as sum(t.a)]
   > 02)--AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[sum(t.a)]
   > 03)----CoalesceBatchesExec: target_batch_size=8192
   > 04)------RepartitionExec: partitioning=Hash([a@0], 4), input_partitions=4
   > 05)--------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   > 06)----------AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[sum(t.a)]
   > 07)------------DataSourceExec: partitions=1, partition_sizes=[1]
   > ```
   > 
   > After adding `limit` to inner table, we get Ordered Group aggregation (` 
ordering_mode=Sorted )`
   > 
   > ```
   > > explain format indent  select sum(value) from (select value from 
generate_series(10000) order by value limit 10) group by value;
   > 
+---------------+-----------------------------------------------------------------------------------------------------------------------+
   > | plan_type     | plan                                                     
                                                             |
   > 
+---------------+-----------------------------------------------------------------------------------------------------------------------+
   > | logical_plan  | Projection: sum(tmp_table.value)                         
                                                             |
   > |               |   Aggregate: groupBy=[[tmp_table.value]], 
aggr=[[sum(tmp_table.value)]]                                               |
   > |               |     Sort: tmp_table.value ASC NULLS LAST, fetch=10       
                                                             |
   > |               |       TableScan: tmp_table projection=[value]            
                                                             |
   > | physical_plan | ProjectionExec: expr=[sum(tmp_table.value)@1 as 
sum(tmp_table.value)]                                                 |
   > |               |   AggregateExec: mode=FinalPartitioned, gby=[value@0 as 
value], aggr=[sum(tmp_table.value)], ordering_mode=Sorted     |
   > |               |     SortExec: expr=[value@0 ASC NULLS LAST], 
preserve_partitioning=[true]                                             |
   > |               |       CoalesceBatchesExec: target_batch_size=8192        
                                                             |
   > |               |         RepartitionExec: partitioning=Hash([value@0], 
24), input_partitions=24                                        |
   > |               |           AggregateExec: mode=Partial, gby=[value@0 as 
value], aggr=[sum(tmp_table.value)], ordering_mode=Sorted      |
   > |               |             RepartitionExec: 
partitioning=RoundRobinBatch(24), input_partitions=1                            
         |
   > |               |               SortExec: TopK(fetch=10), expr=[value@0 
ASC NULLS LAST], preserve_partitioning=[false]                  |
   > |               |                 LazyMemoryExec: partitions=1, 
batch_generators=[generate_series: start=0, end=10000, batch_size=8192] |
   > |               |                                                          
                                                             |
   > 
+---------------+-----------------------------------------------------------------------------------------------------------------------+
   > 2 row(s) fetched. 
   > Elapsed 0.002 seconds.
   > ```
   
   Yes, I think adding `sort` is one of the issue to be fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to