asubiotto commented on issue #14991: URL: https://github.com/apache/datafusion/issues/14991#issuecomment-2720197770
Nowhere. What I mean is that the `AggregateExec` **should** have
`ordering_mode=Sorted` as you share in your previous comment but it **does
not** (and I would expect it to). Apologies if this is off-topic for the issue
but since you were looking for examples where the sorted properties of the
group columns are somehow lost I thought it would be good to share a practical
example (I was searching for ordered aggregation support and came across this
issue).
Specifically `generated_id` is in sorted order and since it is the only
group column for the aggregation I would expect `ordering_mode=Sorted` in the
`AggregateExec`. For example, if I change the values/agg function and remove
the unnest, this ordering is preserved:
```
EXPLAIN WITH unnested AS (SELECT
ROW_NUMBER() OVER () AS generated_id,
value as ar
FROM range(1,5)) SELECT generated_id, sum(ar) FROM unnested group by
generated_id;
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Aggregate: groupBy=[[unnested.generated_id]],
aggr=[[sum(unnested.ar)]]
|
| | SubqueryAlias: unnested
|
| | Projection: row_number() ROWS BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING AS generated_id, tmp_table.value AS ar
|
| | WindowAggr: windowExpr=[[row_number() ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
|
| | TableScan: tmp_table projection=[value]
|
| physical_plan | AggregateExec: mode=FinalPartitioned, gby=[generated_id@0
as generated_id], aggr=[sum(unnested.ar)], ordering_mode=Sorted
|
| | SortExec: expr=[generated_id@0 ASC NULLS LAST],
preserve_partitioning=[true]
|
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([generated_id@0],
12), input_partitions=12
|
| | AggregateExec: mode=Partial, gby=[generated_id@0
as generated_id], aggr=[sum(unnested.ar)], ordering_mode=Sorted
|
| | RepartitionExec:
partitioning=RoundRobinBatch(12), input_partitions=1
|
| | ProjectionExec: expr=[row_number() ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@1 as generated_id, value@0
as ar]
|
| | BoundedWindowAggExec: wdw=[row_number() ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Ok(Field { name:
"row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING",
data_type: UInt64, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }), frame: WindowFrame { units: Rows, start_bound:
Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)), is_causal: false
}], mode=[Sorted] |
| | LazyMemoryExec: partitions=1,
batch_generators=[range: start=1, end=5, batch_size=8192]
|
| |
|
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.009 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
