vegarsti commented on issue #15231: URL: https://github.com/apache/datafusion/issues/15231#issuecomment-3136047369
Plan output has changed a bit since these were posted. I get this when running the query above with `cargo run --bin datafusion-cli`. To confirm, is what we want for both the inner and the outer `AggregateExec` to have `ordering_mode=Sorted`? ``` > EXPLAIN WITH unnested AS (SELECT ROW_NUMBER() OVER () AS generated_id, unnest(array[value]) as ar FROM range(1,5)) SELECT array_agg(ar) FROM unnested group by generated_id; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ ProjectionExec │ | | | │ -------------------- │ | | | │ array_agg(unnested.ar): │ | | | │ array_agg(unnested.ar) │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ aggr: │ | | | │ array_agg(unnested.ar) │ | | | │ │ | | | │ group_by: │ | | | │ generated_id │ | | | │ │ | | | │ mode: │ | | | │ FinalPartitioned │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ RepartitionExec │ | | | │ -------------------- │ | | | │ partition_count(in->out): │ | | | │ 11 -> 11 │ | | | │ │ | | | │ partitioning_scheme: │ | | | │ Hash([generated_id@0], 11)│ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ aggr: │ | | | │ array_agg(unnested.ar) │ | | | │ │ | | | │ group_by: │ | | | │ generated_id │ | | | │ │ | | | │ mode: Partial │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ ProjectionExec │ | | | │ -------------------- │ | | | │ ar: │ | | | │ __unnest_placeholder │ | | | │ (make_array(range() │ | | | │ .value),depth=1) │ | | | │ │ | | | │ generated_id: │ | | | │ generated_id │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ UnnestExec │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ ProjectionExec │ | | | │ -------------------- │ | | | │ __unnest_placeholder │ | | | │ (make_array(range() │ | | | │ .value)): │ | | | │ make_array(value) │ | | | │ │ | | | │ generated_id: │ | | | │ row_number() ROWS BETWEEN │ | | | │ UNBOUNDED PRECEDING AND │ | | | │ UNBOUNDED FOLLOWING │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ RepartitionExec │ | | | │ -------------------- │ | | | │ partition_count(in->out): │ | | | │ 1 -> 11 │ | | | │ │ | | | │ partitioning_scheme: │ | | | │ RoundRobinBatch(11) │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ BoundedWindowAggExec │ | | | │ -------------------- │ | | | │ mode: Sorted │ | | | │ │ | | | │ select_list: │ | | | │ row_number() ROWS BETWEEN │ | | | │ UNBOUNDED PRECEDING AND │ | | | │ UNBOUNDED FOLLOWING │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ LazyMemoryExec │ | | | │ -------------------- │ | | | │ batch_generators: │ | | | │ range: start=1, end=5, │ | | | │ batch_size=8192 │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ 1 row(s) fetched. Elapsed 0.025 seconds. ``` FWIW the other example from @asubiotto which had `ordering_mode=Sorted` in both `AggregateExec`s at that time, no longer do: ``` > EXPLAIN WITH unnested AS (SELECT ROW_NUMBER() OVER () AS generated_id, value as ar FROM range(1,5)) SELECT generated_id, sum(ar) FROM unnested group by generated_id; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ aggr: │ | | | │ sum(unnested.ar) │ | | | │ │ | | | │ group_by: │ | | | │ generated_id │ | | | │ │ | | | │ mode: │ | | | │ FinalPartitioned │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ SortExec │ | | | │ -------------------- │ | | | │ generated_id@0 ASC NULLS │ | | | │ LAST │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ RepartitionExec │ | | | │ -------------------- │ | | | │ partition_count(in->out): │ | | | │ 11 -> 11 │ | | | │ │ | | | │ partitioning_scheme: │ | | | │ Hash([generated_id@0], 11)│ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ aggr: │ | | | │ sum(unnested.ar) │ | | | │ │ | | | │ group_by: │ | | | │ generated_id │ | | | │ │ | | | │ mode: Partial │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ RepartitionExec │ | | | │ -------------------- │ | | | │ partition_count(in->out): │ | | | │ 1 -> 11 │ | | | │ │ | | | │ partitioning_scheme: │ | | | │ RoundRobinBatch(11) │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ ProjectionExec │ | | | │ -------------------- │ | | | │ ar: value │ | | | │ │ | | | │ generated_id: │ | | | │ row_number() ROWS BETWEEN │ | | | │ UNBOUNDED PRECEDING AND │ | | | │ UNBOUNDED FOLLOWING │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ BoundedWindowAggExec │ | | | │ -------------------- │ | | | │ mode: Sorted │ | | | │ │ | | | │ select_list: │ | | | │ row_number() ROWS BETWEEN │ | | | │ UNBOUNDED PRECEDING AND │ | | | │ UNBOUNDED FOLLOWING │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ LazyMemoryExec │ | | | │ -------------------- │ | | | │ batch_generators: │ | | | │ range: start=1, end=5, │ | | | │ batch_size=8192 │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ 1 row(s) fetched. Elapsed 0.022 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org