Re: [PATCH] Incremental sort (was: PoC: Partial sort)

James Coleman Sat, 20 Jul 2019 09:13:17 -0700

On Sat, Jul 20, 2019 at 11:25 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
...
> >My current line of investigation is whether we need to do anything in
> >the parallel portion of create_ordered_paths(). I noticed that the
> >first-pass patch adding generate_useful_gather_paths() modified that
> >section but wasn't actually adding any new gather-merge paths (just
> >bare incremental sort paths). That seems pretty clearly just a
> >prototype miss, so I modified the prototype to build gather-merge
> >paths instead (as a side note that change seems to fix an oddity I was
> >seeing where plans would include a parallel index scan node even
> >though they weren't parallel plans). While the resulting plan for
> >something like:
> >
>
> Yes, that seems to be a bug. The block above it clealy has a gather
> merge nodes, so this one should too.
>
> >explain analyze select * from t where t.a in (1,2,3,4,5,6) order by
> >t.a, t.b limit 50;
> >
> >changes cost (to be cheaper) ever so slightly with the gather-merge
> >addition to create_ordered_paths(), the plan itself is otherwise
> >identical (including row estimates):
> >
> >Limit
> >  -> Gather Merge
> >       -> Incremental Sort
> >          -> Parallel Index Scan
> >
> >(Note: I'm forcing parallel plans here with: set
> >max_parallel_workers_per_gather=4; set min_parallel_table_scan_size=0;
> >set parallel_tuple_cost=0; set parallel_setup_cost=0; set
> >min_parallel_index_scan_size=0;)
> >
> >I can't seem to come up with a case where adding these gather-merge
> >paths in create_ordered_paths() isn't entirely duplicative of paths
> >already created by generate_useful_gather_paths() as called from
> >apply_scanjoin_target_to_paths() -- which I _think_ makes sense given
> >that both apply_scanjoin_target_to_paths() and create_ordered_paths()
> >are called by grouping_planner().
> >
> >Can you think of a case I'm missing here that would make it valuable
> >to generate new parallel plans in create_ordered_paths()?
> >
>
> Good question. Not sure. I think such path would need to do something on
> a relation that is neither a join nor a scan - in which case the path
> should not be created by apply_scanjoin_target_to_paths().
>
> So for example a query like this:
>
>   SELECT
>       a, b, sum(expensive_function(c))
>   FROM
>       t
>   GROUP BY a,b
>   ORDER BY a,sum(...) LIMIT 10;
>
> should be able to produce a plan like this:
>
>   -> Limit
>       -> Gather Merge
>         -> Incremental Sort (pathkeys: a, sum)
>           -> Group Aggregate
>              a, b, sum(expensive_function(c))
>             -> Index Scan (pathkeys: a, b)
>
> or something like that, maybe. I haven't tried this, though. The other
> question is whether such queries are useful in practice ...


Hmm, when I step through on that example input_rel->partial_pathlist
!= NIL is false, so we don't even attempt to consider any extra
parallel paths in create_ordered_paths(). Nonetheless we get a
parallel plan, but with a different shape:

 Limit
   ->  Incremental Sort
         Sort Key: a, (sum(expensive_function(c)))
         Presorted Key: a
         ->  Finalize GroupAggregate
               Group Key: a, b
               ->  Gather Merge
                     Workers Planned: 4
                     ->  Partial GroupAggregate
                           Group Key: a, b
                           ->  Sort
                                 Sort Key: a, b
                                 ->  Parallel Seq Scan on t

(or if I disable seqscan then the sort+seq scan above becomes inc sort
+ index scan)

To be honest, I don't think I understand how you would get a plan like
that anyway since the index here only has "a" and so can't provide
(pathkeys: a, b).

Perhaps there's something I'm still missing though.

James Coleman

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Reply via email to