Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra Sat, 28 Mar 2020 20:15:00 -0700

On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote:

On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra
<[email protected]> wrote:


Hi,

Attached is my take on simplification of the useful pathkeyes thing. It
keeps the function, but it truncates query_pathkeys to only members with
EC members in the relation. I think that's essentially the optimization
you've proposed.


Thanks. I've included that in the patch series in this email (as a
separate patch) with a few additional comments.


Thanks.

I've also noticed that the enabled_incrementalsort check in
generate_useful_gather_paths seemed broken, because it returned us out
of the function before creating either a plain gather merge (if
already sorted) or an explicit sort path. I've included a patch that
moves it to the if block that actually builds the incremental sort
path.


Hmmm, that's probably right.

The reason why the GUC check was right after generate_gather_paths is
that the intent to disable all the useful-pathkeys business, essentially
reducing it back to plain generate_gather_paths.

But I think you're right that's wrong, because it might lead to strange
behavior when the GUC switches between plans without any incremental
sort nodes - setting it to 'true' might end up picking GM on top of
plain Sort, for example.


...

1) Missing worker identification (Worker #).


Fixed.

2) Missing method for workers (we have it for the leader, though).


Fixed. Since we can't have pointers in the parallel shared memory
space, we can't store the sort methods used in a list. To accomplish
the same goal, I've assigned the TuplesortMethod enum entries uique
bit positions, and store the methods used in a bitmask.


OK, makes sense.

3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
why it's in parenthesis.


I've removed the parentheses. It's labeled "Methods" since there can
be more than one (different batches could use different methods). I've
updated this to properly use singular/plural depending on the number
of methods used.


I'm a bit confused. How do I know which batch used which method? Is it
actually worth printing in explain analyze? Maybe only print it in the
verbose mode?

4) Not sure having two lines for each worker is a great idea.


I've left these in for now because the lines are already very long
(much, much longer than the worker lines in a standard sort node).
This is largely because we're trying to summarize many sort batches,
while standard sort nodes only have to give the exact stats from a
single batch.

See the example output later in the email.

OK

5) I'd probably prefer having multiple labels for avg/max memory values,
instead of (avg) and (max) notes. Also, I think we use "peak" in this
context instead of "max".


Updated.

OK

Here's the current output:

Limit  (cost=1887419.20..1889547.68 rows=10000 width=8) (actual
time=13218.403..13222.519 rows=10000 loo
ps=1)
  ->  Gather Merge  (cost=1887419.20..19624748.03 rows=83333360
width=8) (actual time=13218.401..13229.7
50 rows=10000 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Incremental Sort  (cost=1886419.17..10005010.55
rows=41666680 width=8) (actual time=13208.00
4..13208.586 rows=4425 loops=3)
              Sort Key: a, b
              Presorted Key: a
              Full-sort Groups: 1 Sort Method: quicksort Memory:
avg=28kB peak=28kB
              Presorted Groups: 1 Sort Method: top-N heapsort Memory:
avg=1681kB peak=1681kB
              Worker 0:  Full-sort Groups: 1 Sort Method: quicksort
Memory: avg=28kB peak=28kB
                Presorted Groups: 1 Sort Method: top-N heapsort
Memory: avg=1680kB peak=1680kB
              Worker 1:  Full-sort Groups: 1 Sort Method: quicksort
Memory: avg=28kB peak=28kB
                Presorted Groups: 1 Sort Method: top-N heapsort
Memory: avg=1682kB peak=1682kB
              ->  Parallel Index Scan using index_s_a on s
(cost=0.57..4967182.06 rows=41666680 width=8
) (actual time=0.455..11730.878 rows=6666668 loops=3)

James


Looks reasonable. Did you try it in other output formats - json/yaml?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Reply via email to