Hi,

When I was writing an extension which need to get the median of an array, I
tried to find if postgres provide some api that can do that. I found all the
places in postgres invoke qsort() and then get the median. I was thinking can
we do better by using "quick select" and is it worth it.

Currently, there are some places[1] in the code that need the median and can
use "quick select" instead. And some of them(spg_box_quad_picksplit /
spg_range_quad_picksplit) are invoked frequently when INSERT or CREATE INDEX.
So, Peronally, It's acceptable to introduce a quick select api to improve these
places.

Since most of the logic of "quick select" is similar to quick sort, I think
we can reuse the code in sort_template.h. We only need to let the sort stop
when find the target top Kth element.

Attach a POC patch about this idea. I did some simple performance tests, I can
see about 10% performance gain in this test[2].

Thoughts ?

[1]
1.
entry_dealloc
        ...
        /* Record the (approximate) median usage */
        if (i > 0)
                pgss->cur_median_usage = entries[i / 2]->counters.usage;
2.
spg_box_quad_picksplit
        qsort(lowXs, in->nTuples, sizeof(float8), compareDoubles);
        ...
        centroid->low.x = lowXs[median];

3.
spg_range_quad_picksplit
        qsort(lowerBounds, nonEmptyCount, sizeof(RangeBound),
        ...
        centroid = range_serialize(typcache, &lowerBounds[median],

4.
spg_quad_picksplit
        qsort(sorted, in->nTuples, sizeof(*sorted), x_cmp);
        ...
        centroid->x = sorted[median]->x;



[2]
drop table quad_box_tbl;
CREATE unlogged TABLE quad_box_tbl (id int, b box);
truncate quad_box_tbl ;
explain (verbose, analyze)INSERT INTO quad_box_tbl
  SELECT (x - 1) * 10 + x, box(point(x * 10, x * 20), point(x * 10, x * 20 + 5))
  FROM generate_series(1, 1000000) x order by random();

-----test create index
drop index quad_box_tbl_idx;
CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);

-------test results
PATCH:
Time: 2609.664 ms (00:02.610)

HEAD:
Time: 2903.765 ms (00:02.944)

Best regards,
Houzj

Attachment: 0001-test-use-quick-select-to-get-median.patch
Description: 0001-test-use-quick-select-to-get-median.patch

Reply via email to