Hi, When I was writing an extension which need to get the median of an array, I tried to find if postgres provide some api that can do that. I found all the places in postgres invoke qsort() and then get the median. I was thinking can we do better by using "quick select" and is it worth it.
Currently, there are some places[1] in the code that need the median and can use "quick select" instead. And some of them(spg_box_quad_picksplit / spg_range_quad_picksplit) are invoked frequently when INSERT or CREATE INDEX. So, Peronally, It's acceptable to introduce a quick select api to improve these places. Since most of the logic of "quick select" is similar to quick sort, I think we can reuse the code in sort_template.h. We only need to let the sort stop when find the target top Kth element. Attach a POC patch about this idea. I did some simple performance tests, I can see about 10% performance gain in this test[2]. Thoughts ? [1] 1. entry_dealloc ... /* Record the (approximate) median usage */ if (i > 0) pgss->cur_median_usage = entries[i / 2]->counters.usage; 2. spg_box_quad_picksplit qsort(lowXs, in->nTuples, sizeof(float8), compareDoubles); ... centroid->low.x = lowXs[median]; 3. spg_range_quad_picksplit qsort(lowerBounds, nonEmptyCount, sizeof(RangeBound), ... centroid = range_serialize(typcache, &lowerBounds[median], 4. spg_quad_picksplit qsort(sorted, in->nTuples, sizeof(*sorted), x_cmp); ... centroid->x = sorted[median]->x; [2] drop table quad_box_tbl; CREATE unlogged TABLE quad_box_tbl (id int, b box); truncate quad_box_tbl ; explain (verbose, analyze)INSERT INTO quad_box_tbl SELECT (x - 1) * 10 + x, box(point(x * 10, x * 20), point(x * 10, x * 20 + 5)) FROM generate_series(1, 1000000) x order by random(); -----test create index drop index quad_box_tbl_idx; CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b); -------test results PATCH: Time: 2609.664 ms (00:02.610) HEAD: Time: 2903.765 ms (00:02.944) Best regards, Houzj
0001-test-use-quick-select-to-get-median.patch
Description: 0001-test-use-quick-select-to-get-median.patch