After further testing, here is v2 where the issue that rightlink can be set when an index page is already flushed is fixed.
On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4...@yandex-team.ru> wrote: > > Hi Aliaksandr! > > Thanks for working on this! > > > Benchmark summary: > > > > create index roads_rdr_idx on roads_rdr using gist (geom); > > > > with sort support before patch / CREATE INDEX 76.709 ms > > > > with sort support after patch / CREATE INDEX 225.238 ms > > > > without sort support / CREATE INDEX 446.071 ms > > > > select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom; > > > > with sort support before patch / SELECT 5766.526 ms > > > > with sort support after patch / SELECT 2646.554 ms > > > > without sort support / SELECT 2721.718 ms > > > > index size > > > > with sort support before patch / IDXSIZE 2940928 bytes > > > > with sort support after patch / IDXSIZE 4956160 bytes > > > > without sort support / IDXSIZE 5447680 bytes > > The numbers are impressive, newly build index is actually performing > better! > I've conducted some tests over points, not PostGIS geometry. For points > build is 2x slower now, but this is the cost of faster scans. > > Some strong points of this index building technology. > The proposed algorithm is not randomly chosen as anything that performs > better than the original sorting build. We actually researched every idea > we knew from literature and intuition. Although we consciously limited the > search area by existing GiST API. > Stuff like penalty-based choose-subtree and split in equal halves > seriously limit possible solutions. If anyone knows an any better way to > build GiST faster or with better scan performance - please let us know. > The proposed algorithm contains the current algorithm as a special case: > there is a parameter - the number of buffers accumulated before calling > Split. If this parameter is 1 proposed algorithm will produce exactly the > same output. > > At this stage, we would like to hear some feedback from Postgres and > PostGIS community. What other performance aspects should we test? > > Current patch implementation has some known deficiencies: > 1. We need a GUC instead of the hard-coded buffer of 8 pages. > 2. Is GiST sorting build still deterministic? If not - we should add a > fixed random seed into pageinspect tests. > 3. It would make sense to check the resulting indexes with amcheck [0], > although it's not committed. > 4. We cannot make an exact fillfactor due to the behavior of picksplit. > But can we improve anything here? I think if not - it's still OK. > 5. GistSortedBuildPageState is no more page state. It's Level state or > something like that. > 6. The patch desperately needs comments. > > > Thanks! > > Best regards, Andrey Borodin. > > [0] > https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru >
02_reduce_page_overlap_of_gist_indexes_built_using_sorted_method.patch
Description: Binary data