After further testing, here is v2 where the issue that rightlink can be set
when an index page is already flushed is fixed.

On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4...@yandex-team.ru> wrote:

>
> Hi Aliaksandr!
>
> Thanks for working on this!
>
> > Benchmark summary:
> >
> > create index roads_rdr_idx on roads_rdr using gist (geom);
> >
> > with sort support before patch / CREATE INDEX 76.709 ms
> >
> > with sort support after patch / CREATE INDEX 225.238 ms
> >
> > without sort support / CREATE INDEX 446.071 ms
> >
> > select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom;
> >
> > with sort support before patch / SELECT 5766.526 ms
> >
> > with sort support after patch / SELECT 2646.554 ms
> >
> > without sort support / SELECT 2721.718 ms
> >
> > index size
> >
> > with sort support before patch / IDXSIZE 2940928 bytes
> >
> > with sort support after patch / IDXSIZE 4956160 bytes
> >
> > without sort support / IDXSIZE 5447680 bytes
>
> The numbers are impressive, newly build index is actually performing
> better!
> I've conducted some tests over points, not PostGIS geometry. For points
> build is 2x slower now, but this is the cost of faster scans.
>
> Some strong points of this index building technology.
> The proposed algorithm is not randomly chosen as anything that performs
> better than the original sorting build. We actually researched every idea
> we knew from literature and intuition. Although we consciously limited the
> search area by existing GiST API.
> Stuff like penalty-based choose-subtree and split in equal halves
> seriously limit possible solutions. If anyone knows an any better way to
> build GiST faster or with better scan performance - please let us know.
> The proposed algorithm contains the current algorithm as a special case:
> there is a parameter - the number of buffers accumulated before calling
> Split. If this parameter is 1 proposed algorithm will produce exactly the
> same output.
>
> At this stage, we would like to hear some feedback from Postgres and
> PostGIS community. What other performance aspects should we test?
>
> Current patch implementation has some known deficiencies:
> 1. We need a GUC instead of the hard-coded buffer of 8 pages.
> 2. Is GiST sorting build still deterministic? If not - we should add a
> fixed random seed into pageinspect tests.
> 3. It would make sense to check the resulting indexes with amcheck [0],
> although it's not committed.
> 4. We cannot make an exact fillfactor due to the behavior of picksplit.
> But can we improve anything here? I think if not - it's still OK.
> 5. GistSortedBuildPageState is no more page state. It's Level state or
> something like that.
> 6. The patch desperately needs comments.
>
>
> Thanks!
>
> Best regards, Andrey Borodin.
>
> [0]
> https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru
>

Attachment: 02_reduce_page_overlap_of_gist_indexes_built_using_sorted_method.patch
Description: Binary data

Reply via email to