Hi,

here are some benchmark results for GiST patch: https://gist.github.com/mngr777/88ae200c9c30ba5656583d92e8d2cf9e

Code used for benchmarking: https://github.com/mngr777/pg_index_bm2,
see README for the list of test queries.

Results show query performance close to indexes built with no sortsupport function, with index creation time reduced approx. by half.

On 1/8/22 10:20 PM, Aliaksandr Kalenik wrote:
After further testing, here is v2 where the issue that rightlink can be set when an index page is already flushed is fixed.

On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4...@yandex-team.ru <mailto:x4...@yandex-team.ru>> wrote:


    Hi Aliaksandr!

    Thanks for working on this!

     > Benchmark summary:
     >
     > create index roads_rdr_idx on roads_rdr using gist (geom);
     >
     > with sort support before patch / CREATE INDEX 76.709 ms
     >
     > with sort support after patch / CREATE INDEX 225.238 ms
     >
     > without sort support / CREATE INDEX 446.071 ms
     >
     > select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom;
     >
     > with sort support before patch / SELECT 5766.526 ms
     >
     > with sort support after patch / SELECT 2646.554 ms
     >
     > without sort support / SELECT 2721.718 ms
     >
     > index size
     >
     > with sort support before patch / IDXSIZE 2940928 bytes
     >
     > with sort support after patch / IDXSIZE 4956160 bytes
     >
     > without sort support / IDXSIZE 5447680 bytes

    The numbers are impressive, newly build index is actually performing
    better!
    I've conducted some tests over points, not PostGIS geometry. For
    points build is 2x slower now, but this is the cost of faster scans.

    Some strong points of this index building technology.
    The proposed algorithm is not randomly chosen as anything that
    performs better than the original sorting build. We actually
    researched every idea we knew from literature and intuition.
    Although we consciously limited the search area by existing GiST API.
    Stuff like penalty-based choose-subtree and split in equal halves
    seriously limit possible solutions. If anyone knows an any better
    way to build GiST faster or with better scan performance - please
    let us know.
    The proposed algorithm contains the current algorithm as a special
    case: there is a parameter - the number of buffers accumulated
    before calling Split. If this parameter is 1 proposed algorithm will
    produce exactly the same output.

    At this stage, we would like to hear some feedback from Postgres and
    PostGIS community. What other performance aspects should we test?

    Current patch implementation has some known deficiencies:
    1. We need a GUC instead of the hard-coded buffer of 8 pages.
    2. Is GiST sorting build still deterministic? If not - we should add
    a fixed random seed into pageinspect tests.
    3. It would make sense to check the resulting indexes with amcheck
    [0], although it's not committed.
    4. We cannot make an exact fillfactor due to the behavior of
    picksplit. But can we improve anything here? I think if not - it's
    still OK.
    5. GistSortedBuildPageState is no more page state. It's Level state
    or something like that.
    6. The patch desperately needs comments.


    Thanks!

    Best regards, Andrey Borodin.

    [0]
    
https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru
    
<https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru>



Reply via email to