Re: [HACKERS] WIP: Fast GiST index build

Heikki Linnakangas Thu, 01 Sep 2011 02:00:22 -0700

On 30.08.2011 13:38, Alexander Korotkov wrote:

On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:


Thanks. Meanwhile, I hacked together my own set of test scripts, and let
them run over the weekend. I'm still running tests with ordered data, but
here are some preliminary results:

           testname           |   nrows   |    duration     | accesses
-----------------------------+**-----------+-----------------+**----------
  points unordered auto       | 250000000 | 08:08:39.174956 |  3757848
  points unordered buffered   | 250000000 | 09:29:16.47012  |  4049832
  points unordered unbuffered | 250000000 | 03:48:10.999861 |  4564986

As you can see, the results are very disappointing :-(. The buffered builds
take a lot *longer* than unbuffered ones. I was expecting the buffering to
be very helpful at least in these unordered tests. On the positive side, the
buffering made index quality somewhat better (accesses column, smaller is
better), but that's not what we're aiming at.

What's going on here? This data set was large enough to not fit in RAM, the
table was about 8.5 GB in size (and I think the index is even larger than
that), and the box has 4GB of RAM. Does the buffering only help with even
larger indexes that exceed the cache size even more?

This seems pretty strange for me. Time of unbuffered index build shows that
there is not bottleneck at IO. That radically differs from my
experiments. I'm going to try your test script on my test setup.
While I have only express assumption that random function appears to be
somewhat bad. Thereby unordered dataset behave like the ordered one. Can you
rerun tests on your test setup with dataset generation on the backend like
this?
CREATE TABLE points AS (SELECT point(random(), random() FROM
generate_series(1,10000000));


So I changed the test script to generate the table as:

CREATE TABLE points AS SELECT random() as x, random() as y FROMgenerate_series(1, $NROWS);


The unordered results are in:

          testname           |   nrows   |    duration     | accesses
-----------------------------+-----------+-----------------+----------
 points unordered buffered   | 250000000 | 05:56:58.575789 |  2241050
 points unordered auto       | 250000000 | 05:34:12.187479 |  2246420
 points unordered unbuffered | 250000000 | 04:38:48.663952 |  2244228

Although the buffered build doesn't lose as badly as it did with moreoverlap, it still doesn't look good :-(. Any ideas?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

Reply via email to