On Mon, Sep 29, 2014 at 11:48 AM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote:
> On 09/15/2014 06:28 PM, Alexander Korotkov wrote: > >> Hackers, >> >> some GIN opclasses uses collation-aware comparisons while they don't need >> to do especially collation-aware comparison. Examples are text[] and >> hstore >> opclasses. >> > > Hmm. It would be nice to use the index for inequality searches, at least > on text[]. We don't support that currently, but it would require > collation-awareness. > > Depending on collation this may make them a much slower. >> >> See example. >> >> # show lc_collate ; >> lc_collate >> ───────────── >> ru_RU.UTF-8 >> (1 row) >> >> # create table test as (select array_agg(i::text) from >> generate_series(1,1000000) i group by (i-1)/10); >> SELECT 100000 >> >> # create index test_idx on test using gin(array_agg); >> CREATE INDEX >> Time: *26930,423 ms* >> >> # create index test_idx2 on test using gin(array_agg collate "C"); >> CREATE INDEX >> Time: *5143,682 ms* >> >> Index creation with collation "ru_RU.UTF-8" is 5 times slower while >> collation has absolutely no effect on index functionality. >> > > It occurs to me that practically all of those comparisons happen when we > populate the red-black Tree, during the index build. The purpose of the > red-black tree is to collect identical keys together, but there is actually > no requirement that the order of the red-black tree matches the order of > the index. It also isn't strictly required that it recognizes equal keys as > equal. The only requirement is that it doesn't incorrectly put two keys > that are equal according to the compare-function, into two different nodes. > > Good point, Heikki. I experienced several times this problem, fixed it with C-locale and forgot again. Now, it's time to fix ! > We could therefore use plain memcmp() to compare the Datums while building > the red-black tree. Keys that are bit-wise equal are surely considered as > equal by the compare-function. That makes the index build a lot faster. > With the attached quick patch: > > postgres=# create index test_idx on test using gin(array_agg ); > CREATE INDEX > Time: 880.620 ms > > This is on my laptop. Without the patch, that takes about 4.7 seconds with > the C locale, so this is much faster than even using the C locale. > Hmm, on my MBA I got 17277.734 (patch) vs 39151.562 for ru_RU.UTF-8 and 6131.929 (patch) vs 6131.929 for C Not much :( > > - Heikki > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >