The following bug has been logged online:

Bug reference:      3875
Logged by:          Sokolov Yura
Email address:      [EMAIL PROTECTED]
PostgreSQL version: 8.3RC1 8.2.x
Operating system:   Debian 4.0
Description:        Building GIN index on full table could be slower than
filling table with existing index.
Details: 

Here is pathological case:

    \timing
    drop table if exists test_gin;
    
    create table test_gin (
      id serial primary key,
      ar int4[]
    );
    
    create index test_gin_ix_ar on test_gin
    using gin
    ( ar );
    -- And here is pathological pattern
    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 20000) as i;
    
    drop index test_gin_ix_ar;
    
    create index test_gin_ix_ar on test_gin
    using gin
    ( ar );

Timing for this case:

    Timing is on.
    DROP TABLE
    Time: 19,421 ms
    CREATE TABLE
    Time: 13,531 ms
    CREATE INDEX
    Time: 3,352 ms
    INSERT 0 20000     !
    Time: 1064,444 ms  !!
    DROP INDEX
    Time: 30,925 ms
    CREATE INDEX       !
    Time: 28180,484 ms !!!!

So that, creating index on full table is 26.5x slower than filling table
with existing index.

After applying a patch below timings are:

    .....
    INSERT 0 20000
    Time: 1036,746 ms
    .....
    CREATE INDEX
    Time: 220,073 ms

For other tested pattern patched timing usually 
slightly better (5-10%) and in rare case slightly 
worse(5-10%). Other tested patterns are:

    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 100) as i,
         generate_series(1, 200) as j;
    
    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 200) as j,
         generate_series(1, 100) as i;
    
    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 200) as j,
         generate_series(1, 1000) as i;
    
    insert into test_gin (ar)
    select (select int_array_aggregate(k) from generate_series(i, i+200) as
k)
    from generate_series(1, 200) as j,
         generate_series(1, 100) as i;
    
Patch:

diff -pr postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c
postgresql-8.3RC1/src/backend/access/gin/ginbulk.c
*** postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c       2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/ginbulk.c  2008-01-14
16:00:48.000000000 +0300
*************** ginInitBA(BuildAccumulator *accum)
*** 28,33 ****
--- 28,34 ----
        accum->maxdepth = 1;
        accum->stackpos = 0;
        accum->entries = NULL;
+       accum->nentries = 0;
        accum->stack = NULL;
        accum->allocatedMemory = 0;
        accum->entryallocator = NULL;
*************** EAAllocate(BuildAccumulator *accum)
*** 44,49 ****
--- 45,51 ----
        }
  
        accum->length++;
+       accum->nentries++;
        return accum->entryallocator + accum->length - 1;
  }
  
diff -pr postgresql-8.3RC1old/src/backend/access/gin/gininsert.c
postgresql-8.3RC1/src/backend/access/gin/gininsert.c
*** postgresql-8.3RC1old/src/backend/access/gin/gininsert.c     2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/gininsert.c        2008-01-14
16:07:21.000000000 +0300
*************** ginBuildCallback(Relation index, HeapTup
*** 238,244 ****
        buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);
  
        /* If we've maxed out our available memory, dump everything to the index
*/
!       if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
        {
                ItemPointerData *list;
                Datum           entry;
--- 238,246 ----
        buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);
  
        /* If we've maxed out our available memory, dump everything to the index
*/
!       if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L
!           || (buildstate->accum.maxdepth > 16 && 
!               buildstate->accum.nentries < (1 << (buildstate->accum.maxdepth /
2))))
        {
                ItemPointerData *list;
                Datum           entry;
diff -pr postgresql-8.3RC1old/src/include/access/gin.h
postgresql-8.3RC1/src/include/access/gin.h
*** postgresql-8.3RC1old/src/include/access/gin.h       2008-01-01
22:45:56.000000000 +0300
--- postgresql-8.3RC1/src/include/access/gin.h  2008-01-14 15:59:51.000000000
+0300
*************** typedef struct
*** 450,455 ****
--- 450,456 ----
  {
        GinState   *ginstate;
        EntryAccumulator *entries;
+       uint32          nentries;
        uint32          maxdepth;
        EntryAccumulator **stack;
        uint32          stackpos;

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Reply via email to