On 01/22/2014 03:39 AM, Tomas Vondra wrote:
What annoys me a bit is the huge size difference between the index
updated incrementally (by a sequence of INSERT commands), and the index
rebuilt from scratch using VACUUM FULL. It's a bit better with the patch
(2288 vs. 2035 MB), but is there a chance to improve this?
Hmm. What seems to be happening is that pending item list pages that the
fast update mechanism uses are not getting recycled. When enough list
pages are filled up, they are flushed into the main index and the list
pages are marked as deleted. But they are not recorded in the FSM, so
they won't be recycled until the index is vacuumed. Almost all of the
difference can be attributed to deleted pages left behind like that.
So this isn't actually related to the packed postinglists patch at all.
It just makes the bloat more obvious, because it makes the actual size
of the index size, excluding deleted pages, smaller. But it can be
observed on git master as well:
I created a simple test table and index like this:
create table foo (intarr int[]);
create index i_foo on foo using gin(intarr) with (fastupdate=on);
I filled the table like this:
insert into foo select array[-1] from generate_series(1, 10000000) g;
postgres=# \d+i
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+--------+--------+-------------
public | foo | table | heikki | 575 MB |
(1 row)
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------+-------+--------+-------+--------+-------------
public | i_foo | index | heikki | foo | 251 MB |
(1 row)
I wrote a little utility that scans all pages in a gin index, and prints
out the flags indicating what kind of a page it is. The distribution
looks like this:
19 DATA
7420 DATA LEAF
24701 DELETED
1 LEAF
1 META
I think we need to add the deleted pages to the FSM more aggressively.
I tried simply adding calls to RecordFreeIndexPage, after the list pages
have been marked as deleted, but unfortunately that didn't help. The
problem is that the FSM is organized into a three-level tree, and
RecordFreeIndexPage only updates the bottom level. The upper levels are
not updated until the FSM is vacuumed, so the pages are still not
visible to GetFreeIndexPage calls until next vacuum. The simplest fix
would be to add a call to IndexFreeSpaceMapVacuum after flushing the
pending list, per attached patch. I'm slightly worried about the
performance impact of the IndexFreeSpaceMapVacuum() call. It scans the
whole FSM of the index, which isn't exactly free. So perhaps we should
teach RecordFreeIndexPage to update the upper levels of the FSM in a
retail-fashion instead.
- Heikki
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***************
*** 21,26 ****
--- 21,27 ----
#include "access/gin_private.h"
#include "commands/vacuum.h"
#include "miscadmin.h"
+ #include "storage/indexfsm.h"
#include "utils/memutils.h"
#include "utils/rel.h"
***************
*** 434,440 **** ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
--- 435,453 ----
END_CRIT_SECTION();
if (needCleanup)
+ {
ginInsertCleanup(ginstate, false, NULL);
+
+ /*
+ * Vacuum the FSM to make the deleted list pages available for re-use.
+ *
+ * gininsertCleanup marked them as free in the FSM by calling
+ * RecordFreeIndexPage, but that only updates the bottom FSM level.
+ * Since GetFreeIndexPage scans from the top, they won't be visible
+ * for reuse until the upper levels have been updated.
+ */
+ IndexFreeSpaceMapVacuum(index);
+ }
}
/*
***************
*** 599,608 **** shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
}
}
for (i = 0; i < data.ndeleted; i++)
UnlockReleaseBuffer(buffers[i]);
!
! END_CRIT_SECTION();
} while (blknoToDelete != newHead);
return false;
--- 612,625 ----
}
}
+ END_CRIT_SECTION();
+
for (i = 0; i < data.ndeleted; i++)
+ {
+ BlockNumber blkno = BufferGetBlockNumber(buffers[i]);
UnlockReleaseBuffer(buffers[i]);
! RecordFreeIndexPage(index, blkno);
! }
} while (blknoToDelete != newHead);
return false;
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers