Round about v.161 of src/storage/buffer/bufmgr.c, during the development of 8.0 a change was introduced to prevent VACUUM from changing the state of the Adaptive Replacement Cache buffer management strategy. At the time that change made lots of sense. Since then we have changed the buffer management strategy and this behaviour of VACUUM may no longer make as much sense as it did then.
VACUUM's current behaviour is to take blocks it has touched and place them on the head of the freelist, allowing them to be reused. This is a good strategy with clean blocks, but it is a poor strategy for dirty blocks. Once a dirty block has been placed on the freelist, the very next request for a free buffer will need to both write the block to disk *and* this will typically require a WAL flush to occur also. The WAL flushing behaviour has been described in detail on this thread: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00674.php though this proposal has nothing to do with FREEZEing rows. The effects of this behaviour are that when VACUUM is running alone it has to make more WAL flushes than it really needs to, so is slightly slower. That could be improved, but isn't my priority on this post. When VACUUM operates alongside a concurrent workload the other non-VACUUM backends become involved in cleaning the VACUUM's dirty blocks. This slows the non-VACUUM backends down and effectively favours the VACUUM rather than masking its effects, as we were trying to achieve. This behaviour noticeably increases normal transaction response time for extended periods, with noticeable WAL spikes as the WAL drive repeatedly fsyncs, much more than without the VACUUM workload. The proposal would be to stop VACUUM from putting its blocks onto the freelist if they are dirty. This then allows the bgwriter to write the VACUUM's dirty blocks, which avoids the increased response times due to WAL flushing. It also incidentally improves a lone VACUUM, since the bgwriter is able to help write out the dirty blocks. VACUUM pays the cost to test if they are dirty, but its minor anyway. The clock cycle buffer management strategy is less prone to cache spoiling behaviour than was the earlier LRU methods, fixed or adaptive. A simple solution does effectively smooth out the poor response times seen while a VACUUM is in progress. The in-line patch is a one-line change to the buffer manager code, and is one of a few versions experimented with. The additional line is a simple test to see whether the VACUUM'd block is dirty before deciding what to do with it. [A separate patch is available, if requested, identified as vacstrategy.v2.patch] Independent verification of test results is requested. Index: src/backend/storage/buffer/bufmgr.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.215 diff -c -r1.215 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 1 Feb 2007 19:10:27 -0000 1.215 --- src/backend/storage/buffer/bufmgr.c 26 Feb 2007 13:09:35 -0000 *************** *** 907,913 **** else { /* VACUUM accesses don't bump usage count, instead... */ ! if (buf->refcount == 0 && buf->usage_count == 0) immed_free_buffer = true; } } --- 907,914 ---- else { /* VACUUM accesses don't bump usage count, instead... */ ! if (buf->refcount == 0 && buf->usage_count == 0 && ! !(buf->flags & BM_DIRTY)) immed_free_buffer = true; } } -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly