On Thu, 2008-10-23 at 09:09 +0300, Heikki Linnakangas wrote: > However, we require that in b-tree vacuum, you take a cleanup lock on > *every* leaf page of the index, not only those that you modify. That's a > problem, because there's no trace of such pages in the WAL.
OK, good. Thanks for the second opinion. I'm glad you said that, cos I felt sure anybody reading the patch would say "what the hell does this bit do?". Now I can add it. My solution is fairly simple: As we pass through the table we keep track of which blocks need visiting, then append that information onto the next WAL record. If the last block doesn't contain removed rows, then we send a no-op message saying which blocks to visit. I'd already invented the XLOG_BTREE_VACUUM record, so now we just need to augment it further with two fields: ordered array of blocks to visit, and a doit flag. Say we have a 10 block table, with rows to be removed on blocks 3,4,8. As we visit all 10 in sequence we would issue WAL records: XLOG_BTREE_VACUUM block 3 visitFirst {1, 2} doit = true XLOG_BTREE_VACUUM block 4 visitFirst {} doit = true XLOG_BTREE_VACUUM block 8 visitFirst {5,6,7} doit = true XLOG_BTREE_VACUUM block 10 visitFirst {9} doit = false So that allows us to issue the same number of WAL messages yet include all the required information to repeat the process correctly. (The blocks can be visited out of sequence in some cases, hence the ordered array of blocks to visit rather than just a first block value). It would also be possible to introduce a special tweak there which is that if the block is not in cache, don't read it in at all. If its not in cache we know that nobody has a pin on it, so don't need to read it in just to say "got the lock". That icing for later. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers