On Fri, Aug 19, 2011 at 2:51 PM, Gokulakannan Somasundaram <gokul...@gmail.com> wrote: > On Sat, Aug 20, 2011 at 2:25 AM, Heikki Linnakangas > <heikki.linnakan...@enterprisedb.com> wrote: >> >> On 19.08.2011 21:06, Gokulakannan Somasundaram wrote: >>> >>> If you are following the same design that Heikki put forward, then there >>> is >>> a problem with it in maintaining the bits in page and the bits in >>> visibility >>> map in sync, which we have already discussed. >> >> Are you referring to this: >> http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php ? I >> believe Robert's changes to make the visibility map crash-safe covers that. >> Clearing the bit in the visibility map now happens within the same critical >> section as clearing the flag on the heap page and writing th WAL record. >> > In that case, say a 100 sessions are trying to update records which fall > under the 8000*4 heap pages( i assume 2 bits per visibility map - 8 * 1024 * > 4 exact) covered by one page of visibility map,
There are about 8000 visibility map bytes per page, so about 64000 bits, each covering one page. So a visibility map page covers about 512MB of heap. > won't it make the 99 > sessions wait for that visibility map while holding the exclusive lock on > the 99 heap pages? Hmm, you have a point. If 100 backends simultaneously write to 100 different pages, and all of those pages are all-visible, then it's possible that they could end up fighting over the buffer content lock on the visibility map page. But why would you expect that to matter? In a heavily updated table, the proportion of visibility map bits that are set figures to be quite low, since they're only set during VACUUM. To have 100 backends simultaneously pick different pages to write each of which is all-visible seems really unlucky. Even if it does happen from time to time, I suspect the effects would be largely masked by WALInsertLock contention. The visibility map content lock is only taken very briefly, whereas the operations protected by WALInsertLock are much more complex. This does, however, remind me of two other points: 1. Heikki's idea of trying to set visibility map bits more aggressively is probably a good one, but it would be possible to overdo it, because setting visibility map bits is not free. It has an immediate cost - in that we have to write xlog - and a deferred cost - in that it will impose overhead when those pages are re-dirtied. At the moment, I think we're probably too far in the opposite direction - i.e. we leave the visibility map bits unset for too long, leading to a massive amount of deferred work that gets done all at once when VACUUM finally runs. But we shouldn't overcorrect. 2. While we're tinkering with the visibility map, we should think about whether it makes sense to carve out some more bits for such purposes as we may in the future require. Even if we allowed each heap page a byte in the visibility map instead of a single bit, the visibility map would still be roughly 1000 times smaller than the heap; and if there are any situations where the page-level locks become choke points, this would mitigate that effect. There might also be some advantage in that bytes can be atomically set, while bits can't, although I can't immediately think how we'd leverage that. Alternatively, we could widen the field to something less than a full byte, like 2 or 4 bits, if that seems better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers