On Thu, Mar 24, 2016 at 7:17 AM, Dilip Kumar <dilipbal...@gmail.com> wrote: >> Yet another possibility could be to call it as >> GetPageWithFreeSpaceExtended and call it from GetPageWithFreeSpace with >> value of oldPage as InvalidBlockNumber. > > Yes I like this.. Changed the same.
After thinking about this some more, I don't think this is the right approach. I finally understand what's going on here: RecordPageWithFreeSpace updates the FSM lazily, only adjusting the leaves and not the upper levels. It relies on VACUUM to update the upper levels. This seems like it might be a bad policy in general, because VACUUM on a very large relation may be quite infrequent, and you could lose track of a lot of space for a long time, leading to a lot of extra bloat. However, it's a particularly bad policy for bulk relation extension, because you're stuffing a large number of totally free pages in there in a way that doesn't make them particularly easy for anybody else to discover. There are two ways we can fail here: 1. Callers who use GetPageWithFreeSpace() rather than GetPageFreeSpaceExtended() will fail to find the new pages if the upper map levels haven't been updated by VACUUM. 2. Even callers who use GetPageFreeSpaceExtended() may fail to find the new pages. This can happen in two separate ways, namely (a) the lastValidBlock saved by RelationGetBufferForTuple() can be in the middle of the relation someplace rather than near the end, or (b) the bulk-extension performed by some other backend can have overflowed onto some new FSM page that won't be searched even though a relatively plausible lastValidBlock was passed. It seems to me that since we're adding a whole bunch of empty pages at once, it's worth the effort to update the upper levels of the FSM. This isn't a case of discovering a single page with an extra few bytes of storage available due to a HOT prune or something - this is a case of putting at least 20 and plausibly hundreds of extra pages into the FSM. The extra effort to update the upper FSM pages is trivial by comparison with the cost of extending the relation by many blocks. So, I suggest adding a new function FreeSpaceMapBulkExtend(BlockNumber first_block, BlockNumber last_block) which sets all the FSM entries for pages between first_block and last_block to 255 and then bubbles that up to the higher levels of the tree and all the way to the root. Have the bulk extend code use that instead of repeatedly calling RecordPageWithFreeSpace. That should actually be much more efficient, because it can call fsm_readbuf(), LockBuffer(), and UnlockReleaseBuffer() just once per FSM page instead of once per FSM page *per byte modified*. Maybe that makes no difference in practice, but it can't hurt. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers