I wrote: > In particular, now that there's a distinction between smgr flush > and relcache flush, maybe we could associate targblock reset with > smgr flush (only) and arrange to not flush the smgr level during > ANALYZE --- basically, smgr flush would only be needed when truncating > or reassigning the relfilenode. I think this might work out nicely but > haven't chased the details.
I looked into that a bit more and decided that it'd be a ticklish change: the coupling between relcache and smgr cache is pretty tight, and there just isn't any provision for having an smgr cache entry live longer than its owning relcache entry. Even if we could fix it to work reliably, this approach does nothing for the case where a backend actually exits after filling just part of a new page, as noted by Takahiro-san. The next most promising fix is to have RelationGetBufferForTuple tell the FSM about the new page immediately on creation. I made a draft patch for that (attached). It fixes Michael's scenario nicely --- all pages get filled completely --- and a simple test with pgbench didn't reveal any obvious change in performance. However there is clear *potential* for performance loss, due to both the extra FSM access and the potential for increased contention because of multiple backends piling into the same new page. So it would be good to do some real performance testing on insert-heavy scenarios before we consider applying this. Any volunteers? Note: patch is against HEAD but should work in 8.4, if you reverse out the use of the rd_targblock access macros. regards, tom lane
Index: src/backend/access/heap/hio.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/heap/hio.c,v retrieving revision 1.78 diff -c -r1.78 hio.c *** src/backend/access/heap/hio.c 9 Feb 2010 21:43:29 -0000 1.78 --- src/backend/access/heap/hio.c 31 May 2010 20:44:29 -0000 *************** *** 354,384 **** * is empty (this should never happen, but if it does we don't want to * risk wiping out valid data). */ page = BufferGetPage(buffer); if (!PageIsNew(page)) elog(ERROR, "page %u of relation \"%s\" should be empty but is not", ! BufferGetBlockNumber(buffer), ! RelationGetRelationName(relation)); PageInit(page, BufferGetPageSize(buffer), 0); ! if (len > PageGetHeapFreeSpace(page)) { /* We should not get here given the test at the top */ elog(PANIC, "tuple is too big: size %lu", (unsigned long) len); } /* * Remember the new page as our target for future insertions. - * - * XXX should we enter the new page into the free space map immediately, - * or just keep it for this backend's exclusive use in the short run - * (until VACUUM sees it)? Seems to depend on whether you expect the - * current backend to make more insertions or not, which is probably a - * good bet most of the time. So for now, don't add it to FSM yet. */ ! RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer)); return buffer; } --- 354,388 ---- * is empty (this should never happen, but if it does we don't want to * risk wiping out valid data). */ + targetBlock = BufferGetBlockNumber(buffer); page = BufferGetPage(buffer); if (!PageIsNew(page)) elog(ERROR, "page %u of relation \"%s\" should be empty but is not", ! targetBlock, RelationGetRelationName(relation)); PageInit(page, BufferGetPageSize(buffer), 0); ! pageFreeSpace = PageGetHeapFreeSpace(page); ! if (len > pageFreeSpace) { /* We should not get here given the test at the top */ elog(PANIC, "tuple is too big: size %lu", (unsigned long) len); } /* + * If using FSM, mark the page in FSM as having whatever amount of + * free space will be left after our insertion. This is needed so that + * the free space won't be forgotten about if this backend doesn't use + * it up before exiting or flushing the rel's relcache entry. + */ + if (use_fsm) + RecordPageWithFreeSpace(relation, targetBlock, pageFreeSpace - len); + + /* * Remember the new page as our target for future insertions. */ ! RelationSetTargetBlock(relation, targetBlock); return buffer; }
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers