On Tue, Apr 7, 2015 at 11:22 AM, Sawada Masahiko <sawada.m...@gmail.com> wrote: > On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby <jim.na...@bluetreble.com> wrote: >> On 4/6/15 5:18 PM, Greg Stark wrote: >>> >>> Only I would suggest thinking of it in terms of two orthogonal boolean >>> flags rather than three states. It's easier to reason about whether a >>> table has a specific property than trying to control a state machine in >>> a predefined pathway. >>> >>> So I would say the two flags are: >>> READONLY: guarantees nothing can be dirtied >>> ALLFROZEN: guarantees no unfrozen tuples are present >>> >>> In practice you can't have the later without the former since vacuum >>> can't know everything is frozen unless it knows nobody is inserting. But >>> perhaps there will be cases in the future where that's not true. >> >> >> I'm not so sure about that. There's a logical state progression here (see >> below). ISTM it's easier to just enforce that in one place instead of a >> bunch of places having to check multiple conditions. But, I'm not wed to a >> single field. >> >>> Incidentally there are number of other optimisations tat over had in >>> mind that are only possible on frozen read-only tables: >>> >>> 1) Compression: compress the pages and pack them one after the other. >>> Build a new fork with offsets for each page. >>> >>> 2) Automatic partition elimination where the statistics track the >>> minimum and maximum value per partition (and number of tuples) and treat >>> then as implicit constraints. In particular it would magically make read >>> only empty parent partitions be excluded regardless of the where clause. >> >> >> AFAICT neither of those actually requires ALLFROZEN, no? You'll need to >> uncompact and re-compact for #1 when you actually freeze (which maybe isn't >> worth it), but freezing isn't absolutely required. #2 would only require >> that everything in the relation is visible; not frozen. >> >> I think there's value here to having an ALLVISIBLE state as well as >> ALLFROZEN. >> > > Based on may suggestions, I'm going to deal with FM at first as one > patch. It would be simply mechanism and similar to VM, at first patch. > - Each bit of FM represent single page > - The bit is set only by vacuum > - The bit is un-set by inserting and updating and deleting > > At second, I'll deal with simply read-only table and 2 states, > Read/Write(default) and ReadOnly as one patch. ITSM the having the > Frozen state needs to more discussion. read-only table just allow us > to disable any updating table, and it's controlled by read-only flag > pg_class has. And DDL command which changes these status is like ALTER > TABLE SET READ ONLY, or READ WRITE. > Also as Alvaro's suggested, the read-only table affect not only > freezing table but also performance optimization. I'll consider > including them when I deal with read-only table. >
Attached WIP patch adds Frozen Map which enables us to avoid whole table vacuuming even when full scan is required: preventing XID wraparound failures. Frozen Map is a bitmap with one bit per heap page, and quite similar to Visibility Map. A set bit means that all tuples on heap page are completely frozen, therefore we don't need to do vacuum freeze that page. A bit is set when vacuum(or autovacuum) figures out that all tuples on corresponding heap page are completely frozen, and a bit is cleared when INSERT and UPDATE(only new heap page) are executed. Current patch adds new source file src/backend/access/heap/frozenmap.c which is quite similar to visibilitymap.c. They have similar code but are separated for now. I do refactoring these source code like adding bitmap.c, if needed. Also, when skipping vacuum by visibility map, we can skip at least SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in frozen map. Please give me feedbacks. Regards, ------- Sawada Masahiko
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile index b83d496..53f07fd 100644 --- a/src/backend/access/heap/Makefile +++ b/src/backend/access/heap/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/access/heap top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o +OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o frozenmap.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/heap/frozenmap.c b/src/backend/access/heap/frozenmap.c new file mode 100644 index 0000000..6e64cb8 --- /dev/null +++ b/src/backend/access/heap/frozenmap.c @@ -0,0 +1,567 @@ +/*------------------------------------------------------------------------- + * + * frozenmap.c + * bitmap for tracking frozen heap tuples + * + * Portions Copyright (c) 2015, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/backend/access/heap/frozenmap.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "access/frozenmap.h" +#include "access/heapam_xlog.h" +#include "access/xlog.h" +#include "miscadmin.h" +#include "storage/bufmgr.h" +#include "storage/lmgr.h" +#include "storage/smgr.h" +#include "utils/inval.h" + + +//#define TRACE_FROZENMAP + +/* + * Size of the bitmap on each frozen map page, in bytes. There's no + * extra headers, so the whole page minus the standard page header is + * used for the bitmap. + */ +#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData)) + +/* Number of bits allocated for each heap block. */ +#define BITS_PER_HEAPBLOCK 1 + +/* Number of heap blocks we can represent in one byte. */ +#define HEAPBLOCKS_PER_BYTE 8 + +/* Number of heap blocks we can represent in one frozen map page. */ +#define HEAPBLOCKS_PER_PAGE (MAPSIZE * HEAPBLOCKS_PER_BYTE) + +/* Mapping from heap block number to the right bit in the frozen map */ +#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE) +#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) / HEAPBLOCKS_PER_BYTE) +#define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE) + +/* table for fast counting of set bits */ +static const uint8 number_of_ones[256] = { + 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8 +}; + +/* prototypes for internal routines */ +static Buffer fm_readbuf(Relation rel, BlockNumber blkno, bool extend); +static void fm_extend(Relation rel, BlockNumber nfmblocks); + + +/* + * frozenmap_clear - clear a bit in frozen map + * + * This function is same logic as visibilitymap_clear. + * You must pass a buffer containing the correct map page to this function. + * Call frozenmap_pin first to pin the right one. This function doesn't do + * any I/O. + */ +void +frozenmap_clear(Relation rel, BlockNumber heapBlk, Buffer buf) +{ + BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); + int mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); + int mapBit = HEAPBLK_TO_MAPBIT(heapBlk); + uint8 mask = 1 << mapBit; + char *map; + +#ifdef TRACE_FROZENMAP + elog(DEBUG1, "fm_clear %s %d", RelationGetRelationName(rel), heapBlk); +#endif + + if (!BufferIsValid(buf) || BufferGetBlockNumber(buf) != mapBlock) + elog(ERROR, "wrong buffer passed to frozenmap_clear"); + + LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); + map = PageGetContents(BufferGetPage(buf)); + + if (map[mapByte] & mask) + { + map[mapByte] &= ~mask; + + MarkBufferDirty(buf); + } + + LockBuffer(buf, BUFFER_LOCK_UNLOCK); +} + +/* + * frozenmap_pin - pin a map page for setting a bit + * + * This function is same logic as visibilitymap_pin. + * Setting a bit in the frozen map is a two-phase operation. First, call + * frozenmap_pin, to pin the frozen map page containing the bit for + * the heap page. Because that can require I/O to read the map page, you + * shouldn't hold a lock on the heap page while doing that. Then, call + * frozenmap_set to actually set the bit. + * + * On entry, *buf should be InvalidBuffer or a valid buffer returned by + * an earlier call to frozenmap_pin or frozenmap_test on the same + * relation. On return, *buf is a valid buffer with the map page containing + * the bit for heapBlk. + * + * If the page doesn't exist in the map file yet, it is extended. + */ +void +frozenmap_pin(Relation rel, BlockNumber heapBlk, Buffer *buf) +{ + BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); + + /* Reuse the old pinned buffer if possible */ + if (BufferIsValid(*buf)) + { + if (BufferGetBlockNumber(*buf) == mapBlock) + return; + + ReleaseBuffer(*buf); + } + *buf = fm_readbuf(rel, mapBlock, true); +} + +/* + * frozenmap_pin_ok - do we already have the correct page pinned? + * + * On entry, buf should be InvalidBuffer or a valid buffer returned by + * an earlier call to frozenmap_pin or frozenmap_test on the same + * relation. The return value indicates whether the buffer covers the + * given heapBlk. + */ +bool +frozenmap_pin_ok(BlockNumber heapBlk, Buffer buf) +{ + BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); + + return BufferIsValid(buf) && BufferGetBlockNumber(buf) == mapBlock; +} + +/* + * frozenmap_set - set a bit on a previously pinned page + * + * recptr is the LSN of the XLOG record we're replaying, if we're in recovery, + * or InvalidXLogRecPtr in normal running. The page LSN is advanced to the + * one provided; in normal running, we generate a new XLOG record and set the + * page LSN to that value. cutoff_xid is the largest xmin on the page being + * marked all-frozen; it is needed for Hot Standby, and can be + * InvalidTransactionId if the page contains no tuples. + * + * Caller is expected to set the heap page's PD_ALL_FROZEN bit before calling + * this function. Except in recovery, caller should also pass the heap + * buffer. When checksums are enabled and we're not in recovery, we must add + * the heap buffer to the WAL chain to protect it from being torn. + * + * You must pass a buffer containing the correct map page to this function. + * Call frozenmap_pin first to pin the right one. This function doesn't do + * any I/O. + */ +void +frozenmap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, + XLogRecPtr recptr, Buffer fmBuf) +{ + BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); + uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); + uint8 mapBit = HEAPBLK_TO_MAPBIT(heapBlk); + Page page; + char *map; + +#ifdef TRACE_FROZENMAP + elog(DEBUG1, "fm_set %s %d", RelationGetRelationName(rel), heapBlk); +#endif + + Assert(InRecovery || XLogRecPtrIsInvalid(recptr)); + Assert(InRecovery || BufferIsValid(heapBuf)); + + /* Check that we have the right heap page pinned, if present */ + if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk) + elog(ERROR, "wrong heap buffer passed to frozenmap_set"); + + /* Check that we have the right VM page pinned */ + if (!BufferIsValid(fmBuf) || BufferGetBlockNumber(fmBuf) != mapBlock) + elog(ERROR, "wrong FM buffer passed to frozenmap_set"); + + page = BufferGetPage(fmBuf); + map = PageGetContents(page); + LockBuffer(fmBuf, BUFFER_LOCK_EXCLUSIVE); + + if (!(map[mapByte] & (1 << mapBit))) + { + START_CRIT_SECTION(); + + map[mapByte] |= (1 << mapBit); + MarkBufferDirty(fmBuf); + + if (RelationNeedsWAL(rel)) + { + if (XLogRecPtrIsInvalid(recptr)) + { + Assert(!InRecovery); + recptr = log_heap_frozenmap(rel->rd_node, heapBuf, fmBuf); + + /* + * If data checksums are enabled (or wal_log_hints=on), we + * need to protect the heap page from being torn. + */ + if (XLogHintBitIsNeeded()) + { + Page heapPage = BufferGetPage(heapBuf); + + /* caller is expected to set PD_ALL_FROZEN first */ + Assert(PageIsAllFrozen(heapPage)); + PageSetLSN(heapPage, recptr); + } + } + PageSetLSN(page, recptr); + } + + END_CRIT_SECTION(); + } + + LockBuffer(fmBuf, BUFFER_LOCK_UNLOCK); +} + +/* + * frozenmap_test - test if a bit is set + * + * Are all tuples on heapBlk frozen to all, according to the frozen map? + * + * On entry, *buf should be InvalidBuffer or a valid buffer returned by an + * earlier call to frozenmap_pin or frozenmap_test on the same + * relation. On return, *buf is a valid buffer with the map page containing + * the bit for heapBlk, or InvalidBuffer. The caller is responsible for + * releasing *buf after it's done testing and setting bits. + * + * NOTE: This function is typically called without a lock on the heap page, + * so somebody else could change the bit just after we look at it. In fact, + * since we don't lock the frozen map page either, it's even possible that + * someone else could have changed the bit just before we look at it, but yet + * we might see the old value. It is the caller's responsibility to deal with + * all concurrency issues! + */ +bool +frozenmap_test(Relation rel, BlockNumber heapBlk, Buffer *buf) +{ + BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); + uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); + uint8 mapBit = HEAPBLK_TO_MAPBIT(heapBlk); + bool result; + char *map; + +#ifdef TRACE_FROZENMAP + elog(DEBUG1, "fm_test %s %d", RelationGetRelationName(rel), heapBlk); +#endif + + /* Reuse the old pinned buffer if possible */ + if (BufferIsValid(*buf)) + { + if (BufferGetBlockNumber(*buf) != mapBlock) + { + ReleaseBuffer(*buf); + *buf = InvalidBuffer; + } + } + + if (!BufferIsValid(*buf)) + { + *buf = fm_readbuf(rel, mapBlock, false); + if (!BufferIsValid(*buf)) + return false; + } + + map = PageGetContents(BufferGetPage(*buf)); + + /* + * A single-bit read is atomic. There could be memory-ordering effects + * here, but for performance reasons we make it the caller's job to worry + * about that. + */ + result = (map[mapByte] & (1 << mapBit)) ? true : false; + + return result; +} + +/* + * frozenmap_count - count number of bits set in frozen map + * + * Note: we ignore the possibility of race conditions when the table is being + * extended concurrently with the call. New pages added to the table aren't + * going to be marked all-frozen, so they won't affect the result. + */ +BlockNumber +frozenmap_count(Relation rel) +{ + BlockNumber result = 0; + BlockNumber mapBlock; + + for (mapBlock = 0;; mapBlock++) + { + Buffer mapBuffer; + unsigned char *map; + int i; + + /* + * Read till we fall off the end of the map. We assume that any extra + * bytes in the last page are zeroed, so we don't bother excluding + * them from the count. + */ + mapBuffer = fm_readbuf(rel, mapBlock, false); + if (!BufferIsValid(mapBuffer)) + break; + + /* + * We choose not to lock the page, since the result is going to be + * immediately stale anyway if anyone is concurrently setting or + * clearing bits, and we only really need an approximate value. + */ + map = (unsigned char *) PageGetContents(BufferGetPage(mapBuffer)); + + for (i = 0; i < MAPSIZE; i++) + { + result += number_of_ones[map[i]]; + } + + ReleaseBuffer(mapBuffer); + } + + return result; +} + +/* + * frozenmap_truncate - truncate the frozen map + * + * The caller must hold AccessExclusiveLock on the relation, to ensure that + * other backends receive the smgr invalidation event that this function sends + * before they access the VM again. + * + * nheapblocks is the new size of the heap. + */ +void +frozenmap_truncate(Relation rel, BlockNumber nheapblocks) +{ + BlockNumber newnblocks; + + /* last remaining block, byte, and bit */ + BlockNumber truncBlock = HEAPBLK_TO_MAPBLOCK(nheapblocks); + uint32 truncByte = HEAPBLK_TO_MAPBYTE(nheapblocks); + uint8 truncBit = HEAPBLK_TO_MAPBIT(nheapblocks); + +#ifdef TRACE_FROZENMAP + elog(DEBUG1, "fm_truncate %s %d", RelationGetRelationName(rel), nheapblocks); +#endif + + RelationOpenSmgr(rel); + + /* + * If no frozen map has been created yet for this relation, there's + * nothing to truncate. + */ + if (!smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM)) + return; + + /* + * Unless the new size is exactly at a frozen map page boundary, the + * tail bits in the last remaining map page, representing truncated heap + * blocks, need to be cleared. This is not only tidy, but also necessary + * because we don't get a chance to clear the bits if the heap is extended + * again. + */ + if (truncByte != 0 || truncBit != 0) + { + Buffer mapBuffer; + Page page; + char *map; + + newnblocks = truncBlock + 1; + + mapBuffer = fm_readbuf(rel, truncBlock, false); + if (!BufferIsValid(mapBuffer)) + { + /* nothing to do, the file was already smaller */ + return; + } + + page = BufferGetPage(mapBuffer); + map = PageGetContents(page); + + LockBuffer(mapBuffer, BUFFER_LOCK_EXCLUSIVE); + + /* Clear out the unwanted bytes. */ + MemSet(&map[truncByte + 1], 0, MAPSIZE - (truncByte + 1)); + + /*---- + * Mask out the unwanted bits of the last remaining byte. + * + * ((1 << 0) - 1) = 00000000 + * ((1 << 1) - 1) = 00000001 + * ... + * ((1 << 6) - 1) = 00111111 + * ((1 << 7) - 1) = 01111111 + *---- + */ + map[truncByte] &= (1 << truncBit) - 1; + + MarkBufferDirty(mapBuffer); + UnlockReleaseBuffer(mapBuffer); + } + else + newnblocks = truncBlock; + + if (smgrnblocks(rel->rd_smgr, FROZENMAP_FORKNUM) <= newnblocks) + { + /* nothing to do, the file was already smaller than requested size */ + return; + } + + /* Truncate the unused VM pages, and send smgr inval message */ + smgrtruncate(rel->rd_smgr, FROZENMAP_FORKNUM, newnblocks); + + /* + * We might as well update the local smgr_vm_nblocks setting. smgrtruncate + * sent an smgr cache inval message, which will cause other backends to + * invalidate their copy of smgr_vm_nblocks, and this one too at the next + * command boundary. But this ensures it isn't outright wrong until then. + */ + if (rel->rd_smgr) + rel->rd_smgr->smgr_fm_nblocks = newnblocks; +} + +/* + * Read a frozen map page. + * + * If the page doesn't exist, InvalidBuffer is returned, or if 'extend' is + * true, the frozen map file is extended. + */ +static Buffer +fm_readbuf(Relation rel, BlockNumber blkno, bool extend) +{ + Buffer buf; + + /* + * We might not have opened the relation at the smgr level yet, or we + * might have been forced to close it by a sinval message. The code below + * won't necessarily notice relation extension immediately when extend = + * false, so we rely on sinval messages to ensure that our ideas about the + * size of the map aren't too far out of date. + */ + RelationOpenSmgr(rel); + + /* + * If we haven't cached the size of the frozen map fork yet, check it + * first. + */ + if (rel->rd_smgr->smgr_fm_nblocks == InvalidBlockNumber) + { + if (smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM)) + rel->rd_smgr->smgr_fm_nblocks = smgrnblocks(rel->rd_smgr, + FROZENMAP_FORKNUM); + else + rel->rd_smgr->smgr_fm_nblocks = 0; + } + + /* Handle requests beyond EOF */ + if (blkno >= rel->rd_smgr->smgr_fm_nblocks) + { + if (extend) + fm_extend(rel, blkno + 1); + else + return InvalidBuffer; + } + + /* + * Use ZERO_ON_ERROR mode, and initialize the page if necessary. It's + * always safe to clear bits, so it's better to clear corrupt pages than + * error out. + */ + buf = ReadBufferExtended(rel, FROZENMAP_FORKNUM, blkno, + RBM_ZERO_ON_ERROR, NULL); + if (PageIsNew(BufferGetPage(buf))) + PageInit(BufferGetPage(buf), BLCKSZ, 0); + return buf; +} + +/* + * Ensure that the frozen map fork is at least vm_nblocks long, extending + * it if necessary with zeroed pages. + */ +static void +fm_extend(Relation rel, BlockNumber fm_nblocks) +{ + BlockNumber fm_nblocks_now; + Page pg; + + pg = (Page) palloc(BLCKSZ); + PageInit(pg, BLCKSZ, 0); + + /* + * We use the relation extension lock to lock out other backends trying to + * extend the frozen map at the same time. It also locks out extension + * of the main fork, unnecessarily, but extending the frozen map + * happens seldom enough that it doesn't seem worthwhile to have a + * separate lock tag type for it. + * + * Note that another backend might have extended or created the relation + * by the time we get the lock. + */ + LockRelationForExtension(rel, ExclusiveLock); + + /* Might have to re-open if a cache flush happened */ + RelationOpenSmgr(rel); + + /* + * Create the file first if it doesn't exist. If smgr_vm_nblocks is + * positive then it must exist, no need for an smgrexists call. + */ + if ((rel->rd_smgr->smgr_fm_nblocks == 0 || + rel->rd_smgr->smgr_fm_nblocks == InvalidBlockNumber) && + !smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM)) + smgrcreate(rel->rd_smgr, FROZENMAP_FORKNUM, false); + + fm_nblocks_now = smgrnblocks(rel->rd_smgr, FROZENMAP_FORKNUM); + + /* Now extend the file */ + while (fm_nblocks_now < fm_nblocks) + { + PageSetChecksumInplace(pg, fm_nblocks_now); + + smgrextend(rel->rd_smgr, FROZENMAP_FORKNUM, fm_nblocks_now, + (char *) pg, false); + fm_nblocks_now++; + } + + /* + * Send a shared-inval message to force other backends to close any smgr + * references they may have for this rel, which we are about to change. + * This is a useful optimization because it means that backends don't have + * to keep checking for creation or extension of the file, which happens + * infrequently. + */ + CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode); + + /* Update local cache with the up-to-date size */ + rel->rd_smgr->smgr_fm_nblocks = fm_nblocks_now; + + UnlockRelationForExtension(rel, ExclusiveLock); + + pfree(pg); +} diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index cb6f8a3..7f7c147 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -38,6 +38,7 @@ */ #include "postgres.h" +#include "access/frozenmap.h" #include "access/heapam.h" #include "access/heapam_xlog.h" #include "access/hio.h" @@ -86,7 +87,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup, static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf, Buffer newbuf, HeapTuple oldtup, HeapTuple newtup, HeapTuple old_key_tup, - bool all_visible_cleared, bool new_all_visible_cleared); + bool all_visible_cleared, bool new_all_visible_cleared, + bool all_frozen_cleared, bool new_all_frozen_cleared); static void HeapSatisfiesHOTandKeyUpdate(Relation relation, Bitmapset *hot_attrs, Bitmapset *key_attrs, Bitmapset *id_attrs, @@ -2067,8 +2069,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, TransactionId xid = GetCurrentTransactionId(); HeapTuple heaptup; Buffer buffer; - Buffer vmbuffer = InvalidBuffer; + Buffer vmbuffer = InvalidBuffer, + fmbuffer = InvalidBuffer; bool all_visible_cleared = false; + bool all_frozen_cleared; /* * Fill in tuple header fields, assign an OID, and toast the tuple if @@ -2092,12 +2096,14 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, CheckForSerializableConflictIn(relation, NULL, InvalidBuffer); /* - * Find buffer to insert this tuple into. If the page is all visible, - * this will also pin the requisite visibility map page. + * Find buffer to insert this tuple into. If the page is all visible + * of all frozen, this will also pin the requisite visibility map and + * frozen map page. */ buffer = RelationGetBufferForTuple(relation, heaptup->t_len, InvalidBuffer, options, bistate, - &vmbuffer, NULL); + &vmbuffer, NULL, + &fmbuffer, NULL); /* NO EREPORT(ERROR) from here till changes are logged */ START_CRIT_SECTION(); @@ -2113,6 +2119,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, vmbuffer); } + if (PageIsAllFrozen(BufferGetPage(buffer))) + { + all_frozen_cleared = true; + PageClearAllFrozen(BufferGetPage(buffer)); + frozenmap_clear(relation, + ItemPointerGetBlockNumber(&(heaptup->t_self)), + fmbuffer); + } + /* * XXX Should we set PageSetPrunable on this page ? * @@ -2157,6 +2172,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, xlrec.offnum = ItemPointerGetOffsetNumber(&heaptup->t_self); xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0; + if (all_frozen_cleared) + xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED; Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer)); /* @@ -2199,6 +2216,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, UnlockReleaseBuffer(buffer); if (vmbuffer != InvalidBuffer) ReleaseBuffer(vmbuffer); + if (fmbuffer != InvalidBuffer) + ReleaseBuffer(fmbuffer); /* * If tuple is cachable, mark it for invalidation from the caches in case @@ -2346,8 +2365,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, while (ndone < ntuples) { Buffer buffer; - Buffer vmbuffer = InvalidBuffer; + Buffer vmbuffer = InvalidBuffer, + fmbuffer = InvalidBuffer; bool all_visible_cleared = false; + bool all_frozen_cleared = false; int nthispage; CHECK_FOR_INTERRUPTS(); @@ -2358,7 +2379,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, */ buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len, InvalidBuffer, options, bistate, - &vmbuffer, NULL); + &vmbuffer, NULL, + &fmbuffer, NULL); page = BufferGetPage(buffer); /* NO EREPORT(ERROR) from here till changes are logged */ @@ -2395,6 +2417,15 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, vmbuffer); } + if (PageIsAllFrozen(page)) + { + all_frozen_cleared = true; + PageClearAllFrozen(page); + frozenmap_clear(relation, + BufferGetBlockNumber(buffer), + fmbuffer); + } + /* * XXX Should we set PageSetPrunable on this page ? See heap_insert() */ @@ -2437,6 +2468,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, tupledata = scratchptr; xlrec->flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0; + if (all_frozen_cleared) + xlrec->flags |= XLOG_HEAP_ALL_FROZEN_CLEARED; xlrec->ntuples = nthispage; /* @@ -2509,6 +2542,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, UnlockReleaseBuffer(buffer); if (vmbuffer != InvalidBuffer) ReleaseBuffer(vmbuffer); + if (fmbuffer != InvalidBuffer) + ReleaseBuffer(fmbuffer); ndone += nthispage; } @@ -3053,7 +3088,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, Buffer buffer, newbuf, vmbuffer = InvalidBuffer, - vmbuffer_new = InvalidBuffer; + vmbuffer_new = InvalidBuffer, + fmbuffer = InvalidBuffer, + fmbuffer_new = InvalidBuffer; bool need_toast, already_marked; Size newtupsize, @@ -3067,6 +3104,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, bool key_intact; bool all_visible_cleared = false; bool all_visible_cleared_new = false; + bool all_frozen_cleared = false; + bool all_frozen_cleared_new = false; bool checked_lockers; bool locker_remains; TransactionId xmax_new_tuple, @@ -3100,14 +3139,17 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, page = BufferGetPage(buffer); /* - * Before locking the buffer, pin the visibility map page if it appears to - * be necessary. Since we haven't got the lock yet, someone else might be - * in the middle of changing this, so we'll need to recheck after we have - * the lock. + * Before locking the buffer, pin the visibility map and frozen map page + * if it appears to be necessary. Since we haven't got the lock yet, + * someone else might be in the middle of changing this, so we'll need to + * recheck after we have the lock. */ if (PageIsAllVisible(page)) visibilitymap_pin(relation, block, &vmbuffer); + if (PageIsAllFrozen(page)) + frozenmap_pin(relation, block, &fmbuffer); + LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid)); @@ -3390,19 +3432,21 @@ l2: UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode); if (vmbuffer != InvalidBuffer) ReleaseBuffer(vmbuffer); + if (fmbuffer_new != InvalidBuffer) + ReleaseBuffer(fmbuffer); bms_free(hot_attrs); bms_free(key_attrs); return result; } /* - * If we didn't pin the visibility map page and the page has become all - * visible while we were busy locking the buffer, or during some - * subsequent window during which we had it unlocked, we'll have to unlock - * and re-lock, to avoid holding the buffer lock across an I/O. That's a - * bit unfortunate, especially since we'll now have to recheck whether the - * tuple has been locked or updated under us, but hopefully it won't - * happen very often. + * If we didn't pin the visibility(and frozen) map page and the page has + * become all visible(and frozen) while we were busy locking the buffer, + * or during some subsequent window during which we had it unlocked, + * we'll have to unlock and re-lock, to avoid holding the buffer lock + * across an I/O. That's a bit unfortunate, especially since we'll now + * have to recheck whether the tuple has been locked or updated under us, + * but hopefully it won't happen very often. */ if (vmbuffer == InvalidBuffer && PageIsAllVisible(page)) { @@ -3412,6 +3456,15 @@ l2: goto l2; } + if (fmbuffer == InvalidBuffer && PageIsAllFrozen(page)) + { + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + frozenmap_pin(relation, block, &fmbuffer); + LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); + goto l2; + + } + /* * We're about to do the actual update -- check for conflict first, to * avoid possibly having to roll back work we've just done. @@ -3570,7 +3623,8 @@ l2: /* Assume there's no chance to put heaptup on same page. */ newbuf = RelationGetBufferForTuple(relation, heaptup->t_len, buffer, 0, NULL, - &vmbuffer_new, &vmbuffer); + &vmbuffer_new, &vmbuffer, + &fmbuffer_new, &fmbuffer); } else { @@ -3588,7 +3642,8 @@ l2: LockBuffer(buffer, BUFFER_LOCK_UNLOCK); newbuf = RelationGetBufferForTuple(relation, heaptup->t_len, buffer, 0, NULL, - &vmbuffer_new, &vmbuffer); + &vmbuffer_new, &vmbuffer, + &fmbuffer_new, &fmbuffer); } else { @@ -3713,6 +3768,22 @@ l2: vmbuffer_new); } + /* clear PD_ALL_FROZEN flags */ + if (newbuf == buffer && PageIsAllFrozen(BufferGetPage(buffer))) + { + all_frozen_cleared = true; + PageClearAllFrozen(BufferGetPage(buffer)); + frozenmap_clear(relation, BufferGetBlockNumber(buffer), + fmbuffer); + } + else if (newbuf != buffer && PageIsAllFrozen(BufferGetPage(newbuf))) + { + all_frozen_cleared_new = true; + PageClearAllFrozen(BufferGetPage(newbuf)); + frozenmap_clear(relation, BufferGetBlockNumber(newbuf), + fmbuffer_new); + } + if (newbuf != buffer) MarkBufferDirty(newbuf); MarkBufferDirty(buffer); @@ -3736,7 +3807,9 @@ l2: newbuf, &oldtup, heaptup, old_key_tuple, all_visible_cleared, - all_visible_cleared_new); + all_visible_cleared_new, + all_frozen_cleared, + all_frozen_cleared_new); if (newbuf != buffer) { PageSetLSN(BufferGetPage(newbuf), recptr); @@ -3768,6 +3841,10 @@ l2: ReleaseBuffer(vmbuffer_new); if (BufferIsValid(vmbuffer)) ReleaseBuffer(vmbuffer); + if (BufferIsValid(fmbuffer_new)) + ReleaseBuffer(fmbuffer_new); + if (BufferIsValid(fmbuffer)) + ReleaseBuffer(fmbuffer); /* * Release the lmgr tuple lock, if we had it. @@ -6534,6 +6611,34 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid, } /* + * Perform XLogInsert for a heap-all-frozen operation. heap_buffer is the block + * being marked all-frozen, and fm_buffer is the buffer containing the + * corresponding frozen map block. Both should have already been modified and dirty. + */ +XLogRecPtr +log_heap_frozenmap(RelFileNode rnode, Buffer heap_buffer, Buffer fm_buffer) +{ + XLogRecPtr recptr; + uint8 flags; + + Assert(BufferIsValid(heap_buffer)); + Assert(BufferIsValid(fm_buffer)); + + XLogBeginInsert(); + + XLogRegisterBuffer(0, fm_buffer, 0); + + flags = REGBUF_STANDARD; + if (!XLogHintBitIsNeeded()) + flags |= REGBUF_NO_IMAGE; + XLogRegisterBuffer(1, heap_buffer, flags); + + recptr = XLogInsert(RM_HEAP3_ID, XLOG_HEAP3_FROZENMAP); + + return recptr; +} + +/* * Perform XLogInsert for a heap-visible operation. 'block' is the block * being marked all-visible, and vm_buffer is the buffer containing the * corresponding visibility map block. Both should have already been modified @@ -6577,7 +6682,8 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf, Buffer newbuf, HeapTuple oldtup, HeapTuple newtup, HeapTuple old_key_tuple, - bool all_visible_cleared, bool new_all_visible_cleared) + bool all_visible_cleared, bool new_all_visible_cleared, + bool all_frozen_cleared, bool new_all_frozen_cleared) { xl_heap_update xlrec; xl_heap_header xlhdr; @@ -6660,6 +6766,10 @@ log_heap_update(Relation reln, Buffer oldbuf, xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED; if (new_all_visible_cleared) xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED; + if (all_frozen_cleared) + xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED; + if (new_all_frozen_cleared) + xlrec.flags |= XLOG_HEAP_NEW_ALL_FROZEN_CLEARED; if (prefixlen > 0) xlrec.flags |= XLOG_HEAP_PREFIX_FROM_OLD; if (suffixlen > 0) @@ -7198,6 +7308,75 @@ heap_xlog_visible(XLogReaderState *record) UnlockReleaseBuffer(vmbuffer); } + +/* + * Reply XLOG_HEAP3_FROZENMAP record. + */ +static void +heap_xlog_frozenmap(XLogReaderState *record) +{ + XLogRecPtr lsn = record->EndRecPtr; + Buffer fmbuffer = InvalidBuffer; + Buffer buffer; + Page page; + RelFileNode rnode; + BlockNumber blkno; + XLogRedoAction action; + + XLogRecGetBlockTag(record, 1, &rnode, NULL, &blkno); + + /* + * Read the heap page, if it still exists. If the heap file has dropped or + * truncated later in recovery, we don't need to update the page, but we'd + * better still update the frozen map. + */ + action = XLogReadBufferForRedo(record, 1, &buffer); + if (action == BLK_NEEDS_REDO) + { + page = BufferGetPage(buffer); + PageSetAllFrozen(page); + MarkBufferDirty(buffer); + } + else if (action == BLK_RESTORED) + { + /* + * If heap block was backed up, restore it. This can only happen with + * checksums enabled. + */ + Assert(DataChecksumsEnabled()); + } + if (BufferIsValid(buffer)) + UnlockReleaseBuffer(buffer); + + if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false, + &fmbuffer) == BLK_NEEDS_REDO) + { + Page fmpage = BufferGetPage(fmbuffer); + Relation reln; + + /* initialize the page if it was read as zeros */ + if (PageIsNew(fmpage)) + PageInit(fmpage, BLCKSZ, 0); + + /* + * XLogReplayBufferExtended locked the buffer. But frozenmap_set + * will handle locking itself. + */ + LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK); + + reln = CreateFakeRelcacheEntry(rnode); + frozenmap_pin(reln, blkno, &fmbuffer); + + if (lsn > PageGetLSN(fmpage)) + frozenmap_set(reln, blkno, InvalidBuffer, lsn, fmbuffer); + + ReleaseBuffer(fmbuffer); + FreeFakeRelcacheEntry(reln); + } + else if (BufferIsValid(fmbuffer)) + UnlockReleaseBuffer(fmbuffer); +} + /* * Replay XLOG_HEAP2_FREEZE_PAGE records */ @@ -7384,6 +7563,20 @@ heap_xlog_insert(XLogReaderState *record) FreeFakeRelcacheEntry(reln); } + /* The frozen map may need to be fixed even if the heap page is + * already up-to-date. + */ + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + { + Relation reln = CreateFakeRelcacheEntry(target_node); + Buffer fmbuffer = InvalidBuffer; + + frozenmap_pin(reln, blkno, &fmbuffer); + frozenmap_clear(reln, blkno, fmbuffer); + ReleaseBuffer(fmbuffer); + FreeFakeRelcacheEntry(reln); + } + /* * If we inserted the first and only tuple on the page, re-initialize the * page from scratch. @@ -7439,6 +7632,9 @@ heap_xlog_insert(XLogReaderState *record) if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED) PageClearAllVisible(page); + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + PageClearAllFrozen(page); + MarkBufferDirty(buffer); } if (BufferIsValid(buffer)) @@ -7504,6 +7700,21 @@ heap_xlog_multi_insert(XLogReaderState *record) FreeFakeRelcacheEntry(reln); } + /* + * The frozen map may need to be fixed even if the heap page is + * already up-to-date. + */ + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + { + Relation reln = CreateFakeRelcacheEntry(rnode); + Buffer fmbuffer = InvalidBuffer; + + visibilitymap_pin(reln, blkno, &fmbuffer); + visibilitymap_clear(reln, blkno, fmbuffer); + ReleaseBuffer(fmbuffer); + FreeFakeRelcacheEntry(reln); + } + if (isinit) { buffer = XLogInitBufferForRedo(record, 0); @@ -7577,6 +7788,8 @@ heap_xlog_multi_insert(XLogReaderState *record) if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED) PageClearAllVisible(page); + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + PageClearAllFrozen(page); MarkBufferDirty(buffer); } @@ -7660,6 +7873,22 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) } /* + * The frozen map may need to be fixed even if the heap page is + * already up-to-date. + */ + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + { + Relation reln = CreateFakeRelcacheEntry(rnode); + Buffer fmbuffer = InvalidBuffer; + + frozenmap_pin(reln, oldblk, &fmbuffer); + frozenmap_clear(reln, oldblk, fmbuffer); + ReleaseBuffer(fmbuffer); + FreeFakeRelcacheEntry(reln); + } + + + /* * In normal operation, it is important to lock the two pages in * page-number order, to avoid possible deadlocks against other update * operations going the other way. However, during WAL replay there can @@ -7705,6 +7934,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) if (xlrec->flags & XLOG_HEAP_ALL_VISIBLE_CLEARED) PageClearAllVisible(page); + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + PageClearAllFrozen(page); PageSetLSN(page, lsn); MarkBufferDirty(obuffer); @@ -7743,6 +7974,21 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) FreeFakeRelcacheEntry(reln); } + /* + * The frozen map may need to be fixed even if the heap page is + * already up-to-date. + */ + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + { + Relation reln = CreateFakeRelcacheEntry(rnode); + Buffer fmbuffer = InvalidBuffer; + + visibilitymap_pin(reln, oldblk, &fmbuffer); + visibilitymap_clear(reln, oldblk, fmbuffer); + ReleaseBuffer(fmbuffer); + FreeFakeRelcacheEntry(reln); + } + /* Deal with new tuple */ if (newaction == BLK_NEEDS_REDO) { @@ -7840,6 +8086,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) if (xlrec->flags & XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED) PageClearAllVisible(page); + if (xlrec->flags & XLOG_HEAP_ALL_FROZEN_CLEARED) + PageClearAllFrozen(page); freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */ @@ -8072,6 +8320,21 @@ heap2_redo(XLogReaderState *record) } } +void +heap3_redo(XLogReaderState *record) +{ + uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; + + switch (info & XLOG_HEAP_OPMASK) + { + case XLOG_HEAP3_FROZENMAP: + heap_xlog_frozenmap(record); + break; + default: + elog(PANIC, "heap3_redo: unknown op code %u", info); + } +} + /* * heap_sync - sync a heap, for use when no WAL has been written * diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c index 6d091f6..5460d4f 100644 --- a/src/backend/access/heap/hio.c +++ b/src/backend/access/heap/hio.c @@ -15,6 +15,7 @@ #include "postgres.h" +#include "access/frozenmap.h" #include "access/heapam.h" #include "access/hio.h" #include "access/htup_details.h" @@ -156,6 +157,62 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, } /* + * For each heap page which is all-frozen, acquire a pin on the appropriate + * frozen map page, if we haven't already got one. + * + * This function is same logic as GetVisibilityMapPins function. + */ +static void +GetFrozenMapPins(Relation relation, Buffer buffer1, Buffer buffer2, + BlockNumber block1, BlockNumber block2, + Buffer *fmbuffer1, Buffer *fmbuffer2) +{ + bool need_to_pin_buffer1; + bool need_to_pin_buffer2; + + Assert(BufferIsValid(buffer1)); + Assert(buffer2 == InvalidBuffer || buffer1 <= buffer2); + + while (1) + { + /* Figure out which pins we need but don't have. */ + need_to_pin_buffer1 = PageIsAllFrozen(BufferGetPage(buffer1)) + && !frozenmap_pin_ok(block1, *fmbuffer1); + need_to_pin_buffer2 = buffer2 != InvalidBuffer + && PageIsAllFrozen(BufferGetPage(buffer2)) + && !frozenmap_pin_ok(block2, *fmbuffer2); + if (!need_to_pin_buffer1 && !need_to_pin_buffer2) + return; + + /* We must unlock both buffers before doing any I/O. */ + LockBuffer(buffer1, BUFFER_LOCK_UNLOCK); + if (buffer2 != InvalidBuffer && buffer2 != buffer1) + LockBuffer(buffer2, BUFFER_LOCK_UNLOCK); + + /* Get pins. */ + if (need_to_pin_buffer1) + frozenmap_pin(relation, block1, fmbuffer1); + if (need_to_pin_buffer2) + frozenmap_pin(relation, block2, fmbuffer2); + + /* Relock buffers. */ + LockBuffer(buffer1, BUFFER_LOCK_EXCLUSIVE); + if (buffer2 != InvalidBuffer && buffer2 != buffer1) + LockBuffer(buffer2, BUFFER_LOCK_EXCLUSIVE); + + /* + * If there are two buffers involved and we pinned just one of them, + * it's possible that the second one became all-frozen while we were + * busy pinning the first one. If it looks like that's a possible + * scenario, we'll need to make a second pass through this loop. + */ + if (buffer2 == InvalidBuffer || buffer1 == buffer2 + || (need_to_pin_buffer1 && need_to_pin_buffer2)) + break; + } +} + +/* * RelationGetBufferForTuple * * Returns pinned and exclusive-locked buffer of a page in given relation @@ -215,7 +272,8 @@ Buffer RelationGetBufferForTuple(Relation relation, Size len, Buffer otherBuffer, int options, BulkInsertState bistate, - Buffer *vmbuffer, Buffer *vmbuffer_other) + Buffer *vmbuffer, Buffer *vmbuffer_other, + Buffer *fmbuffer, Buffer *fmbuffer_other) { bool use_fsm = !(options & HEAP_INSERT_SKIP_FSM); Buffer buffer = InvalidBuffer; @@ -316,6 +374,8 @@ RelationGetBufferForTuple(Relation relation, Size len, buffer = ReadBufferBI(relation, targetBlock, bistate); if (PageIsAllVisible(BufferGetPage(buffer))) visibilitymap_pin(relation, targetBlock, vmbuffer); + if (PageIsAllFrozen(BufferGetPage(buffer))) + frozenmap_pin(relation, targetBlock, fmbuffer); LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); } else if (otherBlock == targetBlock) @@ -324,6 +384,8 @@ RelationGetBufferForTuple(Relation relation, Size len, buffer = otherBuffer; if (PageIsAllVisible(BufferGetPage(buffer))) visibilitymap_pin(relation, targetBlock, vmbuffer); + if (PageIsAllFrozen(BufferGetPage(buffer))) + frozenmap_pin(relation, targetBlock, fmbuffer); LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); } else if (otherBlock < targetBlock) @@ -332,6 +394,8 @@ RelationGetBufferForTuple(Relation relation, Size len, buffer = ReadBuffer(relation, targetBlock); if (PageIsAllVisible(BufferGetPage(buffer))) visibilitymap_pin(relation, targetBlock, vmbuffer); + if (PageIsAllFrozen(BufferGetPage(buffer))) + frozenmap_pin(relation, targetBlock, fmbuffer); LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE); LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); } @@ -341,6 +405,8 @@ RelationGetBufferForTuple(Relation relation, Size len, buffer = ReadBuffer(relation, targetBlock); if (PageIsAllVisible(BufferGetPage(buffer))) visibilitymap_pin(relation, targetBlock, vmbuffer); + if (PageIsAllFrozen(BufferGetPage(buffer))) + frozenmap_pin(relation, targetBlock, fmbuffer); LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE); } @@ -367,13 +433,23 @@ RelationGetBufferForTuple(Relation relation, Size len, * done. */ if (otherBuffer == InvalidBuffer || buffer <= otherBuffer) + { GetVisibilityMapPins(relation, buffer, otherBuffer, targetBlock, otherBlock, vmbuffer, vmbuffer_other); + GetFrozenMapPins(relation, buffer, otherBuffer, + targetBlock, otherBlock, fmbuffer, + fmbuffer_other); + } else + { GetVisibilityMapPins(relation, otherBuffer, buffer, otherBlock, targetBlock, vmbuffer_other, vmbuffer); + GetFrozenMapPins(relation, otherBuffer, buffer, + otherBlock, targetBlock, fmbuffer_other, + fmbuffer); + } /* * Now we can check to see if there's enough free space here. If so, diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c index 4f06a26..9a67733 100644 --- a/src/backend/access/rmgrdesc/heapdesc.c +++ b/src/backend/access/rmgrdesc/heapdesc.c @@ -149,6 +149,20 @@ heap2_desc(StringInfo buf, XLogReaderState *record) } } +void +heap3_desc(StringInfo buf, XLogReaderState *record) +{ + char *rec = XLogRecGetData(record); + uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; + + if (info == XLOG_HEAP3_FROZENMAP) + { + xl_heap_clean *xlrec = (xl_heap_clean *) rec; + + appendStringInfo(buf, "remxid %u", xlrec->latestRemovedXid); + } +} + const char * heap_identify(uint8 info) { @@ -226,3 +240,18 @@ heap2_identify(uint8 info) return id; } + +const char * +heap3_identify(uint8 info) +{ + const char *id = NULL; + + switch (info & ~XLR_INFO_MASK) + { + case XLOG_HEAP3_FROZENMAP: + id = "FROZENMAP"; + break; + } + + return id; +} diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index ce398fc..961775e 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -19,6 +19,7 @@ #include "postgres.h" +#include "access/frozenmap.h" #include "access/visibilitymap.h" #include "access/xact.h" #include "access/xlog.h" @@ -228,6 +229,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks) { bool fsm; bool vm; + bool fm; /* Open it at the smgr level if not already done */ RelationOpenSmgr(rel); @@ -238,6 +240,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks) rel->rd_smgr->smgr_targblock = InvalidBlockNumber; rel->rd_smgr->smgr_fsm_nblocks = InvalidBlockNumber; rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber; + rel->rd_smgr->smgr_fm_nblocks = InvalidBlockNumber; /* Truncate the FSM first if it exists */ fsm = smgrexists(rel->rd_smgr, FSM_FORKNUM); @@ -249,6 +252,11 @@ RelationTruncate(Relation rel, BlockNumber nblocks) if (vm) visibilitymap_truncate(rel, nblocks); + /* Truncate the frozen map too if it exists. */ + fm = smgrexists(rel->rd_smgr, FROZENMAP_FORKNUM); + if (fm) + frozenmap_truncate(rel, nblocks); + /* * We WAL-log the truncation before actually truncating, which means * trouble if the truncation fails. If we then crash, the WAL replay @@ -282,7 +290,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks) * with a truncated heap, but the FSM or visibility map would still * contain entries for the non-existent heap pages. */ - if (fsm || vm) + if (fsm || vm || fm) XLogFlush(lsn); } diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 3febdd5..80a9f96 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -17,6 +17,7 @@ */ #include "postgres.h" +#include "access/frozenmap.h" #include "access/multixact.h" #include "access/relscan.h" #include "access/rewriteheap.h" @@ -1484,6 +1485,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap, Oid mapped_tables[4]; int reindex_flags; int i; + Buffer fmbuffer = InvalidBuffer, + buf = InvalidBuffer; + Relation rel; + BlockNumber nblocks, blkno; /* Zero out possible results from swapped_relation_files */ memset(mapped_tables, 0, sizeof(mapped_tables)); @@ -1591,6 +1596,26 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap, RelationMapRemoveMapping(mapped_tables[i]); /* + * We can ensure that the all tuple of new relation has been completely + * frozen at this point since we aquired AccessExclusiveLock already. + * We set a bit on frozen map and flag to page header to each page. + */ + rel = relation_open(OIDOldHeap, NoLock); + nblocks = RelationGetNumberOfBlocks(rel); + for (blkno = 0; blkno < nblocks; blkno++) + { + buf = ReadBuffer(rel, blkno); + PageSetAllFrozen(BufferGetPage(buf)); + frozenmap_pin(rel, blkno, &fmbuffer); + frozenmap_set(rel, blkno, buf, InvalidXLogRecPtr, fmbuffer); + ReleaseBuffer(buf); + } + + if (fmbuffer != InvalidBuffer) + ReleaseBuffer(fmbuffer); + relation_close(rel, NoLock); + + /* * At this point, everything is kosher except that, if we did toast swap * by links, the toast table's name corresponds to the transient table. * The name is irrelevant to the backend because it's referenced by OID, diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index c3d6e59..8e9940b 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -37,6 +37,7 @@ #include <math.h> +#include "access/frozenmap.h" #include "access/genam.h" #include "access/heapam.h" #include "access/heapam_xlog.h" @@ -106,6 +107,7 @@ typedef struct LVRelStats BlockNumber rel_pages; /* total number of pages */ BlockNumber scanned_pages; /* number of pages we examined */ BlockNumber pinskipped_pages; /* # of pages we skipped due to a pin */ + BlockNumber fmskipped_pages; /* # of pages we skipped by frozen map */ double scanned_tuples; /* counts only tuples on scanned pages */ double old_rel_tuples; /* previous value of pg_class.reltuples */ double new_rel_tuples; /* new estimated total # of tuples */ @@ -222,6 +224,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, * than or equal to the requested Xid full-table scan limit; or if the * table's minimum MultiXactId is older than or equal to the requested * mxid full-table scan limit. + * Even if scan_all is set so far, we could skip to scan some pages + * according by frozen map. */ scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid, xidFullScanLimit); @@ -247,20 +251,22 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, vac_close_indexes(nindexes, Irel, NoLock); /* - * Compute whether we actually scanned the whole relation. If we did, we - * can adjust relfrozenxid and relminmxid. + * Compute whether we actually scanned the whole relation. If we did, + * we can adjust relfrozenxid and relminmxid. * * NB: We need to check this before truncating the relation, because that * will change ->rel_pages. */ - if (vacrelstats->scanned_pages < vacrelstats->rel_pages) + if ((vacrelstats->scanned_pages + vacrelstats->fmskipped_pages) + < vacrelstats->rel_pages) { - Assert(!scan_all); scanned_all = false; } else scanned_all = true; + scanned_all |= scan_all; + /* * Optionally truncate the relation. * @@ -450,7 +456,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, IndexBulkDeleteResult **indstats; int i; PGRUsage ru0; - Buffer vmbuffer = InvalidBuffer; + Buffer vmbuffer = InvalidBuffer, + fmbuffer = InvalidBuffer; BlockNumber next_not_all_visible_block; bool skipping_all_visible_blocks; xl_heap_freeze_tuple *frozen; @@ -533,6 +540,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, hastup; int prev_dead_count; int nfrozen; + int already_nfrozen; /* # of tuples already frozen */ + int ntup_blk; /* # of tuples in single page */ Size freespace; bool all_visible_according_to_vm; bool all_visible; @@ -562,12 +571,33 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, else skipping_all_visible_blocks = false; all_visible_according_to_vm = false; + + /* Even if current block is not all-visible, we scan skip vacuum + * this block only when corresponding frozen map bit is set, and + * whole table scanning is required. + */ + if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all) + { + vacrelstats->fmskipped_pages++; + continue; + } } else { - /* Current block is all-visible */ + /* + * Current block is all-visible. + * If frozen map represents that it's all frozen and this + * function is called for freezing tuples, we can skip to + * vacuum block. + */ + if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all) + { + vacrelstats->fmskipped_pages++; + continue; + } if (skipping_all_visible_blocks && !scan_all) continue; + all_visible_according_to_vm = true; } @@ -592,6 +622,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, vmbuffer = InvalidBuffer; } + if (BufferIsValid(fmbuffer)) + { + ReleaseBuffer(fmbuffer); + fmbuffer = InvalidBuffer; + } + /* Log cleanup info before we touch indexes */ vacuum_log_cleanup_info(onerel, vacrelstats); @@ -621,6 +657,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, * and did a cycle of index vacuuming. */ visibilitymap_pin(onerel, blkno, &vmbuffer); + frozenmap_pin(onerel, blkno, &fmbuffer); buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno, RBM_NORMAL, vac_strategy); @@ -763,6 +800,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, all_visible = true; has_dead_tuples = false; nfrozen = 0; + already_nfrozen = 0; + ntup_blk = 0; hastup = false; prev_dead_count = vacrelstats->num_dead_tuples; maxoff = PageGetMaxOffsetNumber(page); @@ -917,8 +956,13 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, else { num_tuples += 1; + ntup_blk += 1; hastup = true; + /* If current tuple is already frozen, count it up */ + if (HeapTupleHeaderXminFrozen(tuple.t_data)) + already_nfrozen += 1; + /* * Each non-removable tuple must be checked to see if it needs * freezing. Note we already have exclusive buffer lock. @@ -952,6 +996,27 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, heap_execute_freeze_tuple(htup, &frozen[i]); } + /* + * If the un-frozen tuple is remaining in current page and + * current page is marked as ALL_FROZEN, we should clear it. + */ + if (ntup_blk != (nfrozen + already_nfrozen) + && PageIsAllFrozen(page)) + { + PageClearAllFrozen(page); + frozenmap_clear(onerel, blkno, fmbuffer); + } + /* + * As a result of scanning a page, we ensure that all tuples + * are completely frozen. Set bit on frozen map and PD_ALL_FROZEN + * flag on page. + */ + else if (ntup_blk == (nfrozen + already_nfrozen)) + { + PageSetAllFrozen(page); + frozenmap_set(onerel, blkno, buf, InvalidXLogRecPtr, fmbuffer); + } + /* Now WAL-log freezing if neccessary */ if (RelationNeedsWAL(onerel)) { @@ -1077,13 +1142,18 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, num_tuples); /* - * Release any remaining pin on visibility map page. + * Release any remaining pin on visibility map and frozen map page. */ if (BufferIsValid(vmbuffer)) { ReleaseBuffer(vmbuffer); vmbuffer = InvalidBuffer; } + if (BufferIsValid(fmbuffer)) + { + ReleaseBuffer(fmbuffer); + fmbuffer = InvalidBuffer; + } /* If any tuples need to be deleted, perform final vacuum cycle */ /* XXX put a threshold on min number of tuples here? */ diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index f96fb24..67898df 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -92,7 +92,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList) if (exprType((Node *) tle->expr) != attr->atttypid) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("table row type and query-specified row type do not match"), + errmsg("table row type and query-specified row type do not match"), errdetail("Table has type %s at ordinal position %d, but query expects %s.", format_type_be(attr->atttypid), attno, @@ -117,7 +117,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList) if (attno != resultDesc->natts) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("table row type and query-specified row type do not match"), + errmsg("table row type and query-specified row type do not match"), errdetail("Query has too few columns."))); } diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index eb7293f..d66660d 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -55,6 +55,7 @@ typedef struct XLogRecordBuffer static void DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf); static void DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf); static void DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf); +static void DecodeHeap3Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf); static void DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf); static void DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf); @@ -104,6 +105,10 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor DecodeStandbyOp(ctx, &buf); break; + case RM_HEAP3_ID: + DecodeHeap3Op(ctx, &buf); + break; + case RM_HEAP2_ID: DecodeHeap2Op(ctx, &buf); break; @@ -300,6 +305,29 @@ DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) } /* + * Handle rmgr HEAP3_ID records for DecodeRecordIntoReorderBuffer(). + */ +static void +DecodeHeap3Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) +{ + uint8 info = XLogRecGetInfo(buf->record) & XLOG_HEAP_OPMASK; + SnapBuild *builder = ctx->snapshot_builder; + + /* no point in doing anything yet */ + if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) + return; + + switch (info) + { + case XLOG_HEAP3_FROZENMAP: + break; + default: + elog(ERROR, "unexpected RM_HEAP3_ID record type: %u", info); + } + +} + +/* * Handle rmgr HEAP2_ID records for DecodeRecordIntoReorderBuffer(). */ static void diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c index 244b4ea..666e682 100644 --- a/src/backend/storage/smgr/smgr.c +++ b/src/backend/storage/smgr/smgr.c @@ -168,6 +168,7 @@ smgropen(RelFileNode rnode, BackendId backend) reln->smgr_targblock = InvalidBlockNumber; reln->smgr_fsm_nblocks = InvalidBlockNumber; reln->smgr_vm_nblocks = InvalidBlockNumber; + reln->smgr_fm_nblocks = InvalidBlockNumber; reln->smgr_which = 0; /* we only have md.c at present */ /* mark it not open */ diff --git a/src/common/relpath.c b/src/common/relpath.c index 66dfef1..7eba9ee 100644 --- a/src/common/relpath.c +++ b/src/common/relpath.c @@ -35,6 +35,7 @@ const char *const forkNames[] = { "main", /* MAIN_FORKNUM */ "fsm", /* FSM_FORKNUM */ "vm", /* VISIBILITYMAP_FORKNUM */ + "fm", /* FROZENMAP_FORKNUM */ "init" /* INIT_FORKNUM */ }; @@ -58,7 +59,7 @@ forkname_to_number(const char *forkName) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid fork name"), errhint("Valid fork names are \"main\", \"fsm\", " - "\"vm\", and \"init\"."))); + "\"vm\", \"fm\" and \"init\"."))); #endif return InvalidForkNumber; diff --git a/src/include/access/frozenmap.h b/src/include/access/frozenmap.h new file mode 100644 index 0000000..0f2e54e --- /dev/null +++ b/src/include/access/frozenmap.h @@ -0,0 +1,33 @@ +/*------------------------------------------------------------------------- + * + * frozenmap.h + * frozen map interface + * + * + * Portions Copyright (c) 2007-2015, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/access/frozenmap.h + * + *------------------------------------------------------------------------- + */ +#ifndef FROZENMAP_H +#define FROZENMAP_H + +#include "access/xlogdefs.h" +#include "storage/block.h" +#include "storage/buf.h" +#include "utils/relcache.h" + +extern void frozenmap_clear(Relation rel, BlockNumber heapBlk, + Buffer fmbuf); +extern void frozenmap_pin(Relation rel, BlockNumber heapBlk, + Buffer *fmbuf); +extern bool frozenmap_pin_ok(BlockNumber heapBlk, Buffer fmbuf); +extern void frozenmap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, + XLogRecPtr recptr, Buffer fmBuf); +extern bool frozenmap_test(Relation rel, BlockNumber heapBlk, Buffer *fmbuf); +extern BlockNumber frozenmap_count(Relation rel); +extern void frozenmap_truncate(Relation rel, BlockNumber nheapblocks); + +#endif /* FROZENMAP_H */ diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index f0f89de..087cfeb 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -60,6 +60,13 @@ #define XLOG_HEAP2_NEW_CID 0x70 /* + * heapam.c has a third RmgrId now. These opcodes are associated with + * RM_HEAP3_ID, but are not logically different fromthe ones above + * asssociated with RM_HEAP_ID. XLOG_HEAP_OPMASK applies to these, too. + */ +#define XLOG_HEAP3_FROZENMAP 0x00 + +/* * xl_heap_* ->flag values, 8 bits are available. */ /* PD_ALL_VISIBLE was cleared */ @@ -73,6 +80,10 @@ #define XLOG_HEAP_SUFFIX_FROM_OLD (1<<6) /* last xl_heap_multi_insert record for one heap_multi_insert() call */ #define XLOG_HEAP_LAST_MULTI_INSERT (1<<7) +/* PD_ALL_FROZEN was cleared for INSERT and UPDATE */ +#define XLOG_HEAP_ALL_FROZEN_CLEARED (1<<8) +/* PD_ALL_FROZEN was cleared for INSERT and UPDATE */ +#define XLOG_HEAP_NEW_ALL_FROZEN_CLEARED (1<<9) /* convenience macro for checking whether any form of old tuple was logged */ #define XLOG_HEAP_CONTAINS_OLD \ @@ -110,12 +121,12 @@ typedef struct xl_heap_header typedef struct xl_heap_insert { OffsetNumber offnum; /* inserted tuple's offset */ - uint8 flags; + uint16 flags; /* xl_heap_header & TUPLE DATA in backup block 0 */ } xl_heap_insert; -#define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint8)) +#define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint16)) /* * This is what we need to know about a multi-insert. @@ -130,7 +141,7 @@ typedef struct xl_heap_insert */ typedef struct xl_heap_multi_insert { - uint8 flags; + uint16 flags; uint16 ntuples; OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER]; } xl_heap_multi_insert; @@ -170,7 +181,7 @@ typedef struct xl_heap_update TransactionId old_xmax; /* xmax of the old tuple */ OffsetNumber old_offnum; /* old tuple's offset */ uint8 old_infobits_set; /* infomask bits to set on old tuple */ - uint8 flags; + uint16 flags; TransactionId new_xmax; /* xmax of the new tuple */ OffsetNumber new_offnum; /* new tuple's offset */ @@ -342,6 +353,9 @@ extern const char *heap_identify(uint8 info); extern void heap2_redo(XLogReaderState *record); extern void heap2_desc(StringInfo buf, XLogReaderState *record); extern const char *heap2_identify(uint8 info); +extern void heap3_redo(XLogReaderState *record); +extern void heap3_desc(StringInfo buf, XLogReaderState *record); +extern const char *heap3_identify(uint8 info); extern void heap_xlog_logical_rewrite(XLogReaderState *r); extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode, @@ -354,6 +368,8 @@ extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer, extern XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid, xl_heap_freeze_tuple *tuples, int ntuples); +extern XLogRecPtr log_heap_frozenmap(RelFileNode rnode, Buffer heap_buffer, + Buffer fm_buffer); extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, TransactionId cutoff_multi, diff --git a/src/include/access/hio.h b/src/include/access/hio.h index b014029..1a27ee8 100644 --- a/src/include/access/hio.h +++ b/src/include/access/hio.h @@ -40,6 +40,8 @@ extern void RelationPutHeapTuple(Relation relation, Buffer buffer, extern Buffer RelationGetBufferForTuple(Relation relation, Size len, Buffer otherBuffer, int options, BulkInsertState bistate, - Buffer *vmbuffer, Buffer *vmbuffer_other); + Buffer *vmbuffer, Buffer *vmbuffer_other, + Buffer *fmbuffer, Buffer *fmbuffer_other + ); #endif /* HIO_H */ diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h index 48f04c6..e49c0b0 100644 --- a/src/include/access/rmgrlist.h +++ b/src/include/access/rmgrlist.h @@ -34,6 +34,7 @@ PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, tblspc_identify, N PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, multixact_identify, NULL, NULL) PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, NULL) PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL) +PG_RMGR(RM_HEAP3_ID, "Heap3", heap3_redo, heap3_desc, heap3_identify, NULL, NULL) PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL) PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL) PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, NULL, NULL) diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h index 8b4c35c..8420e47 100644 --- a/src/include/catalog/pg_class.h +++ b/src/include/catalog/pg_class.h @@ -47,6 +47,8 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO float4 reltuples; /* # of tuples (not always up-to-date) */ int32 relallvisible; /* # of all-visible blocks (not always * up-to-date) */ + int32 relallfrozen; /* # of all-frozen blocks (not always + up-to-date) */ Oid reltoastrelid; /* OID of toast table; 0 if none */ bool relhasindex; /* T if has (or has had) any indexes */ bool relisshared; /* T if shared across databases */ @@ -95,7 +97,7 @@ typedef FormData_pg_class *Form_pg_class; * ---------------- */ -#define Natts_pg_class 30 +#define Natts_pg_class 31 #define Anum_pg_class_relname 1 #define Anum_pg_class_relnamespace 2 #define Anum_pg_class_reltype 3 @@ -107,25 +109,26 @@ typedef FormData_pg_class *Form_pg_class; #define Anum_pg_class_relpages 9 #define Anum_pg_class_reltuples 10 #define Anum_pg_class_relallvisible 11 -#define Anum_pg_class_reltoastrelid 12 -#define Anum_pg_class_relhasindex 13 -#define Anum_pg_class_relisshared 14 -#define Anum_pg_class_relpersistence 15 -#define Anum_pg_class_relkind 16 -#define Anum_pg_class_relnatts 17 -#define Anum_pg_class_relchecks 18 -#define Anum_pg_class_relhasoids 19 -#define Anum_pg_class_relhaspkey 20 -#define Anum_pg_class_relhasrules 21 -#define Anum_pg_class_relhastriggers 22 -#define Anum_pg_class_relhassubclass 23 -#define Anum_pg_class_relrowsecurity 24 -#define Anum_pg_class_relispopulated 25 -#define Anum_pg_class_relreplident 26 -#define Anum_pg_class_relfrozenxid 27 -#define Anum_pg_class_relminmxid 28 -#define Anum_pg_class_relacl 29 -#define Anum_pg_class_reloptions 30 +#define Anum_pg_class_relallfrozen 12 +#define Anum_pg_class_reltoastrelid 13 +#define Anum_pg_class_relhasindex 14 +#define Anum_pg_class_relisshared 15 +#define Anum_pg_class_relpersistence 16 +#define Anum_pg_class_relkind 17 +#define Anum_pg_class_relnatts 18 +#define Anum_pg_class_relchecks 19 +#define Anum_pg_class_relhasoids 20 +#define Anum_pg_class_relhaspkey 21 +#define Anum_pg_class_relhasrules 22 +#define Anum_pg_class_relhastriggers 23 +#define Anum_pg_class_relhassubclass 24 +#define Anum_pg_class_relrowsecurity 25 +#define Anum_pg_class_relispopulated 26 +#define Anum_pg_class_relreplident 27 +#define Anum_pg_class_relfrozenxid 28 +#define Anum_pg_class_relminmxid 29 +#define Anum_pg_class_relacl 30 +#define Anum_pg_class_reloptions 31 /* ---------------- * initial contents of pg_class @@ -140,13 +143,13 @@ typedef FormData_pg_class *Form_pg_class; * Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId; * similarly, "1" in relminmxid stands for FirstMultiXactId */ -DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ )); +DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ )); DESCR(""); -DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ )); +DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ )); DESCR(""); -DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ )); +DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ )); DESCR(""); -DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ )); +DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 31 0 t f f f f f t n 3 1 _null_ _null_ )); DESCR(""); diff --git a/src/include/common/relpath.h b/src/include/common/relpath.h index a263779..5d40997 100644 --- a/src/include/common/relpath.h +++ b/src/include/common/relpath.h @@ -27,6 +27,7 @@ typedef enum ForkNumber MAIN_FORKNUM = 0, FSM_FORKNUM, VISIBILITYMAP_FORKNUM, + FROZENMAP_FORKNUM, INIT_FORKNUM /* @@ -38,7 +39,7 @@ typedef enum ForkNumber #define MAX_FORKNUM INIT_FORKNUM -#define FORKNAMECHARS 4 /* max chars for a fork name */ +#define FORKNAMECHARS 5 /* max chars for a fork name */ extern const char *const forkNames[]; diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index c2fbffc..f46375d 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -178,8 +178,10 @@ typedef PageHeaderData *PageHeader; * tuple? */ #define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to * everyone */ +#define PD_ALL_FROZEN 0x0008 /* all tuples on page are completely + frozen */ -#define PD_VALID_FLAG_BITS 0x0007 /* OR of all valid pd_flags bits */ +#define PD_VALID_FLAG_BITS 0x000F /* OR of all valid pd_flags bits */ /* * Page layout version number 0 is for pre-7.3 Postgres releases. @@ -367,6 +369,13 @@ typedef PageHeaderData *PageHeader; #define PageClearAllVisible(page) \ (((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE) +#define PageIsAllFrozen(page) \ + (((PageHeader) (page))->pd_flags & PD_ALL_FROZEN) +#define PageSetAllFrozen(page) \ + (((PageHeader) (page))->pd_flags |= PD_ALL_FROZEN) +#define PageClearAllFrozen(page) \ + (((PageHeader) (page))->pd_flags &= ~PD_ALL_FROZEN) + #define PageIsPrunable(page, oldestxmin) \ ( \ AssertMacro(TransactionIdIsNormal(oldestxmin)), \ diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h index 69a624f..2173c20 100644 --- a/src/include/storage/smgr.h +++ b/src/include/storage/smgr.h @@ -55,6 +55,7 @@ typedef struct SMgrRelationData BlockNumber smgr_targblock; /* current insertion target block */ BlockNumber smgr_fsm_nblocks; /* last known size of fsm fork */ BlockNumber smgr_vm_nblocks; /* last known size of vm fork */ + BlockNumber smgr_fm_nblocks; /* last known size of fm fork */ /* additional public fields may someday exist here */
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers