On 2015-12-09 16:50:06 -0500, Robert Haas wrote: > Now, if subsequent to this an index scan happens to sweep through and > try to fetch a block in 123456.2, it will work! This happens because > _mdfd_getseg() doesn't care about the length of the segments; it only > cares whether or not they exist.
Unless in recovery in the startup process, or when EXTENSION_CREATE is passed to it. Depending on whether it or mdnblocks were called first, and depending on which segment is missing. In that case it'd *possibly* pad the last block in a sgment with zeroes, Yes, only the last block: * * We have to maintain the invariant that segments before the last * active segment are of size RELSEG_SIZE; therefore, pad them out * with zeroes if needed. (This only matters if caller is * extending the relation discontiguously, but that can happen in * hash indexes.) */ if (behavior == EXTENSION_CREATE || InRecovery) { if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE) { char *zerobuf = palloc0(BLCKSZ); mdextend(reln, forknum, nextsegno * ((BlockNumber) RELSEG_SIZE) - 1, zerobuf, skipFsync); pfree(zerobuf); } v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT); } > If 123456.1 were actually missing, > then we'd never test whether 123456.2 exists and we'd get an error. > But because mdnblocks() created 123456.1, _mdfd_getseg() is now quite > happy to fetch blocks in 123456.2; it considers the empty 123456.1 > file to be a sufficient reason to look for 123456.2, and seeing that > file, and finding the block it wants therein, it happily returns that > block, blithely ignoring the fact that it passed over a completely .1 > segment before returning a block from .2. This is maybe not the > smartest thing ever. I think it might be worthwhile to add a check against this. It'd not be very hard to ensure the segment is indeed large enough iff further segments exist. Possibly not an ERROR, but a WARNING might be useful. To avoid overhead and absurd log spamming we should make that check only if the the segment isn't already open. > The comment in mdnblocks.c says this: > > * Because we pass O_CREAT, we will create the > next segment (with > * zero length) immediately, if the last > segment is of length > * RELSEG_SIZE. While perhaps not strictly > necessary, this keeps > * the logic simple. > > I don't really see how this "keeps the logic simple". What it does is > allow sequential scans and index scans to have two completely > different notions of how many accessible blocks there are in the > relation. Granted, this kind of thing really shouldn't happen, but > sometimes bad things do happen. Therefore, I propose the attached > patch. I'm in favor of changing this behaviour - it's indeed weird and somewhat error prone. Do you want to change this in master, or backpatch a change? To me the code in md.c is subtle and complicated enough that I'm rather inclined to only change this in master. This code is really hard to test effectively :(. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers