On 2015-12-15 10:53:58 -0500, Robert Haas wrote: > On Tue, Dec 15, 2015 at 9:51 AM, Andres Freund <and...@anarazel.de> wrote: > > Unless in recovery in the startup process, or when EXTENSION_CREATE is > > passed to it. Depending on whether it or mdnblocks were called first, > > and depending on which segment is missing. In that case it'd *possibly* > > pad the last block in a sgment with zeroes, Yes, only the last block: > > Yes, that's clearly inadequate. I think the fact that it only pads > the last block, possibly creating a sparse file, should also be fixed, > but as a separate patch.
I was wondering about that - but after a bit of thinking I'm disinclined to got hat way. Consider the scenario where we write to the last block in a humongous relation, drop that relation, and then crash, before a checkpoint. Right now we'll write the last block of each segment during recovery. After such a change we'd end up rewriting the whole file... I think we should primarily update that comment. > 1. Before extending a relation to RELSEG_SIZE, verify that the next > segment of the relation doesn't already exist on disk. If we find out > that it does, then throw an error and refuse the extension. That bit wouldn't work directly that way IIRC - the segment is allowed to exist as an 'inactive' segment. We only truncate segments to 0 in mdtruncate()... But a size 0 check should do the trick. > 2. Teach _mdfd_getseg() that a segment of a size less than RELSEG_SIZE > is, by definition, the last segment, and don't even check whether > further files exist on disk. mdnblocks() already behaves this way, so > it would just be making things consistent. > However, I don't think this is exactly what you are proposing. I'm > skeptical of the idea that _mdfd_getseg() should probe ahead to see > whether we're dealing with a malformed relation where the intermediate > segments still exist but have zero length. That's not exactly what I was thinking of. I'm was thinking of doing a _mdnblocks(reln, forknum, v) == RELSEG_SIZE check in _mdfd_getseg()'s main loop, whenever nextsegno < targetseg. That'll make that check rather cheap. Which sounds pretty much like your 2). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers