On Fri, Feb 02, 2018 at 12:21:49AM +0000, Simon Riggs wrote: > Yes, it would be about 99% of the time.
When it comes to recovery, I don't think that 99% is a guarantee sufficient. (Wondering about the maths behind such a number as well.) > But you have it backwards - we are not assuming that case. That is the > only case that has risk - the one where an old WAL record starts at > exactly the place the latest one stops. Otherwise the rest of the WAL > record will certainly fail the CRC check, since it will effectively > have random data in it, as you say. Your patch assumes that a single WAL segment recycling is fine to escape based on the file name, but you need to think beyond that. It seems to me that your assumption is wrong if the tail of a segment gets reused after more cycles than a single one, which could happen when doing recovery from an archive, where segments used could have junk in them. So you actually *increase* the odds of problems if a segment is forcibly switched and archived, then reused in recovery after being fetched from an archive. Since 9.4, the tail of WAL segments is filled with zeros so this brings more confidence, but I think that we cannot exchange the existing reliability with performance. So like others have expressed on this thread, the approach taken does not sound good to me, because we increase the risks of junk WAL records being used during recovery, just for the sake of performance. An approach that could be better is to replace XLogRecord->xl_prev by a new field in XLogPageHeaderData which tracks XLogRecPtr for the previous page, say xlp_prevpageaddr, and use that for all sanity checks that xlogreader.c is doing. If I can count with my fingers correctly, SizeOfXLogLongPHD is 40 bytes long, and SizeOfXLogShortPHD is 24 bytes, so even with an increase of 8 bytes for the page header you gain space as one record is usually much likely less than a full xlog page if you remove xl_prev. -- Michael
signature.asc
Description: PGP signature