On 8/7/18 11:42 AM, Stephen Frost wrote: > >>> CRC's are per WAL record, and possibly some WAL records might not be ok >>> to replay, or at least we need to make sure that we replay the right set >>> of WAL in the right order even when there are partial WAL files being >>> given to PG (that aren't named that way...). The more I think about >>> this, I think we really need to avoid partial WAL files entirely- what >>> are we going to do when we get to the end of one? We'd need to request >>> the full one from the restore command anyway, so seems like we should >>> just go ahead and get it from the archive, the question is if there's an >>> easy/cheap way to detect partial WAL files in pg_wal. >> >> As explained above, I don't think this is actually a problem. The checksums >> do cover the whole file thanks to chaining, and there are ways to detect >> partial segments. IMHO it's fine if we replay a segment and then find out it >> was partial and that we need to fetch it from archive anyway and re-apply it >> - it should not be very common case, except when the user does something >> silly. > > As long as we *do* go off and try to fetch that WAL file and replay it, > and don't assume that the end of that partial WAL file means the end of > WAL replay, then I think you may be right and that it'd be fine, but it > does seem a bit risky to me.
This assumes that the local partial is a subset of the archived full WAL segment, which should be true in most cases but I don't think we can discount the possibility that it isn't. Split-brain is certainly a way to get to differing partials, though in that case things are already pretty bad. I've seen some pretty messed up situations and usually it is best to treat the WAL archive as the ground truth. If the archive_command is smart enough not to overwrite WAL segments that already exist with different versions then it should be a reliable record that all servers can be replayed from (split-brains aside). I think it's best to treat the local WAL with some suspicion unless it is known to be good, i.e. just restored from archive. I do agree that most inconsistencies could be detected and throw an error, but only if the WAL in the repository is examined, which means making a round-trip there anyway. At the very least, it seems that simple enabling "read from pg_wal first" is not a good idea without making other changes to ensure it is done correctly. Regards, -- -David da...@pgmasters.net