On Wed, Mar 25, 2020 at 8:53 AM Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > HINT: This is to be expected if this is the end of the WAL. Otherwise, > it could indicate corruption.
First, I agree that this general issue is a problem, because it's come up for me in quite a number of customer situations. Either people get scared when they shouldn't, because the message is innocuous, or they don't get scared about other things that actually are scary, because if some scary-looking messages are actually innocuous, it can lead people to believe that the same is true in other cases. Second, I don't really like the particular formulation you have above, because the user still doesn't know whether or not to be scared. Can we figure that out? I think if we're in crash recovery, I think that we should not be scared, because we have no alternative to assuming that we've reached the end of WAL, so all crash recoveries will end like this. If we're in archive recovery, we should definitely be scared if we haven't yet reached the minimum recovery point, because more WAL than that should certainly exist. After that, it depends on how we got the WAL. If it's being streamed, the question is whether we've reached the end of what got streamed. If it's being copied from the archive, we ought to have the whole segment, but maybe not more. Can we get the right context to the point where the error is being reported to know whether we hit the error at the end of the WAL that was streamed? If not, can we somehow rejigger things so that we only make it sound scary if we keep getting stuck at the same point when we woud've expected to make progress meanwhile? I'm just spitballing here, but it would be really good if there's a way to know definitely whether or not you should be scared. Corrupted WAL segments are definitely a thing that happens, but retries are a lot more common. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company