On Tue, Aug 16, 2022 at 10:04 PM Robert Haas <robertmh...@gmail.com> wrote: > > Hi, > > I was looking at the code for pg_walinspect today and I think I may > have found a bug (or else I'm confused about how this all works, which > is also possible). ReadNextXLogRecord() takes an argument first_record > of type XLogRecPtr which is used only for error reporting purposes: if > we fail to read the next record for a reason other than end-of-WAL, we > complain that we couldn't read the WAL at the LSN specified by > first_record. > > ReadNextXLogRecord() has three callers. In pg_get_wal_record_info(), > we're just reading a single record, and the LSN passed as first_record > is the LSN at which that record starts. Cool. But in the other two > callers, GetWALRecordsInfo() and GetWalStats(), we're reading multiple > records, and first_record is always passed as the LSN of the first > record. That's logical enough given the name of the argument, but the > effect of it seems to be that an error while reading any of the > records will be reported using the LSN of the first record, which does > not seem right.
Indeed. Thanks a lot for finding it. > By contrast, pg_rewind's extractPageMap() reports the error using > xlogreader->EndRecPtr. I think that's correct. The toplevel xlogreader > function that we're calling here is XLogReadRecord(), which in turn > calls XLogNextRecord(), which has this comment: > > /* > * state->EndRecPtr is expected to have been set by the last call to > * XLogBeginRead() or XLogNextRecord(), and is the location of the > * error. > */ > > Thoughts? Agreed. Here's a patch (for V15 as well) fixing this bug, please review. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
v1-master-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch
Description: Binary data
v1-PG15-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch
Description: Binary data