On Thu, Jul 15, 2021 at 6:14 AM Jeremy Schneider <schnj...@amazon.com> wrote: > > On 7/2/21 18:57, Jeremy Schneider wrote: > > The process of trying to understand this recent incident has given me some > new insight about what information would be helpful up front in this error > message for faster resolution. > > First off, and most importantly, the current WAL record we're processing when > the error is encountered. I wonder if it could easily print the LSN? > > Secondly, the transaction ID. In the specific bug Bertrand found, the problem > is actually not with the actual WAL record that's being processed - but > rather because previous WAL records in the same transaction left the decoder > process in a state where the current WAL record [a commit] generated an > error. So it's the entire transaction that needs to be examined to reproduce > the error. (Andres actually pointed this out on the original thread back in > December 2019.) I realize that once you know the LSN you can easily get the > XID with pg_waldump, but personally I'd just as soon include the XID in the > error message since I think it will usually be a first step for debugging any > problems with WAL decoding. The I can go straight to filtering that XID on my > first pg_waldump run. >
I don't think it is a bad idea to print additional information as you are suggesting but why only for this error? It could be useful to investigate any other error we get during decoding. I think normally we add such additional information via error_context. We have recently added/enhanced it for apply-workers, see commit [1]. I think here we should just print the relation name in the error message you pointed out and then work on adding additional information via error context as a separate patch. What do you think? [1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=abc0910e2e0adfc5a17e035465ee31242e32c4fc -- With Regards, Amit Kapila.