On Thu, Jul 15, 2021 at 6:14 AM Jeremy Schneider <schnj...@amazon.com> wrote:
>
> On 7/2/21 18:57, Jeremy Schneider wrote:
>
> The process of trying to understand this recent incident has given me some 
> new insight about what information would be helpful up front in this error 
> message for faster resolution.
>
> First off, and most importantly, the current WAL record we're processing when 
> the error is encountered. I wonder if it could easily print the LSN?
>
> Secondly, the transaction ID. In the specific bug Bertrand found, the problem 
> is actually not with the actual WAL record that's being processed - but 
> rather because previous WAL records in the same transaction left the decoder 
> process in a state where the current WAL record [a commit] generated an 
> error.  So it's the entire transaction that needs to be examined to reproduce 
> the error.  (Andres actually pointed this out on the original thread back in 
> December 2019.)  I realize that once you know the LSN you can easily get the 
> XID with pg_waldump, but personally I'd just as soon include the XID in the 
> error message since I think it will usually be a first step for debugging any 
> problems with WAL decoding. The I can go straight to filtering that XID on my 
> first pg_waldump run.
>

I don't think it is a bad idea to print additional information as you
are suggesting but why only for this error? It could be useful to
investigate any other error we get during decoding. I think normally
we add such additional information via error_context. We have recently
added/enhanced it for apply-workers, see commit [1].

I think here we should just print the relation name in the error
message you pointed out and then work on adding additional information
via error context as a separate patch. What do you think?

[1] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=abc0910e2e0adfc5a17e035465ee31242e32c4fc


--
With Regards,
Amit Kapila.


Reply via email to