> On 02/13/2021 11:49 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Feb 12, 2021 at 10:00 PM <e...@xs4all.nl> wrote: > > > > > On 02/12/2021 1:51 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Fri, Feb 12, 2021 at 6:04 PM Erik Rijkers <e...@xs4all.nl> wrote: > > > > > > > > I am seeing errors in replication in a test program that I've been > > > > running for years with very little change (since 2017, really [1]). > > > > Hi, > > > > Here is a test program. Careful, it deletes stuff. And it will need some > > changes: > > > > Thanks for sharing the test. I think I have found the problem. > Actually, it was an existing code problem exposed by the commit > ce0fdbfe97. In pgoutput_begin_txn(), we were sometimes sending the > prepare_write ('w') message but then the actual message was not being > sent. This was the case when we didn't found the origin of a txn. This > can happen after that commit because we have now started using origins > for tablesync workers as well and those origins are removed once the > tablesync workers are finished. We might want to change the behavior > related to the origin messages as indicated in the comments but for > now, fixing the existing code. > > Can you please test if the attached fixes the problem at your end as well?
> [fix_origin_message_1.patch] I compiled just now a binary from HEAD, and a binary from HEAD+patch HEAD is still broken; your patch rescues it, so yes, fixed. Maybe a test (check or check-world) should be added to run a second replica? (Assuming that would have caught this bug) Thanks, Erik Rijkers > > -- > With Regards, > Amit Kapila.