> On 02/13/2021 11:49 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
> 
> On Fri, Feb 12, 2021 at 10:00 PM <e...@xs4all.nl> wrote:
> >
> > > On 02/12/2021 1:51 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> > >
> > > On Fri, Feb 12, 2021 at 6:04 PM Erik Rijkers <e...@xs4all.nl> wrote:
> > > >
> > > > I am seeing errors in replication in a test program that I've been 
> > > > running for years with very little change (since 2017, really [1]).
> >
> > Hi,
> >
> > Here is a test program.  Careful, it deletes stuff.  And it will need some 
> > changes:
> >
> 
> Thanks for sharing the test. I think I have found the problem.
> Actually, it was an existing code problem exposed by the commit
> ce0fdbfe97. In pgoutput_begin_txn(), we were sometimes sending the
> prepare_write ('w') message but then the actual message was not being
> sent. This was the case when we didn't found the origin of a txn. This
> can happen after that commit because we have now started using origins
> for tablesync workers as well and those origins are removed once the
> tablesync workers are finished. We might want to change the behavior
> related to the origin messages as indicated in the comments but for
> now, fixing the existing code.
> 
> Can you please test if the attached fixes the problem at your end as well?

> [fix_origin_message_1.patch]

I compiled just now a binary from HEAD, and a binary from HEAD+patch

HEAD is still broken; your patch rescues it, so yes, fixed.

Maybe a test (check or check-world) should be added to run a second replica?  
(Assuming that would have caught this bug)


Thanks,

Erik Rijkers
 






> 
> -- 
> With Regards,
> Amit Kapila.


Reply via email to