At Thu, 19 May 2022 16:42:31 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote in > On Thu, May 19, 2022 at 3:16 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > This happens after "ALTER SUBSCRIPTION sub1 SET PUBLICATION pub9". The > > probable theory is that ALTER SUBSCRIPTION will lead to restarting of > > apply worker (which we can see in LOGS as well) and after the restart,
Yes. > > the apply worker will use the existing slot and replication origin > > corresponding to the subscription. Now, it is possible that before > > restart the origin has not been updated and the WAL start location > > points to a location prior to where PUBLICATION pub9 exists which can > > lead to such an error. Once this error occurs, apply worker will never > > be able to proceed and will always return the same error. Does this > > make sense? Wow. I didin't thought that line. That theory explains the silence and makes sense even though I don't see LSN transistions that clearly support it. I dimly remember a similar kind of problem.. > > Unless you or others see a different theory, this seems to be the > > existing problem in logical replication which is manifested by this > > test. If we just want to fix these test failures, we can create a new > > subscription instead of altering the existing publication to point to > > the new publication. > > > > If the above theory is correct then I think allowing the publisher to > catch up with "$node_publisher->wait_for_catchup('sub1');" before > ALTER SUBSCRIPTION should fix this problem. Because if before ALTER > both publisher and subscriber are in sync then the new publication > should be visible to WALSender. It looks right to me. That timetravel seems inintuitive but it's the (current) way it works. regards. -- Kyotaro Horiguchi NTT Open Source Software Center