On Fri, Feb 17, 2023 at 1:24 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > I was working on testing a major upgrade scenario using a mix of physical and > logical replication when I faced some unexpected problem leading to missing > rows. Note that my motivation is to rely on physical replication / physical > backup to avoid recreating a node from scratch using logical replication, as > the initial sync with logical replication is much more costly and impacting > compared to pg_basebackup / restoring a physical backup, but the same problem > exist if you just pg_upgrade a node that has subscriptions. > > The problem is that pg_upgrade creates the subscriptions on the newly upgraded > node using "WITH (connect = false)", which seems expected as you obviously > don't want to try to connect to the publisher at that point. But then once > the > newly upgraded node is restarted and ready to replace the previous one, unless > I'm missing something there's absolutely no possibility to use the created > subscriptions without losing some data from the publisher. > > The reason is that the subscription doesn't have a local list of relation to > process until you refresh the subscription, but you can't refresh the > subscription without enabling it (and you can't enable it in a transaction), > which means that you have to let the logical worker start, consume and ignore > all changes that happened on the publisher side until the refresh happens. > > An easy workaround that I tried is to allow something like > > ALTER SUBSCRIPTION ... ENABLE WITH (refresh = true, copy_data = false) > > so that the refresh internally happens before the apply worker is started and > you just keep consuming the delta, which works on naive scenario. > > One concern I have with this approach is that the default values for both > "refresh" and "copy_data" for all other subcommands is "true, but we would > probably need a different default value in that exact scenario (as we know we > already have the data). I think that it would otherwise be safe in my very > specific scenario, assuming that you created the slot beforehand and moved the > slot's LSN at the promotion point, as even if you add non-empty tables to the > publication you will only need the delta whether those were initially empty or > not given your initial physical replica state. >
This point is not very clear. Why would one just need delta even for new tables? > Any other scenario would make > this new option dangerous, if not entirely useless, but not more than any of > the current commands that lead to refreshing a subscription and have the same > options I guess. > > All in all, currently the only way to somewhat safely resume logical > replication after a pg_upgrade is to drop all the subscriptions that were > transferred during pg_upgrade on all databases and recreate them (using the > existing slots on the publisher side obviously), allowing the initial > connection. But this approach only works in the exact scenario I mentioned > (physical to logical replication, or at least a case where *all* the tables > where logically replicated prior to the pg_ugprade), otherwise you have to > recreate the follower node from scratch using logical repication. > I think if you dropped and recreated the subscriptions by retaining old slots, the replication should resume from where it left off before the upgrade. Which scenario are you concerned about? > Is that indeed the current behavior, or did I miss something? > > Is this "resume logical replication on pg_upgraded node" something we want to > support better? I was thinking that we could add a new pg_dump mode (maybe > only usable during pg_upgrade) that also restores the pg_subscription_rel > content in each subscription or something like that. If not, should > pg_upgrade > keep preserving the subscriptions as it doesn't seem safe to use them, or at > least document the hazards (I didn't find anything about it in the > documentation)? > > There is a mention of this in pg_dump docs. See [1] (When dumping logical replication subscriptions ...) [1] - https://www.postgresql.org/docs/devel/app-pgdump.html -- With Regards, Amit Kapila.