Thanks for the suggestions. I think in the future I'll do something like this rather than try to re-use existing subscriptions.
I'm still trying to understand what went wrong though. Putting a finer point on my question: Does pg_upgrade mess up disabled subscriptions? On Fri, May 19, 2023 at 1:55 PM Elterman, Michael <melter...@enova.com> wrote: > Please, use the following runbook. > 1. Disable the subscription to pg10. > 2. Disable Application Users on Publisher. > 3. Drop all replication slots on Publisher (The upgrade can not be > executed if there are any replication slots) > 4. Run RDS's upgrade (which runs pg_upgrade). > 5. Recreate replication slots with the same names. > 6. Enable Application Users on Publisher. > 7. Re-Enable the subscriptions to the newly upgraded server. > Good luck > > On Fri, May 19, 2023 at 11:49 AM Mike Lissner < > mliss...@michaeljaylissner.com> wrote: > >> I also am realizing belatedly that my solution of dropping the subscriber >> probably won't work anyway, since I'd lose the changes on the publisher for >> the duration of the upgrade. Maybe I could drop the subscription while >> keeping the slot on the publisher, and then create a new subscription after >> the upgrade using that slot and copy_data=False? Getting wonky. >> >> On Fri, May 19, 2023 at 8:17 AM Mike Lissner < >> mliss...@michaeljaylissner.com> wrote: >> >>> Hi all, >>> >>> In AWS RDS, we are using logical replication between a postgresql 14 >>> publisher and a postgresql 10 subscriber. The subscriber is rather old, so >>> yesterday I tried to update it using AWS's built in upgrade tool (which >>> uses pg_upgrade behind the scenes). >>> >>> I did a pretty thorough test run before beginning, but the live run went >>> pretty poorly. My process was: >>> >>> 1. Disable the subscription to pg10. >>> 2. Run RDS's upgrade (which runs pg_upgrade). >>> 3. Re-Enable the subscription to the newly upgraded server. >>> >>> The idea was that the publisher could still be live and collect changes, >>> and then on step 3, those changes would flush to the newly upgraded server. >>> >>> When I hit step three, things went awry. From what I can tell, it seems >>> like pg_upgrade might have wiped out the LSN location of the subscriber, >>> because I was getting many messages in the logs saying: >>> >>> 2023-05-19 01:01:09 >>> UTC:100.20.224.120(56536):django@courtlistener:[29669]:STATEMENT: >>> CREATE_REPLICATION_SLOT "pg_18278_sync_86449755_7234675743763347169" >>> LOGICAL pgoutput USE_SNAPSHOT2023-05-19 01:01:09 >>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: replication >>> slot "pg_18278_sync_16561_7234675743763347169" does not exist2023-05-19 >>> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:STATEMENT: >>> DROP_REPLICATION_SLOT pg_18278_sync_16561_7234675743763347169 >>> WAIT2023-05-19 01:01:09 >>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: all >>> replication slots are in use2023-05-19 01:01:09 >>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:HINT: Free one or >>> increase max_replication_slots. >>> >>> I followed those instructions, and upped max_replication_slots to 200. >>> That fixed that error, but then I had errors about COPY commands failing, >>> and looking in the publisher I saw about 150 slots like: >>> >>> select * from pg_replication_slots ; >>> slot_name | plugin | slot_type | datoid >>> | database | temporary | active | active_pid | xmin | catalog_xmin | >>> restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase >>> --------------------------------------------+----------+-----------+--------+---------------+-----------+--------+------------+------+--------------+--------------+---------------------+------------+---------------+----------- >>> pg_18278_sync_86449408_7234675743763347169 | pgoutput | logical | 16428 >>> | courtlistener | f | t | 6906 | | 859962500 | >>> EA5/954A9F18 | | reserved | | f >>> pg_18278_sync_20492279_7234675743763347169 | pgoutput | logical | 16428 >>> | courtlistener | f | f | | | 859962448 | >>> EA5/9548EDF0 | EA5/9548EE28 | reserved | | f >>> pg_18278_sync_16940_7234675743763347169 | pgoutput | logical | 16428 >>> | courtlistener | f | f | | | 859962448 | >>> EA5/9548EE60 | EA5/9548EE98 | reserved | | f >>> >>> >>> So this looks like it's trying to sync all of the existing tables all >>> over again when I re-enabled the subscription. >>> >>> Does that make sense? In the future, I'll DROP the subscription and then >>> create a new one with copy_data=False, but this was a real gotcha. >>> >>> Anybody know what's going on here? >>> >>> Thanks, >>> >>> Mike >>> >>