On Thu, Aug 3, 2023 at 9:12 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote:
> > Just to confirm, which subversion of PG15 are you using on the subscriber > side ? > Is it 15.3 ? I am confirming because I recall we fixed similar bug before > in commit [1]. > Yes. Specifically on the target/subscriber: # select version(); version ----------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 15.3 (Ubuntu 15.3-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit (1 row) Installed on Ubuntu 22.04 via the PGDG apt repo. The source/publisher side is: # select version(); version ----------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 12.15 (Ubuntu 12.15-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit (1 row) Are there any ERRORs(including erros other than the slots number) in the > log of publisher and subscriber. > It would be helpful to see the log in case there are some other ERRORs > that could cause problem. > On the subscriber I see a series of errors similar to this: 2023-08-03 18:22:50.196 UTC [239889] LOG: logical replication table synchronization worker for subscription "sub01", table "users" has started 2023-08-03 18:22:50.209 UTC [239889] ERROR: could not create replication slot "pg_19742_sync_18602_7263122209699118815": ERROR: all replication slots are in use HINT: Free one or increase max_replication_slots. on the publisher I see this corresponding message: 2023-08-03 18:22:50.199 UTC [13884] [unknown]@[unknown] - [unknown] LOG: connection received: host=10.228.88.74 port=47338 2023-08-03 18:22:50.207 UTC [13884] migrator@foo - [unknown] LOG: replication connection authorized: user=migrator application_name=pg_19742_sync_18602_7263122209699118815 SSL enabled (protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off) 2023-08-03 18:22:50.209 UTC [13884] migrator@foo - pg_19742_sync_18602_7263122209699118815 ERROR: all replication slots are in use 2023-08-03 18:22:50.209 UTC [13884] migrator@foo - pg_19742_sync_18602_7263122209699118815 HINT: Free one or increase max_replication_slots. 2023-08-03 18:22:50.209 UTC [13884] migrator@foo - pg_19742_sync_18602_7263122209699118815 STATEMENT: CREATE_REPLICATION_SLOT "pg_19742_sync_18602_7263122209699118815" LOGICAL pgoutput USE_SNAPSHOT 2023-08-03 18:22:50.209 UTC [13884] migrator@foo - pg_19742_sync_18602_7263122209699118815 LOG: disconnection: session time: 0:00:00.009 user=migrator database=foo host=10.228.88.74 port=47338 and later they include this message about trying to drop them first on the subscriber: 2023-08-03 18:47:39.485 UTC [263502] LOG: logical replication table synchronization worker for subscription "sub01", table "foo_accounts" has started 2023-08-03 18:47:39.496 UTC [263502] LOG: could not drop replication slot "pg_19742_sync_17238_7263122209699118815" on publisher: ERROR: replication slot "pg_19742_sync_17238_7263122209699118815" does not exist 2023-08-03 18:47:39.497 UTC [263502] ERROR: could not create replication slot "pg_19742_sync_17238_7263122209699118815": ERROR: all replication slots are in use HINT: Free one or increase max_replication_slots. and on the publisher side for that: 2023-08-03 18:47:39.487 UTC [3232] [unknown]@[unknown] - [unknown] LOG: connection received: host=10.228.88.74 port=17390 2023-08-03 18:47:39.494 UTC [3232] migrator@foo - [unknown] LOG: replication connection authorized: user=migrator application_name=pg_19742_sync_17238_7263122209699118815 SSL enabled (protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off) 2023-08-03 18:47:39.495 UTC [3232] migrator@foo - pg_19742_sync_17238_7263122209699118815 ERROR: replication slot "pg_19742_sync_17238_7263122209699118815" does not exist 2023-08-03 18:47:39.495 UTC [3232] migrator@foo - pg_19742_sync_17238_7263122209699118815 STATEMENT: DROP_REPLICATION_SLOT pg_19742_sync_17238_7263122209699118815 WAIT 2023-08-03 18:47:39.496 UTC [3232] migrator@foo - pg_19742_sync_17238_7263122209699118815 ERROR: all replication slots are in use 2023-08-03 18:47:39.496 UTC [3232] migrator@foo - pg_19742_sync_17238_7263122209699118815 HINT: Free one or increase max_replication_slots. 2023-08-03 18:47:39.496 UTC [3232] migrator@foo - pg_19742_sync_17238_7263122209699118815 STATEMENT: CREATE_REPLICATION_SLOT "pg_19742_sync_17238_7263122209699118815" LOGICAL pgoutput USE_SNAPSHOT 2023-08-03 18:47:39.496 UTC [3232] migrator@foo - pg_19742_sync_17238_7263122209699118815 LOG: disconnection: session time: 0:00:00.009 user=migrator datab ase=foo host=10.228.88.74 port=17390 and then this from a different pid, maybe some kind of cleanup process? It was doing this for what seemed like all of the sync slots: 2023-08-03 18:47:47.975 UTC [3303] migrator@foo - sub01 ERROR: replication slot "pg_19742_sync_17238_7263122209699118815" does not exist 2023-08-03 18:47:47.975 UTC [3303] migrator@foo - sub01 STATEMENT: DROP_REPLICATION_SLOT pg_19742_sync_17238_7263122209699118815 WAIT -- Don Seiler www.seiler.us