On Mon, Aug 16, 2021 at 9:24 AM Michael Paquier <mich...@paquier.xyz> wrote: > > Hi all, > > topminnow has just failed in a weird fashion: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2021-08-15%2013%3A24%3A58 > # SELECT pid != FROM pg_stat_replication WHERE application_name = 'tap_sub'; > # expecting this output: > # t > # last actual query output: > # > # with stderr: > # ERROR: syntax error at or near "FROM" > # LINE 1: SELECT pid != FROM pg_stat_replication WHERE application_na... > > Looking at the logs, it seems like the problem boils down to an active > slot when using ALTER SUBSCRIPTION tap_sub CONNECTION: > 2021-08-15 18:44:38.027 CEST [16473:2] ERROR: could not start WAL > streaming: ERROR: replication slot "tap_sub" is active for PID 16336 >
The "ALTER SUBSCRIPTION tap_sub CONNECTION" would lead to restart the walsender process. Now, here the problem seems to be that the previous walsender process (16336) didn't exit and the new process started to use the slot which leads to the error shown in the log. This is evident from the below part of log where we can see that 16336 has exited after new process started to use the slot. 2021-08-15 18:44:38.027 CEST [16475:6] tap_sub LOG: received replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', publication_names '"tap_pub","tap_pub_ins_only"') 2021-08-15 18:44:38.027 CEST [16475:7] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', publication_names '"tap_pub","tap_pub_ins_only"') 2021-08-15 18:44:38.027 CEST [16475:8] tap_sub ERROR: replication slot "tap_sub" is active for PID 16336 2021-08-15 18:44:38.027 CEST [16475:9] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', publication_names '"tap_pub","tap_pub_ins_only"') 2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG: disconnection: session time: 0:00:00.036 user=nm database=postgres host=[local] 2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG: disconnection: session time: 0:00:06.367 user=nm database=postgres host=[local] One idea to solve this is to first disable the subscription, wait for the walsender process to exit, and then change the connection string and re-enable the subscription. -- With Regards, Amit Kapila.