On Tue, Mar 16, 2021 at 9:00 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > > > > Hi, > > > > This seems to be a new low frequency failure, I didn't see it mentioned > > already: > > > > Thanks for reporting, I'll look into it. >
By looking at the logs [1] in the buildfarm, I think I know what is going on here. After Create Subscription, the tablesync worker is launched and tries to create the slot for doing the initial copy but before it could finish creating the slot, we issued the Drop Subscription which first stops the tablesync worker and then tried to drop its slot. Now, it is quite possible that by the time Drop Subscription tries to drop the tablesync slot, it is not yet created. We treat this condition okay and just Logs the message. I don't think this is an issue because anyway generally such a slot created on the server will be dropped before we persist it but the test was checking the existence of slots on server before it gets dropped. I think we can avoid such a situation by preventing cancel/die interrupts while creating tablesync slot. This is a timing issue, so I have reproduced it via debugger and tested that the attached patch fixes it. [1]: 2021-02-23 09:57:47.593 UTC [6034d19b.291aed:7] LOG: received replication command: CREATE_REPLICATION_SLOT "pg_16396_sync_16384_6932396177428838370" LOGICAL pgoutput USE_SNAPSHOT 2021-02-23 09:57:47.593 UTC [6034d19b.291aed:8] STATEMENT: CREATE_REPLICATION_SLOT "pg_16396_sync_16384_6932396177428838370" LOGICAL pgoutput USE_SNAPSHOT 2021-02-23 09:57:47.664 UTC [6034d19b.291ae2:14] LOG: disconnection: session time: 0:00:00.130 user=andres database=postgres host=[local] 2021-02-23 09:57:47.686 UTC [6034d19b.291af3:1] LOG: connection received: host=[local] 2021-02-23 09:57:47.687 UTC [6034d19b.291af3:2] LOG: replication connection authorized: user=andres application_name=tap_sub 2021-02-23 09:57:47.688 UTC [6034d19b.291af3:3] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2021-02-23 09:57:47.688 UTC [6034d19b.291af3:4] LOG: received replication command: DROP_REPLICATION_SLOT pg_16396_sync_16384_6932396177428838370 WAIT 2021-02-23 09:57:47.688 UTC [6034d19b.291af3:5] STATEMENT: DROP_REPLICATION_SLOT pg_16396_sync_16384_6932396177428838370 WAIT 2021-02-23 09:57:47.688 UTC [6034d19b.291af3:6] ERROR: replication slot "pg_16396_sync_16384_6932396177428838370" does not exist 2021-02-23 09:57:47.688 UTC [6034d19b.291af3:7] STATEMENT: DROP_REPLICATION_SLOT pg_16396_sync_16384_6932396177428838370 WAIT -- With Regards, Amit Kapila.
v1-0001-Prevent-cancel-die-interrupts-while-creating-tabl.patch
Description: Binary data