On Tue, Mar 16, 2021 at 12:29 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Tue, Mar 16, 2021 at 9:00 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > > > > > > Hi, > > > > > > This seems to be a new low frequency failure, I didn't see it mentioned > > > already: > > > > > > > Thanks for reporting, I'll look into it. > > > > By looking at the logs [1] in the buildfarm, I think I know what is > going on here. After Create Subscription, the tablesync worker is > launched and tries to create the slot for doing the initial copy but > before it could finish creating the slot, we issued the Drop > Subscription which first stops the tablesync worker and then tried to > drop its slot. Now, it is quite possible that by the time Drop > Subscription tries to drop the tablesync slot, it is not yet created. > We treat this condition okay and just Logs the message. I don't think > this is an issue because anyway generally such a slot created on the > server will be dropped before we persist it but the test was checking > the existence of slots on server before it gets dropped. I think we > can avoid such a situation by preventing cancel/die interrupts while > creating tablesync slot. > > This is a timing issue, so I have reproduced it via debugger and > tested that the attached patch fixes it. >
Thanks for the patch. I was able to reproduce the issue using debugger by making it wait at CreateReplicationSlot. After applying the patch the issue gets solved. Few minor comments: 1) subscrition should be subscription in the below change: + * Prevent cancel/die interrupts while creating slot here because it is + * possible that before the server finishes this command a concurrent drop + * subscrition happens which would complete without removing this slot + * leading to a dangling slot on the server. */ 2) "finishes this command a concurrent drop" should be "finishes this command, a concurrent drop" in the below change: + * Prevent cancel/die interrupts while creating slot here because it is + * possible that before the server finishes this command a concurrent drop + * subscrition happens which would complete without removing this slot + * leading to a dangling slot on the server. */ Regards, Vignesh