I wrote: > It's been kind of hidden by other buildfarm noise, but > 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4]. > ... > I think this is showing us a real bug, ie we sometimes fail to cancel > the conflicting query.
After digging around in the code, I think this is almost certainly some manifestation of the previously-complained-of problem [1] that RecoveryConflictInterrupt is not safe to call in a signal handler, leading the conflicting backend to sometimes decide that it's not the problem. That squares with the observation that skink is more prone to show this than other animals: you'd have to get the SIGUSR1 while the target backend isn't idle, so a very slow machine ought to show it more. We don't seem to have that issue on the open items list, but I'll go add it. Not sure if the 'buffer pin conflict: stats show conflict on standby' failure could trace to a similar cause. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com