Re: pg_replication_origin_drop API potential race condition

Amit Kapila Wed, 03 Feb 2021 04:18:00 -0800

On Wed, Jan 27, 2021 at 4:58 AM Peter Smith <[email protected]> wrote:
>
> Hi Hackers.
>
> As discovered elsewhere [ak0125] there is a potential race condition
> in the pg_replication_origin_drop API
>
> The current code of pg_replication_origin_drop looks like:
> ====
> roident = replorigin_by_name(name, false);
> Assert(OidIsValid(roident));
>
> replorigin_drop(roident, true);
> ====
>
> Users cannot deliberately drop a non-existent origin
> (replorigin_by_name passes missing_ok = false) but there is still a
> small window where concurrent processes may be able to call
> replorigin_drop for the same valid roident.
>
> Locking within replorigin_drop guards against concurrent drops so the
> 1st execution will succeed, but then the 2nd execution would give
> internal cache error: elog(ERROR, "cache lookup failed for replication
> origin with oid %u", roident);
>
> Some ideas to fix this include:
> 1. Do nothing except write a comment about this in the code. The
> internal ERROR is not ideal for a user API there is no great harm
> done.
> 2. Change the behavior of replorigin_drop to be like
> replorigin_drop_IF_EXISTS, so the 2nd execution of this race would
> silently do nothing when it finds the roident is already gone.
> 3. Same as 2, but make the NOP behavior more explicit by introducing a
> new "missing_ok" parameter for replorigin_drop.
>


How about if we call replorigin_by_name() inside replorigin_drop after
acquiring the lock? Wouldn't that close this race condition? We are
doing something similar for pg_replication_origin_advance().

-- 
With Regards,
Amit Kapila.

Re: pg_replication_origin_drop API potential race condition

Reply via email to