pg_replication_origin_drop API potential race condition

Peter Smith Tue, 26 Jan 2021 15:28:17 -0800

Hi Hackers.

As discovered elsewhere [ak0125] there is a potential race condition
in the pg_replication_origin_drop API


The current code of pg_replication_origin_drop looks like:
====
roident = replorigin_by_name(name, false);
Assert(OidIsValid(roident));

replorigin_drop(roident, true);
====

Users cannot deliberately drop a non-existent origin
(replorigin_by_name passes missing_ok = false) but there is still a
small window where concurrent processes may be able to call
replorigin_drop for the same valid roident.

Locking within replorigin_drop guards against concurrent drops so the
1st execution will succeed, but then the 2nd execution would give
internal cache error: elog(ERROR, "cache lookup failed for replication
origin with oid %u", roident);

Some ideas to fix this include:
1. Do nothing except write a comment about this in the code. The
internal ERROR is not ideal for a user API there is no great harm
done.
2. Change the behavior of replorigin_drop to be like
replorigin_drop_IF_EXISTS, so the 2nd execution of this race would
silently do nothing when it finds the roident is already gone.
3. Same as 2, but make the NOP behavior more explicit by introducing a
new "missing_ok" parameter for replorigin_drop.

Thoughts?

----
[ak0125] 
https://www.postgresql.org/message-id/CAA4eK1%2ByeLwBCkTvTdPM-hSk1fr6jT8KJc362CN8zrGztq_JqQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

pg_replication_origin_drop API potential race condition

Reply via email to