Hi,
On Wed, Apr 29, 2026 at 8:41 PM Chao Li <[email protected]> wrote:
>
> > <v5-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-Fix-pg_get_publication_tables-race-with-concurren.patch><v5-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.txt>
>
> I am afraid this is only a partial fix.
Thanks for reviewing it. Please find my responses below.
> ```
> @@ -1599,12 +1621,18 @@ pg_get_publication_tables(FunctionCallInfo fcinfo,
> ArrayType *pubnames,
> /* Show all columns when the column list is not specified. */
> if (nulls[2])
> {
> - Relation rel = table_open(relid,
> AccessShareLock);
> + Relation rel = try_table_open(relid,
> AccessShareLock);
> int nattnums = 0;
> int16 *attnums;
> - TupleDesc desc = RelationGetDescr(rel);
> + TupleDesc desc;
> int i;
>
> + /* Skip if the relation has been concurrently
> dropped. */
> + if (rel == NULL)
> + continue;
> ```
>
> This change uses try_table_open() to detect whether a table has been dropped,
> but try_table_open() is only called when the column list is not specified. If
> a table is included in the publication with an explicit column list, then
> even if it is dropped concurrently, it may still be returned by
> pg_get_publication_tables().
Right. The try_table_open() is only needed there because that's the
only code path that actually opens the relation (to enumerate its
columns). The column list path reads from the pg_publication_rel
catalog - it never calls table_open(), so it cannot hit the ERROR.
> So this patch removes the “could not open relation with OID” error, but it
> does not fully ensure the accuracy of the returned table list.
IMO, no function returning table OIDs can guarantee they remain valid
- a drop can happen right after we return the OID, and accuracy is in
the caller's hands. All the callers of pg_get_publication_tables()
already handle this by JOINing with pg_class.
> It also introduces inconsistent behavior between tables published with and
> without column lists.
These two paths do different things - one needs the relation open, the
other doesn't. For callers, the outcome is the same: any stale OID
gets filtered out by their pg_class JOIN.
> To resolve the race condition completely, I think we should try to open the
> table regardless of whether a column list is specified.
IMO, that would add unnecessary locking overhead in a path that
doesn't need it. The bug is the ERROR, and this patch fixes it.
In short, when no column list is specified,
pg_get_publication_tables() needs to open the relation to enumerate
all publishable columns and return them as an int2vector in the attrs
output column. It currently uses table_open() for this, which errors
out with concurrent table drops. This patch fixes it by using
try_table_open() and skipping the relation if it's been dropped.
Thoughts?
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com