Hi,

On Wed, Apr 29, 2026 at 8:41 PM Chao Li <[email protected]> wrote:
>
> > <v5-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-Fix-pg_get_publication_tables-race-with-concurren.patch><v5-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.txt>
>
> I am afraid this is only a partial fix.

Thanks for reviewing it. Please find my responses below.

> ```
> @@ -1599,12 +1621,18 @@ pg_get_publication_tables(FunctionCallInfo fcinfo, 
> ArrayType *pubnames,
>                 /* Show all columns when the column list is not specified. */
>                 if (nulls[2])
>                 {
> -                       Relation        rel = table_open(relid, 
> AccessShareLock);
> +                       Relation        rel = try_table_open(relid, 
> AccessShareLock);
>                         int                     nattnums = 0;
>                         int16      *attnums;
> -                       TupleDesc       desc = RelationGetDescr(rel);
> +                       TupleDesc       desc;
>                         int                     i;
>
> +                       /* Skip if the relation has been concurrently 
> dropped. */
> +                       if (rel == NULL)
> +                               continue;
> ```
>
> This change uses try_table_open() to detect whether a table has been dropped, 
> but try_table_open() is only called when the column list is not specified. If 
> a table is included in the publication with an explicit column list, then 
> even if it is dropped concurrently, it may still be returned by 
> pg_get_publication_tables().

Right. The try_table_open() is only needed there because that's the
only code path that actually opens the relation (to enumerate its
columns). The column list path reads from the pg_publication_rel
catalog - it never calls table_open(), so it cannot hit the ERROR.

> So this patch removes the “could not open relation with OID” error, but it 
> does not fully ensure the accuracy of the returned table list.

IMO, no function returning table OIDs can guarantee they remain valid
- a drop can happen right after we return the OID, and accuracy is in
the caller's hands. All the callers of pg_get_publication_tables()
already handle this by JOINing with pg_class.

> It also introduces inconsistent behavior between tables published with and 
> without column lists.

These two paths do different things - one needs the relation open, the
other doesn't. For callers, the outcome is the same: any stale OID
gets filtered out by their pg_class JOIN.

> To resolve the race condition completely, I think we should try to open the 
> table regardless of whether a column list is specified.

IMO, that would add unnecessary locking overhead in a path that
doesn't need it. The bug is the ERROR, and this patch fixes it.

In short, when no column list is specified,
pg_get_publication_tables() needs to open the relation to enumerate
all publishable columns and return them as an int2vector in the attrs
output column. It currently uses table_open() for this, which errors
out with concurrent table drops. This patch fixes it by using
try_table_open() and skipping the relation if it's been dropped.

Thoughts?

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com


Reply via email to