Re: row filtering for logical replication

Peter Smith Sat, 27 Nov 2021 23:18:34 -0800

On Fri, Nov 26, 2021 at 1:16 PM [email protected]
<[email protected]> wrote:
>
...
> Based on this direction, I tried to write a top up POC patch(0005) which I'd 
> like to share.
>
> The top up patch mainly did the following things.
>
> * Move the row filter columns invalidation to CheckCmdReplicaIdentity, so that
> the invalidation is executed only when actual UPDATE or DELETE executed on the
> published relation. It's consistent with the existing check about replica
> identity.
>
> * Cache the results of the validation for row filter columns in relcache to
> reduce the cost of the validation. It's safe because every operation that
> change the row filter and replica identity will invalidate the relcache.
>
> Also attach the v42 patch set to keep cfbot happy.


Hi Hou-san.

Thanks for providing your "top-up" 0005 patch!

I suppose the goal will be to later merge this top-up with the current
0002 validation patch, but in the meantime here are my review comments
for 0005.

======

1) src/include/catalog/pg_publication.h - PublicationInfo
+typedef struct PublicationInfo
+{
+ PublicationActions pubactions;
+
+ /*
+ * True if pubactions don't include UPDATE and DELETE or
+ * all the columns in the row filter expression are part
+ * of replica identity.
+ */
+ bool rfcol_valid_for_replid;
+} PublicationInfo;
+

IMO "PublicationInfo" sounded too much like it is about the
Publication only, but IIUC it is really *per* Relation publication
info, right? So I thought perhaps it should be called more like struct
"RelationPubInfo".

======

2) src/include/catalog/pg_publication.h - PublicationInfo

The member "rfcol_valid_for_replid" also seems a little bit mis-named
because in some scenario (not UPDATE/DELETE) it can be true even if
there is not replica identity columns. So I thought perhaps it should
be called more like just "rfcols_valid"

Another thing - IIUC this is a kind of a "unified" boolean that covers
*all* filters for this Relation (across multiple publications). If
that is right., then the comment for this member should say something
about this.

======

3) src/include/catalog/pg_publication.h - PublicationInfo

This new typedef should be added to src/tools/pgindent/typedefs.list

======

4) src/backend/catalog/pg_publication.c - check_rowfilter_replident
+/*
+ * Check if all the columns used in the row-filter WHERE clause are part of
+ * REPLICA IDENTITY
+ */
+bool
+check_rowfilter_replident(Node *node, Bitmapset *bms_replident)
+{

IIUC here the false means "valid" and true means "invalid" which is
counter-intuitive to me. So at least true/false meaning ought to be
clarified in the function comment, and/or perhaps also rename the
function so that the return meaning is more obvious.

======

5) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity
+ pubinfo = RelationGetPublicationInfo(rel);
+

IIUC this pubinfo* is palloced *every* time by
RelationGetPublicationInfo isn't it? If that is the case shouldn't
CheckCmdReplicaIdentity be doing a pfree(pubinfo)?

======

6) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity
+ pubinfo = RelationGetPublicationInfo(rel);
+
+ /*
+ * if not all columns in the publication row filter are part of the REPLICA
+ * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE.
+ */
+ if (!pubinfo->rfcol_valid_for_replid)
+ {
+ if (cmd == CMD_UPDATE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot update table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Not all row filter columns are not part of the REPLICA
IDENTITY")));
+ else if (cmd == CMD_DELETE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot delete from table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Not all row filter columns are not part of the REPLICA
IDENTITY")));

The comment seemed worded in a confusingly negative way.

Before:
+ * if not all columns in the publication row filter are part of the REPLICA
+ * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE.

My Suggestion:
It is only safe to execute UPDATE/DELETE when all columns of the
publication row filters are part of the REPLICA IDENTITY.

~~

Also, is "publication row filter" really the correct terminology?
AFAIK it is more like *all* filters for this Relation across multiple
publications, but I have not got a good idea how to word that in a
comment. Anyway, I have a feeling this whole idea might be impacted by
other discussions in this RF thread.

======

7) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity

Error messages have double negative wording? I think Greg already
commented on this same point.

+ errdetail("Not all row filter columns are not part of the REPLICA
IDENTITY")));

======

8) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity

But which are the bad filter columns?

Previously the Row Filter column validation gave errors for the
invalid filter column, but in this top-up patch there is no indication
which column or which filter or which publication was the bad one -
only that "something" bad was detected. IMO this might make it very
difficult for the user to know enough about the cause of the problem
to be able to fix the offending filter.

======

9) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity

  /* If relation has replica identity we are always good. */
  if (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
  OidIsValid(RelationGetReplicaIndex(rel)))

I was wondering if the check for REPLICA_IDENTITY_FULL should go
*before* your new call to pubinfo = RelationGetPublicationInfo(rel);
because IIUC if *every* column is a member of the replica identity
then the filter validation is not really necessary at all.

======

10) src/backend/utils/cache/relcache.c - function
GetRelationPublicationActions
@@ -5547,22 +5548,45 @@ RelationGetExclusionInfo(Relation indexRelation,
 struct PublicationActions *
 GetRelationPublicationActions(Relation relation)
 {
- List    *puboids;
- ListCell   *lc;
- MemoryContext oldcxt;
- Oid schemaid;
- PublicationActions *pubactions = palloc0(sizeof(PublicationActions));
+ PublicationInfo    *pubinfo;
+ PublicationActions *pubactions = palloc0(sizeof(PublicationInfo));
+
+ pubinfo = RelationGetPublicationInfo(relation);

Just assign pubinfo at the declaration instead of later in the function body.

======

11) src/backend/utils/cache/relcache.c - function
GetRelationPublicationActions

+ pubactions = memcpy(pubactions, relation->rd_pubinfo,
+ sizeof(PublicationActions));

Isn't that memcpy slightly incorrect and only working because the
pubactions happens to be the first member of the PublicationInfo? I
thought it should really be copying from
"&relation->rd_pubinfo->pubactions", right?

======

12) src/backend/utils/cache/relcache.c - function
GetRelationPublicationActions

Excessive blank lines following this function.

======

13). src/backend/utils/cache/relcache.c - function RelationGetPublicationInfo
+/*
+ * Get publication information for the given relation.
+ */
+struct PublicationInfo *
+RelationGetPublicationInfo(Relation relation)
+{
+ List    *puboids;
+ ListCell    *lc;
+ MemoryContext oldcxt;
+ Oid schemaid;
+ Bitmapset    *bms_replident = NULL;
+ PublicationInfo *pubinfo = palloc0(sizeof(PublicationInfo));
+
+ pubinfo->rfcol_valid_for_replid = true;

It is not entirely clear to me why this function is always pallocing
the PublicationInfo and then returning a copy of what is stored in the
relation->rd_pubinfo. This then puts a burden on the callers (like the
GetRelationPublicationActions etc) to make sure to free that memory.
Why can't we just return the relation->rd_pubinfo directly And avoid
all the extra palloc/memcpy/free?

======

14). src/backend/utils/cache/relcache.c - function RelationGetPublicationInfo
+ /*
+ * Find what are the cols that are part of the REPLICA IDENTITY.
+ * Note that REPLICA IDENTIY DEFAULT means primary key or nothing.
+ */

typo "IDENTIY" -> "IDENTITY"

======

15). src/backend/utils/cache/relcache.c - function RelationGetPublicationInfo

/* Now save copy of the actions in the relcache entry. */
  oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
- relation->rd_pubactions = palloc(sizeof(PublicationActions));
- memcpy(relation->rd_pubactions, pubactions, sizeof(PublicationActions));
+ relation->rd_pubinfo = palloc(sizeof(PublicationInfo));
+ memcpy(relation->rd_pubinfo, pubinfo, sizeof(PublicationInfo));
  MemoryContextSwitchTo(oldcxt);

The code comment looks a bit stale now. e.g. Perhaps now it should say
"save a copy of the info" instead of "save a copy of the actions".

======

16) Tests... CREATE PUBLICATION succeeds

I have not yet reviewed any of the 0005 tests, but there was some big
behaviour difference that I noticed.

I think now with the 0005 top-up patch the replica identify validation
is deferred to when UPDATE/DELETE is executed. I don’t know if this
will be very user friendly. It means now sometimes you can
successfully CREATE a PUBLICATION even though it will fail as soon as
you try to use it.

e.g. Below I create a publication with only pubaction "update", and
although it creates OK you cannot use it as intended.

test_pub=# create table t1(a int, b int, c int);
CREATE TABLE
test_pub=# create publication ptest for table t1 where (a > 3) with
(publish="update");
CREATE PUBLICATION
test_pub=# update t1 set a = 3;
ERROR:  cannot update table "t1"
DETAIL:  Not all row filter columns are not part of the REPLICA IDENTITY

Should we *also* be validating the replica identity at the time of
CREATE PUBLICATION so the user can be for-warned of problems?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

Reply via email to