On Fri, Nov 26, 2021 at 1:16 PM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > ... > Based on this direction, I tried to write a top up POC patch(0005) which I'd > like to share. > > The top up patch mainly did the following things. > > * Move the row filter columns invalidation to CheckCmdReplicaIdentity, so that > the invalidation is executed only when actual UPDATE or DELETE executed on the > published relation. It's consistent with the existing check about replica > identity. > > * Cache the results of the validation for row filter columns in relcache to > reduce the cost of the validation. It's safe because every operation that > change the row filter and replica identity will invalidate the relcache. > > Also attach the v42 patch set to keep cfbot happy.
Hi Hou-san. Thanks for providing your "top-up" 0005 patch! I suppose the goal will be to later merge this top-up with the current 0002 validation patch, but in the meantime here are my review comments for 0005. ====== 1) src/include/catalog/pg_publication.h - PublicationInfo +typedef struct PublicationInfo +{ + PublicationActions pubactions; + + /* + * True if pubactions don't include UPDATE and DELETE or + * all the columns in the row filter expression are part + * of replica identity. + */ + bool rfcol_valid_for_replid; +} PublicationInfo; + IMO "PublicationInfo" sounded too much like it is about the Publication only, but IIUC it is really *per* Relation publication info, right? So I thought perhaps it should be called more like struct "RelationPubInfo". ====== 2) src/include/catalog/pg_publication.h - PublicationInfo The member "rfcol_valid_for_replid" also seems a little bit mis-named because in some scenario (not UPDATE/DELETE) it can be true even if there is not replica identity columns. So I thought perhaps it should be called more like just "rfcols_valid" Another thing - IIUC this is a kind of a "unified" boolean that covers *all* filters for this Relation (across multiple publications). If that is right., then the comment for this member should say something about this. ====== 3) src/include/catalog/pg_publication.h - PublicationInfo This new typedef should be added to src/tools/pgindent/typedefs.list ====== 4) src/backend/catalog/pg_publication.c - check_rowfilter_replident +/* + * Check if all the columns used in the row-filter WHERE clause are part of + * REPLICA IDENTITY + */ +bool +check_rowfilter_replident(Node *node, Bitmapset *bms_replident) +{ IIUC here the false means "valid" and true means "invalid" which is counter-intuitive to me. So at least true/false meaning ought to be clarified in the function comment, and/or perhaps also rename the function so that the return meaning is more obvious. ====== 5) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity + pubinfo = RelationGetPublicationInfo(rel); + IIUC this pubinfo* is palloced *every* time by RelationGetPublicationInfo isn't it? If that is the case shouldn't CheckCmdReplicaIdentity be doing a pfree(pubinfo)? ====== 6) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity + pubinfo = RelationGetPublicationInfo(rel); + + /* + * if not all columns in the publication row filter are part of the REPLICA + * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE. + */ + if (!pubinfo->rfcol_valid_for_replid) + { + if (cmd == CMD_UPDATE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot update table \"%s\"", + RelationGetRelationName(rel)), + errdetail("Not all row filter columns are not part of the REPLICA IDENTITY"))); + else if (cmd == CMD_DELETE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot delete from table \"%s\"", + RelationGetRelationName(rel)), + errdetail("Not all row filter columns are not part of the REPLICA IDENTITY"))); The comment seemed worded in a confusingly negative way. Before: + * if not all columns in the publication row filter are part of the REPLICA + * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE. My Suggestion: It is only safe to execute UPDATE/DELETE when all columns of the publication row filters are part of the REPLICA IDENTITY. ~~ Also, is "publication row filter" really the correct terminology? AFAIK it is more like *all* filters for this Relation across multiple publications, but I have not got a good idea how to word that in a comment. Anyway, I have a feeling this whole idea might be impacted by other discussions in this RF thread. ====== 7) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity Error messages have double negative wording? I think Greg already commented on this same point. + errdetail("Not all row filter columns are not part of the REPLICA IDENTITY"))); ====== 8) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity But which are the bad filter columns? Previously the Row Filter column validation gave errors for the invalid filter column, but in this top-up patch there is no indication which column or which filter or which publication was the bad one - only that "something" bad was detected. IMO this might make it very difficult for the user to know enough about the cause of the problem to be able to fix the offending filter. ====== 9) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity /* If relation has replica identity we are always good. */ if (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || OidIsValid(RelationGetReplicaIndex(rel))) I was wondering if the check for REPLICA_IDENTITY_FULL should go *before* your new call to pubinfo = RelationGetPublicationInfo(rel); because IIUC if *every* column is a member of the replica identity then the filter validation is not really necessary at all. ====== 10) src/backend/utils/cache/relcache.c - function GetRelationPublicationActions @@ -5547,22 +5548,45 @@ RelationGetExclusionInfo(Relation indexRelation, struct PublicationActions * GetRelationPublicationActions(Relation relation) { - List *puboids; - ListCell *lc; - MemoryContext oldcxt; - Oid schemaid; - PublicationActions *pubactions = palloc0(sizeof(PublicationActions)); + PublicationInfo *pubinfo; + PublicationActions *pubactions = palloc0(sizeof(PublicationInfo)); + + pubinfo = RelationGetPublicationInfo(relation); Just assign pubinfo at the declaration instead of later in the function body. ====== 11) src/backend/utils/cache/relcache.c - function GetRelationPublicationActions + pubactions = memcpy(pubactions, relation->rd_pubinfo, + sizeof(PublicationActions)); Isn't that memcpy slightly incorrect and only working because the pubactions happens to be the first member of the PublicationInfo? I thought it should really be copying from "&relation->rd_pubinfo->pubactions", right? ====== 12) src/backend/utils/cache/relcache.c - function GetRelationPublicationActions Excessive blank lines following this function. ====== 13). src/backend/utils/cache/relcache.c - function RelationGetPublicationInfo +/* + * Get publication information for the given relation. + */ +struct PublicationInfo * +RelationGetPublicationInfo(Relation relation) +{ + List *puboids; + ListCell *lc; + MemoryContext oldcxt; + Oid schemaid; + Bitmapset *bms_replident = NULL; + PublicationInfo *pubinfo = palloc0(sizeof(PublicationInfo)); + + pubinfo->rfcol_valid_for_replid = true; It is not entirely clear to me why this function is always pallocing the PublicationInfo and then returning a copy of what is stored in the relation->rd_pubinfo. This then puts a burden on the callers (like the GetRelationPublicationActions etc) to make sure to free that memory. Why can't we just return the relation->rd_pubinfo directly And avoid all the extra palloc/memcpy/free? ====== 14). src/backend/utils/cache/relcache.c - function RelationGetPublicationInfo + /* + * Find what are the cols that are part of the REPLICA IDENTITY. + * Note that REPLICA IDENTIY DEFAULT means primary key or nothing. + */ typo "IDENTIY" -> "IDENTITY" ====== 15). src/backend/utils/cache/relcache.c - function RelationGetPublicationInfo /* Now save copy of the actions in the relcache entry. */ oldcxt = MemoryContextSwitchTo(CacheMemoryContext); - relation->rd_pubactions = palloc(sizeof(PublicationActions)); - memcpy(relation->rd_pubactions, pubactions, sizeof(PublicationActions)); + relation->rd_pubinfo = palloc(sizeof(PublicationInfo)); + memcpy(relation->rd_pubinfo, pubinfo, sizeof(PublicationInfo)); MemoryContextSwitchTo(oldcxt); The code comment looks a bit stale now. e.g. Perhaps now it should say "save a copy of the info" instead of "save a copy of the actions". ====== 16) Tests... CREATE PUBLICATION succeeds I have not yet reviewed any of the 0005 tests, but there was some big behaviour difference that I noticed. I think now with the 0005 top-up patch the replica identify validation is deferred to when UPDATE/DELETE is executed. I don’t know if this will be very user friendly. It means now sometimes you can successfully CREATE a PUBLICATION even though it will fail as soon as you try to use it. e.g. Below I create a publication with only pubaction "update", and although it creates OK you cannot use it as intended. test_pub=# create table t1(a int, b int, c int); CREATE TABLE test_pub=# create publication ptest for table t1 where (a > 3) with (publish="update"); CREATE PUBLICATION test_pub=# update t1 set a = 3; ERROR: cannot update table "t1" DETAIL: Not all row filter columns are not part of the REPLICA IDENTITY Should we *also* be validating the replica identity at the time of CREATE PUBLICATION so the user can be for-warned of problems? ------ Kind Regards, Peter Smith. Fujitsu Australia