On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote: > Hi. > > I have been looking at the latest patch set (v16). Below are my review > comments and some patches. > Peter, thanks for your detailed review. Comments are inline.
> 1. Patch 0001 comment - typo > > you can optionally filter rows that does not satisfy a WHERE condition > > typo: does/does Fixed. > > 2. Patch 0001 comment - typo > > The WHERE clause should probably contain only columns that are part of > the primary key or that are covered by REPLICA IDENTITY. Otherwise, > and DELETEs won't be replicated. > > typo: "Otherwise, and DELETEs" ?? Fixed. > 3. Patch 0001 comment - typo and clarification > > If your publication contains partitioned table, the parameter > publish_via_partition_root determines if it uses the partition row filter (if > the parameter is false -- default) or the partitioned table row filter. > > Typo: "contains partitioned table" -> "contains a partitioned table" Fixed. > Also, perhaps the text "or the partitioned table row filter." should > say "or the root partitioned table row filter." to disambiguate the > case where there are more levels of partitions like A->B->C. e.g. What > filter does C use? I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the root partitioned table is used. We should improve that sentence too. > 4. src/backend/catalog/pg_publication.c - misleading names > > -publication_add_relation(Oid pubid, Relation targetrel, > +publication_add_relation(Oid pubid, PublicationRelationInfo *targetrel, > bool if_not_exists) > > Leaving this parameter name as "targetrel" seems a bit misleading now > in the function code. Maybe this should be called something like "pri" > which is consistent with other places where you have declared > PublicationRelationInfo. > > Also, consider declaring some local variables so that the patch may > have less impact on existing code. e.g. > Oid relid = pri->relid > Relation *targetrel = relationinfo->relation Done. > 5. src/backend/commands/publicationcmds.c - simplify code > > - rels = OpenTableList(stmt->tables); > + if (stmt->tableAction == DEFELEM_DROP) > + rels = OpenTableList(stmt->tables, true); > + else > + rels = OpenTableList(stmt->tables, false); > > Consider writing that code more simply as just: > > rels = OpenTableList(stmt->tables, stmt->tableAction == DEFELEM_DROP); It is not a common pattern to use an expression as a function argument in Postgres. I prefer to use a variable with a suggestive name. > 6. src/backend/commands/publicationcmds.c - bug? > > - CloseTableList(rels); > + CloseTableList(rels, false); > } > > Is this a potential bug? When you called OpenTableList the 2nd param > was maybe true/false, so is it correct to be unconditionally false > here? I am not sure. Good catch. > 7. src/backend/commands/publicationcmds.c - OpenTableList function comment. > > * Open relations specified by a RangeVar list. > + * AlterPublicationStmt->tables has a different list element, hence, is_drop > + * indicates if it has a RangeVar (true) or PublicationTable (false). > * The returned tables are locked in ShareUpdateExclusiveLock mode in order > to > * add them to a publication. > > I am not sure about this. Should that comment instead say "indicates > if it has a Relation (true) or PublicationTable (false)"? Fixed. > 8. src/backend/commands/publicationcmds.c - OpenTableList >8 > For some reason it feels kind of clunky to me for this function to be > processing the list differently according to the 2nd param. e.g. the > name "is_drop" seems quite unrelated to the function code, and more to > do with where it was called from. Sorry, I don't have any better ideas > for improvement atm. My suggestion is to rename it to "pub_drop_table". > 9. src/backend/commands/publicationcmds.c - OpenTableList bug? >8 > I felt maybe this is a possible bug here because there seems no code > explicitly assigning the whereClause = NULL if "is_drop" is true so > maybe it can have a garbage value which could cause problems later. > Maybe this is fixed by using palloc0. Fixed. > 10. src/backend/commands/publicationcmds.c - CloseTableList function comment >8 > Probably the meaning of "is_drop" should be described in this function > comment. Done. > 11. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry > signature. > > -static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid); > +static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation > rel); > > I see that this function signature is modified but I did not see how > this parameter refactoring is actually related to the RowFilter patch. > Perhaps I am mistaken, but IIUC this only changes the relid = > RelationGetRelid(rel); to be done inside this function instead of > being done outside by the callers. It is not critical for this patch so I removed it. > 12. src/backend/replication/pgoutput/pgoutput.c - missing function comments > > The static functions create_estate_for_relation and > pgoutput_row_filter_prepare_expr probably should be commented. Done. > 13. src/backend/replication/pgoutput/pgoutput.c - > pgoutput_row_filter_prepare_expr function name > > +static ExprState *pgoutput_row_filter_prepare_expr(Node *rfnode, > EState *estate); > > This function has an unfortunate name with the word "prepare" in it. I > wonder if a different name can be found for this function to avoid any > confusion with pgoutput functions (coming soon) which are related to > the two-phase commit "prepare". The word "prepare" is related to the executor context. The function name contains "row_filter" that is sufficient to distinguish it from any other function whose context is "prepare". I replaced "prepare" with "init". > 14. src/bin/psql/describe.c > > + if (!PQgetisnull(tabres, j, 2)) > + appendPQExpBuffer(&buf, " WHERE (%s)", > + PQgetvalue(tabres, j, 2)); > > Because the where-clause value already has enclosing parentheses so > using " WHERE (%s)" seems overkill here. e.g. you can see the effect > in your src/test/regress/expected/publication.out file. I think this > should be changed to " WHERE %s" to give better output. Peter E suggested that extra parenthesis be added. See 0005 [1]. > 15. src/include/catalog/pg_publication.h - new typedef > > +typedef struct PublicationRelationInfo > +{ > + Oid relid; > + Relation relation; > + Node *whereClause; > +} PublicationRelationInfo; > + > > The new PublicationRelationInfo should also be added > src/tools/pgindent/typedefs.list Patches usually don't update typedefs.list. Check src/tools/pgindent/README. > 16. src/include/nodes/parsenodes.h - new typedef > > +typedef struct PublicationTable > +{ > + NodeTag type; > + RangeVar *relation; /* relation to be published */ > + Node *whereClause; /* qualifications */ > +} PublicationTable; > > The new PublicationTable should also be added src/tools/pgindent/typedefs.list Idem. > 17. sql/publication.sql - show more output > > +CREATE PUBLICATION testpub5 FOR TABLE testpub_rf_tbl1, > testpub_rf_tbl2 WHERE (c <> 'test' AND d < 5); > +RESET client_min_messages; > +ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000 > AND e < 2000); > +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2; > +-- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another > WHERE expression) > +ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300 > AND e < 500); > +-- fail - functions disallowed > +ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl4 WHERE (length(g) < 6); > +-- fail - WHERE not allowed in DROP > +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27); > +\dRp+ testpub5 > > I felt that it would be better to have a "\dRp+ testpub5" after each > of the valid ALTER PUBLICATION steps to show the intermediate results > also; not just the final one at the end. Done. > 18. src/test/subscription/t/020_row_filter.pl - rename file > > I think this file should be renamed to 021_row_filter.pl as there is > already an 020 TAP test present. Done. > 19. src/test/subscription/t/020_row_filter.pl - test comments > > AFAIK the test cases are all OK, but it was really quite hard to > review these TAP tests to try to determine what the expected results > should be. I included your comments but heavily changed it. > 20. src/test/subscription/t/020_row_filter.pl - missing test case? > > There are some partition tests, but I did not see any test that was > like 3 levels deep like A->B->C, so I was not sure if there is any > case C would ever make use of the filter of its parent B, or would it > only use the filter of the root A? I didn't include it yet. There is an issue with initial synchronization and partitioned table when you set publish_via_partition_root. I'll start another thread for this issue. > 21. src/test/subscription/t/020_row_filter.pl - missing test case? > > If the same table is in multiple publications they can each have a row > filter. And a subscription might subscribe to some but not all of > those publications. I think this scenario is only partly tested. 8< > e.g. > pub_1 has tableX with RowFilter1 > pub_2 has tableX with RowFilter2 > > Then sub_12 subscribes to pub_1, pub_2 > This is already tested in your TAP test (I think) and it makes sure > both filters are applied > > But if there was also > pub_3 has tableX with RowFilter3 > > Then sub_12 still should only be checking the filtered RowFilter1 AND > RowFilter2 (but NOT row RowFilter3). I think this scenario is not > tested. I added a new publication tap_pub_not_used to cover this case. > POC PATCH FOR PLAN CACHE > ======================== > > PSA a POC patch for a plan cache which gets used inside the > pgoutput_row_filter function instead of calling prepare for every row. > I think this is implementing something like Andes was suggesting a > while back [1]. I also had a WIP patch for it (that's very similar to your patch) so I merged it. This cache mechanism consists of caching ExprState and avoid calling pgoutput_row_filter_init_expr() for every single row. Greg N suggested in another email that tuple table slot should also be cached to avoid a few cycles too. It is also included in this new patch. > Measurements with/without this plan cache: > > Time spent processing within the pgoutput_row_filter function > - Data was captured using the same technique as the > 0002-Measure-row-filter-overhead.patch. > - Inserted 1000 rows, sampled data for the first 100 times in this function. > not cached: average ~ 28.48 us > cached: average ~ 9.75 us > > Replication times: > - Using tables and row filters same as in Onder's commands_to_test_perf.sql > [2] > 100K rows - not cached: ~ 42sec, 43sec, 44sec > 100K rows - cached: ~ 41sec, 42sec, 42 sec. > > There does seem to be a tiny gain achieved by having the plan cache, > but I think the gain might be a lot less than what people were > expecting. I did another measure using as baseline the previous patch (v16). without cache (v16) --------------------------- mean: 1.46 us stddev: 2.13 us median: 1.39 us min-max: [0.69 .. 1456.69] us percentile(99): 3.15 us mode: 0.91 us with cache (v18) ----------------------- mean: 0.63 us stddev: 1.07 us median: 0.55 us min-max: [0.29 .. 844.87] us percentile(99): 1.38 us mode: 0.41 us It represents -57%. It is a really good optimization for just a few extra lines of code. [1] https://www.postgresql.org/message-id/57373e8b-1264-cd37-404e-8edbcf7884cc%40enterprisedb.com -- Euler Taveira EDB https://www.enterprisedb.com/
From 6176df1880ba690a7e5d550b7ee8c533dd33712e Mon Sep 17 00:00:00 2001 From: Euler Taveira <euler.tave...@enterprisedb.com> Date: Mon, 18 Jan 2021 12:07:51 -0300 Subject: [PATCH v18 1/2] Row filter for logical replication This feature adds row filter for publication tables. When a publication is defined or modified, rows that don't satisfy a WHERE clause may be optionally filtered out. This allows a database or set of tables to be partially replicated. The row filter is per table, which allows different row filters to be defined for different tables. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The WHERE clause should probably contain only columns that are part of the primary key or that are covered by REPLICA IDENTITY. Otherwise, any DELETEs won't be replicated. DELETE uses the old row version (that is limited to primary key or REPLICA IDENTITY) to evaluate the row filter. INSERT and UPDATE use the new row version to evaluate the row filter, hence, you can use any column. If the row filter evaluates to NULL, it returns false. For simplicity, functions are not allowed; it could possibly be addressed in a future patch. If you choose to do the initial table synchronization, only data that satisfies the row filters is sent. If the subscription has several publications in which a table has been published with different WHERE clauses, rows must satisfy all expressions to be copied. If subscriber is a pre-15 version, data synchronization won't use row filters if they are defined in the publisher. Previous versions cannot handle row filters. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition row filter (if the parameter is false, the default) or the root partitioned table row filter. --- doc/src/sgml/catalogs.sgml | 8 + doc/src/sgml/ref/alter_publication.sgml | 11 +- doc/src/sgml/ref/create_publication.sgml | 32 ++- doc/src/sgml/ref/create_subscription.sgml | 11 +- src/backend/catalog/pg_publication.c | 42 ++- src/backend/commands/publicationcmds.c | 112 +++++--- src/backend/parser/gram.y | 24 +- src/backend/parser/parse_agg.c | 10 + src/backend/parser/parse_expr.c | 13 + src/backend/parser/parse_func.c | 3 + src/backend/parser/parse_oper.c | 7 + src/backend/replication/logical/tablesync.c | 94 +++++- src/backend/replication/pgoutput/pgoutput.c | 268 +++++++++++++++++- src/bin/pg_dump/pg_dump.c | 24 +- src/bin/pg_dump/pg_dump.h | 1 + src/bin/psql/describe.c | 15 +- src/include/catalog/pg_publication.h | 9 +- src/include/catalog/pg_publication_rel.h | 6 + src/include/nodes/nodes.h | 1 + src/include/nodes/parsenodes.h | 11 +- src/include/parser/parse_node.h | 1 + src/test/regress/expected/publication.out | 71 +++++ src/test/regress/sql/publication.sql | 32 +++ src/test/subscription/t/021_row_filter.pl | 298 ++++++++++++++++++++ 24 files changed, 1024 insertions(+), 80 deletions(-) create mode 100644 src/test/subscription/t/021_row_filter.pl diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index f517a7d4af..dbf2f46c00 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -6233,6 +6233,14 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l Reference to relation </para></entry> </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>prqual</structfield> <type>pg_node_tree</type> + </para> + <para>Expression tree (in <function>nodeToString()</function> + representation) for the relation's qualifying condition</para></entry> + </row> </tbody> </tgroup> </table> diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml index faa114b2c6..ca091aae33 100644 --- a/doc/src/sgml/ref/alter_publication.sgml +++ b/doc/src/sgml/ref/alter_publication.sgml @@ -21,8 +21,8 @@ PostgreSQL documentation <refsynopsisdiv> <synopsis> -ALTER PUBLICATION <replaceable class="parameter">name</replaceable> ADD TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ...] -ALTER PUBLICATION <replaceable class="parameter">name</replaceable> SET TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ...] +ALTER PUBLICATION <replaceable class="parameter">name</replaceable> ADD TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ WHERE ( <replaceable class="parameter">expression</replaceable> ) ] [, ...] +ALTER PUBLICATION <replaceable class="parameter">name</replaceable> SET TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ WHERE ( <replaceable class="parameter">expression</replaceable> ) ] [, ...] ALTER PUBLICATION <replaceable class="parameter">name</replaceable> DROP TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ...] ALTER PUBLICATION <replaceable class="parameter">name</replaceable> SET ( <replaceable class="parameter">publication_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ALTER PUBLICATION <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable>new_owner</replaceable> | CURRENT_ROLE | CURRENT_USER | SESSION_USER } @@ -92,7 +92,12 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r table name, only that table is affected. If <literal>ONLY</literal> is not specified, the table and all its descendant tables (if any) are affected. Optionally, <literal>*</literal> can be specified after the table - name to explicitly indicate that descendant tables are included. + name to explicitly indicate that descendant tables are included. If the + optional <literal>WHERE</literal> clause is specified, rows that do not + satisfy the <replaceable class="parameter">expression</replaceable> will + not be published. Note that parentheses are required around the + expression. The <replaceable class="parameter">expression</replaceable> + is executed with the role used for the replication connection. </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index ff82fbca55..5c2b7d0bd2 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -22,7 +22,7 @@ PostgreSQL documentation <refsynopsisdiv> <synopsis> CREATE PUBLICATION <replaceable class="parameter">name</replaceable> - [ FOR TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ...] + [ FOR TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ WHERE ( <replaceable class="parameter">expression</replaceable> ) ] [, ...] | FOR ALL TABLES ] [ WITH ( <replaceable class="parameter">publication_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ] </synopsis> @@ -71,6 +71,10 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable> This does not apply to a partitioned table, however. The partitions of a partitioned table are always implicitly considered part of the publication, so they are never explicitly added to the publication. + If the optional <literal>WHERE</literal> clause is specified, rows that do + not satisfy the <replaceable class="parameter">expression</replaceable> + will not be published. Note that parentheses are required around the + expression. It has no effect on <literal>TRUNCATE</literal> commands. </para> <para> @@ -131,9 +135,9 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable> on its partitions) contained in the publication will be published using the identity and schema of the partitioned table rather than that of the individual partitions that are actually changed; the - latter is the default. Enabling this allows the changes to be - replicated into a non-partitioned table or a partitioned table - consisting of a different set of partitions. + latter is the default (<literal>false</literal>). Enabling this + allows the changes to be replicated into a non-partitioned table or a + partitioned table consisting of a different set of partitions. </para> <para> @@ -182,6 +186,14 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable> disallowed on those tables. </para> + <para> + The <literal>WHERE</literal> clause should probably contain only columns + that are part of the primary key or be covered by <literal>REPLICA + IDENTITY</literal> otherwise, <command>DELETE</command> operations will not + be replicated. For <command>INSERT</command> and <command>UPDATE</command> + operations, any column can be used in the <literal>WHERE</literal> clause. + </para> + <para> For an <command>INSERT ... ON CONFLICT</command> command, the publication will publish the operation that actually results from the command. So depending @@ -197,6 +209,11 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable> <para> <acronym>DDL</acronym> operations are not published. </para> + + <para> + The <literal>WHERE</literal> clause expression is executed with the role used + for the replication connection. + </para> </refsect1> <refsect1> @@ -209,6 +226,13 @@ CREATE PUBLICATION mypublication FOR TABLE users, departments; </programlisting> </para> + <para> + Create a publication that publishes all changes from active departments: +<programlisting> +CREATE PUBLICATION active_departments FOR TABLE departments WHERE (active IS TRUE); +</programlisting> + </para> + <para> Create a publication that publishes all changes in all tables: <programlisting> diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index e812beee37..7183700ed9 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -102,7 +102,16 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl <para> Specifies whether the existing data in the publications that are being subscribed to should be copied once the replication starts. - The default is <literal>true</literal>. + The default is <literal>true</literal>. If any table in the + publications has a <literal>WHERE</literal> clause, rows that do not + satisfy the <replaceable class="parameter">expression</replaceable> + will not be copied. If the subscription has several publications in + which a table has been published with different + <literal>WHERE</literal> clauses, rows must satisfy all expressions + to be copied. If any table in the publications has a + <literal>WHERE</literal> clause, data synchronization does not use it + if the subscriber is a <productname>PostgreSQL</productname> version + before 15. </para> </listitem> </varlistentry> diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c index 86e415af89..a15f00f637 100644 --- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -33,6 +33,9 @@ #include "catalog/pg_type.h" #include "funcapi.h" #include "miscadmin.h" +#include "parser/parse_clause.h" +#include "parser/parse_collate.h" +#include "parser/parse_relation.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/catcache.h" @@ -141,21 +144,27 @@ pg_relation_is_publishable(PG_FUNCTION_ARGS) * Insert new publication / relation mapping. */ ObjectAddress -publication_add_relation(Oid pubid, Relation targetrel, +publication_add_relation(Oid pubid, PublicationRelationInfo *pri, bool if_not_exists) { Relation rel; HeapTuple tup; Datum values[Natts_pg_publication_rel]; bool nulls[Natts_pg_publication_rel]; - Oid relid = RelationGetRelid(targetrel); + Relation targetrel = pri->relation; + Oid relid; Oid prrelid; Publication *pub = GetPublication(pubid); ObjectAddress myself, referenced; + ParseState *pstate; + ParseNamespaceItem *nsitem; + Node *whereclause; rel = table_open(PublicationRelRelationId, RowExclusiveLock); + relid = RelationGetRelid(targetrel); + /* * Check for duplicates. Note that this does not really prevent * duplicates, it's here just to provide nicer error message in common @@ -177,6 +186,23 @@ publication_add_relation(Oid pubid, Relation targetrel, check_publication_add_relation(targetrel); + /* Set up a pstate to parse with */ + pstate = make_parsestate(NULL); + pstate->p_sourcetext = nodeToString(pri->whereClause); + + nsitem = addRangeTableEntryForRelation(pstate, targetrel, + AccessShareLock, + NULL, false, false); + addNSItemToQuery(pstate, nsitem, false, true, true); + + whereclause = transformWhereClause(pstate, + copyObject(pri->whereClause), + EXPR_KIND_PUBLICATION_WHERE, + "PUBLICATION"); + + /* Fix up collation information */ + assign_expr_collations(pstate, whereclause); + /* Form a tuple. */ memset(values, 0, sizeof(values)); memset(nulls, false, sizeof(nulls)); @@ -189,6 +215,12 @@ publication_add_relation(Oid pubid, Relation targetrel, values[Anum_pg_publication_rel_prrelid - 1] = ObjectIdGetDatum(relid); + /* Add qualifications, if available */ + if (whereclause) + values[Anum_pg_publication_rel_prqual - 1] = CStringGetTextDatum(nodeToString(whereclause)); + else + nulls[Anum_pg_publication_rel_prqual - 1] = true; + tup = heap_form_tuple(RelationGetDescr(rel), values, nulls); /* Insert tuple into catalog. */ @@ -205,6 +237,12 @@ publication_add_relation(Oid pubid, Relation targetrel, ObjectAddressSet(referenced, RelationRelationId, relid); recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO); + /* Add dependency on the objects mentioned in the qualifications */ + if (whereclause) + recordDependencyOnExpr(&myself, whereclause, pstate->p_rtable, DEPENDENCY_NORMAL); + + free_parsestate(pstate); + /* Close the table. */ table_close(rel, RowExclusiveLock); diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 95c253c8e0..12adcf1f36 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -48,7 +48,7 @@ /* Same as MAXNUMMESSAGES in sinvaladt.c */ #define MAX_RELCACHE_INVAL_MSGS 4096 -static List *OpenTableList(List *tables); +static List *OpenTableList(List *tables, bool pub_drop_table); static void CloseTableList(List *rels); static void PublicationAddTables(Oid pubid, List *rels, bool if_not_exists, AlterPublicationStmt *stmt); @@ -232,7 +232,7 @@ CreatePublication(CreatePublicationStmt *stmt) Assert(list_length(stmt->tables) > 0); - rels = OpenTableList(stmt->tables); + rels = OpenTableList(stmt->tables, false); PublicationAddTables(puboid, rels, true, NULL); CloseTableList(rels); } @@ -361,6 +361,7 @@ AlterPublicationTables(AlterPublicationStmt *stmt, Relation rel, List *rels = NIL; Form_pg_publication pubform = (Form_pg_publication) GETSTRUCT(tup); Oid pubid = pubform->oid; + bool isdrop = (stmt->tableAction == DEFELEM_DROP); /* Check that user is allowed to manipulate the publication tables. */ if (pubform->puballtables) @@ -372,7 +373,7 @@ AlterPublicationTables(AlterPublicationStmt *stmt, Relation rel, Assert(list_length(stmt->tables) > 0); - rels = OpenTableList(stmt->tables); + rels = OpenTableList(stmt->tables, isdrop); if (stmt->tableAction == DEFELEM_ADD) PublicationAddTables(pubid, rels, false, stmt); @@ -385,31 +386,24 @@ AlterPublicationTables(AlterPublicationStmt *stmt, Relation rel, List *delrels = NIL; ListCell *oldlc; - /* Calculate which relations to drop. */ + /* + * Remove all publication-table mappings. We could possibly remove (i) + * tables that are not found in the new table list and (ii) tables that + * are being re-added with a different qual expression. For (ii), + * simply updating the existing tuple is not enough, because of qual + * expression dependencies. + */ foreach(oldlc, oldrelids) { Oid oldrelid = lfirst_oid(oldlc); - ListCell *newlc; - bool found = false; + PublicationRelationInfo *oldrel; - foreach(newlc, rels) - { - Relation newrel = (Relation) lfirst(newlc); - - if (RelationGetRelid(newrel) == oldrelid) - { - found = true; - break; - } - } - - if (!found) - { - Relation oldrel = table_open(oldrelid, - ShareUpdateExclusiveLock); - - delrels = lappend(delrels, oldrel); - } + oldrel = palloc(sizeof(PublicationRelationInfo)); + oldrel->relid = oldrelid; + oldrel->whereClause = NULL; + oldrel->relation = table_open(oldrel->relid, + ShareUpdateExclusiveLock); + delrels = lappend(delrels, oldrel); } /* And drop them. */ @@ -500,26 +494,44 @@ RemovePublicationRelById(Oid proid) /* * Open relations specified by a RangeVar list. + * + * Publication node can have a different list element, hence, pub_drop_table + * indicates if it has a Relation (true) or PublicationTable (false). + * * The returned tables are locked in ShareUpdateExclusiveLock mode in order to * add them to a publication. */ static List * -OpenTableList(List *tables) +OpenTableList(List *tables, bool pub_drop_table) { List *relids = NIL; List *rels = NIL; ListCell *lc; + PublicationRelationInfo *pri; /* * Open, share-lock, and check all the explicitly-specified relations */ foreach(lc, tables) { - RangeVar *rv = castNode(RangeVar, lfirst(lc)); - bool recurse = rv->inh; + PublicationTable *t = NULL; + RangeVar *rv; + bool recurse; Relation rel; Oid myrelid; + if (pub_drop_table) + { + rv = castNode(RangeVar, lfirst(lc)); + } + else + { + t = lfirst(lc); + rv = castNode(RangeVar, t->relation); + } + + recurse = rv->inh; + /* Allow query cancel in case this takes a long time */ CHECK_FOR_INTERRUPTS(); @@ -538,8 +550,14 @@ OpenTableList(List *tables) table_close(rel, ShareUpdateExclusiveLock); continue; } - - rels = lappend(rels, rel); + pri = palloc(sizeof(PublicationRelationInfo)); + pri->relid = myrelid; + pri->relation = rel; + if (pub_drop_table) + pri->whereClause = NULL; + else + pri->whereClause = t->whereClause; + rels = lappend(rels, pri); relids = lappend_oid(relids, myrelid); /* @@ -572,7 +590,15 @@ OpenTableList(List *tables) /* find_all_inheritors already got lock */ rel = table_open(childrelid, NoLock); - rels = lappend(rels, rel); + pri = palloc(sizeof(PublicationRelationInfo)); + pri->relid = childrelid; + pri->relation = rel; + /* child inherits WHERE clause from parent */ + if (pub_drop_table) + pri->whereClause = NULL; + else + pri->whereClause = t->whereClause; + rels = lappend(rels, pri); relids = lappend_oid(relids, childrelid); } } @@ -585,6 +611,9 @@ OpenTableList(List *tables) /* * Close all relations in the list. + * + * Publication node can have a different list element, hence, pub_drop_table + * indicates if it has a Relation (true) or PublicationTable (false). */ static void CloseTableList(List *rels) @@ -593,10 +622,12 @@ CloseTableList(List *rels) foreach(lc, rels) { - Relation rel = (Relation) lfirst(lc); + PublicationRelationInfo *pri = (PublicationRelationInfo *) lfirst(lc); - table_close(rel, NoLock); + table_close(pri->relation, NoLock); } + + list_free_deep(rels); } /* @@ -612,15 +643,15 @@ PublicationAddTables(Oid pubid, List *rels, bool if_not_exists, foreach(lc, rels) { - Relation rel = (Relation) lfirst(lc); + PublicationRelationInfo *pri = (PublicationRelationInfo *) lfirst(lc); ObjectAddress obj; /* Must be owner of the table or superuser. */ - if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId())) - aclcheck_error(ACLCHECK_NOT_OWNER, get_relkind_objtype(rel->rd_rel->relkind), - RelationGetRelationName(rel)); + if (!pg_class_ownercheck(pri->relid, GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, get_relkind_objtype(pri->relation->rd_rel->relkind), + RelationGetRelationName(pri->relation)); - obj = publication_add_relation(pubid, rel, if_not_exists); + obj = publication_add_relation(pubid, pri, if_not_exists); if (stmt) { EventTriggerCollectSimpleCommand(obj, InvalidObjectAddress, @@ -644,11 +675,10 @@ PublicationDropTables(Oid pubid, List *rels, bool missing_ok) foreach(lc, rels) { - Relation rel = (Relation) lfirst(lc); - Oid relid = RelationGetRelid(rel); + PublicationRelationInfo *pri = (PublicationRelationInfo *) lfirst(lc); prid = GetSysCacheOid2(PUBLICATIONRELMAP, Anum_pg_publication_rel_oid, - ObjectIdGetDatum(relid), + ObjectIdGetDatum(pri->relid), ObjectIdGetDatum(pubid)); if (!OidIsValid(prid)) { @@ -658,7 +688,7 @@ PublicationDropTables(Oid pubid, List *rels, bool missing_ok) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("relation \"%s\" is not part of the publication", - RelationGetRelationName(rel)))); + RelationGetRelationName(pri->relation)))); } ObjectAddressSet(obj, PublicationRelRelationId, prid); diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index eb24195438..d82ea003db 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -426,14 +426,14 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); transform_element_list transform_type_list TriggerTransitions TriggerReferencing vacuum_relation_list opt_vacuum_relation_list - drop_option_list + drop_option_list publication_table_list %type <node> opt_routine_body %type <groupclause> group_clause %type <list> group_by_list %type <node> group_by_item empty_grouping_set rollup_clause cube_clause %type <node> grouping_sets_clause -%type <node> opt_publication_for_tables publication_for_tables +%type <node> opt_publication_for_tables publication_for_tables publication_table_elem %type <list> opt_fdw_options fdw_options %type <defelt> fdw_option @@ -9612,7 +9612,7 @@ opt_publication_for_tables: ; publication_for_tables: - FOR TABLE relation_expr_list + FOR TABLE publication_table_list { $$ = (Node *) $3; } @@ -9643,7 +9643,7 @@ AlterPublicationStmt: n->options = $5; $$ = (Node *)n; } - | ALTER PUBLICATION name ADD_P TABLE relation_expr_list + | ALTER PUBLICATION name ADD_P TABLE publication_table_list { AlterPublicationStmt *n = makeNode(AlterPublicationStmt); n->pubname = $3; @@ -9651,7 +9651,7 @@ AlterPublicationStmt: n->tableAction = DEFELEM_ADD; $$ = (Node *)n; } - | ALTER PUBLICATION name SET TABLE relation_expr_list + | ALTER PUBLICATION name SET TABLE publication_table_list { AlterPublicationStmt *n = makeNode(AlterPublicationStmt); n->pubname = $3; @@ -9669,6 +9669,20 @@ AlterPublicationStmt: } ; +publication_table_list: + publication_table_elem { $$ = list_make1($1); } + | publication_table_list ',' publication_table_elem { $$ = lappend($1, $3); } + ; + +publication_table_elem: relation_expr OptWhereClause + { + PublicationTable *n = makeNode(PublicationTable); + n->relation = $1; + n->whereClause = $2; + $$ = (Node *) n; + } + ; + /***************************************************************************** * * CREATE SUBSCRIPTION name ... diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c index 24268eb502..8fb953b54f 100644 --- a/src/backend/parser/parse_agg.c +++ b/src/backend/parser/parse_agg.c @@ -551,6 +551,13 @@ check_agglevels_and_constraints(ParseState *pstate, Node *expr) err = _("grouping operations are not allowed in COPY FROM WHERE conditions"); break; + case EXPR_KIND_PUBLICATION_WHERE: + if (isAgg) + err = _("aggregate functions are not allowed in publication WHERE expressions"); + else + err = _("grouping operations are not allowed in publication WHERE expressions"); + + break; case EXPR_KIND_CYCLE_MARK: errkind = true; @@ -950,6 +957,9 @@ transformWindowFuncCall(ParseState *pstate, WindowFunc *wfunc, case EXPR_KIND_CYCLE_MARK: errkind = true; break; + case EXPR_KIND_PUBLICATION_WHERE: + err = _("window functions are not allowed in publication WHERE expressions"); + break; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index f928c32311..fc4170e723 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -119,6 +119,13 @@ transformExprRecurse(ParseState *pstate, Node *expr) /* Guard against stack overflow due to overly complex expressions */ check_stack_depth(); + /* Functions are not allowed in publication WHERE clauses */ + if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE && nodeTag(expr) == T_FuncCall) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("functions are not allowed in publication WHERE expressions"), + parser_errposition(pstate, exprLocation(expr)))); + switch (nodeTag(expr)) { case T_ColumnRef: @@ -509,6 +516,7 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref) case EXPR_KIND_COPY_WHERE: case EXPR_KIND_GENERATED_COLUMN: case EXPR_KIND_CYCLE_MARK: + case EXPR_KIND_PUBLICATION_WHERE: /* okay */ break; @@ -1769,6 +1777,9 @@ transformSubLink(ParseState *pstate, SubLink *sublink) case EXPR_KIND_GENERATED_COLUMN: err = _("cannot use subquery in column generation expression"); break; + case EXPR_KIND_PUBLICATION_WHERE: + err = _("cannot use subquery in publication WHERE expression"); + break; /* * There is intentionally no default: case here, so that the @@ -3089,6 +3100,8 @@ ParseExprKindName(ParseExprKind exprKind) return "GENERATED AS"; case EXPR_KIND_CYCLE_MARK: return "CYCLE"; + case EXPR_KIND_PUBLICATION_WHERE: + return "publication expression"; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 3cec8de7da..e946f17c64 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -2655,6 +2655,9 @@ check_srf_call_placement(ParseState *pstate, Node *last_srf, int location) case EXPR_KIND_CYCLE_MARK: errkind = true; break; + case EXPR_KIND_PUBLICATION_WHERE: + err = _("set-returning functions are not allowed in publication WHERE expressions"); + break; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/parser/parse_oper.c b/src/backend/parser/parse_oper.c index bc34a23afc..29f8835ce1 100644 --- a/src/backend/parser/parse_oper.c +++ b/src/backend/parser/parse_oper.c @@ -718,6 +718,13 @@ make_op(ParseState *pstate, List *opname, Node *ltree, Node *rtree, opform->oprright)), parser_errposition(pstate, location))); + /* Check it's not a custom operator for publication WHERE expressions */ + if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE && opform->oid >= FirstNormalObjectId) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("user-defined operators are not allowed in publication WHERE expressions"), + parser_errposition(pstate, location))); + /* Do typecasting and build the expression tree */ if (ltree == NULL) { diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 682c107e74..980826a502 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -691,19 +691,23 @@ copy_read_data(void *outbuf, int minread, int maxread) /* * Get information about remote relation in similar fashion the RELATION - * message provides during replication. + * message provides during replication. This function also returns the relation + * qualifications to be used in COPY command. */ static void fetch_remote_table_info(char *nspname, char *relname, - LogicalRepRelation *lrel) + LogicalRepRelation *lrel, List **qual) { WalRcvExecResult *res; StringInfoData cmd; TupleTableSlot *slot; Oid tableRow[] = {OIDOID, CHAROID, CHAROID}; Oid attrRow[] = {TEXTOID, OIDOID, BOOLOID}; + Oid qualRow[] = {TEXTOID}; bool isnull; int natt; + ListCell *lc; + bool first; lrel->nspname = nspname; lrel->relname = relname; @@ -799,6 +803,55 @@ fetch_remote_table_info(char *nspname, char *relname, lrel->natts = natt; walrcv_clear_result(res); + + /* Get relation qual */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000) + { + resetStringInfo(&cmd); + appendStringInfo(&cmd, + "SELECT pg_get_expr(prqual, prrelid) " + " FROM pg_publication p " + " INNER JOIN pg_publication_rel pr " + " ON (p.oid = pr.prpubid) " + " WHERE pr.prrelid = %u " + " AND p.pubname IN (", lrel->remoteid); + + first = true; + foreach(lc, MySubscription->publications) + { + char *pubname = strVal(lfirst(lc)); + + if (first) + first = false; + else + appendStringInfoString(&cmd, ", "); + + appendStringInfoString(&cmd, quote_literal_cstr(pubname)); + } + appendStringInfoChar(&cmd, ')'); + + res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 1, qualRow); + + if (res->status != WALRCV_OK_TUPLES) + ereport(ERROR, + (errmsg("could not fetch relation qualifications for table \"%s.%s\" from publisher: %s", + nspname, relname, res->err))); + + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple); + while (tuplestore_gettupleslot(res->tuplestore, true, false, slot)) + { + Datum rf = slot_getattr(slot, 1, &isnull); + + if (!isnull) + *qual = lappend(*qual, makeString(TextDatumGetCString(rf))); + + ExecClearTuple(slot); + } + ExecDropSingleTupleTableSlot(slot); + + walrcv_clear_result(res); + } + pfree(cmd.data); } @@ -812,6 +865,7 @@ copy_table(Relation rel) { LogicalRepRelMapEntry *relmapentry; LogicalRepRelation lrel; + List *qual = NIL; WalRcvExecResult *res; StringInfoData cmd; CopyFromState cstate; @@ -820,7 +874,7 @@ copy_table(Relation rel) /* Get the publisher relation info. */ fetch_remote_table_info(get_namespace_name(RelationGetNamespace(rel)), - RelationGetRelationName(rel), &lrel); + RelationGetRelationName(rel), &lrel, &qual); /* Put the relation into relmap. */ logicalrep_relmap_update(&lrel); @@ -829,16 +883,23 @@ copy_table(Relation rel) relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock); Assert(rel == relmapentry->localrel); + /* List of columns for COPY */ + attnamelist = make_copy_attnamelist(relmapentry); + /* Start copy on the publisher. */ initStringInfo(&cmd); - if (lrel.relkind == RELKIND_RELATION) + + /* Regular table with no row filter */ + if (lrel.relkind == RELKIND_RELATION && qual == NIL) appendStringInfo(&cmd, "COPY %s TO STDOUT", quote_qualified_identifier(lrel.nspname, lrel.relname)); else { /* * For non-tables, we need to do COPY (SELECT ...), but we can't just - * do SELECT * because we need to not copy generated columns. + * do SELECT * because we need to not copy generated columns. For + * tables with any row filters, build a SELECT query with AND'ed row + * filters for COPY. */ appendStringInfoString(&cmd, "COPY (SELECT "); for (int i = 0; i < lrel.natts; i++) @@ -847,8 +908,29 @@ copy_table(Relation rel) if (i < lrel.natts - 1) appendStringInfoString(&cmd, ", "); } - appendStringInfo(&cmd, " FROM %s) TO STDOUT", + appendStringInfo(&cmd, " FROM %s", quote_qualified_identifier(lrel.nspname, lrel.relname)); + /* list of AND'ed filters */ + if (qual != NIL) + { + ListCell *lc; + bool first = true; + + appendStringInfoString(&cmd, " WHERE "); + foreach(lc, qual) + { + char *q = strVal(lfirst(lc)); + + if (first) + first = false; + else + appendStringInfoString(&cmd, " AND "); + appendStringInfo(&cmd, "%s", q); + } + list_free_deep(qual); + } + + appendStringInfoString(&cmd, ") TO STDOUT"); } res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 0, NULL); pfree(cmd.data); diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index abd5217ab1..08c018a300 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -13,18 +13,27 @@ #include "postgres.h" #include "access/tupconvert.h" +#include "access/xact.h" #include "catalog/partition.h" #include "catalog/pg_publication.h" +#include "catalog/pg_publication_rel.h" #include "commands/defrem.h" +#include "executor/executor.h" #include "fmgr.h" +#include "nodes/nodeFuncs.h" +#include "optimizer/optimizer.h" +#include "parser/parse_coerce.h" #include "replication/logical.h" #include "replication/logicalproto.h" +#include "replication/logicalrelation.h" #include "replication/origin.h" #include "replication/pgoutput.h" +#include "utils/builtins.h" #include "utils/int8.h" #include "utils/inval.h" #include "utils/lsyscache.h" #include "utils/memutils.h" +#include "utils/snapmgr.h" #include "utils/syscache.h" #include "utils/varlena.h" @@ -99,6 +108,9 @@ typedef struct RelationSyncEntry bool replicate_valid; PublicationActions pubactions; + List *qual; /* row filter */ + List *exprstate; /* ExprState for row filter */ + TupleTableSlot *scantuple; /* tuple table slot for row filter */ /* * OID of the relation to publish changes as. For a partition, this may @@ -122,7 +134,7 @@ static HTAB *RelationSyncCache = NULL; static void init_rel_sync_cache(MemoryContext decoding_context); static void cleanup_rel_sync_cache(TransactionId xid, bool is_commit); -static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid); +static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation relation); static void rel_sync_cache_relation_cb(Datum arg, Oid relid); static void rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue); @@ -131,6 +143,13 @@ static void set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, static bool get_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId xid); +/* row filter routines */ +static EState *create_estate_for_relation(Relation rel); +static ExprState *pgoutput_row_filter_init_expr(Node *rfnode); +static bool pgoutput_row_filter_exec_expr(ExprState *state, ExprContext *econtext); +static bool pgoutput_row_filter(Relation relation, HeapTuple oldtuple, + HeapTuple newtuple, RelationSyncEntry *entry); + /* * Specify output plugin callbacks */ @@ -520,6 +539,154 @@ send_relation_and_attrs(Relation relation, TransactionId xid, OutputPluginWrite(ctx, false); } +/* + * Executor state preparation for evaluation of row filter expressions for the + * specified relation. + */ +static EState * +create_estate_for_relation(Relation rel) +{ + EState *estate; + RangeTblEntry *rte; + + estate = CreateExecutorState(); + + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = RelationGetRelid(rel); + rte->relkind = rel->rd_rel->relkind; + rte->rellockmode = AccessShareLock; + ExecInitRangeTable(estate, list_make1(rte)); + + estate->es_output_cid = GetCurrentCommandId(false); + + return estate; +} + +/* + * Initialize for row filter expression execution. + */ +static ExprState * +pgoutput_row_filter_init_expr(Node *rfnode) +{ + ExprState *exprstate; + Oid exprtype; + Expr *expr; + MemoryContext oldctx; + + /* Prepare expression for execution */ + exprtype = exprType(rfnode); + expr = (Expr *) coerce_to_target_type(NULL, rfnode, exprtype, BOOLOID, -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST, -1); + + if (expr == NULL) + ereport(ERROR, + (errcode(ERRCODE_CANNOT_COERCE), + errmsg("row filter returns type %s that cannot be coerced to the expected type %s", + format_type_be(exprtype), + format_type_be(BOOLOID)), + errhint("You will need to rewrite the row filter."))); + + /* + * Cache ExprState using CacheMemoryContext. This is the same code as + * ExecPrepareExpr() but it is not used because it doesn't use an EState. + * It should probably be another function in the executor to handle the + * execution outside a normal Plan tree context. + */ + oldctx = MemoryContextSwitchTo(CacheMemoryContext); + expr = expression_planner(expr); + exprstate = ExecInitExpr(expr, NULL); + MemoryContextSwitchTo(oldctx); + + return exprstate; +} + +/* + * Evaluates row filter. + * + * If the row filter evaluates to NULL, it is taken as false i.e. the change + * isn't replicated. + */ +static bool +pgoutput_row_filter_exec_expr(ExprState *state, ExprContext *econtext) +{ + Datum ret; + bool isnull; + + Assert(state != NULL); + + ret = ExecEvalExprSwitchContext(state, econtext, &isnull); + + elog(DEBUG3, "row filter evaluates to %s (isnull: %s)", + DatumGetBool(ret) ? "true" : "false", + isnull ? "true" : "false"); + + if (isnull) + return false; + + return DatumGetBool(ret); +} + +/* + * Change is checked against the row filter, if any. + * + * If it returns true, the change is replicated, otherwise, it is not. + */ +static bool +pgoutput_row_filter(Relation relation, HeapTuple oldtuple, HeapTuple newtuple, RelationSyncEntry *entry) +{ + EState *estate; + ExprContext *ecxt; + ListCell *lc; + bool result = true; + + /* Bail out if there is no row filter */ + if (entry->qual == NIL) + return true; + + elog(DEBUG3, "table \"%s.%s\" has row filter", + get_namespace_name(get_rel_namespace(RelationGetRelid(relation))), + get_rel_name(relation->rd_id)); + + PushActiveSnapshot(GetTransactionSnapshot()); + + estate = create_estate_for_relation(relation); + + if (entry->scantuple == NULL) + elog(DEBUG1, "entry->scantuple is null"); + + /* Prepare context per tuple */ + ecxt = GetPerTupleExprContext(estate); + ecxt->ecxt_scantuple = entry->scantuple; + + ExecStoreHeapTuple(newtuple ? newtuple : oldtuple, ecxt->ecxt_scantuple, false); + + /* + * If the subscription has multiple publications and the same table has a + * different row filter in these publications, all row filters must be + * matched in order to replicate this change. + */ + foreach(lc, entry->exprstate) + { + ExprState *exprstate = (ExprState *) lfirst(lc); + + /* Evaluates row filter */ + result = pgoutput_row_filter_exec_expr(exprstate, ecxt); + + elog(DEBUG3, "row filter %smatched", result ? "" : "not "); + + /* If the tuple does not match one of the row filters, bail out */ + if (!result) + break; + } + + /* Cleanup allocated resources */ + ResetExprContext(ecxt); + FreeExecutorState(estate); + PopActiveSnapshot(); + + return result; +} + /* * Sends the decoded DML over wire. * @@ -547,7 +714,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, if (in_streaming) xid = change->txn->xid; - relentry = get_rel_sync_entry(data, RelationGetRelid(relation)); + relentry = get_rel_sync_entry(data, relation); /* First check the table filter */ switch (change->action) @@ -571,8 +738,6 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, /* Avoid leaking memory by using and resetting our own context */ old = MemoryContextSwitchTo(data->context); - maybe_send_schema(ctx, txn, change, relation, relentry); - /* Send the data */ switch (change->action) { @@ -580,6 +745,16 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, { HeapTuple tuple = &change->data.tp.newtuple->tuple; + /* Check row filter. */ + if (!pgoutput_row_filter(relation, NULL, tuple, relentry)) + break; + + /* + * Schema should be sent before the logic that replaces the + * relation because it also sends the ancestor's relation. + */ + maybe_send_schema(ctx, txn, change, relation, relentry); + /* Switch relation if publishing via root. */ if (relentry->publish_as_relid != RelationGetRelid(relation)) { @@ -603,6 +778,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, &change->data.tp.oldtuple->tuple : NULL; HeapTuple newtuple = &change->data.tp.newtuple->tuple; + /* Check row filter. */ + if (!pgoutput_row_filter(relation, oldtuple, newtuple, relentry)) + break; + + maybe_send_schema(ctx, txn, change, relation, relentry); + /* Switch relation if publishing via root. */ if (relentry->publish_as_relid != RelationGetRelid(relation)) { @@ -631,6 +812,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, { HeapTuple oldtuple = &change->data.tp.oldtuple->tuple; + /* Check row filter. */ + if (!pgoutput_row_filter(relation, oldtuple, NULL, relentry)) + break; + + maybe_send_schema(ctx, txn, change, relation, relentry); + /* Switch relation if publishing via root. */ if (relentry->publish_as_relid != RelationGetRelid(relation)) { @@ -694,7 +881,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, if (!is_publishable_relation(relation)) continue; - relentry = get_rel_sync_entry(data, relid); + relentry = get_rel_sync_entry(data, relation); if (!relentry->pubactions.pubtruncate) continue; @@ -1005,9 +1192,10 @@ set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId xid) * when publishing. */ static RelationSyncEntry * -get_rel_sync_entry(PGOutputData *data, Oid relid) +get_rel_sync_entry(PGOutputData *data, Relation relation) { RelationSyncEntry *entry; + Oid relid = RelationGetRelid(relation); bool am_partition = get_rel_relispartition(relid); char relkind = get_rel_relkind(relid); bool found; @@ -1030,6 +1218,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) entry->replicate_valid = false; entry->pubactions.pubinsert = entry->pubactions.pubupdate = entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; + entry->qual = NIL; + entry->scantuple = NULL; + entry->exprstate = NIL; entry->publish_as_relid = InvalidOid; entry->map = NULL; /* will be set by maybe_send_schema() if * needed */ @@ -1041,6 +1232,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) List *pubids = GetRelationPublications(relid); ListCell *lc; Oid publish_as_relid = relid; + TupleDesc tupdesc; /* Reload publications if needed before use. */ if (!publications_valid) @@ -1054,6 +1246,23 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) publications_valid = true; } + /* Release tuple table slot */ + if (entry->scantuple != NULL) + { + ExecDropSingleTupleTableSlot(entry->scantuple); + entry->scantuple = NULL; + + elog(DEBUG1, "get_rel_sync_entry: free entry->scantuple"); + } + + /* create a tuple table slot for row filter */ + tupdesc = RelationGetDescr(relation); + oldctx = MemoryContextSwitchTo(CacheMemoryContext); + entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple); + MemoryContextSwitchTo(oldctx); + + elog(DEBUG1, "get_rel_sync_entry: allocate entry->scantuple"); + /* * Build publication cache. We can't use one provided by relcache as * relcache considers all publications given relation is in, but here @@ -1063,6 +1272,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) { Publication *pub = lfirst(lc); bool publish = false; + HeapTuple rftuple; + Datum rfdatum; + bool rfisnull; if (pub->alltables) { @@ -1122,9 +1334,34 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate; } - if (entry->pubactions.pubinsert && entry->pubactions.pubupdate && - entry->pubactions.pubdelete && entry->pubactions.pubtruncate) - break; + /* + * Cache row filter, if available. All publication-table mappings + * must be checked. If it is a partition and pubviaroot is true, + * use the row filter of the topmost partitioned table instead of + * the row filter of its own partition. + */ + rftuple = SearchSysCache2(PUBLICATIONRELMAP, ObjectIdGetDatum(publish_as_relid), ObjectIdGetDatum(pub->oid)); + if (HeapTupleIsValid(rftuple)) + { + rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple, Anum_pg_publication_rel_prqual, &rfisnull); + + if (!rfisnull) + { + Node *rfnode; + ExprState *exprstate; + + oldctx = MemoryContextSwitchTo(CacheMemoryContext); + rfnode = stringToNode(TextDatumGetCString(rfdatum)); + entry->qual = lappend(entry->qual, rfnode); + + /* Prepare for expression execution */ + exprstate = pgoutput_row_filter_init_expr(rfnode); + entry->exprstate = lappend(entry->exprstate, exprstate); + MemoryContextSwitchTo(oldctx); + } + + ReleaseSysCache(rftuple); + } } list_free(pubids); @@ -1242,6 +1479,7 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue) { HASH_SEQ_STATUS status; RelationSyncEntry *entry; + MemoryContext oldctx; /* * We can get here if the plugin was used in SQL interface as the @@ -1251,6 +1489,8 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue) if (RelationSyncCache == NULL) return; + oldctx = MemoryContextSwitchTo(CacheMemoryContext); + /* * There is no way to find which entry in our cache the hash belongs to so * mark the whole cache as invalid. @@ -1268,5 +1508,15 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue) entry->pubactions.pubupdate = false; entry->pubactions.pubdelete = false; entry->pubactions.pubtruncate = false; + + if (entry->qual != NIL) + list_free_deep(entry->qual); + entry->qual = NIL; + + if (entry->exprstate != NIL) + list_free_deep(entry->exprstate); + entry->exprstate = NIL; } + + MemoryContextSwitchTo(oldctx); } diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 321152151d..6f944ec60d 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -4172,6 +4172,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables) int i_oid; int i_prpubid; int i_prrelid; + int i_prrelqual; int i, j, ntups; @@ -4182,9 +4183,16 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables) query = createPQExpBuffer(); /* Collect all publication membership info. */ - appendPQExpBufferStr(query, - "SELECT tableoid, oid, prpubid, prrelid " - "FROM pg_catalog.pg_publication_rel"); + if (fout->remoteVersion >= 150000) + appendPQExpBufferStr(query, + "SELECT tableoid, oid, prpubid, prrelid, " + "pg_catalog.pg_get_expr(prqual, prrelid) AS prrelqual " + "FROM pg_catalog.pg_publication_rel"); + else + appendPQExpBufferStr(query, + "SELECT tableoid, oid, prpubid, prrelid, " + "NULL AS prrelqual " + "FROM pg_catalog.pg_publication_rel"); res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK); ntups = PQntuples(res); @@ -4193,6 +4201,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables) i_oid = PQfnumber(res, "oid"); i_prpubid = PQfnumber(res, "prpubid"); i_prrelid = PQfnumber(res, "prrelid"); + i_prrelqual = PQfnumber(res, "prrelqual"); /* this allocation may be more than we need */ pubrinfo = pg_malloc(ntups * sizeof(PublicationRelInfo)); @@ -4233,6 +4242,10 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables) pubrinfo[j].dobj.name = tbinfo->dobj.name; pubrinfo[j].publication = pubinfo; pubrinfo[j].pubtable = tbinfo; + if (PQgetisnull(res, i, i_prrelqual)) + pubrinfo[j].pubrelqual = NULL; + else + pubrinfo[j].pubrelqual = pg_strdup(PQgetvalue(res, i, i_prrelqual)); /* Decide whether we want to dump it */ selectDumpablePublicationTable(&(pubrinfo[j].dobj), fout); @@ -4265,8 +4278,11 @@ dumpPublicationTable(Archive *fout, const PublicationRelInfo *pubrinfo) appendPQExpBuffer(query, "ALTER PUBLICATION %s ADD TABLE ONLY", fmtId(pubinfo->dobj.name)); - appendPQExpBuffer(query, " %s;\n", + appendPQExpBuffer(query, " %s", fmtQualifiedDumpable(tbinfo)); + if (pubrinfo->pubrelqual) + appendPQExpBuffer(query, " WHERE (%s)", pubrinfo->pubrelqual); + appendPQExpBufferStr(query, ";\n"); /* * There is no point in creating a drop query as the drop is done by table diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index ba9bc6ddd2..7d72d498c1 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -626,6 +626,7 @@ typedef struct _PublicationRelInfo DumpableObject dobj; PublicationInfo *publication; TableInfo *pubtable; + char *pubrelqual; } PublicationRelInfo; /* diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 2abf255798..e2e64cb3bf 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -6329,8 +6329,15 @@ describePublications(const char *pattern) if (!puballtables) { printfPQExpBuffer(&buf, - "SELECT n.nspname, c.relname\n" - "FROM pg_catalog.pg_class c,\n" + "SELECT n.nspname, c.relname"); + if (pset.sversion >= 150000) + appendPQExpBuffer(&buf, + ", pg_get_expr(pr.prqual, c.oid)"); + else + appendPQExpBuffer(&buf, + ", NULL"); + appendPQExpBuffer(&buf, + "\nFROM pg_catalog.pg_class c,\n" " pg_catalog.pg_namespace n,\n" " pg_catalog.pg_publication_rel pr\n" "WHERE c.relnamespace = n.oid\n" @@ -6359,6 +6366,10 @@ describePublications(const char *pattern) PQgetvalue(tabres, j, 0), PQgetvalue(tabres, j, 1)); + if (!PQgetisnull(tabres, j, 2)) + appendPQExpBuffer(&buf, " WHERE (%s)", + PQgetvalue(tabres, j, 2)); + printTableAddFooter(&cont, buf.data); } PQclear(tabres); diff --git a/src/include/catalog/pg_publication.h b/src/include/catalog/pg_publication.h index f332bad4d4..2703b9c3fe 100644 --- a/src/include/catalog/pg_publication.h +++ b/src/include/catalog/pg_publication.h @@ -83,6 +83,13 @@ typedef struct Publication PublicationActions pubactions; } Publication; +typedef struct PublicationRelationInfo +{ + Oid relid; + Relation relation; + Node *whereClause; +} PublicationRelationInfo; + extern Publication *GetPublication(Oid pubid); extern Publication *GetPublicationByName(const char *pubname, bool missing_ok); extern List *GetRelationPublications(Oid relid); @@ -108,7 +115,7 @@ extern List *GetAllTablesPublications(void); extern List *GetAllTablesPublicationRelations(bool pubviaroot); extern bool is_publishable_relation(Relation rel); -extern ObjectAddress publication_add_relation(Oid pubid, Relation targetrel, +extern ObjectAddress publication_add_relation(Oid pubid, PublicationRelationInfo *pri, bool if_not_exists); extern Oid get_publication_oid(const char *pubname, bool missing_ok); diff --git a/src/include/catalog/pg_publication_rel.h b/src/include/catalog/pg_publication_rel.h index b5d5504cbb..154bb61777 100644 --- a/src/include/catalog/pg_publication_rel.h +++ b/src/include/catalog/pg_publication_rel.h @@ -31,6 +31,10 @@ CATALOG(pg_publication_rel,6106,PublicationRelRelationId) Oid oid; /* oid */ Oid prpubid BKI_LOOKUP(pg_publication); /* Oid of the publication */ Oid prrelid BKI_LOOKUP(pg_class); /* Oid of the relation */ + +#ifdef CATALOG_VARLEN /* variable-length fields start here */ + pg_node_tree prqual; /* qualifications */ +#endif } FormData_pg_publication_rel; /* ---------------- @@ -40,6 +44,8 @@ CATALOG(pg_publication_rel,6106,PublicationRelRelationId) */ typedef FormData_pg_publication_rel *Form_pg_publication_rel; +DECLARE_TOAST(pg_publication_rel, 8287, 8288); + DECLARE_UNIQUE_INDEX_PKEY(pg_publication_rel_oid_index, 6112, PublicationRelObjectIndexId, on pg_publication_rel using btree(oid oid_ops)); DECLARE_UNIQUE_INDEX(pg_publication_rel_prrelid_prpubid_index, 6113, PublicationRelPrrelidPrpubidIndexId, on pg_publication_rel using btree(prrelid oid_ops, prpubid oid_ops)); diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index d9e417bcd7..2037705f45 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -491,6 +491,7 @@ typedef enum NodeTag T_PartitionRangeDatum, T_PartitionCmd, T_VacuumRelation, + T_PublicationTable, /* * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index def9651b34..cf815cc0f2 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -3624,12 +3624,19 @@ typedef struct AlterTSConfigurationStmt } AlterTSConfigurationStmt; +typedef struct PublicationTable +{ + NodeTag type; + RangeVar *relation; /* relation to be published */ + Node *whereClause; /* qualifications */ +} PublicationTable; + typedef struct CreatePublicationStmt { NodeTag type; char *pubname; /* Name of the publication */ List *options; /* List of DefElem nodes */ - List *tables; /* Optional list of tables to add */ + List *tables; /* Optional list of PublicationTable to add */ bool for_all_tables; /* Special publication for all tables in db */ } CreatePublicationStmt; @@ -3642,7 +3649,7 @@ typedef struct AlterPublicationStmt List *options; /* List of DefElem nodes */ /* parameters used for ALTER PUBLICATION ... ADD/DROP TABLE */ - List *tables; /* List of tables to add/drop */ + List *tables; /* List of PublicationTable to add/drop */ bool for_all_tables; /* Special publication for all tables in db */ DefElemAction tableAction; /* What action to perform with the tables */ } AlterPublicationStmt; diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index 1500de2dd0..4537543a7b 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -80,6 +80,7 @@ typedef enum ParseExprKind EXPR_KIND_COPY_WHERE, /* WHERE condition in COPY FROM */ EXPR_KIND_GENERATED_COLUMN, /* generation expression for a column */ EXPR_KIND_CYCLE_MARK, /* cycle mark value */ + EXPR_KIND_PUBLICATION_WHERE /* WHERE condition for a table in PUBLICATION */ } ParseExprKind; diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out index 63d6ab7a4e..444f8344bc 100644 --- a/src/test/regress/expected/publication.out +++ b/src/test/regress/expected/publication.out @@ -156,6 +156,77 @@ Tables: DROP TABLE testpub_parted1; DROP PUBLICATION testpub_forparted, testpub_forparted1; +CREATE TABLE testpub_rf_tbl1 (a integer, b text); +CREATE TABLE testpub_rf_tbl2 (c text, d integer); +CREATE TABLE testpub_rf_tbl3 (e integer); +CREATE TABLE testpub_rf_tbl4 (g text); +SET client_min_messages = 'ERROR'; +CREATE PUBLICATION testpub5 FOR TABLE testpub_rf_tbl1, testpub_rf_tbl2 WHERE (c <> 'test' AND d < 5); +RESET client_min_messages; +\dRp+ testpub5 + Publication testpub5 + Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root +--------------------------+------------+---------+---------+---------+-----------+---------- + regress_publication_user | f | t | t | t | t | f +Tables: + "public.testpub_rf_tbl1" + "public.testpub_rf_tbl2" WHERE (((c <> 'test'::text) AND (d < 5))) + +ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000 AND e < 2000); +\dRp+ testpub5 + Publication testpub5 + Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root +--------------------------+------------+---------+---------+---------+-----------+---------- + regress_publication_user | f | t | t | t | t | f +Tables: + "public.testpub_rf_tbl1" + "public.testpub_rf_tbl2" WHERE (((c <> 'test'::text) AND (d < 5))) + "public.testpub_rf_tbl3" WHERE (((e > 1000) AND (e < 2000))) + +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2; +\dRp+ testpub5 + Publication testpub5 + Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root +--------------------------+------------+---------+---------+---------+-----------+---------- + regress_publication_user | f | t | t | t | t | f +Tables: + "public.testpub_rf_tbl1" + "public.testpub_rf_tbl3" WHERE (((e > 1000) AND (e < 2000))) + +-- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another WHERE expression) +ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300 AND e < 500); +\dRp+ testpub5 + Publication testpub5 + Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root +--------------------------+------------+---------+---------+---------+-----------+---------- + regress_publication_user | f | t | t | t | t | f +Tables: + "public.testpub_rf_tbl3" WHERE (((e > 300) AND (e < 500))) + +-- fail - functions disallowed +ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl4 WHERE (length(g) < 6); +ERROR: functions are not allowed in publication WHERE expressions +LINE 1: ...ICATION testpub5 ADD TABLE testpub_rf_tbl4 WHERE (length(g) ... + ^ +-- fail - user-defined operators disallowed +CREATE FUNCTION testpub_rf_func(integer, integer) RETURNS boolean AS $$ SELECT hashint4($1) > $2 $$ LANGUAGE SQL; +CREATE OPERATOR =#> (PROCEDURE = testpub_rf_func, LEFTARG = integer, RIGHTARG = integer); +CREATE PUBLICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27); +ERROR: user-defined operators are not allowed in publication WHERE expressions +LINE 1: ...ICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27); + ^ +-- fail - WHERE not allowed in DROP +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27); +ERROR: syntax error at or near "WHERE" +LINE 1: ...R PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e <... + ^ +DROP TABLE testpub_rf_tbl1; +DROP TABLE testpub_rf_tbl2; +DROP TABLE testpub_rf_tbl3; +DROP TABLE testpub_rf_tbl4; +DROP PUBLICATION testpub5; +DROP OPERATOR =#>(integer, integer); +DROP FUNCTION testpub_rf_func(integer, integer); -- fail - view CREATE PUBLICATION testpub_fortbl FOR TABLE testpub_view; ERROR: "testpub_view" is not a table diff --git a/src/test/regress/sql/publication.sql b/src/test/regress/sql/publication.sql index d844075368..b1606cce7e 100644 --- a/src/test/regress/sql/publication.sql +++ b/src/test/regress/sql/publication.sql @@ -93,6 +93,38 @@ ALTER PUBLICATION testpub_forparted SET (publish_via_partition_root = true); DROP TABLE testpub_parted1; DROP PUBLICATION testpub_forparted, testpub_forparted1; +CREATE TABLE testpub_rf_tbl1 (a integer, b text); +CREATE TABLE testpub_rf_tbl2 (c text, d integer); +CREATE TABLE testpub_rf_tbl3 (e integer); +CREATE TABLE testpub_rf_tbl4 (g text); +SET client_min_messages = 'ERROR'; +CREATE PUBLICATION testpub5 FOR TABLE testpub_rf_tbl1, testpub_rf_tbl2 WHERE (c <> 'test' AND d < 5); +RESET client_min_messages; +\dRp+ testpub5 +ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000 AND e < 2000); +\dRp+ testpub5 +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2; +\dRp+ testpub5 +-- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another WHERE expression) +ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300 AND e < 500); +\dRp+ testpub5 +-- fail - functions disallowed +ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl4 WHERE (length(g) < 6); +-- fail - user-defined operators disallowed +CREATE FUNCTION testpub_rf_func(integer, integer) RETURNS boolean AS $$ SELECT hashint4($1) > $2 $$ LANGUAGE SQL; +CREATE OPERATOR =#> (PROCEDURE = testpub_rf_func, LEFTARG = integer, RIGHTARG = integer); +CREATE PUBLICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27); +-- fail - WHERE not allowed in DROP +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27); + +DROP TABLE testpub_rf_tbl1; +DROP TABLE testpub_rf_tbl2; +DROP TABLE testpub_rf_tbl3; +DROP TABLE testpub_rf_tbl4; +DROP PUBLICATION testpub5; +DROP OPERATOR =#>(integer, integer); +DROP FUNCTION testpub_rf_func(integer, integer); + -- fail - view CREATE PUBLICATION testpub_fortbl FOR TABLE testpub_view; SET client_min_messages = 'ERROR'; diff --git a/src/test/subscription/t/021_row_filter.pl b/src/test/subscription/t/021_row_filter.pl new file mode 100644 index 0000000000..0f6d2f0128 --- /dev/null +++ b/src/test/subscription/t/021_row_filter.pl @@ -0,0 +1,298 @@ +# Test logical replication behavior with row filtering +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 7; + +# create publisher node +my $node_publisher = get_new_node('publisher'); +$node_publisher->init(allows_streaming => 'logical'); +$node_publisher->start; + +# create subscriber node +my $node_subscriber = get_new_node('subscriber'); +$node_subscriber->init(allows_streaming => 'logical'); +$node_subscriber->start; + +# setup structure on publisher +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_1 (a int primary key, b text)"); +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_2 (c int primary key)"); +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_3 (a int primary key, b boolean)"); +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_partitioned (a int primary key, b integer) PARTITION BY RANGE(a)" +); +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_less_10k (LIKE tab_rowfilter_partitioned)"); +$node_publisher->safe_psql('postgres', + "ALTER TABLE tab_rowfilter_partitioned ATTACH PARTITION tab_rowfilter_less_10k FOR VALUES FROM (MINVALUE) TO (10000)" +); +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_greater_10k (LIKE tab_rowfilter_partitioned)" +); +$node_publisher->safe_psql('postgres', + "ALTER TABLE tab_rowfilter_partitioned ATTACH PARTITION tab_rowfilter_greater_10k FOR VALUES FROM (10000) TO (MAXVALUE)" +); + +# setup structure on subscriber +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_1 (a int primary key, b text)"); +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_2 (c int primary key)"); +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_3 (a int primary key, b boolean)"); +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_partitioned (a int primary key, b integer) PARTITION BY RANGE(a)" +); +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_less_10k (LIKE tab_rowfilter_partitioned)"); +$node_subscriber->safe_psql('postgres', + "ALTER TABLE tab_rowfilter_partitioned ATTACH PARTITION tab_rowfilter_less_10k FOR VALUES FROM (MINVALUE) TO (10000)" +); +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_rowfilter_greater_10k (LIKE tab_rowfilter_partitioned)" +); +$node_subscriber->safe_psql('postgres', + "ALTER TABLE tab_rowfilter_partitioned ATTACH PARTITION tab_rowfilter_greater_10k FOR VALUES FROM (10000) TO (MAXVALUE)" +); + +# setup logical replication +$node_publisher->safe_psql('postgres', + "CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 WHERE (a > 1000 AND b <> 'filtered')" +); + +$node_publisher->safe_psql('postgres', + "ALTER PUBLICATION tap_pub_1 ADD TABLE tab_rowfilter_2 WHERE (c % 7 = 0)" +); + +$node_publisher->safe_psql('postgres', + "ALTER PUBLICATION tap_pub_1 SET TABLE tab_rowfilter_1 WHERE (a > 1000 AND b <> 'filtered'), tab_rowfilter_2 WHERE (c % 2 = 0), tab_rowfilter_3" +); + +$node_publisher->safe_psql('postgres', + "CREATE PUBLICATION tap_pub_2 FOR TABLE tab_rowfilter_2 WHERE (c % 3 = 0)" +); + +$node_publisher->safe_psql('postgres', + "CREATE PUBLICATION tap_pub_3 FOR TABLE tab_rowfilter_partitioned WHERE (a < 5000)" +); +$node_publisher->safe_psql('postgres', + "ALTER PUBLICATION tap_pub_3 ADD TABLE tab_rowfilter_less_10k WHERE (a < 6000)" +); +$node_publisher->safe_psql('postgres', + "CREATE PUBLICATION tap_pub_not_used FOR TABLE tab_rowfilter_1 WHERE (a < 0)" +); + +# +# The following INSERTs are executed before the CREATE SUBSCRIPTION, so these +# SQL commands are for testing the initial data copy using logical replication. +# +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) VALUES (1, 'not replicated')"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) VALUES (1500, 'filtered')"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) VALUES (1980, 'not filtered')"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) SELECT x, 'test ' || x FROM generate_series(990,1002) x" +); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_2 (c) SELECT generate_series(1, 20)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_3 (a, b) SELECT x, (x % 3 = 0) FROM generate_series(1, 10) x"); + +# insert data into partitioned table and directly on the partition +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_partitioned (a, b) VALUES(1, 100),(7000, 101),(15000, 102),(5500, 300)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_less_10k (a, b) VALUES(2, 200),(6005, 201)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_greater_10k (a, b) VALUES(16000, 103)"); + +my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres'; +my $appname = 'tap_sub'; +$node_subscriber->safe_psql('postgres', + "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2, tap_pub_3" +); + +$node_publisher->wait_for_catchup($appname); + +# wait for initial table synchronization to finish +my $synced_query = + "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');"; +$node_subscriber->poll_query_until('postgres', $synced_query) + or die "Timed out while waiting for subscriber to synchronize data"; + +# Check expected replicated rows for tab_rowfilter_1 +# tap_pub_1 filter is: (a > 1000 AND b <> 'filtered') +# - INSERT (1, 'not replicated') NO, because a is not > 1000 +# - INSERT (1500, 'filtered') NO, because b == 'filtered' +# - INSERT (1980, 'not filtered') YES +# - generate_series(990,1002) YES, only for 1001,1002 because a > 1000 +# +my $result = + $node_subscriber->safe_psql('postgres', + "SELECT a, b FROM tab_rowfilter_1 ORDER BY 1, 2"); +is( $result, qq(1001|test 1001 +1002|test 1002 +1980|not filtered), 'check initial data copy from table tab_rowfilter_1'); + +# Check expected replicated rows for tab_rowfilter_2 +# tap_pub_1 filter is: (c % 2 = 0) +# tap_pub_2 filter is: (c % 3 = 0) +# When there are multiple publications for the same table, all filter +# expressions should succeed. In this case, rows are replicated if c value is +# divided by 2 AND 3 (6, 12, 18). +# +$result = + $node_subscriber->safe_psql('postgres', + "SELECT count(c), min(c), max(c) FROM tab_rowfilter_2"); +is($result, qq(3|6|18), 'check initial data copy from table tab_rowfilter_2'); + +# Check expected replicated rows for tab_rowfilter_3 +# There is no filter. 10 rows are inserted, so 10 rows are replicated. +$result = + $node_subscriber->safe_psql('postgres', + "SELECT count(a) FROM tab_rowfilter_3"); +is($result, qq(10), 'check initial data copy from table tab_rowfilter_3'); + +# Check expected replicated rows for partitions +# publication option publish_via_partition_root is false so use the row filter +# from a partition +# tab_rowfilter_partitioned filter: (a < 5000) +# tab_rowfilter_less_10k filter: (a < 6000) +# tab_rowfilter_greater_10k filter: no filter +# +# INSERT into tab_rowfilter_partitioned: +# - INSERT (1,100) YES, because 1 < 6000 +# - INSERT (7000, 101) NO, because 7000 is not < 6000 +# - INSERT (15000, 102) YES, because tab_rowfilter_greater_10k has no filter +# - INSERT (5500, 300) YES, because 5500 < 6000 +# +# INSERT directly into tab_rowfilter_less_10k: +# - INSERT (2, 200) YES, because 2 < 6000 +# - INSERT (6005, 201) NO, because 6005 is not < 6000 +# +# INSERT directly into tab_rowfilter_greater_10k: +# - INSERT (16000, 103) YES, because tab_rowfilter_greater_10k has no filter +# +$result = + $node_subscriber->safe_psql('postgres', + "SELECT a, b FROM tab_rowfilter_less_10k ORDER BY 1, 2"); +is($result, qq(1|100 +2|200 +5500|300), 'check initial data copy from partition tab_rowfilter_less_10k'); + +$result = + $node_subscriber->safe_psql('postgres', + "SELECT a, b FROM tab_rowfilter_greater_10k ORDER BY 1, 2"); +is($result, qq(15000|102 +16000|103), 'check initial data copy from partition tab_rowfilter_greater_10k'); + +# The following commands are executed after CREATE SUBSCRIPTION, so these SQL +# commands are for testing normal logical replication behavior. +# +# test row filter (INSERT, UPDATE, DELETE) +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) VALUES (800, 'test 800')"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) VALUES (1600, 'test 1600')"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) VALUES (1601, 'test 1601')"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_1 (a, b) VALUES (1700, 'test 1700')"); +$node_publisher->safe_psql('postgres', + "UPDATE tab_rowfilter_1 SET b = NULL WHERE a = 1600"); +$node_publisher->safe_psql('postgres', + "UPDATE tab_rowfilter_1 SET b = 'test 1601 updated' WHERE a = 1601"); +$node_publisher->safe_psql('postgres', + "DELETE FROM tab_rowfilter_1 WHERE a = 1700"); + +$node_publisher->wait_for_catchup($appname); + +# Check expected replicated rows for tab_rowfilter_1 +# tap_pub_1 filter is: (a > 1000 AND b <> 'filtered') +# +# - 1001, 1002, 1980 already exist from initial data copy +# - INSERT (800, 'test 800') NO, because 800 is not > 1000 +# - INSERT (1600, 'test 1600') YES, because 1600 > 1000 and 'test 1600' <> 'filtered' +# - INSERT (1601, 'test 1601') YES, because 1601 > 1000 and 'test 1601' <> 'filtered' +# - INSERT (1700, 'test 1700') YES, because 1700 > 1000 and 'test 1700' <> 'filtered' +# - UPDATE (1600, NULL) NO, row filter evaluates to false because NULL is not <> 'filtered' +# - UPDATE (1601, 'test 1601 updated') YES, because 1601 > 1000 and 'test 1601 updated' <> 'filtered' +# - DELETE (1700) NO, row filter contains column b that is not part of +# the PK or REPLICA IDENTITY and old tuple contains b = NULL, hence, row filter +# evaluates to false +# +$result = + $node_subscriber->safe_psql('postgres', + "SELECT a, b FROM tab_rowfilter_1 ORDER BY 1, 2"); +is($result, qq(1001|test 1001 +1002|test 1002 +1600|test 1600 +1601|test 1601 updated +1700|test 1700 +1980|not filtered), 'check replicated rows to table tab_rowfilter_1'); + +# Publish using root partitioned table +# Use a different partitioned table layout (exercise publish_via_partition_root) +$node_publisher->safe_psql('postgres', + "ALTER PUBLICATION tap_pub_3 SET (publish_via_partition_root = true)"); +$node_subscriber->safe_psql('postgres', + "TRUNCATE TABLE tab_rowfilter_partitioned"); +$node_subscriber->safe_psql('postgres', + "ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION WITH (copy_data = true)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_partitioned (a, b) VALUES(4000, 400),(4001, 401),(4002, 402)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_less_10k (a, b) VALUES(4500, 450)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_less_10k (a, b) VALUES(5600, 123)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_rowfilter_greater_10k (a, b) VALUES(14000, 1950)"); +$node_publisher->safe_psql('postgres', + "UPDATE tab_rowfilter_less_10k SET b = 30 WHERE a = 4001"); +$node_publisher->safe_psql('postgres', + "DELETE FROM tab_rowfilter_less_10k WHERE a = 4002"); + +$node_publisher->wait_for_catchup($appname); + +# Check expected replicated rows for partitions +# publication option publish_via_partition_root is true so use the row filter +# from the root partitioned table +# tab_rowfilter_partitioned filter: (a < 5000) +# tab_rowfilter_less_10k filter: (a < 6000) +# tab_rowfilter_greater_10k filter: no filter +# +# After TRUNCATE, REFRESH PUBLICATION, the initial data copy will apply the +# partitioned table row filter. +# - INSERT (1, 100) YES, 1 < 5000 +# - INSERT (7000, 101) NO, 7000 is not < 5000 +# - INSERT (15000, 102) NO, 15000 is not < 5000 +# - INSERT (5500, 300) NO, 5500 is not < 5000 +# - INSERT (2, 200) YES, 2 < 5000 +# - INSERT (6005, 201) NO, 6005 is not < 5000 +# - INSERT (16000, 103) NO, 16000 is not < 5000 +# +# Execute SQL commands after initial data copy for testing the logical +# replication behavior. +# - INSERT (4000, 400) YES, 4000 < 5000 +# - INSERT (4001, 401) YES, 4001 < 5000 +# - INSERT (4002, 402) YES, 4002 < 5000 +# - INSERT (4500, 450) YES, 4500 < 5000 +# - INSERT (5600, 123) NO, 5600 is not < 5000 +# - INSERT (14000, 1950) NO, 16000 is not < 5000 +$result = + $node_subscriber->safe_psql('postgres', + "SELECT a, b FROM tab_rowfilter_partitioned ORDER BY 1, 2"); +is( $result, qq(1|100 +2|200 +4000|400 +4001|30 +4500|450), 'check publish_via_partition_root behavior'); + +$node_subscriber->stop('fast'); +$node_publisher->stop('fast'); -- 2.20.1
From b0d60791f06908d2dc118ff7f5dc669c91062913 Mon Sep 17 00:00:00 2001 From: Euler Taveira <euler.tave...@enterprisedb.com> Date: Sun, 31 Jan 2021 20:48:43 -0300 Subject: [PATCH v18 2/2] Measure row filter overhead --- src/backend/replication/pgoutput/pgoutput.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index 08c018a300..5700a3306b 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -638,6 +638,8 @@ pgoutput_row_filter(Relation relation, HeapTuple oldtuple, HeapTuple newtuple, R ExprContext *ecxt; ListCell *lc; bool result = true; + instr_time start_time; + instr_time end_time; /* Bail out if there is no row filter */ if (entry->qual == NIL) @@ -647,13 +649,12 @@ pgoutput_row_filter(Relation relation, HeapTuple oldtuple, HeapTuple newtuple, R get_namespace_name(get_rel_namespace(RelationGetRelid(relation))), get_rel_name(relation->rd_id)); + INSTR_TIME_SET_CURRENT(start_time); + PushActiveSnapshot(GetTransactionSnapshot()); estate = create_estate_for_relation(relation); - if (entry->scantuple == NULL) - elog(DEBUG1, "entry->scantuple is null"); - /* Prepare context per tuple */ ecxt = GetPerTupleExprContext(estate); ecxt->ecxt_scantuple = entry->scantuple; @@ -684,6 +685,11 @@ pgoutput_row_filter(Relation relation, HeapTuple oldtuple, HeapTuple newtuple, R FreeExecutorState(estate); PopActiveSnapshot(); + INSTR_TIME_SET_CURRENT(end_time); + INSTR_TIME_SUBTRACT(end_time, start_time); + + elog(DEBUG2, "row filter time: %0.3f us", INSTR_TIME_GET_DOUBLE(end_time) * 1e6); + return result; } @@ -1251,8 +1257,6 @@ get_rel_sync_entry(PGOutputData *data, Relation relation) { ExecDropSingleTupleTableSlot(entry->scantuple); entry->scantuple = NULL; - - elog(DEBUG1, "get_rel_sync_entry: free entry->scantuple"); } /* create a tuple table slot for row filter */ @@ -1261,8 +1265,6 @@ get_rel_sync_entry(PGOutputData *data, Relation relation) entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple); MemoryContextSwitchTo(oldctx); - elog(DEBUG1, "get_rel_sync_entry: allocate entry->scantuple"); - /* * Build publication cache. We can't use one provided by relcache as * relcache considers all publications given relation is in, but here -- 2.20.1