Re: SQL JSON compliance
On 13.04.22 22:43, Andrew Dunstan wrote: Simon has just pointed out to me that as a result of recent commits, a number of things now should move from the unsupported table to the supported table in features.sgml. In particular, it looks to me like all of these should move: This all looks correct to me. Please go ahead. T811 Basic SQL/JSON constructor functions T812 SQL/JSON: JSON_OBJECTAGG T813 SQL/JSON: JSON_ARRAYAGG with ORDER BY T814 Colon in JSON_OBJECT or JSON_OBJECTAGG T821 Basic SQL/JSON query operators T822 SQL/JSON: IS JSON WITH UNIQUE KEYS predicate T823 SQL/JSON: PASSING clause T824 JSON_TABLE: specific PLAN clause T825 SQL/JSON: ON EMPTY and ON ERROR clauses T826 General value expression in ON ERROR or ON EMPTY clauses T827 JSON_TABLE: sibling NESTED COLUMNS clauses T828 JSON_QUERY T829 JSON_QUERY: array wrapper options T830 Enforcing unique keys in SQL/JSON constructor functions T838 JSON_TABLE: PLAN DEFAULT clause
Re: Re: fix cost subqueryscan wrong parallel cost
On Fri, Apr 29, 2022 at 12:53 AM Robert Haas wrote: > Gather doesn't require a parallel aware subpath, just a parallel-safe > subpath. In a case like this, the parallel seq scan will divide the > rows from the underlying relation across the three processes executing > it. Each process will pass the rows it receives through its own copy > of the subquery scan. Then, the Gather node will collect all the rows > from all the workers to produce the final result. > > It's an extremely important feature of parallel query that the > parallel-aware node doesn't have to be immediately beneath the Gather. > You need to have a parallel-aware node in there someplace, but it > could be separated from the gather by any number of levels e.g. > > Gather > -> Nested Loop > -> Nested Loop > -> Nested Loop >-> Parallel Seq Scan >-> Index Scan > -> Index Scan >-> Index Scan > Thanks for the explanation. That's really helpful to understand the parallel query mechanism. So for the nodes between Gather and parallel-aware node, how should we calculate their estimated rows? Currently subquery scan is using rel->rows (if no parameterization), which I believe is not correct. That's not the size the subquery scan node in each worker needs to handle, as the rows have been divided across workers by the parallel-aware node. Using subpath->rows is not correct either, as subquery scan node may have quals. It seems to me the right way is to divide the rel->rows among all the workers. Thanks Richard
Re: Multi-Master Logical Replication
On Fri, Apr 29, 2022 at 2:16 PM Yura Sokolov wrote: > > В Чт, 28/04/2022 в 17:37 +0530, vignesh C пишет: > > On Thu, Apr 28, 2022 at 4:24 PM Yura Sokolov > > wrote: > > > В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет: > > > > > > > 1.1 ADVANTAGES OF MMLR > > > > > > > > - Increases write scalability (e.g., all nodes can write arbitrary > > > > data). > > > > > > I've never heard how transactional-aware multimaster increases > > > write scalability. More over, usually even non-transactional > > > multimaster doesn't increase write scalability. At the best it > > > doesn't decrease. > > > > > > That is because all hosts have to write all changes anyway. But > > > side cost increases due to increased network interchange and > > > interlocking (for transaction-aware MM) and increased latency. > > > > I agree it won't increase in all cases, but it will be better in a few > > cases when the user works on different geographical regions operating > > on independent schemas in asynchronous mode. Since the write node is > > closer to the geographical zone, the performance will be better in a > > few cases. > > From EnterpriseDB BDB page [1]: > > > Adding more master nodes to a BDR Group does not result in > > significant write throughput increase when most tables are > > replicated because BDR has to replay all the writes on all nodes. > > Because BDR writes are in general more effective than writes coming > > from Postgres clients via SQL, some performance increase can be > > achieved. Read throughput generally scales linearly with the number > > of nodes. > > And I'm sure EnterpriseDB does the best. > > > > В Чт, 28/04/2022 в 08:34 +, kuroda.hay...@fujitsu.com пишет: > > > > Dear Laurenz, > > > > > > > > Thank you for your interest in our works! > > > > > > > > > I am missing a discussion how replication conflicts are handled to > > > > > prevent replication from breaking > > > > > > > > Actually we don't have plans for developing the feature that avoids > > > > conflict. > > > > We think that it should be done as core PUB/SUB feature, and > > > > this module will just use that. > > > > > > If you really want to have some proper isolation levels ( > > > Read Committed? Repeatable Read?) and/or want to have > > > same data on each "master", there is no easy way. If you > > > think it will be "easy", you are already wrong. > > > > The synchronous_commit and synchronous_standby_names configuration > > parameters will help in getting the same data across the nodes. Can > > you give an example for the scenario where it will be difficult? > > So, synchronous or asynchronous? > Synchronous commit on every master, every alive master or on quorum > of masters? > > And it is not about synchronicity. It is about determinism at > conflicts. > > If you have fully determenistic conflict resolution that works > exactly same way on each host, then it is possible to have same > data on each host. (But it will not be transactional.)And it seems EDB BDB > achieved this. > > Or if you have fully and correctly implemented one of distributed > transactions protocols. > > [1] > https://www.enterprisedb.com/docs/bdr/latest/overview/#characterising-bdr-performance > > regards > > -- > > Yura Sokolov Thanks for your feedback. This MMLR proposal was mostly just to create an interface making it easier to use PostgreSQL core logical replication CREATE PUBLICATION/SUBSCRIPTION for table sharing among a set of nodes. Otherwise, this is difficult for a user to do manually. (e.g. difficulties as mentioned in section 2.2 of the original post [1] - dealing with initial table data, coordinating the timing/locking to avoid concurrent updates, getting the SUBSCRIPTION options for copy_data exactly right etc) At this time we have no provision for HA, nor for transaction consistency awareness, conflict resolutions, node failure detections, DDL replication etc. Some of the features like DDL replication are currently being implemented [2], so when committed it will become available in the core, and can then be integrated into this module. Once the base feature of the current MMLR proposal is done, perhaps it can be extended in subsequent versions. Probably our calling this “Multi-Master” has been misleading/confusing, because that term implies much more to other readers. We really only intended it to mean the ability to set up logical replication across a set of nodes. Of course, we can rename the proposal (and API) to something different if there are better suggestions. -- [1] https://www.postgresql.org/message-id/CAHut%2BPuwRAoWY9pz%3DEubps3ooQCOBFiYPU9Yi%3DVB-U%2ByORU7OA%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/45d0d97c-3322-4054-b94f-3c08774bbd90%40www.fastmail.com#db6e810fc93f17b0a5585bac25fb3d4b Kind Regards, Peter Smith. Fujitsu Australia
Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
On Sun, Feb 27, 2022 at 12:50 PM Bharath Rupireddy wrote: > > On Wed, Dec 29, 2021 at 6:50 AM Bharath Rupireddy > wrote: > > > > On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera > > wrote: > > > > > > Maybe some tunable like > > > log_wal_traffic = { none, medium, high } > > > where "none" is current behavior of no noise, "medium" gets (say) once > > > every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets > > > you one message per segment. > > > > On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby wrote: > > > > > > If you're talking about a new feature that uses the infrastructre from > > > 9ce3, > > > but is controlled by a separate GUC like log_wal_traffic, that could be > > > okay. > > > > Thanks for the suggestion. I've added a new GUC log_wal_traffic as > > suggested above. PSA v7 patch. > > Here's the rebased v8 patch, please review it. > > https://commitfest.postgresql.org/37/3443/ Here's the rebased v9 patch. Regards, Bharath Rupireddy. v9-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patch Description: Binary data
add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Sun, Feb 27, 2022 at 3:16 PM Bharath Rupireddy wrote: > > On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy > wrote: > > > > On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi > > wrote: > > > > Here's the v2 patch. Please provide your thoughts. > > > > > > Thanks! I have three comments on this version. > > > > Thanks for your review. > > > > > - I still think "acquire/release" logs on creation/dropping should be > > > silenced. Adding the third parameter to ReplicationSlotAcquire() > > > that can mute the acquiring (and as well as the corresponding > > > releasing) log will work. > > > > Done. > > > > > can piggy-back on log_replication_commands for the purpose, changing > > > its meaning slightly to "log replication commands and related > > > activity". > > > - Need to mute the logs by log_replication_commands. (We could add > > > another choice for the variable for this purpose but I think we > > > don't need it.) > > > > Done. > > > > > - The messages are not translatable as the variable parts are > > > adjectives. They need to consist of static sentences. The > > > combinations of the two properties are 6 (note that persistence is > > > tristate) but I don't think we accept that complexity for the > > > information. So I recommend to just remove the variable parts from > > > the messages. > > > > Done. > > > > Here's v3, please review it further. > > Here's the rebased v4 patch. Here's the rebased v5 patch. Regards, Bharath Rupireddy. v5-0001-Add-LOG-messages-when-replication-slots-become-ac.patch Description: Binary data
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
If I read the code right, this patch emits logs when pg_logical_slot_get_changes and pg_replication_slot_advance SQL functions are called. Is this desirable/useful, for the use case you stated at start of thread? I think it is most likely pointless. If you get rid of those, then the only acquisitions that would log messages are those in StartReplication and StartLogicalReplication. So I wonder if it would be better to leave the API of ReplicationSlotAcquire() alone, and instead make StartReplication and StartLogicalReplication responsible for those messages. I didn't look at the release-side messages you're adding, but I suppose it should be symmetrical. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)
Re: Multi-Master Logical Replication
On Fri, Apr 29, 2022 at 2:35 PM Peter Smith wrote: > > On Fri, Apr 29, 2022 at 2:16 PM Yura Sokolov wrote: > > > > В Чт, 28/04/2022 в 17:37 +0530, vignesh C пишет: > > > On Thu, Apr 28, 2022 at 4:24 PM Yura Sokolov > > > wrote: > > > > В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет: > > > > > > > > > 1.1 ADVANTAGES OF MMLR > > > > > > > > > > - Increases write scalability (e.g., all nodes can write arbitrary > > > > > data). > > > > > > > > I've never heard how transactional-aware multimaster increases > > > > write scalability. More over, usually even non-transactional > > > > multimaster doesn't increase write scalability. At the best it > > > > doesn't decrease. > > > > > > > > That is because all hosts have to write all changes anyway. But > > > > side cost increases due to increased network interchange and > > > > interlocking (for transaction-aware MM) and increased latency. > > > > > > I agree it won't increase in all cases, but it will be better in a few > > > cases when the user works on different geographical regions operating > > > on independent schemas in asynchronous mode. Since the write node is > > > closer to the geographical zone, the performance will be better in a > > > few cases. > > > > From EnterpriseDB BDB page [1]: > > > > > Adding more master nodes to a BDR Group does not result in > > > significant write throughput increase when most tables are > > > replicated because BDR has to replay all the writes on all nodes. > > > Because BDR writes are in general more effective than writes coming > > > from Postgres clients via SQL, some performance increase can be > > > achieved. Read throughput generally scales linearly with the number > > > of nodes. > > > > And I'm sure EnterpriseDB does the best. > > > > > > В Чт, 28/04/2022 в 08:34 +, kuroda.hay...@fujitsu.com пишет: > > > > > Dear Laurenz, > > > > > > > > > > Thank you for your interest in our works! > > > > > > > > > > > I am missing a discussion how replication conflicts are handled to > > > > > > prevent replication from breaking > > > > > > > > > > Actually we don't have plans for developing the feature that avoids > > > > > conflict. > > > > > We think that it should be done as core PUB/SUB feature, and > > > > > this module will just use that. > > > > > > > > If you really want to have some proper isolation levels ( > > > > Read Committed? Repeatable Read?) and/or want to have > > > > same data on each "master", there is no easy way. If you > > > > think it will be "easy", you are already wrong. > > > > > > The synchronous_commit and synchronous_standby_names configuration > > > parameters will help in getting the same data across the nodes. Can > > > you give an example for the scenario where it will be difficult? > > > > So, synchronous or asynchronous? > > Synchronous commit on every master, every alive master or on quorum > > of masters? > > > > And it is not about synchronicity. It is about determinism at > > conflicts. > > > > If you have fully determenistic conflict resolution that works > > exactly same way on each host, then it is possible to have same > > data on each host. (But it will not be transactional.)And it seems EDB BDB > > achieved this. > > > > Or if you have fully and correctly implemented one of distributed > > transactions protocols. > > > > [1] > > https://www.enterprisedb.com/docs/bdr/latest/overview/#characterising-bdr-performance > > > > regards > > > > -- > > > > Yura Sokolov > > Thanks for your feedback. > > This MMLR proposal was mostly just to create an interface making it > easier to use PostgreSQL core logical replication CREATE > PUBLICATION/SUBSCRIPTION for table sharing among a set of nodes. > Otherwise, this is difficult for a user to do manually. (e.g. > difficulties as mentioned in section 2.2 of the original post [1] - > dealing with initial table data, coordinating the timing/locking to > avoid concurrent updates, getting the SUBSCRIPTION options for > copy_data exactly right etc) Different problems and how to solve each scenario is mentioned detailly in [1]. It gets even more complex when there are more nodes associated, let's consider the 3 node case: Adding a new node node3 to the existing node1 and node2 when data is present in existing nodes node1 and node2, the following steps are required: Create a publication in node3: CREATE PUBLICATION pub_node3 for all tables; Create a subscription in node1 to subscribe the changes from node3: CREATE SUBSCRIPTION sub_node1_node3 CONNECTION 'dbname=foo host=node3 user=repuser' PUBLICATION pub_node3 WITH (copy_data = off, local_only = on); Create a subscription in node2 to subscribe the changes from node3: CREATE SUBSCRIPTION sub_node2_node3 CONNECTION 'dbname=foo host=node3 user=repuser' PUBLICATION pub_node3 WITH (copy_data = off, local_only = on); Lock database at node2 and wait till walsender sends WAL to node1(upto current lsn) to avoid any data loss because of node2's WAL not being sent to node1. This lock needs
Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
Did we ever consider the idea of using a new pg_stat_wal_activity_progress view or something like that, using the backend_progress.c functionality? I don't see it mentioned in the thread. It's not an *exact* fit, since this is not about one "command" being executed by a "backend", but it seems a near enough fit, and it would unobtrusive enough that we don't need to worry about the progress-update rate or have a GUC to determine how frequently to update (with a gauge that's heavily workload-dependant and is likely to make little sense to some users -- consider differently-sized WAL segments, for one thing). -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Handle infinite recursion in logical replication setup
On Fri, Apr 29, 2022 at 8:08 AM Peter Smith wrote: > > Here are my review comments for the v11* patches. > > == > > v11-0001 - no more comments. LGTM > > == > > V11-0002 > > 1. doc/src/sgml/logical-replication.sgml > > + > + Bidirectional replication is useful in creating multi master database > + which helps in performing read/write operations from any of the nodes. > + Setting up bidirectional logical replication between two nodes requires > + creation of the publication in all the nodes, creating subscriptions in > + each of the nodes that subscribes to data from all the nodes. The steps > + to create a two-node bidirectional replication are given below: > + > > Wording: "creating multi master database" -> "creating a multi-master > database" > Wording: "creation of the publication in all the nodes" -> "creation > of a publication in all the nodes". Modified > ~~~ > > 2. doc/src/sgml/logical-replication.sgml > > > + > > +node2=# CREATE SUBSCRIPTION sub_node1_node2 > > +node2=# CONNECTION 'dbname=foo host=node1 user=repuser' > > +node2=# PUBLICATION pub_node1 > > +node2=# WITH (copy_data = off, local_only = on); > > +CREATE SUBSCRIPTION > > + > > + > > > > IIUC the closing should be on the same line as the > > . I recall there was some recent github push about > > this sometime in the last month - maybe you can check to confirm it. > > Vignesh: I have seen the programlisting across multiple places and noticed > that there is no such guideline being followed. I have not made any > change for this comment. > > FYI – I found the push [1] by PeterE that I was referring to so. I > thought we perhaps should follow this, even if not all other code is > doing so. But you can choose. Modified > ~~~ > > 3. doc/src/sgml/logical-replication.sgml > > + > +Create a subscription in node3 to subscribe the > changes > +from node1 and node2, here > +copy_data is specified as force > when > +creating a subscription to node1 so that the existing > +table data is copied during initial sync: > + > +node3=# CREATE SUBSCRIPTION > +node3-# sub_node3_node1 CONNECTION 'dbname=foo host=node1 user=repuser' > +node3-# PUBLICATION pub_node1 > +node3-# WITH (copy_data = force, local_only = on); > +CREATE SUBSCRIPTION > +node3=# CREATE SUBSCRIPTION > +node3-# sub_node3_node2 CONNECTION 'dbname=foo host=node2 user=repuser' > +node3-# PUBLICATION pub_node2 > +node3-# WITH (copy_data = off, local_only = on); > +CREATE SUBSCRIPTION > + > > I think this part should be split into 2 separate program listings > each for the different subscriptions. Then those descriptions can be > described separately (e.g. why one is force and the other is not). > Then it will also be more consistent with how you split/explained > something similar in the previous section on this page. Modified The attached v12 patch has the changes for the same. Regards, Vignesh From 28a94a0e335241f8cd60adb4cdeb359c69cecbfe Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Fri, 8 Apr 2022 11:10:05 +0530 Subject: [PATCH v12 1/2] Skip replication of non local data. This patch adds a new SUBSCRIPTION boolean option "local_only". The default is false. When a SUBSCRIPTION is created with this option enabled, the publisher will only publish data that originated at the publisher node. Usage: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=' PUBLICATION pub1 with (local_only = true); --- doc/src/sgml/ref/alter_subscription.sgml | 5 +- doc/src/sgml/ref/create_subscription.sgml | 12 ++ src/backend/catalog/pg_subscription.c | 1 + src/backend/catalog/system_views.sql | 4 +- src/backend/commands/subscriptioncmds.c | 26 ++- .../libpqwalreceiver/libpqwalreceiver.c | 5 + src/backend/replication/logical/worker.c | 2 + src/backend/replication/pgoutput/pgoutput.c | 23 +++ src/bin/pg_dump/pg_dump.c | 17 +- src/bin/pg_dump/pg_dump.h | 1 + src/bin/psql/describe.c | 8 +- src/bin/psql/tab-complete.c | 4 +- src/include/catalog/pg_subscription.h | 3 + src/include/replication/logicalproto.h| 10 +- src/include/replication/pgoutput.h| 1 + src/include/replication/walreceiver.h | 1 + src/test/regress/expected/subscription.out| 142 --- src/test/regress/sql/subscription.sql | 10 ++ src/test/subscription/t/032_localonly.pl | 162 ++ 19 files changed, 365 insertions(+), 72 deletions(-) create mode 100644 src/test/subscription/t/032_localonly.pl diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 353ea5def2..c5ebcf5500 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -207,8 +207,9 @@ ALTER SUBSCRIPTION name RENAME TO < information. The parameters that can be a
Re: Skipping schema changes in publication
On Thu, Apr 28, 2022 at 4:50 PM osumi.takami...@fujitsu.com wrote: > > On Wednesday, April 27, 2022 9:50 PM vignesh C wrote: > > Thanks for the comments, the attached v3 patch has the changes for the same. > Hi > > Thank you for updating the patch. Several minor comments on v3. > > (1) commit message > > The new syntax allows specifying schemas. For example: > CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE t1,t2; > OR > ALTER PUBLICATION pub1 ADD EXCEPT TABLE t1,t2; > > We have above sentence, but it looks better > to make the description a bit more accurate. > > Kindly change > From : > "The new syntax allows specifying schemas" > To : > "The new syntax allows specifying excluded relations" > > Also, kindly change "OR" to "or", > because this description is not syntax. Slightly reworded and modified > (2) publication_add_relation > > @@ -396,6 +400,9 @@ publication_add_relation(Oid pubid, PublicationRelInfo > *pri, > ObjectIdGetDatum(pubid); > values[Anum_pg_publication_rel_prrelid - 1] = > ObjectIdGetDatum(relid); > + values[Anum_pg_publication_rel_prexcept - 1] = > + BoolGetDatum(pri->except); > + > > /* Add qualifications, if available */ > > It would be better to remove the blank line, > because with this change, we'll have two blank > lines in a row. Modified > (3) pg_dump.h & pg_dump_sort.c > > @@ -80,6 +80,7 @@ typedef enum > DO_REFRESH_MATVIEW, > DO_POLICY, > DO_PUBLICATION, > + DO_PUBLICATION_EXCEPT_REL, > DO_PUBLICATION_REL, > DO_PUBLICATION_TABLE_IN_SCHEMA, > DO_SUBSCRIPTION > > @@ -90,6 +90,7 @@ enum dbObjectTypePriorities > PRIO_FK_CONSTRAINT, > PRIO_POLICY, > PRIO_PUBLICATION, > + PRIO_PUBLICATION_EXCEPT_REL, > PRIO_PUBLICATION_REL, > PRIO_PUBLICATION_TABLE_IN_SCHEMA, > PRIO_SUBSCRIPTION, > @@ -144,6 +145,7 @@ static const int dbObjectTypePriority[] = > PRIO_REFRESH_MATVIEW, /* DO_REFRESH_MATVIEW */ > PRIO_POLICY,/* DO_POLICY */ > PRIO_PUBLICATION, /* DO_PUBLICATION */ > + PRIO_PUBLICATION_EXCEPT_REL,/* DO_PUBLICATION_EXCEPT_REL */ > PRIO_PUBLICATION_REL, /* DO_PUBLICATION_REL */ > PRIO_PUBLICATION_TABLE_IN_SCHEMA, /* > DO_PUBLICATION_TABLE_IN_SCHEMA */ > PRIO_SUBSCRIPTION /* DO_SUBSCRIPTION */ > > How about having similar order between > pg_dump.h and pg_dump_sort.c, like > we'll add DO_PUBLICATION_EXCEPT_REL > after DO_PUBLICATION_REL in pg_dump.h ? > Modified > (4) GetAllTablesPublicationRelations > > + /* > +* pg_publication_rel and pg_publication_namespace will only have > except > +* tables in case of all tables publication, no need to pass except > flag > +* to get the relations. > +*/ > + List *exceptpubtablelist = GetPublicationRelations(pubid, > PUBLICATION_PART_ALL); > + > > There is one unnecessary space in a comment > "...pg_publication_namespace will only have...". Kindly remove it. > > Then, how about diving the variable declaration and > the insertion of the return value of GetPublicationRelations ? > That might be aligned with other places in this file. Modified > (5) GetTopMostAncestorInPublication > > > @@ -302,8 +303,9 @@ GetTopMostAncestorInPublication(Oid puboid, List > *ancestors, int *ancestor_level > foreach(lc, ancestors) > { > Oid ancestor = lfirst_oid(lc); > - List *apubids = GetRelationPublications(ancestor); > + List *apubids = GetRelationPublications(ancestor, > false); > List *aschemaPubids = NIL; > + List *aexceptpubids; > > level++; > > @@ -317,7 +319,9 @@ GetTopMostAncestorInPublication(Oid puboid, List > *ancestors, int *ancestor_level > else > { > aschemaPubids = > GetSchemaPublications(get_rel_namespace(ancestor)); > - if (list_member_oid(aschemaPubids, puboid)) > + aexceptpubids = GetRelationPublications(ancestor, > true); > + if (list_member_oid(aschemaPubids, puboid) || > + (puballtables && > !list_member_oid(aexceptpubids, puboid))) > { > topmost_relid = ancestor; > > It seems we forgot to call list_free for "aexceptpubids". Modified The attached v4 patch has the changes for the same. Regards, Vignesh From b900c9468bc5c9deb3412361c7a7891352b0ffb6 Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Wed, 20 Apr 2022 11:19:50 +0530 Subject: [PATCH v4] Skip publishing the tables specified in EXCEPT TABLE. A new option "EXCEPT TABLE" in Create/Alter Publication allows one or more tables to be excl
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
On Tue, Apr 26, 2022 at 12:09 AM Stephen Frost wrote: > > I was thinking more specifically along the lines of "if there's > X GB > of WAL that hasn't been archived, give up on archiving anything new" > (which is how the pgbackrest option works). IMO, this option is dangerous because the users might lose the unarchived WAL files completely. Instead, fixing the archive_command either before the server runs out of disk space (if someone/some monitoring detected the archive_command failure) or after the server crashes with no space. > As archiving with this command is optional, it does present the same > risk too. Perhaps if we flipped it around to require the > archive_command be provided then it'd be a bit better, though we would > also need a way for users to ask for us to just delete the WAL without > archiving it. There again though ... the server already has a way of > both archiving and removing archived WAL and also has now grown the > archive_library option, something that this tool would be pretty hard to > replicate, I feel like, as it wouldn't be loading the library into a PG > backend anymore. As we don't have any real archive libraries yet, it's > hard to say if that's going to be an actual issue or not. Something to > consider though. Actually, why are we just thinking that the WAL files grow up only because of archive command failures (of course, it's the main cause even in a well-configured server)?. Imagine if the checkpoints weren't happening frequently for whatever reasons or the max_slot_wal_keep_size or wal_keep_size weren't set appropriately (even if they set properly, someone, could be an attacker, can reset them), high write activity generating huge WAL files (especially on smaller servers) - these too can be reasons for server going down because of WAL files growth. The main motto of this tool is to use (by DBAs or service engineers) it as a last resort option after the server goes out of disk space to quickly bring the server back online so that the regular cleaning up activity can take place or the storage scale operations can be performed if required. There's a dry run option in pg_walcleaner to see if it can free up some space at all. If required we can provide an option to archive and delete only the maximum of specified number of WAL files. I'm okay with making the archive_command mandatory and if archiving isn't required to be used with the pg_walcleaner tool they can set some dummy values such as archive_command='/bin/true'. > > The initial version of the patch doesn't check if the server crashed > > or not before running it. I was thinking of looking at the > > postmaster.pid or pg_control file (however they don't guarantee > > whether the server is up or crashed because the server can crash > > without deleting postmaster.pid or updating pg_control file). Another > > idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see > > if the server is up and running, if yes, exit, otherwise proceed with > > its work. > > All of which isn't an issue if we don't have an external tool trying to > do this and instead have the server doing it as the server knows its > internal status, that the archive command has been failing long enough > to pass the configuration threshold, and that the WAL isn't needed for > crash recovery. The ability to avoid having to crash and go through > that process is pretty valuable. Still, a crash may still happen and > it'd be useful to have a clean way to deal with it. I'm not really a > fan of having to essentially configure this external command as well as > have the server configured. Have we settled that there's no way to make > the server archive while there's no space available and before trying to > write out more data? The pg_walcleaner tool isn't intrusive in the sense that it doesn't delete the WAL files that are required for the server to come up (as it checks for the checkpoint redo WAL file), apart from this it has archive_command too so no loss of the WAL file(s) at all unlike the pgbackrest option. > > Also, to not cause losing of WAL permanently, we must recommend using > > archvie_command so that the WAL can be moved to an alternative > > location (could be the same archvie_location that primary uses). > > I agree we should recommend using archive_command or archive_library, of > course, but if that's been done and is working properly then this tool > isn't really needed. The use-case we're trying to address, I believe, > is something like: > > 1) archive command starts failing for some reason > 2) WAL piles up on the primary > 3) primary runs out of disk space, crash happens > 4) archive command gets 'fixed' in some fashion > 5) WAL is archived and removed from primary > 6) primary is restarted and able to go through crash recovery > 7) server is online again > > Now, I was trying to suggest an approach to addressing the issue at #2, > that is, avoid having WAL pile up without end on the primary and avoid > t
Re: Building Postgres with lz4 on Visual Studio
On 2022-04-26 Tu 16:26, Robert Haas wrote: > On Wed, Apr 13, 2022 at 10:22 AM Melih Mutlu wrote: >> What I realized is that c:\lz4\lib\liblz4.lib actually does not exist. >> The latest versions of lz4, downloaded from [2], do not contain \liblz4.lib >> anymore, as opposed to what's written here [3]. Also there isn't a lib >> folder too. >> >> After those changes on lz4 side, AFAIU there seems like this line adds >> library from wrong path in Solution.pm file [4]. >>> $proj->AddIncludeDir($self->{options}->{lz4} . '\include'); >>> $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); >> Even if I spent some time on this problem and tried to fix some places, I'm >> not able to run a successful build yet. >> This is also the case for zstd too. Enabling zstd gives the same error. >> >> Has anyone had this issue before? Is this something that anyone is aware of >> and somehow made it work? >> I would appreciate any comment/suggestion etc. > In Solution.pm we have this: > > if ($self->{options}->{lz4}) > { > $proj->AddIncludeDir($self->{options}->{lz4} . '\include'); > $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); > } > if ($self->{options}->{zstd}) > { > $proj->AddIncludeDir($self->{options}->{zstd} . '\include'); > $proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib'); > } > > I think what you're saying is that the relative pathnames here may not > be correct, depending on which version of lz4/zstd you're using. The > solution is probably to use perl's -e to test which files actually > exists e.g. > > if ($self->{options}->{lz4}) > { > $proj->AddIncludeDir($self->{options}->{lz4} . '\include'); > if (-e $proj->AddLibrary($self->{options}->{lz4} . > '\someplace\somelz4.lib') > { > $proj->AddLibrary($self->{options}->{lz4} . > '\someplace\somelz4.lib'); > } > else > { > $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); > } > $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); >} > > The trick, at least as it seems to me, is figuring out exactly what > the right set of conditions is, based on what kinds of different > builds exist out there and where they put stuff. I agree that we should use perl's -e to test that the files actually exists. But I don't think we should try to adjust to everything the zstd and lz4 people put in their release files. They are just horribly inconsistent. What I did was to install the packages using vcpkg[1] which is a standard framework created by Microsoft for installing package libraries. It does install the .lib files in a sane place (installdir/lib), but it doesn't use the lib suffix. Also it names the lib file for zlib differently. I got around those things by renaming the lib files, but that's a bit ugly. So I came up with this (untested) patch. cheers andrew [1] https://github.com/microsoft/vcpkg -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Building Postgres with lz4 on Visual Studio
On 2022-04-29 Fr 08:50, Andrew Dunstan wrote: > On 2022-04-26 Tu 16:26, Robert Haas wrote: >> On Wed, Apr 13, 2022 at 10:22 AM Melih Mutlu wrote: >>> What I realized is that c:\lz4\lib\liblz4.lib actually does not exist. >>> The latest versions of lz4, downloaded from [2], do not contain \liblz4.lib >>> anymore, as opposed to what's written here [3]. Also there isn't a lib >>> folder too. >>> >>> After those changes on lz4 side, AFAIU there seems like this line adds >>> library from wrong path in Solution.pm file [4]. $proj->AddIncludeDir($self->{options}->{lz4} . '\include'); $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); >>> Even if I spent some time on this problem and tried to fix some places, I'm >>> not able to run a successful build yet. >>> This is also the case for zstd too. Enabling zstd gives the same error. >>> >>> Has anyone had this issue before? Is this something that anyone is aware of >>> and somehow made it work? >>> I would appreciate any comment/suggestion etc. >> In Solution.pm we have this: >> >> if ($self->{options}->{lz4}) >> { >> $proj->AddIncludeDir($self->{options}->{lz4} . '\include'); >> $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); >> } >> if ($self->{options}->{zstd}) >> { >> $proj->AddIncludeDir($self->{options}->{zstd} . '\include'); >> $proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib'); >> } >> >> I think what you're saying is that the relative pathnames here may not >> be correct, depending on which version of lz4/zstd you're using. The >> solution is probably to use perl's -e to test which files actually >> exists e.g. >> >> if ($self->{options}->{lz4}) >> { >> $proj->AddIncludeDir($self->{options}->{lz4} . '\include'); >> if (-e $proj->AddLibrary($self->{options}->{lz4} . >> '\someplace\somelz4.lib') >> { >> $proj->AddLibrary($self->{options}->{lz4} . >> '\someplace\somelz4.lib'); >> } >> else >> { >> $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); >> } >> $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); >>} >> >> The trick, at least as it seems to me, is figuring out exactly what >> the right set of conditions is, based on what kinds of different >> builds exist out there and where they put stuff. > > I agree that we should use perl's -e to test that the files actually > exists. But I don't think we should try to adjust to everything the zstd > and lz4 people put in their release files. They are just horribly > inconsistent. > > What I did was to install the packages using vcpkg[1] which is a > standard framework created by Microsoft for installing package > libraries. It does install the .lib files in a sane place > (installdir/lib), but it doesn't use the lib suffix. Also it names the > lib file for zlib differently. > > I got around those things by renaming the lib files, but that's a bit > ugly. So I came up with this (untested) patch. > > er this patch -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/tools/msvc/Project.pm b/src/tools/msvc/Project.pm index d39c502f30..ec2b860462 100644 --- a/src/tools/msvc/Project.pm +++ b/src/tools/msvc/Project.pm @@ -175,6 +175,15 @@ sub AddLibrary { my ($self, $lib, $dbgsuffix) = @_; + if ( ! -e $lib) + { + my ($libname = $lib) =~ s!.*[/\\]!!; + my $altlibname = $libname eq 'zdll.lib' ? 'zlib.lib' : "lib$libname"; + my ($altlib = $lib) =~ s/$libname/$altlibname/; + die "cannot find $lib or $altlib" unless -e $altlib; + $lib = $altlib; + } + # quote lib name if it has spaces and isn't already quoted if ($lib =~ m/\s/ && $lib !~ m/^[&]quot;/) { diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm index cb2ad6cd29..d0a949b391 100644 --- a/src/tools/msvc/Solution.pm +++ b/src/tools/msvc/Solution.pm @@ -1089,12 +1089,12 @@ sub AddProject if ($self->{options}->{lz4}) { $proj->AddIncludeDir($self->{options}->{lz4} . '\include'); - $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib'); + $proj->AddLibrary($self->{options}->{lz4} . '\lib\lz4.lib'); } if ($self->{options}->{zstd}) { $proj->AddIncludeDir($self->{options}->{zstd} . '\include'); - $proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib'); + $proj->AddLibrary($self->{options}->{zstd} . '\lib\zstd.lib'); } if ($self->{options}->{uuid}) {
Re: SQL JSON compliance
On 2022-04-29 Fr 04:13, Peter Eisentraut wrote: > On 13.04.22 22:43, Andrew Dunstan wrote: >> Simon has just pointed out to me that as a result of recent commits, a >> number of things now should move from the unsupported table to the >> supported table in features.sgml. In particular, it looks to me like all >> of these should move: > > This all looks correct to me. Please go ahead. Thanks, done. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Assorted small doc patches
Updated status of the set. On Wed, Apr 20, 2022 at 5:59 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > > v0001-database-default-name (-bugs, with a related cleanup suggestion as > well) > > https://www.postgresql.org/message-id/flat/CAKFQuwZvHH1HVSOu7EYjvshynk4pnDwC5RwkF%3DVfZJvmUskwrQ%40mail.gmail.com#0e6d799478d88aee93402bec35fa64a2 > > > v0002-doc-extension-dependent-routine-behavior (-general, reply to user > confusion) > > https://www.postgresql.org/message-id/CAKFQuwb_QtY25feLeh%3D8uNdnyo1H%3DcN4R3vENsUwQzJP4-0xZg%40mail.gmail.com > > > v0001-doc-savepoint-name-reuse (-docs, reply to user request for > improvement) > > https://www.postgresql.org/message-id/CAKFQuwYzSb9OW5qTFgc0v9RWMN8bX83wpe8okQ7x6vtcmfA2KQ%40mail.gmail.com > Pending discussion of alternate presentation of transaction sequence; if not favorable, I can just go with a simple factual fix of the mistake in v0001 (see this thread). > > v0001-on-conflict-excluded-is-name-not-table (-docs, figured out while > trying to improve the docs to reduce user confusion in this area) > > https://www.postgresql.org/message-id/flat/CAKFQuwYN20c0%2B7kKvm3PBgibu77BzxDvk9RvoXBb1%3Dj1mDODPw%40mail.gmail.com#ea79c88b55fdccecbd2c4fe549f321c9 > > > v0001-doc-make-row-estimation-example-match-prose (-docs, reply to user > pointing of an inconsistency) > > https://www.postgresql.org/message-id/CAKFQuwax7V5R_rw%3DEOWmy%3DTBON6v3sveBx_WvwsENskCL5CLQQ%40mail.gmail.com > David J.
Re: fix cost subqueryscan wrong parallel cost
Richard Guo writes: > Currently subquery scan is using rel->rows (if no parameterization), > which I believe is not correct. That's not the size the subquery scan > node in each worker needs to handle, as the rows have been divided > across workers by the parallel-aware node. Really? Maybe I misunderstand the case under consideration, but what I think will be happening is that each worker will re-execute the pushed-down subquery in full. Otherwise it can't compute the correct answer. What gets divided across the set of workers is the total *number of executions* of the subquery, which should be independent of the number of workers, so that the cost is (more or less) the same as the non-parallel case. At least that's true for a standard correlated subplan, which is normally run again for each row processed by the parent node. For hashed subplans and initplans, what would have been "execute once" semantics becomes "execute once per worker", creating a strict cost disadvantage for parallelization. I don't know whether the current costing model accounts for that. But if it does that wrong, arbitrarily altering the number of rows won't make it better. regards, tom lane
Re: fix cost subqueryscan wrong parallel cost
On Fri, Apr 29, 2022 at 7:02 AM Tom Lane wrote: > Richard Guo writes: > > Currently subquery scan is using rel->rows (if no parameterization), > > which I believe is not correct. That's not the size the subquery scan > > node in each worker needs to handle, as the rows have been divided > > across workers by the parallel-aware node. > > Really? Maybe I misunderstand the case under consideration, but > what I think will be happening is that each worker will re-execute > the pushed-down subquery in full. Otherwise it can't compute the > correct answer. What gets divided across the set of workers is > the total *number of executions* of the subquery, which should be > independent of the number of workers, so that the cost is (more > or less) the same as the non-parallel case. > > I've been operating under the belief that a subquery node (or, rather, all nodes) in a path get their row count estimates from self-selectivity * rows provided by the next node down in the path. In a partial path, eventually the parallel aware node at the bottom of the path divides its row count estimate by the parallel divisor (2.4 for two workers). That value then returns back up through the path until it hits a gather. Every node in between, which are almost exclusively parallel-safe (not parallel aware), can just use the row count being percolated up the path (i.e., they get the divided row count in their partial pathing and full row counts in the normal pathing). The gather node, realizing that the row count it is dealing with has been divided by the parallel divisor, undoes the division in order to provide the correct row count to the non-parallel executing nodes above it. So a subquery is only ever executed once in a path - but the number of returned rows depends on the number of planned workers done under the assumption that a query executed among 2 workers costs 1/2.4 the amount of the same query done with just the leader. It is an independent factor compared to everything else going on and so the multiplication can simply be done first to the original row count. allpaths.c@2974-2975 (generate_gather_paths) (L: 2993 as well) (L: 3167 - generate_useful_gather_paths) rows = cheapest_partial_path->rows * cheapest_partial_path->parallel_workers; Shouldn't that be: rows = cheapest_partial_path->rows * (get_parallel_divisor(cheapest_partial_path)) ? David J.
Re: fix cost subqueryscan wrong parallel cost
I wrote: > Really? Maybe I misunderstand the case under consideration, but > what I think will be happening is that each worker will re-execute > the pushed-down subquery in full. Oh ... nevermind that: what I was thinking about was the SubLink/SubPlan case, which is unrelated to SubqueryScan. -ENOCAFFEINE, sorry about that. regards, tom lane
Re: fix cost subqueryscan wrong parallel cost
I wrote: > -ENOCAFFEINE, sorry about that. As penance for that blunder, I spent a little time looking into this issue (responding to Robert's upthread complaint that it wasn't explained clearly). See the test case in parallel_subquery.sql, attached, which is adapted from the test in incremental_sort.sql. It produces these plans: explain select * from t union select * from t; Unique (cost=29344.85..30544.85 rows=12 width=12) -> Sort (cost=29344.85..29644.85 rows=12 width=12) Sort Key: t.a, t.b, t.c -> Append (cost=0.00..2950.00 rows=12 width=12) -> Gather (cost=0.00..575.00 rows=6 width=12) Workers Planned: 2 -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) -> Gather (cost=0.00..575.00 rows=6 width=12) Workers Planned: 2 -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) explain select * from t union all select * from t; Gather (cost=0.00..1400.00 rows=12 width=12) Workers Planned: 2 -> Parallel Append (cost=0.00..1400.00 rows=5 width=12) -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) I take no position on whether the second plan's costs are correct; but if they are, then the Gather-atop-Append structure is clearly cheaper than the Append-atop-Gathers structure, so the planner made the wrong choice in the first case. I then modified setrefs.c to not remove SubqueryScan nodes (just make trivial_subqueryscan() return constant false) and got this output from the UNION case: Unique (cost=29344.85..30544.85 rows=12 width=12) -> Sort (cost=29344.85..29644.85 rows=12 width=12) Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c -> Append (cost=0.00..2950.00 rows=12 width=12) -> Subquery Scan on "*SELECT* 1" (cost=0.00..1175.00 rows=6 width=12) -> Gather (cost=0.00..575.00 rows=6 width=12) Workers Planned: 2 -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) -> Subquery Scan on "*SELECT* 2" (cost=0.00..1175.00 rows=6 width=12) -> Gather (cost=0.00..575.00 rows=6 width=12) Workers Planned: 2 -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) The UNION ALL plan doesn't change, because no SubqueryScans are ever created in that case (we flattened the UNION ALL to an appendrel early on). Removing the test case's settings to allow a parallel plan, the non-parallelized plan for the UNION case looks like Unique (cost=30044.85..31244.85 rows=12 width=12) -> Sort (cost=30044.85..30344.85 rows=12 width=12) Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c -> Append (cost=0.00..3650.00 rows=12 width=12) -> Subquery Scan on "*SELECT* 1" (cost=0.00..1525.00 rows=6 width=12) -> Seq Scan on t (cost=0.00..925.00 rows=6 width=12) -> Subquery Scan on "*SELECT* 2" (cost=0.00..1525.00 rows=6 width=12) -> Seq Scan on t t_1 (cost=0.00..925.00 rows=6 width=12) So: the row counts for SubqueryScan are correct, thus the upthread proposal to change them is not. The cost estimates, however, seem pretty bogus. What we're seeing here is that we're charging 600 cost units to pass the data through SubqueryScan, and that's consistent across the parallel and non-parallel cases, so it's correct on its own terms. But actually it's totally wrong because we're going to elide that node later. So I think the actual problem here is that we leave the decision to elide no-op SubqueryScan nodes till setrefs.c. We should detect that earlier, and when it applies, label the SubqueryScanPath with the exact cost of its child. (I think the current implementation might've been all right when it was devised, on the grounds that we had no real planning flexibility for UNION operations anyway. But now we do, and the bogus cost charge is causing visible problems.) regards, tom lane drop table if exists t; create table t (a int, b int, c int); insert into t select mod(i,10),mod(i,10),i from generate_series(1,6) s(i); create index on t (a); analyze t; set min_parallel_table_scan_size = '1kB'; set min_parallel_index_scan_size = '1kB'; set parallel_setup_cost = 0; set parallel_tuple_cost = 0; set max_parallel_workers_per_gather = 2; set enable_hashagg to off; explain select * from t union select * from t; explain select * from t union all select * from t;
Re: Support logical replication of DDLs
Hello, > You should forbid it. Unless you can decompose the command into multiple SQL > commands to make it a safe operation for logical replication. > > Let's say you want to add a column with a volatile default. > > ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random(); > > If you replicate the DDL command as is, you will have different data > downstream. You should forbid it. However, this operation can be supported if > the DDL command is decomposed in multiple steps. > > -- add a new column without DEFAULT to avoid rewrite > ALTER TABLE foo ADD COLUMN bar double precision; > > -- future rows could use the DEFAULT expression > -- it also doesn't rewrite the table > ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random(); > > -- it effectively rewrites the table > -- all rows are built from one source node > -- data will be the same on all nodes > UPDATE foo SET bar = random(); I looked more into this. In order to support statements like "ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();", we have two potential solutions, but both of them are non-trivial to implement: 1. As Euler pointed out, we could decompose the statement on the publisher into multiple statements so that the table rewrite (using volatile function) is handled by a DML sub-command. The decomposition requires changes in parse analysis/transform. We also need new logic to assemble the decomposed DDL commands string back from the parse trees so we can log them for logical replication. 2. Force skipping table rewrite when executing the same command on the subscriber, and let DML replication replicate the table rewrite from the publisher. The problem is table rewrite is not replicated at all today, and it doesn't seem easy to just enable it for logical replication. Table rewrite is an expensive operation involving heap file swap, details can be found in ATRewriteTables(). In light of this, I propose to temporarily block replication of such DDL command on the replication worker until we figure out a better solution. This is implemented in patch 0008-Fail-replication-worker-on-DDL-command-that-rewrites.patch. Notice only DDL statements that rewrite table using a VOLATILE expression will be blocked. I don't see a problem replicating non-volatile expression. Here is the github commit of the same patch: https://github.com/zli236/postgres/commit/1e6115cb99a1286a61cb0a6a088f7476da29d0b9 > The ALTER TABLE ... ALTER COLUMN ... TYPE has a similar issue. This DDL > command > can be decomposed to avoid the rewrite. If you are changing the data type, in > general, you add a new column and updates all rows doing the proper > conversion. > (If you are updating in batches, you usually add a trigger to automatically > adjust the new column value for INSERTs and UPDATEs. Another case is when you > are reducing the the typmod (for example, varchar(100) to varchar(20)). In > this > case, the DDL command can de decomposed removing the typmod information (ALTER > TABLE ... ALTER COLUMN ... TYPE varchar) and replacing it with a CHECK > constraint. I tested ALTER TABLE ... ALTER COLUMN ... TYPE. It seems to be working fine. Is there a particular case you're concerned about? > -- > Euler Taveira > EDB https://www.enterprisedb.com/ > Regards, Zheng Li 0008-Fail-replication-worker-on-DDL-command-that-rewrites.patch Description: Binary data
Re: [RFC] building postgres with meson -v8
Hi, On 2022-04-20 15:09:31 +0200, Peter Eisentraut wrote: > On 13.04.22 12:26, Peter Eisentraut wrote: > > Some feedback and patches for your branch at > > 3274198960c139328fef3c725cee1468bbfff469: > > Here is another patch. It adds support for building ecpg. Cool! I have merged this, with a few changes (split parse.pl change out, changed its invocation in Solution.pm, indentation, explicitly using shared_library() rather than library(), indentation). But there's need for some more - exports.txt handling is needed for windows (and everywhere else, but not as crucially) - hence CI currently being broken on windows. I've done that in a VM, and it indeed fixes the issues. But it needs to be generalized, I just copied and pasted stuff around... Greetings, Andres Freund
Re: [RFC] building postgres with meson -v8
Hi, On 2022-04-27 21:56:27 +0200, Peter Eisentraut wrote: > Here is a patch that adds in NLS. Cool! I know very little about translations, so I was reticent tackling this... > For example, we could move the list of languages from the meson.build files > into separate LINGUAS files, which could be shared with the makefile-based > build system. I need to research this a bit more. Yea, that'd be nice. > The annoying thing is that the i18n module doesn't appear to have a way to > communicate with feature options or dependencies, so there isn't a way to > tell it to only do its things when some option is enabled, or conversely to > check whether the module found the things it needs and to enable or disable > an option based on that. So right now for example if you explicitly disable > the 'nls' option, the binaries are built without NLS but the .mo files are > still built and installed. One partial way to deal with that, I think, would be to change all the subdir('po') invocations to subdir('po', if_found: libintl). If we don't want that for some reason, is there a reason a simple if libintl.found() wouldn't work? > In any case, this works for the main use cases and gets us a step forward, > so it's worth considering. Agreed. Greetings, Andres Freund
Re: fix cost subqueryscan wrong parallel cost
I wrote: > So I think the actual problem here is that we leave the decision > to elide no-op SubqueryScan nodes till setrefs.c. We should detect > that earlier, and when it applies, label the SubqueryScanPath with > the exact cost of its child. Hmm ... actually, while doing that seems like it'd be a good idea, it doesn't have much bearing on the case at hand. I approximated the results of such a change on this test case by just removing the "cpu_tuple_cost" component from cost_subqueryscan: @@ -1420,7 +1420,7 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root, get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost); startup_cost = qpqual_cost.startup; - cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; + cpu_per_tuple = qpqual_cost.per_tuple; run_cost = cpu_per_tuple * baserel->tuples; /* tlist eval costs are paid per output row, not per tuple scanned */ and what I got was Unique (cost=28144.85..29344.85 rows=12 width=12) -> Sort (cost=28144.85..28444.85 rows=12 width=12) Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c -> Append (cost=0.00..1750.00 rows=12 width=12) -> Subquery Scan on "*SELECT* 1" (cost=0.00..575.00 rows=6 width=12) -> Gather (cost=0.00..575.00 rows=6 width=12) Workers Planned: 2 -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) -> Subquery Scan on "*SELECT* 2" (cost=0.00..575.00 rows=6 width=12) -> Gather (cost=0.00..575.00 rows=6 width=12) Workers Planned: 2 -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) The subqueryscans are being costed the way we want now, but it's still going for the append-atop-gathers plan. So I dug a bit deeper, and found that generate_union_paths does also create the gather-atop-append plan, but *it's costed at 1750 units*, exactly the same as the other path. So add_path is just making an arbitrary choice between two paths of identical cost, and it happens to pick this one. (If I don't make the above tweak to cost_subqueryscan, the same thing happens, although the two competing paths now each have cost 2950.) So: why do we come out with a cost of 1750 for the very same plan that in the UNION ALL case is costed at 1400? AFAICS it's because the UNION ALL case thinks that the two inputs of the parallel append produce 25000 rows apiece so the parallel append produces 5 rows, and it costs the append's overhead on that basis. But in the UNION case, the parallel append sees two inputs that are claimed to return 6 rows, so the append produces 12 rows, meaning more append overhead. We can't readily get EXPLAIN to print this tree since it's rejected by add_path, but what I'm seeing (with the above costing hack) is {GATHERPATH :rows 12 :startup_cost 0.00 :total_cost 1750.00 :subpath {APPENDPATH :parallel_aware true :parallel_safe true :parallel_workers 2 :rows 12 :startup_cost 0.00 :total_cost 1750.00 :subpaths ( {SUBQUERYSCANPATH :rows 6 :startup_cost 0.00 :total_cost 575.00 :subpath {PATH :parallel_aware true :parallel_safe true :parallel_workers 2 :rows 25000 :startup_cost 0.00 :total_cost 575.00 } } {SUBQUERYSCANPATH :rows 6 :startup_cost 0.00 :total_cost 575.00 :subpath {PATH :parallel_aware true :parallel_safe true :parallel_workers 2 :rows 25000 :startup_cost 0.00 :total_cost 575.00 } } ) } } In short, these SubqueryScans are being labeled as producing 6 rows when their input only produces 25000 rows, which is surely insane. So: even though the SubqueryScan itself isn't parallel-aware, the number of rows it processes has to be de-rated according to the number of workers involved. Perhaps something like the original patch in this thread is indeed correct. But I can't help feeling that this definition of path_rows is mighty unintuitive and error-prone, and that if cost_subqueryscan is wrong then it's likely got lots of company. Maybe we need to take two steps back and rethink the whole approach. regards, tom lane
Re: allow specifying action when standby encounters incompatible parameter settings
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hello The patch applied and tested fine. I think for this kind of exception, it really is up to the administrator how he/she should proceed to resolve depending on his/her business application. Leaving things configurable by the user is generally a nice and modest change. I also like that you leave the parameters as enum entry so it is possible to extend other possible actions such as automatically adjust to match the new value. - Cary Huang HighGo Software Canada
Re: fix cost subqueryscan wrong parallel cost
On Fri, Apr 29, 2022 at 11:09 AM Tom Lane wrote: > > In short, these SubqueryScans are being labeled as producing 6 rows > when their input only produces 25000 rows, which is surely insane. > > So: even though the SubqueryScan itself isn't parallel-aware, the number > of rows it processes has to be de-rated according to the number of workers > involved. Right, so why does baserel.rows show 60,000 here when path->subpath->rows only shows 25,000? Because if you substitute path->subpath->rows for baserel.rows in cost_subquery you get (with your cost change above): Incremental Sort (cost=27875.50..45577.57 rows=12 width=12) (actual time=165.285..235.749 rows=6 loops=1) Sort Key: "*SELECT* 1".a, "*SELECT* 1".c Presorted Key: "*SELECT* 1".a Full-sort Groups: 10 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB Pre-sorted Groups: 10 Sort Method: quicksort Average Memory: 521kB Peak Memory: 521kB -> Unique (cost=27794.85..28994.85 rows=12 width=12) (actual time=157.882..220.501 rows=6 loops=1) -> Sort (cost=27794.85..28094.85 rows=12 width=12) (actual time=157.881..187.232 rows=12 loops=1) Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c Sort Method: external merge Disk: 2600kB -> Gather (cost=0.00..1400.00 rows=12 width=12) (actual time=0.197..22.705 rows=12 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Append (cost=0.00..1400.00 rows=5 width=12) (actual time=0.015..13.101 rows=4 loops=3) -> Subquery Scan on "*SELECT* 1" (cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..6.864 rows=3 loops=2) -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..3.708 rows=3 loops=2) -> Subquery Scan on "*SELECT* 2" (cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..6.918 rows=3 loops=2) -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..3.769 rows=3 loops=2) Planning Time: 0.137 ms Execution Time: 239.958 ms (19 rows) Which shows your 1400 cost goal from union all, and the expected row counts, for gather-atop-append. The fact that (baserel.rows > path->subpath->rows) here seems like a straight bug: there are no filters involved in this case but in the presence of filters baserel->rows should be strictly (<= path->subpath->rows), right? David J.
Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Hi, On 2020-12-16 18:12:39 +0900, Fujii Masao wrote: > - /* Wait to be signaled by UnpinBuffer() */ > + /* > + * Wait to be signaled by UnpinBuffer(). > + * > + * We assume that only UnpinBuffer() and the timeout requests > established > + * above can wake us up here. WakeupRecovery() called by walreceiver or > + * SIGHUP signal handler, etc cannot do that because it uses the > different > + * latch from that ProcWaitForSignal() waits on. > + */ > ProcWaitForSignal(PG_WAIT_BUFFER_PIN); > > /* Isn't this comment bogus? The latch could e.g. be set by procsignal_sigusr1_handler(), which the startup process uses. Or it could already be set, when entering ResolveRecoveryConflictWithBufferPin(). Why is it even relevant that we only get woken up by UnpinBuffer()? Greetings, Andres Freund
Re: fix cost subqueryscan wrong parallel cost
"David G. Johnston" writes: > The fact that (baserel.rows > path->subpath->rows) here seems like a > straight bug: there are no filters involved in this case but in the > presence of filters baserel->rows should be strictly (<= > path->subpath->rows), right? No, because the subpath's rowcount has been derated to represent the number of rows any one worker is expected to process. Since the SubqueryScan is below the Gather, it has to represent that number of rows too. Like I said, this design is pretty confusing; though I do not have a better alternative to suggest. Anyway, after chewing on this for awhile, it strikes me that the sanest way to proceed is for cost_subqueryscan to compute its rows estimate as the subpath's rows estimate times the selectivity of the subqueryscan's quals (if any). That'd be the natural way to proceed for most sorts of non-bottom-level paths to begin with. I think there are a few reasons why cost_subqueryscan currently does it the way it does: * By analogy to other sorts of relation-scan nodes. But those don't have any subpath that they could consult instead. This analogy is really a bit faulty, since SubqueryScan isn't a relation scan node in any ordinary meaning of that term. * To ensure that all paths for the same relation have the same rowcount estimate (modulo different parameterization or parallelism). But we'll have that anyway because the only possible path type for an unflattened RTE_SUBQUERY rel is a SubqueryScan, so there's no risk of different cost_xxx functions arriving at slightly different estimates. * To avoid redundant computation of the quals' selectivity. This is slightly annoying; but in practice the quals will always be RestrictInfos which will cache the per-qual selectivities, so it's not going to cost us very much to perform that calculation over again. So perhaps we should do it more like the attached, which produces this plan for the UNION case: Unique (cost=28994.85..30194.85 rows=12 width=12) -> Sort (cost=28994.85..29294.85 rows=12 width=12) Sort Key: t.a, t.b, t.c -> Gather (cost=0.00..2600.00 rows=12 width=12) Workers Planned: 2 -> Parallel Append (cost=0.00..2600.00 rows=5 width=12) -> Parallel Seq Scan on t (cost=0.00..575.00 rows=25000 width=12) -> Parallel Seq Scan on t t_1 (cost=0.00..575.00 rows=25000 width=12) It'd still be a good idea to fix the cost estimation to not charge extra for SubqueryScans that will be elided later, because this Parallel Append's cost should be 1400 not 2600. But as I showed upthread, that won't affect the plan choice for this particular test case. regards, tom lane #text/x-diff; name="change-cost_subqueryscan-rowcount-estimate-2.patch" [change-cost_subqueryscan-rowcount-estimate-2.patch] /home/postgres/zzz
Re: fix cost subqueryscan wrong parallel cost
I wrote: > So perhaps we should do it more like the attached, which produces > this plan for the UNION case: sigh ... actually attached this time. regards, tom lane diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index b787c6f81a..18749e842d 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1395,6 +1395,7 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root, { Cost startup_cost; Cost run_cost; + List *qpquals; QualCost qpqual_cost; Cost cpu_per_tuple; @@ -1402,11 +1403,24 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root, Assert(baserel->relid > 0); Assert(baserel->rtekind == RTE_SUBQUERY); - /* Mark the path with the correct row estimate */ + /* + * We compute the rowcount estimate as the subplan's estimate times the + * selectivity of relevant restriction clauses. In simple cases this will + * come out the same as baserel->rows; but when dealing with parallelized + * paths we must do it like this to get the right answer. + */ if (param_info) - path->path.rows = param_info->ppi_rows; + qpquals = list_concat_copy(param_info->ppi_clauses, + baserel->baserestrictinfo); else - path->path.rows = baserel->rows; + qpquals = baserel->baserestrictinfo; + + path->path.rows = clamp_row_est(path->subpath->rows * + clauselist_selectivity(root, + qpquals, + 0, + JOIN_INNER, + NULL)); /* * Cost of path is cost of evaluating the subplan, plus cost of evaluating diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out index 21c429226f..0d8d77140a 100644 --- a/src/test/regress/expected/incremental_sort.out +++ b/src/test/regress/expected/incremental_sort.out @@ -1487,14 +1487,12 @@ explain (costs off) select * from t union select * from t order by 1,3; -> Unique -> Sort Sort Key: t.a, t.b, t.c - -> Append - -> Gather - Workers Planned: 2 + -> Gather + Workers Planned: 2 + -> Parallel Append -> Parallel Seq Scan on t - -> Gather - Workers Planned: 2 -> Parallel Seq Scan on t t_1 -(13 rows) +(11 rows) -- Full sort, not just incremental sort can be pushed below a gather merge path -- by generate_useful_gather_paths.
Re: failures in t/031_recovery_conflict.pl on CI
Hi, Attached are patches for this issue. It adds a test case for deadlock conflicts to make sure that case isn't broken. I also tested the recovery conflict tests in the back branches, and they work there with a reasonably small set of changes. Questions: - I'm planning to backpatch the test as 031_recovery_conflict.pl, even though preceding numbers are unused. It seems way more problematic to use a different number in the backbranches than have gaps? - The test uses pump_until() and wait_for_log(), which don't exist in the backbranches. For now I've just inlined the implementation, but I guess we could also backpatch their introduction? - There's a few incompatibilities in the test with older branches: - older branches don't have allow_in_place_tablespaces - I think just skipping tablespace conflicts is fine, they're comparatively simple. Eventually it might make sense to backpatch allow_in_place_tablespaces, our test coverage in the area is quite poor. - the stats tests can't easily made reliably in the backbranches - which is ok, as the conflict itself is verified via the log - some branches don't have log_recovery_conflict_waits, since it's not critical to the test, it's ok to just not include it there I played with the idea of handling the differences using version comparisons in the code, and have the test be identically across branches. Since it's something we don't do so far, I'm leaning against it, but ... > - For HEAD we have to replace the disable_all_timeouts() calls, it breaks the > replay progress reporting. Is there a reason to keep them in the > backbranches? Hard to see how an extension or something could rely on it, > but ...? I've left it as is for now, will start a separate thread. > - There's the following comment in ResolveRecoveryConflictWithBufferPin(): > > "We assume that only UnpinBuffer() and the timeout requests established >above can wake us up here." > > That bogus afaict? There's plenty other things that can cause MyProc->latch > to be set. Is it worth doing something about this at the same time? Right > now we seem to call ResolveRecoveryConflictWithBufferPin() in rapid > succession initially. The comment is more recent than I had realized. I raised this separately in https://postgr.es/m/20220429191815.xewxjlpmq7mxhsr2%40alap3.anarazel.de pgindent uses some crazy formatting nearby: SendRecoveryConflictWithBufferPin( PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); I'm tempted to clean that up in passing by having just one SendRecoveryConflictWithBufferPin() call instead of two, storing the type of conflict in a local variable? Doesn't look entirely pretty either, but ... I'm very doubtful of this claim above ResolveRecoveryConflictWithBufferPin(), btw. But that'd be a non-backpatchable cleanup, I think: * The ProcWaitForSignal() sleep normally done in LockBufferForCleanup() * (when not InHotStandby) is performed here, for code clarity. Greetings, Andres Freund >From 08b71cceefabe48ca80b1b15752031b27d05229d Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 29 Apr 2022 12:50:10 -0700 Subject: [PATCH vHEAD 1/2] Fix possibility of self-deadlock in ResolveRecoveryConflictWithBufferPin(). Author: Reviewed-By: Discussion: https://postgr.es/m/20220413002626.udl7lll7f3o7n...@alap3.anarazel.de Backpatch: --- src/backend/storage/ipc/standby.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 2850867323b..8c5e8432e73 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -46,6 +46,7 @@ static HTAB *RecoveryLockLists; /* Flags set by timeout handlers */ static volatile sig_atomic_t got_standby_deadlock_timeout = false; +static volatile sig_atomic_t got_standby_delay_timeout = false; static volatile sig_atomic_t got_standby_lock_timeout = false; static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist, @@ -793,7 +794,8 @@ ResolveRecoveryConflictWithBufferPin(void) } /* - * Wait to be signaled by UnpinBuffer(). + * Wait to be signaled by UnpinBuffer() or for the wait to be interrupted + * by one of the timeouts established above. * * We assume that only UnpinBuffer() and the timeout requests established * above can wake us up here. WakeupRecovery() called by walreceiver or @@ -802,7 +804,9 @@ ResolveRecoveryConflictWithBufferPin(void) */ ProcWaitForSignal(PG_WAIT_BUFFER_PIN); - if (got_standby_deadlock_timeout) + if (got_standby_delay_timeout) + SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); + else if (got_standby_deadlock_timeout) { /* * Send out a request for hot-standby backends to check themselves for @@ -828,6 +832,7 @@ ResolveRecoveryConflictWithBufferPin(void) * individually, but that'd be slower
Re: [RFC] building postgres with meson -v8
Hi, On 2022-04-29 11:00:43 -0700, Andres Freund wrote: > On 2022-04-27 21:56:27 +0200, Peter Eisentraut wrote: > > Here is a patch that adds in NLS. > > Cool! I know very little about translations, so I was reticent tackling > this... > > The annoying thing is that the i18n module doesn't appear to have a way to > > communicate with feature options or dependencies, so there isn't a way to > > tell it to only do its things when some option is enabled, or conversely to > > check whether the module found the things it needs and to enable or disable > > an option based on that. So right now for example if you explicitly disable > > the 'nls' option, the binaries are built without NLS but the .mo files are > > still built and installed. > > One partial way to deal with that, I think, would be to change all the > subdir('po') invocations to subdir('po', if_found: libintl). If we don't want > that for some reason, is there a reason a simple if libintl.found() wouldn't > work? Merged into my tree now, using if_found. I've also made the intl check work with older meson versions, since I didn't include your version requirement upgrades. For now I "fixed" the ecpg issue on windows by just not building ecpg there. Ecpg also needs tests ported... Greetings, Andres Freund
Re: fix cost subqueryscan wrong parallel cost
On Fri, Apr 29, 2022 at 12:31 PM Tom Lane wrote: > "David G. Johnston" writes: > > The fact that (baserel.rows > path->subpath->rows) here seems like a > > straight bug: there are no filters involved in this case but in the > > presence of filters baserel->rows should be strictly (<= > > path->subpath->rows), right? > > No, because the subpath's rowcount has been derated to represent the > number of rows any one worker is expected to process. Since the > SubqueryScan is below the Gather, it has to represent that number > of rows too. Like I said, this design is pretty confusing; though > I do not have a better alternative to suggest. > > Anyway, after chewing on this for awhile, it strikes me that the > sanest way to proceed is for cost_subqueryscan to compute its rows > estimate as the subpath's rows estimate times the selectivity of > the subqueryscan's quals (if any). This is what Robert was getting at, and I followed-up on. The question I ended up at is why doesn't baserel->rows already produce the value you now propose to calculate directly within cost_subquery set_baserel_size_estimates (multiplies rel->tuples - which is derated - by selectivity, sets rel->rows) set_subquery_size_estimates rel->subroot = subquery_planner(...) // my expectation is that sub_final_rel->cheapest_total_path->rows is the derated number of rows; // the fact you can reach the derated amount later by using path->subpath->rows seems to affirm this. sets rel->tuples from sub_final_rel->cheapest_total_path->rows) set_subquery_pathlist (executes the sizing call stack above, then proceeds to create_subqueryscan_path which in turn calls cost_subquery) set_rel_size ... > * By analogy to other sorts of relation-scan nodes. But those don't > have any subpath that they could consult instead. This analogy is > really a bit faulty, since SubqueryScan isn't a relation scan node > in any ordinary meaning of that term. > I did observe this copy-and-paste dynamic; I take it this is why we cannot or would not just change all of the baserel->rows usages to path->subpath->rows. > > So perhaps we should do it more like the attached, which produces > this plan for the UNION case: > > The fact this changes row counts in a costing function is bothersome - it would be nice to go the other direction and remove the if block. More generally, path->path.rows should be read-only by the time we get to costing. But I'm not out to start a revolution here either. But it feels like we are just papering over a bug in how baserel->rows is computed; per my analysis above. In short, the solution seems like it should, and in fact does here, fix the observed problem. I'm fine with that. David J.
Re: bogus: logical replication rows/cols combinations
On 4/29/22 06:48, Amit Kapila wrote: > On Thu, Apr 28, 2022 at 11:00 PM Tomas Vondra > wrote: >> >> On 4/28/22 14:26, Peter Eisentraut wrote: >>> On 27.04.22 12:33, Amit Kapila wrote: >>> >>> I wonder how we handle the combination of >>> >>> pub1: publish=insert WHERE (foo) >>> pub2: publish=update WHERE (bar) >>> >>> I think it would be incorrect if the combination is >>> >>> pub1, pub2: publish=insert,update WHERE (foo OR bar). >> >> That's a good question, actually. No, we don't combine the publications >> like this, the row filters are kept "per action". >> > > Right, and it won't work even if try to combine in this case because > of replica identity restrictions. > >> But the exact behavior >> turns out to be rather confusing in this case. >> >> (Note: This has nothing to do with column lists.) >> >> Consider an example similar to what Alvaro posted earlier: >> >> create table uno (a int primary key, b int, c int); >> >> create publication uno for table uno where (a > 0) >> with (publish='insert'); >> >> create publication dos for table uno where (a < 0) >> with (publish='update'); >> >> And do this: >> >> insert into uno values (1, 2, 3), (-1, 3, 4) >> >> which on the subscriber produces just one row, because (a<0) replicates >> only updates: >> >>a | b | c >> ---+---+--- >>1 | 2 | 3 >> (1 row) >> >> Now, let's update the (a<0) row. >> >> update uno set a = 2 where a = -1; >> >> It might seem reasonable to expect the updated row (2,3,4) to appear on >> the subscriber, but no - that's not what happens. Because we have (a<0) >> for UPDATE, and we evaluate this on the old row (matches) and new row >> (does not match). And pgoutput_row_filter() decides the update needs to >> be converted to DELETE, despite the old row was not replicated at all. >> > > Right, but we don't know what previously would have happened maybe the > user would have altered the publication action after the initial row > is published in which case this DELETE is required as is shown in the > example below. We can only make the decision based on the current > tuple. For example: > > create table uno (a int primary key, b int, c int); > > create publication uno for table uno where (a > 0) > with (publish='insert'); > > create publication dos for table uno where (a < 0) > with (publish='insert'); > > -- create subscription for both these publications. > > insert into uno values (1, 2, 3), (-1, 3, 4); > > Alter publication dos set (publish='update'); > > update uno set a = 2 where a = -1; > > Now, in this case, the old row was replicated and we would need a > DELETE corresponding to it. > I think such issues due to ALTER of the publication are somewhat expected, and I think users will understand they might need to resync the subscription or something like that. A similar example might be just changing the where condition, create publication p for table t where (a > 10); and then alter publication p set table t where (a > 15); If we replicated any rows with (a > 10) and (a <= 15), we'll just stop replicating them. But if we re-create the subscription, we end up with a different set of rows on the subscriber, omitting rows with (a <= 15). In principle we'd need to replicate the ALTER somehow, to delete or insert the rows that start/stop matching the row filter. It's a bit similar to not replicating DDL, perhaps. But I think the issue I've described is different, because you don't have to change the subscriptions at all and you'll still have the problem. I mean, imagine doing this: -- publisher create table t (a int primary key, b int); create publication p for table t where (a > 10) with (publish='update'); -- subscriber create table t (a int primary key, b int); create subscription s connection '...' publication p; -- publisher insert into t select i, i from generate_series(1,20) s(i); update t set b = b * 10; -- subscriber --> has no rows in "t" --> recreate the subscription drop subscription s; create subscription s connection '...' publication p; --> now it has all the rows with (a>10), because tablesync ignores publication actions The reason why I find this really annoying is that it makes it almost impossible to setup two logical replicas that'd be "consistent", unless you create them at the same time (= without any writes in between). And it's damn difficult to think about the inconsistencies. IMHO this all stems from allowing row filters and restricting pubactions at the same time (notice this only used a single publication). So maybe the best option would be to disallow combining these two features? That would ensure the row filter filter is always applied to all actions in a consistent manner, preventing all these issues. Maybe that's not possible - maybe there are valid use cases that would need such combination, and you mentioned replica identity might be an issue (and maybe requiring RIF with row filters is not desirable). So maybe we should at leas
Re: failures in t/031_recovery_conflict.pl on CI
Andres Freund writes: > Questions: > - I'm planning to backpatch the test as 031_recovery_conflict.pl, even though > preceding numbers are unused. It seems way more problematic to use a > different number in the backbranches than have gaps? +1 > - The test uses pump_until() and wait_for_log(), which don't exist in the > backbranches. For now I've just inlined the implementation, but I guess we > could also backpatch their introduction? I'd backpatch --- seems unlikely this will be the last need for 'em. > pgindent uses some crazy formatting nearby: > SendRecoveryConflictWithBufferPin( > > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); I do not believe that that line break is pgindent's fault. If you just fold it into one line it should stay that way. regards, tom lane
Re: bogus: logical replication rows/cols combinations
On Sat, Apr 30, 2022 at 2:02 AM Tomas Vondra wrote: > > On 4/29/22 06:48, Amit Kapila wrote: > > On Thu, Apr 28, 2022 at 11:00 PM Tomas Vondra > > I think such issues due to ALTER of the publication are somewhat > expected, and I think users will understand they might need to resync > the subscription or something like that. > > A similar example might be just changing the where condition, > > create publication p for table t where (a > 10); > > and then > > alter publication p set table t where (a > 15); > > If we replicated any rows with (a > 10) and (a <= 15), we'll just stop > replicating them. But if we re-create the subscription, we end up with a > different set of rows on the subscriber, omitting rows with (a <= 15). > > In principle we'd need to replicate the ALTER somehow, to delete or > insert the rows that start/stop matching the row filter. It's a bit > similar to not replicating DDL, perhaps. > > But I think the issue I've described is different, because you don't > have to change the subscriptions at all and you'll still have the > problem. I mean, imagine doing this: > > -- publisher > create table t (a int primary key, b int); > create publication p for table t where (a > 10) with (publish='update'); > > -- subscriber > create table t (a int primary key, b int); > create subscription s connection '...' publication p; > > -- publisher > insert into t select i, i from generate_series(1,20) s(i); > update t set b = b * 10; > > -- subscriber > --> has no rows in "t" > --> recreate the subscription > drop subscription s; > create subscription s connection '...' publication p; > > --> now it has all the rows with (a>10), because tablesync ignores > publication actions > > > The reason why I find this really annoying is that it makes it almost > impossible to setup two logical replicas that'd be "consistent", unless > you create them at the same time (= without any writes in between). And > it's damn difficult to think about the inconsistencies. > I understood your case related to the initial sync and it is with or without rowfilter. > > IMHO this all stems from allowing row filters and restricting pubactions > at the same time (notice this only used a single publication). So maybe > the best option would be to disallow combining these two features? That > would ensure the row filter filter is always applied to all actions in a > consistent manner, preventing all these issues. > > Maybe that's not possible - maybe there are valid use cases that would > need such combination, and you mentioned replica identity might be an > issue > Yes, that is the reason we can't combine the row filters for all pubactions. > (and maybe requiring RIF with row filters is not desirable). > > So maybe we should at least warn against this in the documentation? > Yeah, I find this as the most suitable thing to do to address your concern. I would like to add this information to the 'Initial Snapshot' page with some examples (both with and without a row filter). > > > > True, I think to some extent we rely on users to define it sanely > > otherwise currently also it can easily lead to even replication being > > stuck. This can happen when the user is trying to operate on the same > > table and define publication/subscription on multiple nodes for it. > > See [1] where we trying to deal with such a problem. > > > > [1] - https://commitfest.postgresql.org/38/3610/ > > > > That seems to deal with a circular replication, i.e. two logical > replication links - a bit like a multi-master. Not sure how is that > related to the issue we're discussing here? > It is not directly related to what we are discussing here but I was trying to emphasize the point that users need to define the logical replication via pub/sub sanely otherwise they might see some weird behaviors like that. -- With Regards, Amit Kapila.
Progress report removal of temp files and temp relation files using ereport_startup_progress
Hi, At times, there can be many temp files (under pgsql_tmp) and temp relation files (under removal which after crash may take longer during which users have no clue about what's going on in the server before it comes up online. Here's a proposal to use ereport_startup_progress to report the progress of the file removal. Thoughts? Regards, Bharath Rupireddy. v1-0001-Progress-report-removal-of-temp-files-and-temp-re.patch Description: Binary data