Re: Function for listing pg_wal/summaries directory
On Mon, Oct 07, 2024 at 10:07:10AM +0900, Michael Paquier wrote: > On Fri, Oct 04, 2024 at 10:02:11AM -0500, Nathan Bossart wrote: >> Could you explain why you feel the existing support functions are >> insufficient? > > Because it is not possible to outsource the scan of pg_wal/summaries/ > to a different role, no? I was under the impression that you could do this with pg_available_wal_summaries() [0]. [0] https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-WAL-SUMMARY -- nathan
Re: Changing the state of data checksums in a running cluster
Hi, I did a quick review today. First a couple minor comments: 1) monitoring.sgml typos: number of database -> number of databases calcuated -> calculated 2) unnecessary newline in heapam.c (no other changes) 3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345, shadowing earlier variable 4) unnecessary comment change in bufmgr.c (no other changes) 5) unnecessary include and newline in bulk_write.c (no other changes) 6) unnecessary newline in pgstat.c (no other changes) 7) controldata.c - maybe this if (oldctrl->data_checksum_version == 2) should use PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION instead of the magic constant? For "off" we use "0" which seems somewhat acceptable, but for other values it's less obvious what the meaning is. 8) xlog_internal.h - xl_checksum_state should be added to typedefs 9) system_functions.sql - Isn't it weird that this only creates the new pg_enable_data_checksums function, but not pg_disable_data_checksums? It also means it doesn't revoke EXECUTE from public on it, which I guess it probably should? Or why should this be different for the two functions? Also the error message seems to differ: test=> select pg_enable_data_checksums(); ERROR: permission denied for function pg_enable_data_checksums test=> select pg_disable_data_checksums(); ERROR: must be superuser Probably harmless, but seems a bit strange. But there also seems to be a more serious problem with recovery. I did a simple script that does a loop of * start a cluster * initialize a small pgbench database (scale 1 - 100) * run "long" pgbench * call pg_enable_data_checksums(), wait for it to complete * stop the cluster with "-m immediate" * start the cluster And this unfortunately hits this assert: bool AbsorbChecksumsOnBarrier(void) { Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION); LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION; return true; } Based on our short discussion about this, the controlfile gets updated right after pg_enable_data_checksums() completes. The immediate stop however forces a recovery since the last checkpoint, which means we see the XLOG_CHECKSUMS WAL message again, and set the barrier. And then we exit recovery, try to start checkpointer and it trips over this, because the control file already has the "on" value :-( I'm not sure what's the best way to fix this. Would it be possible to remember we saw the XLOG_CHECKSUMS during recovery, and make the assert noop in that case? Or not set the barrier when exiting recovery. I'm not sure the relaxed assert would remain meaningful, though. What would it check on standbys, for example? Maybe a better way would be to wait for a checkpoint before updating the controlfile, similar to what we do at the beginning? Possibly even with the same "fast=true/false" logic. That would prevent us from seeing the XLOG_CHECKSUMS wal record with the updated flag. It would extend the "window" where a crash would mean we have to redo the checksums, but I don't think that matters much. For small databases who cares, and for large databases it should not be a meaningful difference (setting the checksums already ran over multiple checkpoints, so one checkpoint is not a big difference). regards -- Tomas Vondra
Re: Refactoring postmaster's code to cleanup after child exit
Hi, On 2024-10-05 20:51:50 +0100, Dagfinn Ilmari Mannsåker wrote: > Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un() > on Windows, so that can be fixed by bumping the Perl version in > https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl > to something more modern (such as 5.40.0.1), and only skipping the test > if on Windows if Socket is too old. > > The decision to use 5.26 seems to come from the initial creation of the > CI images in 2021 (when 5.34 was current), with the comment «newer > versions don't currently work correctly for plperl». That claim is > worth revisiting, and fixing if it's still the case. I think we fixed the issues that were known at the time. I think I tried upgrading to something newer at some point and there were some weird, but fixable, encoding issues. Unfortunately I don't have the bandwidth to tackle this rn. Greetings, Andres Freund
Re: Add minimal C example and SQL registration example for custom table access methods.
Glad to hear it. Thank you! On Mon, Oct 7, 2024 at 2:50 AM Michael Paquier wrote: > > On Fri, May 24, 2024 at 03:59:08PM -0300, Fabrízio de Royes Mello wrote: > > Nice... LGTM! > > I have noticed that this was still in the CF. After fixing a couple > of inconsistencies in the markups and the names, trimming down the > list of headers to avoid rot and adding a static in from of the const, > the result looked pretty much OK, so applied. > -- > Michael
Re: POC, WIP: OR-clause support for indexes
assume v40 is the latest version. in group_similar_or_args we can add a bool variable so boolmatched = false; foreach(lc, orargs) { if (match_index_to_operand(nonConstExpr, colnum, index)) { matches[i].indexnum = indexnum; matches[i].colnum = colnum; matches[i].opno = opno; matches[i].inputcollid = clause->inputcollid; matched = true; break; } } ... if (!matched) return orargs; /* Sort clauses to make similar clauses go together */ qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp); I guess it can save some cycles?
Re: Add trim_trailing_whitespace to editorconfig file
On Thu, 5 Sept 2024 at 23:28, Jelte Fennema-Nio wrote: > Okay, I've done this now. Is this blocked on anything? I feel it's ready to merge at this point. I'd really like to not have this problem with trailing whitespace in sgml files anymore.
Re: First draft of PG 17 release notes
On Mon, Oct 7, 2024 at 07:25:11PM -0400, Bruce Momjian wrote: > > Yes. This change on CREATE INDEX was introduced by 2af07e2f7 together with > > other commands, but it was missed to be mentioned in the commit message > > although the description was added to the documentation. > > > > The change on CEATE MATERIALIZED VIEW was introduced by a separate commit > > b4da732fd, since which the REFRESH logic is used when creating a matview. > > Should we add here a link to that commit, too? > > I developed the attached patch which adds the two commands and the > commit item. Okay, updated commit after running src/tools/add_commit_links.pl. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml index ad814737745..0ea9d96a47f 100644 --- a/doc/src/sgml/release-17.sgml +++ b/doc/src/sgml/release-17.sgml @@ -125,6 +125,8 @@ @@ -132,11 +134,14 @@ Author: Jeff Davis Change functions to use a safe during maintenance operations (Jeff Davis) § + § This prevents maintenance operations (ANALYZE, - CLUSTER, REFRESH + CLUSTER, CREATE + INDEX, CREATE + MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, REINDEX, or VACUUM) from performing unsafe access. Functions used by expression indexes and materialized views that
Re: Doc: typo in config.sgml
> On Tue, 1 Oct 2024 22:20:55 +0900 > Yugo Nagata wrote: > >> On Tue, 1 Oct 2024 15:16:52 +0900 >> Yugo NAGATA wrote: >> >> > On Tue, 01 Oct 2024 10:33:50 +0900 (JST) >> > Tatsuo Ishii wrote: >> > >> > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in >> > > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code >> > > >> point in Unicode. i.e. U+00A0). >> > > >> So grep -P "[\xC2\xA0]" should work to detect nbsp. >> > > > >> > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. >> > > > ([ and ] were not necessary.) >> > > > >> > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters >> > > > in charset.sgml, >> > > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for >> > > > making sure detecting >> > > > nbsp. >> > > > >> > > > One problem is that -P option can be used in only GNU grep, and grep >> > > > in mac doesn't support it. >> > > > >> > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can >> > > > assume the shell is bash. >> > > > >> > > > Maybe, better way is use perl itself rather than grep as following. >> > > > >> > > > `perl -ne '/\xC2\xA0/ and print' ` >> > > > >> > > > I attached a patch fixed in this way. >> > > >> > > GNU sed can also be used without setting LC_ALL: >> > > >> > > sed -n /"\xC2\xA0"/p >> > > >> > > However I am not sure if non-GNU sed can do this too... >> > >> > Although I've not check it myself, BSD sed doesn't support \x escape >> > according to [1]. >> > >> > [1] >> > https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference >> > >> > By the way, I've attached a patch a bit modified to use the plural form >> > statement >> > as same as check-tabs. >> > >> > Non-breaking **spaces** appear in SGML/XML files >> >> The previous patch was broken because the perl command failed to return the >> correct result. >> I've attached an updated patch to fix the return value. In passing, I added >> line breaks >> for long lines. > > I've attached a updated patch. > I added the comment to explain why Perl is used instead of grep or sed. Looks good to me. If there's no objection, I will commit this to master branch. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: First draft of PG 17 release notes
On Tue, Oct 1, 2024 at 04:36:09PM +0200, Laurenz Albe wrote: > I think that the removal of the "adminpack" extension should > be listed in the section "migration to v17" as an incompatibility. > > I have seen one complaint that pg_upgrade fails if the extension > is installed, but a dump/restore would also throw an error. Agreed. moved. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Collation & ctype method table, and extension hooks
On Fri, 2024-10-04 at 15:24 +0200, Andreas Karlsson wrote: > Great! I had been planning to do this myself so great to see that you > already did it before me. Will take a look at this work later. Great! We'll need to test whether there are any regressions in the regex & pattern matching code due to the indirection. What would be a good test for that? Just running it over long strings? Regards, Jeff Davis
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Hi, It’s been a long gap in the activity of this thread, and I apologize for the delay. However, I have now returned and reviewed the other threads [1],[2] that have made changes in this area. I would like to share a summary of the discussion that took place among Robert, Andres, Bharath, and Tom on this thread, to make it easier to move forward. Robert was dissatisfied with the approach used in the patch to report progress for the checkpointer process, as he felt the current mechanism is not suitable. He proposed allocating a dedicated chunk of shared memory in CheckpointerShmemStruct. Bharath opposed this, suggesting instead to use PgStat_CheckpointerStats. Andres somewhat supported Robert’s idea but noted that using PgStat_CheckpointerStats would allow for more code reuse. The discussion then shifted towards the statistics handling for the checkpointer process. Robert expressed dissatisfaction with the current statistics handling mechanism. Andres explained the rationale behind the existing setup and the improvements made in pg_stat_io. He also mentioned the overhead of the changecount mechanism when updating for every buffer write. However, for updates involving a single parameter at a time, performance can be improved on platforms that support atomic 64-bit writes (indicated by #ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY). He also shared performance metrics demonstrating good results with this approach. Tom agreed to use this and restrict it to the specific case. But I am not quite clear on the direction ahead. Let me summarise the approaches based on the above discussion. Approach-1: The approach used in the current patch which uses the existing mechanism of progress reporting. The advantage of this approach is that the machinery is already in place and ready to use. However, it is not suitable for the checkpointer process because only the checkpointer process runs the checkpoint, even if the command is issued from a different backend. The current mechanism is designed for any backend to report progress for any command it is running, and we don’t know which command that will be at any given point in time, or how many backends will be running any given command simultaneously. Hence, this approach does not fit the checkpointer. Additionally, there is complexity involved in mapping down to integers and back. Approach-2: Allocate a dedicated chunk of shared memory in CheckpointerShmemStruct with an appropriate name and size. This approach eliminates the complexity involved in Approach-1 related to mapping down to integers and back. However, it requires building the necessary machinery to suit checkpointer progress reporting which might be similar to the current progress reporting mechanism. Approach-3: Using PgStat_CheckpointerStats to store the progress information. Have we completely ruled out this approach? Additionally all three approaches require improvements in the changecount mechanism on platforms that support atomic 64-bit writes. I’m inclined to favor Approach-2 because it provides a clearer method for reporting progress for the checkpointer process, with the additional work required to implement the necessary machinery. However, I’m still uncertain about the best path forward. Please share your thoughts. [1]: https://www.postgresql.org/message-id/flat/CAOtHd0ApHna7_p6mvHoO%2BgLZdxjaQPRemg3_o0a4ytCPijLytQ%40mail.gmail.com#74ae447064932198495aa6d722fdc092 [2]: https://www.postgresql.org/message-id/CALj2ACVxX2ii=66rypxrweze2esbripmj0ahfrfhuexjcc7...@mail.gmail.com Best Regards, Nitin Jadhav Azure Database for PostgreSQL Microsoft On Tue, Jan 31, 2023 at 11:16 PM vignesh C wrote: > > On Thu, 8 Dec 2022 at 00:33, Andres Freund wrote: > > > > Hi, > > > > On 2022-11-04 09:25:52 +0100, Drouvot, Bertrand wrote: > > > Please find attached a rebase in v7. > > > > cfbot complains that the docs don't build: > > https://cirrus-ci.com/task/6694349031866368?logs=docs_build#L296 > > > > [03:24:27.317] ref/checkpoint.sgml:66: element para: validity error : > > Element para is not declared in para list of possible children > > > > I've marked the patch as waitin-on-author for now. > > > > > > There's been a bunch of architectural feedback too, but tbh, I don't know if > > we came to any conclusion on that front... > > There has been no updates on this thread for some time, so this has > been switched as Returned with Feedback. Feel free to open it in the > next commitfest if you plan to continue on this. > > Regards, > Vignesh
Commit fest 2024-09
Hi all, The commit fest 2024-09 should have ended one week ago, so I have been taking one step ahead and I have begun cleaning up things. As of now, there is still a total of 113 patches marked as "Needs Review", and I hope that my non-auto-vacuuming will be over tomorrow. If you are an author of patches still listed, I would recommend to close or move your entries by yourself to limit errors, because, well, I am going to mishandle some of them for sure. I am done processing the patches waiting on author. Some of them have been waiting on input from author for a rather long time up to six months, so I have marked them as returned with feedback where it was clear that a patch has been given up. This is where a "Not Interested" or a "Lack of Interest" field would have been handy. If you feel that one or more of your patches have been handled in an incorrect way, feel free to reply back to this thread, to update your patch entry to a better state, or to reply back to me using this thread. Thanks, -- Michael signature.asc Description: PGP signature
Re: Add parallel columns for pg_stat_statements
Le lun. 7 oct. 2024 à 02:18, Michael Paquier a écrit : > On Sun, Oct 06, 2024 at 03:32:02PM +0200, Guillaume Lelarge wrote: > > I'm not sure I follow. That would mean that every time a query is > executed, > > it always gets the same amount of workers. Which is not guaranteed to be > > true. > > > > I would agree, though, that parallelized_queries_launched is probably not > > that interesting. I could get rid of it if you think it should go away. > > My point is that these stats are useful to know which action may have > to be taken when reaching a mean, and numbers in pg_stat_statements > offer hints that something is going wrong and that a closer lookup at > an EXPLAIN plan may be required, particularly if the total number of > workers planned and launched aggregated in the counters is unbalanced > across queries. If the planned/launched ratio is balanced across most > queries queries, a GUC adjustment may be OK. If the ratio is very > unbalanced in a lower set of queries, I'd also look at tweaking GUCs > instead like the per_gather. These counters give information that one > or the other may be required. > > > Well, I don't see this as an overlap. Rather more information. > > Later versions of Benoit's patch have been accumulating this data in > the executor state. v4 posted at [1] has the following diffs: > --- a/src/include/nodes/execnodes.h > +++ b/src/include/nodes/execnodes.h > @@ -724,6 +724,9 @@ typedef struct EState > */ > List *es_insert_pending_result_relations; > List *es_insert_pending_modifytables; > + > + int es_workers_launched; > + int es_workers_planned; > } EState; > > Your v2 posted on this thread has that: > @@ -707,6 +707,12 @@ typedef struct EState > struct EPQState *es_epq_active; > > booles_use_parallel_mode; /* can we use parallel > workers? */ > + booles_used_parallel_mode; /* was executed in > parallel */ > + int es_parallelized_workers_launched; > + int es_parallelized_workers_planned; > + int es_parallelized_nodes; /* # of > parallelized nodes */ > + int es_parallelized_nodes_all_workers; /* # of > nodes with all workers launched */ > + int es_parallelized_nodes_no_worker; /* # of > nodes with no workers launched */ > > es_parallelized_workers_launched and es_workers_launched are the same > thing in both. > > My bad. I agree this is the way to go. See patch v3-0001 attached. > > On this, I would agree with you. They are not that particularly useful to > > get better setting for parallel GUCs. I can drop them if you want. > > Yep. I would remove them for now. This leads to more bloat. > > Done. See patch v3-0002 attached. > > Did this on the v2 version of the patch (attached here). > > > > Thanks for your review. If you want the parallelized_queries_launched > > column and the parallelized_nodes_* columns dropped, I can do that on a > v3 > > patch. > > I'd recommend to split that into more independent patches: > - Introduce the two counters in EState with the incrementations done > in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may > want to coordinate with Benoit to avoid duplicating the work). > - Expand pg_stat_statements to use them for DMLs, SELECTs, well where > they matter. > - Look at expanding that for utilities that can do parallel jobs: > CREATE INDEX and VACUUM, but this has lower priority to me, and this > can reuse the same counters as the ones added by patch 2. > > The first two are done. The last one is beyond my scope. I'm now working on Benoit's patch to make it work with my v3-0001 patch. I'll send the resulting patch on his thread. > [1]: > https://www.postgresql.org/message-id/6ecad3ad-835c-486c-9ebd-da87a9a97...@dalibo.com > [2]: https://www.postgresql.org/message-id/zv46wtmjltuu2...@paquier.xyz > -- > Michael > Regards. -- Guillaume. From 95b300eeff0168f2618418102df660f1ba9b9113 Mon Sep 17 00:00:00 2001 From: Guillaume Lelarge Date: Mon, 7 Oct 2024 08:45:36 +0200 Subject: [PATCH v3 1/2] Introduce two new counters in EState They will be used by two other patchs to populate new columns in pg_stat_database and pg_statements. --- src/backend/executor/execUtils.c | 3 +++ src/backend/executor/nodeGather.c | 3 +++ src/backend/executor/nodeGatherMerge.c | 3 +++ src/include/nodes/execnodes.h | 3 +++ 4 files changed, 12 insertions(+) diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index 5737f9f4eb..1908481999 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -162,6 +162,9 @@ CreateExecutorState(void) estate->es_jit_flags = 0; estate->es_jit = NULL; + estate->es_parallelized_workers_launched = 0; + estate->es_parallelized_workers_planned = 0; + /* * Retur
Re: Unlinking Parallel Hash Join inner batch files sooner
On Tue, May 14, 2024 at 02:56:37PM -0400, Robert Haas wrote: > It doesn't appear to me that this got committed. On the procedural > question, I would personally treat it as a non-back-patchable bug fix > i.e. master-only but without regard to feature freeze. However, I can > see arguments for either treating it as a back-patchable fix or for > waiting until v18 development opens. What would you like to do? I am not seeing anything committed, either. Thomas, an update? -- Michael signature.asc Description: PGP signature
Re: On disable_cost
On 03.10.2024 23:10, Laurenz Albe wrote: On Thu, 2024-10-03 at 14:24 -0400, Robert Haas wrote: On Thu, Oct 3, 2024 at 1:35 PM Alena Rybakina wrote: I prepared a patch that includes the information we can add. One general thing to think about is that we really document very little about EXPLAIN. That might not be good, but we should consider whether it will look strange if we document a bunch of stuff about this and still don't talk about anything else. (This is not a comment on this specific patch, which I have not examined. It's just a general thought.) I think we should still add it because it might cause a lot of misunderstanding. The "EXPLAIN Basics" already mention "enable_seqscan", so I think it is alright to expand on that a bit. Here is my take on a documentation patch (assuming David's "Disabled: true" wording): diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index ff689b65245..db906841472 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -578,6 +578,28 @@ WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; discussed below. + +Some plan node types cannot be completely disabled. For example, there is +no other access method than a sequential scan for a table with no index. +If you told the planner to disregard a certain node type, but it is forced +to use it nonetheless, you will see the plan node marked as +Disabled in the output of EXPLAIN: + + +CREATE TABLE dummy (t text); + +SET enable_seqscan = off; + +EXPLAIN SELECT * FROM dummy; + +QUERY PLAN +-- + Seq Scan on dummy (cost=0.00..23.60 rows=1360 width=32) + Disabled: true + + + + subplan Sorry for the late reply, I needed time to look into this feature to respond to your email. I think this is not entirely correct. I tested last version of the patch [0]: I created a table and disabled sequential scanning, so there were no other options for optimizer to scan table t1. it still displayed that it has disabled nodes. However you are right that this display will not appear for all nodes that only contain a data collection procedure, such as Append, MergeAppend, Gather, etc. And I agree with you that we should information about it. I also think it’s worth adding additional information that this option does not appear in the postgres_fdw extension. -- Regards, Alena Rybakina Postgres Professional
Re: Parallel workers stats in pg_stat_database
Hey, Le mer. 2 oct. 2024 à 11:12, Benoit Lobréau a écrit : > Hi, > > Thanks for your imput ! I will fix the doc as proposed and do the split > as soon as I have time. > > I've done the split, but I didn't go any further than that. Two patches attached: * v5-0001 adds the metrics (same patch than v3-0001 for pg_stat_statements) * v5-0002 handles the metrics for pg_stat_database. "make check" works, and I also did a few other tests without any issues. Regards. -- Guillaume. From 9413829616bdc0806970647c15d7d6bbd96489d1 Mon Sep 17 00:00:00 2001 From: Guillaume Lelarge Date: Mon, 7 Oct 2024 08:45:36 +0200 Subject: [PATCH v5 1/2] Introduce two new counters in EState They will be used by two other patchs to populate new columns in pg_stat_database and pg_statements. --- src/backend/executor/execUtils.c | 3 +++ src/backend/executor/nodeGather.c | 3 +++ src/backend/executor/nodeGatherMerge.c | 3 +++ src/include/nodes/execnodes.h | 3 +++ 4 files changed, 12 insertions(+) diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index 5737f9f4eb..1908481999 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -162,6 +162,9 @@ CreateExecutorState(void) estate->es_jit_flags = 0; estate->es_jit = NULL; + estate->es_parallelized_workers_launched = 0; + estate->es_parallelized_workers_planned = 0; + /* * Return the executor state structure */ diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c index 5d4ffe989c..0fb915175a 100644 --- a/src/backend/executor/nodeGather.c +++ b/src/backend/executor/nodeGather.c @@ -182,6 +182,9 @@ ExecGather(PlanState *pstate) /* We save # workers launched for the benefit of EXPLAIN */ node->nworkers_launched = pcxt->nworkers_launched; + estate->es_parallelized_workers_launched += pcxt->nworkers_launched; + estate->es_parallelized_workers_planned += pcxt->nworkers_to_launch; + /* Set up tuple queue readers to read the results. */ if (pcxt->nworkers_launched > 0) { diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c index 45f6017c29..149ab23d90 100644 --- a/src/backend/executor/nodeGatherMerge.c +++ b/src/backend/executor/nodeGatherMerge.c @@ -223,6 +223,9 @@ ExecGatherMerge(PlanState *pstate) /* We save # workers launched for the benefit of EXPLAIN */ node->nworkers_launched = pcxt->nworkers_launched; + estate->es_parallelized_workers_launched += pcxt->nworkers_launched; + estate->es_parallelized_workers_planned += pcxt->nworkers_to_launch; + /* Set up tuple queue readers to read the results. */ if (pcxt->nworkers_launched > 0) { diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index aab59d681c..f898590ece 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -708,6 +708,9 @@ typedef struct EState bool es_use_parallel_mode; /* can we use parallel workers? */ + int es_parallelized_workers_launched; + int es_parallelized_workers_planned; + /* The per-query shared memory area to use for parallel execution. */ struct dsa_area *es_query_dsa; -- 2.46.2 From 873c03b99e1ccc05bac4c71b5f070e0dbe0bd779 Mon Sep 17 00:00:00 2001 From: benoit Date: Mon, 7 Oct 2024 10:12:46 +0200 Subject: [PATCH v5 2/2] Adds four parallel workers stat columns to pg_stat_database * parallel_workers_planned * parallel_workers_launched * parallel_maint_workers_planned * parallel_maint_workers_launched --- doc/src/sgml/monitoring.sgml | 36 +++ src/backend/access/brin/brin.c| 4 +++ src/backend/access/nbtree/nbtsort.c | 4 +++ src/backend/catalog/system_views.sql | 4 +++ src/backend/commands/vacuumparallel.c | 5 +++ src/backend/executor/execMain.c | 7 src/backend/utils/activity/pgstat_database.c | 36 +++ src/backend/utils/adt/pgstatfuncs.c | 12 +++ src/include/catalog/pg_proc.dat | 20 +++ src/include/pgstat.h | 7 src/test/regress/expected/rules.out | 4 +++ src/test/regress/expected/select_parallel.out | 11 ++ src/test/regress/expected/vacuum_parallel.out | 19 ++ src/test/regress/sql/select_parallel.sql | 8 + src/test/regress/sql/vacuum_parallel.sql | 11 ++ 15 files changed, 188 insertions(+) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 331315f8d3..9567ca5bd1 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3611,6 +3611,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage + + + parallel_workers_planned bigint + + + Number of parallel workers planned by queries on this database + + + + + + parallel_workers_launched
Re: Enable data checksums by default
> On 3 Oct 2024, at 23:13, Nathan Bossart wrote: > > On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote: >> I have committed 0001 (the new option) and 0004 (the docs tweak). I think >> there is consensus for the rest, too, but I'll leave it for a few more days >> to think about. I guess the test failure has to be addressed. > > Here is a rebased patch with the test fix (for cfbot). I have made no > other changes. +Data checksums are enabled by default. They can be +disabled by use of --no-data-checksums. Nitpick, but I think this should be an xref like how we link to --no-locale in the -E docs: instead. LGTM otherwise. -- Daniel Gustafsson
Re: per backend I/O statistics
Hi, On Fri, Sep 20, 2024 at 12:53:43PM +0900, Michael Paquier wrote: > On Wed, Sep 04, 2024 at 04:45:24AM +, Bertrand Drouvot wrote: > > On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote: > >> As an additional benefit of this approach, the client can set a > >> connection variable, for example, no_backend_iostats to true, or set > >> its inverse variable to false, to restrict memory usage to only the > >> required backends. > > > > Thanks for the feedback! > > > > If we were to add an on/off switch button, I think I'd vote for a global one > > instead. Indeed, I see this feature more like an "Administrator" one, where > > the administrator wants to be able to find out which session is reponsible > > of > > what (from an I/O point of view): like being able to anwser "which session > > is > > generating this massive amount of reads"? > > > > If we allow each session to disable the feature then the administrator > > would lost this ability. > > Hmm, I've been studying this patch, Thanks for looking at it! > and I am not completely sure to > agree with this feeling of using fixed-numbered stats, actually, after > reading the whole and seeing the structure of the patch > (FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't > know exactly the number of slots we need to know for the > fixed-numbered stats as MaxBackends may change). Right, that's a new way of dealing with "unknown" number of slots (and it has cons as you mentioned in [1]). > If we make these > kind of stats variable-numbered, does it have to actually involve many > creations or removals of the stats entries, though? One point is that > the number of entries to know about is capped by max_connections, > which is a PGC_POSTMASTER. That's the same kind of control as > replication slots. So one approach would be to reuse entries in the > dshash and use in the hashing key the number in the procarrays. If a > new connection spawns and reuses a slot that was used in the past, > then reset all the existing fields and assign its PID. Yeah, like it's done currently with the "fixed-numbered" stats proposal. That sounds reasonable to me, I'll look at this proposed approach and come back with a new patch version, thanks! > Another thing is the consistency of the data that we'd like to keep at > shutdown. If the connections have a balanced amount of stats shared > among them, doing decision-making based on them is kind of easy. But > that may cause confusion if the activity is unbalanced across the > sessions. We could also not flush them to disk as an option, but it > still seems more useful to me to save this data across restarts if one > takes frequent snapshots of the new system view reporting everything, > so as it is possible to get an idea of the deltas across the snapshots > for each connection slot. The idea that has been implemented so far in this patch is that we still maintain an aggregated version of the stats (visible through pg_stat_io) and that only the aggregated stats are flushed/read to/from disk (means we don't flush the per-backend stats). I think that it makes sense that way. The way I see it is that the per-backend I/O stats is more for current activity instrumentation. So it's not clear to me what would be the benefits of restoring the per-backend stats at startup knowing that: 1) we restored the aggregated stats and 2) the sessions that were responsible for the the restored stats are gone. [1]: https://www.postgresql.org/message-id/Zuz5iQ4AjcuOMx_w%40paquier.xyz Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: ECPG cleanup and fix for clang compile-time problem
On Saturday, October 5, 2024, Tom Lane wrote: > > > Rebase needed after f22e84df1, so here's an update that rebases > up to HEAD and adds the missing "static". No other changes. > > (Anybody want to review this? I'm getting tired of rebasing it, > and we're missing out on the clang build time savings.) Sorry for the delay, I'll respond in a couple days.
Re: per backend I/O statistics
Hi, On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote: > On Tue, Sep 17, 2024 at 01:56:34PM +, Bertrand Drouvot wrote: > > No problem at all! (I re-explained because I'm not always 100% sure that my > > explanations are crystal clear ;-) ) > > We've discussed a bit this patch offline, but after studying the patch > I doubt that this part is a good idea: > > + /* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */ > + PgStatShared_IO io; > } PgStat_ShmemControl; > > We are going to be in trouble if we introduce a second member in this > routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl > relies on the fact that all its members after deterministic offset > positions in this structure. Agree that it would be an issue should we have to add a new FLEXIBLE_ARRAY_MEMBER. > So this lacks flexibility. This choice > is caused by the fact that we don't exactly know the number of > backends because that's controlled by the PGC_POSTMASTER GUC > max_connections so the size of the structure would be undefined. Right. > There is a parallel with replication slot statistics here, where we > save the replication slot data in the dshash based on their index > number in shmem. Hence, wouldn't it be better to do like replication > slot stats, where we use the dshash and a key based on the procnum of > each backend or auxiliary process (ProcNumber in procnumber.h)? If at > restart max_connections is lower than what was previously used, we > could discard entries that would not fit anymore into the charts. > This is probably not something that happens often, so I'm not really > worried about forcing the removal of these stats depending on how the > upper-bound of ProcNumber evolves. Yeah, I'll look at implementing the dshash based on their procnum and see where it goes. > So, using a new kind of ID and making this kind variable-numbered may > ease the implementation quite a bit, while avoiding any extensibility > issues with the shmem portion of the patch if these are > fixed-numbered. Agree. > This would rely on the fact that we would use the ProcNumber for the > dshash key, and this information is not provided in pg_stat_activity. > Perhaps we should add this information in pg_stat_activity so as it > would be easily possible to do joins with a SQL function that returns > a SRF with all the stats associated with a given connection slot > (auxiliary or backend process)? I'm not sure that's needed. What has been done in the previous versions is to get the stats based on the pid (see pg_stat_get_backend_io()) where the procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)). Do you see an issue with that approach? > The active PIDs of the live sessions are not stored in the active > stats, why not? That was not needed. We can still retrieve the stats based on the pid thanks to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having to actually store the pid in the stats. I think that's fine because the pid only matters at "display" time (pg_stat_get_backend_io()). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Commit fest 2024-09
> On 7 Oct 2024, at 09:47, Michael Paquier wrote: > The commit fest 2024-09 should have ended one week ago, so I have been > taking one step ahead and I have begun cleaning up things. Thanks for doing this, much appreciated! -- Daniel Gustafsson
Re: Use heap scan routines directly in vac_update_datfrozenxid()
On 2024-Oct-06, Tom Lane wrote: > Soumyadeep Chakraborty writes: > > Attached is a simple patch to directly use heap scan routines in > > vac_update_datfrozenxid(), avoiding the multilayer overhead from the > > sysscan infrastructure. Though if there's anybody with a Postgres fork using catalog tables that aren't heapam, then they aren't going to be happy with this change. (I remember Tom commenting that Salesforce used to do that, I wonder if they still do.) > I would think the overhead of that is minuscule. If it isn't, > we should try to make it so, not randomly hack up some of the callers > to avoid it. The intention certainly was that it wouldn't cost > anything compared to what happens within the actual table access. I suspect the problem is not the systable layer per se, but the fact that it has to go through the table AM interface. So by replacing systable with the heap routines, it's actually _two_ layers of indirection that are being skipped over. systable seems indeed quite lean, or at least it was up to postgres 11 ... but it's not clear to me that it continues to be. The table AM API is heavily centered on slots, and I think having to build heap tuples from slots might be slow. I wonder if it would be better to add "systable_getnext_slot" which returns a slot and omit the conversion to heaptuple. Callers for which it's significant could skip that conversion by dealing with a slot instead. Converting just one or two critical spots (such as vac_update_datfrozenxid, maybe pg_publication.c) should be easy enough. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Ellos andaban todos desnudos como su madre los parió, y también las mujeres, aunque no vi más que una, harto moza, y todos los que yo vi eran todos mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)
Re: Add contrib/pg_logicalsnapinspect
Hi, On Wed, Sep 25, 2024 at 05:04:18PM -0700, Masahiko Sawada wrote: > I've reviewed v10 patch and here are some comments: Thanks for looking at it! > > +static void > +ValidateAndRestoreSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk, > + const char *path) > > This function and SnapBuildRestore() have duplicate codes. Can we have > a common function that reads the snapshot data from disk to > SnapBuildOnDisk, and have both ValidateAndRestoreSnapshotFile() and > SnapBuildRestore() call it? Right. I had in mind to remove the duplicate code while proposing the "in core" functions version of this patch, see [1]. Now that snapbuild_internal.h is there, I do not see any reason not to remove the duplicate code. That's done in v11 attached: ValidateAndRestoreSnapshotFile() has been modified a bit and can now be used in the new module and in SnapBuildRestore(). Bonus points, as compared to v10, it allows to: 1. move the SnapBuildRestoreContents() declaration back from snapbuild_internal.h to its original location (snapbuild.c) 2. same as above for the SnapBuildOnDiskConstantSize, SnapBuildOnDiskNotChecksummedSize, SNAPBUILD_MAGIC and SNAPBUILD_VERSION definitions 3. remove the changes in pg_crc32c.h as the extras "PGDLLIMPORT" are not needed anymore > --- > +CREATE FUNCTION pg_get_logical_snapshot_meta(IN in_lsn pg_lsn, > (snip) > +CREATE FUNCTION pg_get_logical_snapshot_info(IN in_lsn pg_lsn, > > Is there any reason why both functions take a pg_lsn value as an > argument? Given that the main usage would be to use these functions > with pg_ls_logicalsnapdir(), I think it would be easier for users if > these functions take a filename as a function argument. That way, we > can use these functions like: > > select info.* from pg_ls_logicalsnapdir(), > pg_get_logical_snapshot_info(name) as info; I think that makes sense. It also simplfies the tests and the examples, done that way in v11. > > If there are use cases where specifying a LSN is easier, I think we > would have both types. > > > +static const char *const SnapBuildStateDesc[] = { > +[SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start", > +[SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building", > +[SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full", > +[SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent", > +}; > > I think that it'd be better to have a dedicated function that returns > a string representation of the given state by using a switch > statement. That way, we don't need SNAPBUILD_STATE_INCR and a compiler > warning would help realize a missing state if a new state is > introduced in the future. Yeah, that sounds reasonable. Done in v11: the switch does not handle "default" so that [-Wswitch] would report a warning in case of enumeration value not handled in the switch (v11 keeps a remark on top of the SnapBuildState enum definition though). > It needs a function call but I believe it won't be a problem in this use case. Agree. [1]: https://www.postgresql.org/message-id/ZtGKi5FdW%2Bky%2B0fV%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 52fbead00e079132881ec51a3f913cd784ccd356 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Wed, 14 Aug 2024 08:46:05 + Subject: [PATCH v11] Add contrib/pg_logicalinspect Provides SQL functions that allow to inspect logical decoding components. It currently allows to inspect the contents of serialized logical snapshots of a running database cluster, which is useful for debugging or educational purposes. --- contrib/Makefile | 1 + contrib/meson.build | 1 + contrib/pg_logicalinspect/.gitignore | 4 + contrib/pg_logicalinspect/Makefile| 31 ++ .../expected/logical_inspect.out | 52 contrib/pg_logicalinspect/logicalinspect.conf | 1 + contrib/pg_logicalinspect/meson.build | 39 +++ .../pg_logicalinspect--1.0.sql| 43 +++ contrib/pg_logicalinspect/pg_logicalinspect.c | 199 + .../pg_logicalinspect.control | 5 + .../specs/logical_inspect.spec| 34 +++ doc/src/sgml/contrib.sgml | 1 + doc/src/sgml/filelist.sgml| 1 + doc/src/sgml/pglogicalinspect.sgml| 150 ++ src/backend/replication/logical/snapbuild.c | 279 -- src/include/replication/snapbuild.h | 6 +- src/include/replication/snapbuild_internal.h | 197 + 17 files changed, 823 insertions(+), 221 deletions(-) 7.5% contrib/pg_logicalinspect/expected/ 5.2% contrib/pg_logicalinspect/specs/ 27.9% contrib/pg_logicalinspect/ 14.2% doc/src/sgml/ 25.0% src/backend/replication/logical/ 19.7% src/include/replication/ diff
Re: System views for versions reporting
> On Sun, Oct 06, 2024 at 12:01:29PM GMT, Joe Conway wrote: > On 10/6/24 11:36, Dmitry Dolgov wrote: > > Hi, > > > > Based on the feedback in [1], here is my attempt at implementing system > > views for versions reporting. It adds pg_system_versions for showing > > things like core version, compiler, LLVM, etc, and pg_system_libraries > > for showing linked shared objects. I think everyone has ageed that the > > first was a good idea, where the second was only suggested -- after some > > thinking I find shared obects useful enough to include here as well. > > > > The main idea is to facilitate bug reporting. In particular, in many JIT > > related bug reports one of the first questions is often "which LLVM > > version is used?". Gathering such information is a manual process, > > mistakes could be made when veryfing llvm-config output or anywhere > > else. Having a system view for such purposes makes the process a bit > > more robust. > > > > The first three patches are essential for this purpose, the fourth one > > is somewhat independent and could be concidered in isolation. The output > > looks like this : > > > > =# select * from pg_system_versions; > >name | version| type > > --+--+-- > > Arch | x86_64-linux | Compile Time > > ICU | 15.1 | Compile Time > > Core | 18devel | Compile Time > > Compiler | gcc-14.0.1 | Compile Time > > LLVM | 18.1.6 | Run Time > > I'm not sure why ICU is "Compile Time" rather than "Run Time" when it is not > statically linked. It reports U_UNICODE_VERSION at compile time. It's not necessarily correct though, I can try to replace it with the runtime version. I think there was some ICU functionality (something like u_getUnicodeVersion), which is maybe a better fit. > Also, if we are going to include ICU here, shouldn't we > also include libc version? Yeah, why not. One of my goals here is to identify a balanced set of useful versions to report. > > =# select * from pg_system_libraries; > > name > > - > > /lib64/libkrb5.so.3 > > /lib64/libz.so.1 > > linux-vdso.so.1 > > /lib64/libxml2.so.2 > > [...] > > > > Any feedback is appreciated. > > I think it would be nice to include a sha256 hash (or something similar) of > the libraries as well, so that they can be checked against known good > values. I was thinking about getting more info to show in this view, but haven't found any reasonable way to achieve that. So I would agree with Tom on that.
Re: Make default subscription streaming option as Parallel
On Mon, 7 Oct 2024 at 12:26, Hayato Kuroda (Fujitsu) wrote: > > Dear Vignesh, > > > One key point to consider is that the lock on transaction objects will > > be held for a longer duration when using streaming in parallel. This > > occurs because the parallel apply worker initiates the transaction as > > soon as streaming begins, maintaining the lock until the transaction > > is fully completed. As a result, for long-running transactions, this > > extended lock can hinder concurrent access that requires a lock. > > So, the current default is the most conservative and can run anywhere; your > proposal is a bit optimistic, right? Yes, that is correct. > One comment for your patch; > Shouldn't we add a streaming=off case for pg_dump test? You added lines but > no one > passes it. > Update existing pg_dump tests to cover the streaming options, the attached patch has the changes for the same. Regards, Vignesh v2-0001-Make-default-value-for-susbcription-streaming-opt.patch Description: Binary data
Re: Add new COPY option REJECT_LIMIT
Thanks for your review! On Thu, Oct 3, 2024 at 4:27 PM jian he wrote: mentioning maxerror is a bigint type or explicitly mentioning the maximum allowed value of "maxerror" would be great. Added a description that it allows positive bigint. On Thu, Oct 3, 2024 at 11:43 PM Fujii Masao wrote: + if (opts_out->reject_limit && !opts_out->on_error) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + /*- translator: first and second %s are the names of COPY +* option, e.g. ON_ERROR, third is the value of the COPY option, +* e.g. IGNORE */ +errmsg("COPY %s requires %s to be set to %s", +"REJECT_LIMIT", "ON_ERROR", "IGNORE"))); This check might be better moved to other place, for example at the bottom of ProcessCopyOptions(). I noticed a comment in the code: "Check for incompatible options (must do these two before inserting defaults)." You added your condition after this comment and before inserting the defaults, but it's not related to that process. So, moving this check to other place seems more appropriate. Agreed. Moved it to the bottom of ProcessCopyOptions(). While reviewing, I also noticed that the check for "opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP" is similarly placed before setting the defaults, which might not be correct. This check should probably be moved as well. Additionally, the comment mentioning "must do these two" should be updated to "must do these three." These changes should be handled in a separate patch. Agreed and attached 0002 patch. + if (cstate->opts.reject_limit && Wouldn't it be clearer to use cstate->opts.reject_limit > 0 for better readability? Agreed. + cstate->num_errors > cstate->opts.reject_limit) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), +errmsg("skipped more than REJECT_LIMIT rows: \"%lld\",", + (long long) cstate->opts.reject_limit))); To make the error message clearer, similar to the NOTICE about skipped rows, how about rewording it to: "skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility"? Agreed. Also considering when REJECT_LIMIT is specified to 1, attached patch uses errmsg_plural() instead of errmsg. -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 360e170b8aee378c0f498d8899d9f3b5d41d0310 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 7 Oct 2024 19:57:51 +0900 Subject: [PATCH v7 1/2] Subject: Add REJECT_LIMIT option to COPY command. 9e2d870 enabled the COPY FROM command to skip soft errors, but there was no limitation about the number of errors and skipped all the soft errors. In some cases there would be a limitation how many errors are tolerable. This patch introduces REJECT_LIMIT to specify how many errors can be skipped. --- doc/src/sgml/ref/copy.sgml | 19 + src/backend/commands/copy.c | 33 + src/backend/commands/copyfrom.c | 9 src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 10 + src/test/regress/sql/copy2.sql | 21 ++ 6 files changed, 93 insertions(+) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index b9413d4892..59a17d7793 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] ) | * } FORCE_NULL { ( column_name [, ...] ) | * } ON_ERROR error_action +REJECT_LIMIT maxerror ENCODING 'encoding_name' LOG_VERBOSITY verbosity @@ -413,6 +414,24 @@ COPY { table_name [ ( + +REJECT_LIMIT + + + Specifies the maximum number of errors tolerated while converting a + column's input value to its data type, when ON_ERROR is + set to ignore. + If the input causes more errors than the specified value, the COPY + command fails, even with ON_ERROR set to ignore. + This clause must be used with ON_ERROR=ignore + and maxerror must be positive bigint. + If not specified, ON_ERROR=ignore + allows an unlimited number of errors, meaning COPY will + skip all erroneous data. + + + + ENCODING diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 03eb7a4eba..befab92074 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -418,6 +418,23 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) return COPY_ON_ERROR_STOP; /* keep compiler quiet */ } +/* + * Extract REJECT_LIMIT value from a Def
Re: Doc: typo in config.sgml
On Tue, 1 Oct 2024 22:20:55 +0900 Yugo Nagata wrote: > On Tue, 1 Oct 2024 15:16:52 +0900 > Yugo NAGATA wrote: > > > On Tue, 01 Oct 2024 10:33:50 +0900 (JST) > > Tatsuo Ishii wrote: > > > > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in > > > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code > > > >> point in Unicode. i.e. U+00A0). > > > >> So grep -P "[\xC2\xA0]" should work to detect nbsp. > > > > > > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. > > > > ([ and ] were not necessary.) > > > > > > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters > > > > in charset.sgml, > > > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for > > > > making sure detecting > > > > nbsp. > > > > > > > > One problem is that -P option can be used in only GNU grep, and grep in > > > > mac doesn't support it. > > > > > > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can > > > > assume the shell is bash. > > > > > > > > Maybe, better way is use perl itself rather than grep as following. > > > > > > > > `perl -ne '/\xC2\xA0/ and print' ` > > > > > > > > I attached a patch fixed in this way. > > > > > > GNU sed can also be used without setting LC_ALL: > > > > > > sed -n /"\xC2\xA0"/p > > > > > > However I am not sure if non-GNU sed can do this too... > > > > Although I've not check it myself, BSD sed doesn't support \x escape > > according to [1]. > > > > [1] > > https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference > > > > By the way, I've attached a patch a bit modified to use the plural form > > statement > > as same as check-tabs. > > > > Non-breaking **spaces** appear in SGML/XML files > > The previous patch was broken because the perl command failed to return the > correct result. > I've attached an updated patch to fix the return value. In passing, I added > line breaks > for long lines. I've attached a updated patch. I added the comment to explain why Perl is used instead of grep or sed. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile index 9c9bbfe375..65ed32cd0a 100644 --- a/doc/src/sgml/Makefile +++ b/doc/src/sgml/Makefile @@ -194,7 +194,7 @@ MAKEINFO = makeinfo ## # Quick syntax check without style processing -check: postgres.sgml $(ALLSGML) check-tabs +check: postgres.sgml $(ALLSGML) check-tabs check-nbsp $(XMLLINT) $(XMLINCLUDE) --noout --valid $< @@ -257,7 +257,15 @@ endif # sqlmansectnum != 7 # tabs are harmless, but it is best to avoid them in SGML files check-tabs: - @( ! grep ' ' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Tabs appear in SGML/XML files" 1>&2; exit 1) + @( ! grep ' ' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \ + (echo "Tabs appear in SGML/XML files" 1>&2; exit 1) + +# Non-breaking spaces are harmless, but it is best to avoid them in SGML files. +# Use perl command because non-GNU grep or sed could not have hex escape sequence. +check-nbsp: + @ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \ + $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \ + (echo "Non-breaking spaces appear in SGML/XML files" 1>&2; exit 1) ## ## Clean
Re: Psql meta-command conninfo+
On Sun, Oct 6, 2024 at 11:17 PM Hunaid Sohail wrote: > > PQpass - no need > I would include this as presence/absence. I concur on all of the rest. > > For PQparameterStatus, some parameters are already used. > server_version and application_name were already discussed and removed in > v12 and v29 respectively. Do we need other parameters? > Ok, I'll need to go read the reasoning for why they are deemed unneeded and form an opinion one way or the other. > >> Within that framework having \conninfo[+[CSE][…]] be the command - >> printing out only the table specified would be the behavior (specifying no >> suffix letters prints all three) - would be an option. >> > > 3 separate tables without suffix? > Yes, the tables need headers specific to their categories. I do like the idea of having 4 though, placing settings into their own. Premised on having all or most of the available parameters being on the table. If it only ends up being a few of them then keeping those in the status table makes sense. David J. >
Re: Converting tab-complete.c's else-if chain to a switch
Thomas Munro writes: > On Mon, Oct 7, 2024 at 12:11 PM Tom Lane wrote: >> Hmm, I should think that if you break anything in tab-complete.in.c, >> the breakage would propagate to tab-complete.c without difficulty. >> Do you have an example of something that the preprocessor would mask? > Fair point, and nope. Thought withdrawn. Pushed, then. The original motivation for all this was to try to get tab completion working in MSVC builds. I'm not planning to do anything more towards that myself, but maybe somebody else would like to have a go at it. regards, tom lane
Re: POC, WIP: OR-clause support for indexes
On Mon, Oct 7, 2024 at 12:02 PM Tom Lane wrote: > Peter Geoghegan writes: > > To be clear, I don't think that it's essential that we have equivalent > > behavior in those cases where the patch applies its transformations. I > > have no objections to committing the patch without any handling for > > that. > > Oy. I don't agree with that *at all*. An "optimization" that changes > query semantics is going to be widely seen as a bug. I think that you must have misinterpreted what I meant by "equivalent behavior". The context was important. I really meant: "Ideally, the patch's transformations would produce an equivalent execution strategy to what we already get in when IN() is used directly, *even in the presence of constants of mixed though related types*. Ideally, the final patch would somehow be able to generate a SAOP with one array of the same common type in cases where an analogous IN() query can do the same. But I'm not going to insist on adding something for that now." Importantly, I meant equivalent outcome in terms of execution strategy, across similar queries where the patch sometimes succeeds in generating a SAOP, and sometimes fails -- I wasn't trying to say anything about query semantics. This wasn't intended to be a rigorous argument (if it was then I'd have explained why my detailed and rigorous proposal didn't break query semantics). -- Peter Geoghegan
Re: POC, WIP: OR-clause support for indexes
On Mon, Oct 7, 2024 at 12:02 PM Tom Lane wrote: > Peter Geoghegan writes: > > To be clear, I don't think that it's essential that we have equivalent > > behavior in those cases where the patch applies its transformations. I > > have no objections to committing the patch without any handling for > > that. > > Oy. I don't agree with that *at all*. An "optimization" that changes > query semantics is going to be widely seen as a bug. I think everyone agrees on that. The issue is that I don't know how to implement the optimization Peter wants without changing the query semantics, and it seems like Alexander doesn't either. By committing the patch without that optimization, we're *avoiding* changing the query semantics. -- Robert Haas EDB: http://www.enterprisedb.com
Re: On disable_cost
On Mon, 2024-10-07 at 11:14 -0400, Robert Haas wrote: > I accept that my commit created this problem and I'm > certainly willing to be involved too if we need to sort out more > things. Thanks you. I think it is great that disabled nodes are now handled better, so I appreciate the change as such. But I had to focus on the one fly in the ointment; you know how it is... Yours, Laurenz Albe
Re: Should rolpassword be toastable?
Committed with the limit set to 512 bytes. We have plenty of time to adjust this limit as needed before it takes effect in v18. -- nathan
Re: POC, WIP: OR-clause support for indexes
Peter Geoghegan writes: > To be clear, I don't think that it's essential that we have equivalent > behavior in those cases where the patch applies its transformations. I > have no objections to committing the patch without any handling for > that. Oy. I don't agree with that *at all*. An "optimization" that changes query semantics is going to be widely seen as a bug. regards, tom lane
Re: On disable_cost
On Mon, 2024-10-07 at 10:17 +0300, Alena Rybakina wrote: > > diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml > > index ff689b65245..db906841472 100644 > > --- a/doc/src/sgml/perform.sgml > > +++ b/doc/src/sgml/perform.sgml > > @@ -578,6 +578,28 @@ WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; > > discussed below. > > > > > > + > > +Some plan node types cannot be completely disabled. For example, > > there is > > +no other access method than a sequential scan for a table with no > > index. > > +If you told the planner to disregard a certain node type, but it is > > forced > > +to use it nonetheless, you will see the plan node marked as > > +Disabled in the output of EXPLAIN: > > + > > + > > +CREATE TABLE dummy (t text); > > + > > +SET enable_seqscan = off; > > + > > +EXPLAIN SELECT * FROM dummy; > > + > > +QUERY PLAN > > +-- > > + Seq Scan on dummy (cost=0.00..23.60 rows=1360 width=32) > > + Disabled: true > > + > > + > > + > > + > > > > > > subplan > > I think this is not entirely correct. I tested last version of the > patch [0]: I created a table and disabled sequential scanning, so > there were no other options for optimizer to scan table t1. it still > displayed that it has disabled nodes. Isn't that exactly what my doc patch shows? > However you are right that this display will not appear for all > nodes that only contain a data collection procedure, such as Append, > MergeAppend, Gather, etc. And I agree with you that we should > information about it. I also think it’s worth adding additional > information that this option does not appear in the postgres_fdw > extension. I cannot quite follow that either... Yours, Laurenz Albe
Re: bt Scankey in another contradictory case
On Fri, Aug 30, 2024 at 10:32 AM Peter Geoghegan wrote: > It doesn't make a huge difference in practice, because we'll still end > the scan once the leaf level is reached. But it matters more when > array keys are involved, where there might be more than one descent to > the leaf level. Plus we might as well just be thorough about this > stuff. Was your "explain (analyze, buffers) select * from c_s where x >4000 and y >10 and y <10 order by x desc" example intended to illustrate that these earlier remarks of mine about the problem not being so bad aren't always correct? With your patch, we can detect contradictory quals regardless of their required-ness. There will be some cases where (with your patch) we'll now avoid a very inefficient full index scan -- contrary to what I said about it back on August 30. If that is what you meant, then I accept your argument. I didn't quite get your point before, but this is a logical, useful argument. (You didn't really need to convince me, but this argument still helps your patch.) -- Peter Geoghegan
Re: POC, WIP: OR-clause support for indexes
On Mon, Oct 7, 2024 at 12:02 PM Tom Lane wrote: > Oy. I don't agree with that *at all*. An "optimization" that changes > query semantics is going to be widely seen as a bug. I don't believe that I said otherwise? It's just rather unclear what query semantics really mean here, in detail. At least to me. But it's obvious that (for example) it would not be acceptable if a cast were to visibly fail, where that hadn't happened before. -- Peter Geoghegan
Re: On disable_cost
On 2024-Oct-03, Robert Haas wrote: > One general thing to think about is that we really document very > little about EXPLAIN. That might not be good, but we should consider > whether it will look strange if we document a bunch of stuff about > this and still don't talk about anything else. I completely agree that we document very little about EXPLAIN. However, I disagree that we should continue to do so. I'd rather take the opportunity to _add_ more details that we currently omit, and make the documentation more complete. A short blurb about Disabled Nodes such as the one Laurenz proposed seems an excellent way to start; we can add more later, as people propose them. We don't have to stop here, and we don't have to stay at statu quo re. other points. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
Re: On disable_cost
On Mon, Oct 7, 2024 at 11:28 AM Alvaro Herrera wrote: > On 2024-Oct-03, Robert Haas wrote: > > One general thing to think about is that we really document very > > little about EXPLAIN. That might not be good, but we should consider > > whether it will look strange if we document a bunch of stuff about > > this and still don't talk about anything else. > > I completely agree that we document very little about EXPLAIN. However, > I disagree that we should continue to do so. I'd rather take the > opportunity to _add_ more details that we currently omit, and make the > documentation more complete. A short blurb about Disabled Nodes such as > the one Laurenz proposed seems an excellent way to start; we can add > more later, as people propose them. We don't have to stop here, and we > don't have to stay at statu quo re. other points. Sure, that all makes sense. I was just raising it as a point to consider. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Re: bt Scankey in another contradictory case
On Sun, Sep 1, 2024 at 11:44 AM bigbro...@hotmail.com wrote: > I have reanalysed the code of function _bt_first. I notice that using a > multi-attribute index > if we can't identify the starting boundaries and the following attributes > markded not required , > that means we need start at first or last page in the index to examine every > tuple to satisfy the > qual or not, in the meantime the scan will be stopped while the first > attribute evaluated failed. I'm not sure of what it is that you're trying to draw attention to. The behavior of _bt_first doesn't seem relevant to this patch to me. For one thing, _bt_first doesn't actually care about whether or not a scan key is marked required -- _bt_first independently applies its own similar rules to determine which scan keys can be used in the insertion scan key used to find an initial position on the leaf level. More importantly, whether a scan key is marked required doesn't seem relevant to this patch (that is relevant to _bt_preprocess_keys, but doesn't seem relevant to the changes that you propose to make to _bt_preprocess_keys). > For instance: > create table c_s( x int, y int); > insert into c_s select generate_series(1, 2), generate_series(1, > 2); > create index my_c_s_idx on c_s using btree(x,y); > explain (analyze, buffers) select * from c_s where x >4000 and y >10 and > y <10 order by x desc; Your patch (which I haven't tested for myself) is based on the observation that "y > 10 and y < 10" is a contradictory condition. This is true regardless of any other factor, such as whether we're able to mark the "y" scan key required. All that _bt_first has to do is return when it notices "so->qual_ok" was set to false within _bt_preprocess_keys. It doesn't matter if there is an "x > 4000", and it doesn't matter if you use a composite index like this that completely omits a condition on "x". > What's more, if i change the number 4000 to 1000. > - > Sort (cost=441.01..441.01 rows=1 width=8) (actual time=2.974..2.975 rows=0 > loops=1) >Sort Key: x DESC >Sort Method: quicksort Memory: 25kB >Buffers: shared hit=89 >-> Seq Scan on c_s (cost=0.00..441.00 rows=1 width=8) (actual > time=2.971..2.971 rows=0 loops=1) > Filter: ((x > 1000) AND (y > 10) AND (y < 10)) > Rows Removed by Filter: 2 > Buffers: shared hit=89 > Planning: >Buffers: shared hit=2 > Planning Time: 0.113 ms > Execution Time: 2.990 ms > (12 rows) > > The planner choosed the Seq Scan, and the executor have done the unnecessary > jobs 2 times. It's true that a sequential scan won't ever apply the logic from _bt_preprocess_keys, nor any similar logic. A sequential scan with contradictory quals will therefore not be detected in cases where it would have been detected had we used an index scan with the same quals. This inconsistency doesn't make much sense; it's just an implementation deficiency. It's historical. We've talked about moving the current _bt_preprocess_keys logic to plan time before. See Tom's remarks at the end of this email: https://www.postgresql.org/message-id/2587523.1647982...@sss.pgh.pa.us I think that it would be possible to move _bt_preprocess_keys to plan time, and to generalize it to work with sequential scans, but that is a much larger project. It seems out of scope. I think that you should just focus on the immediate problem of not detecting contradictory quals that happen to involve (> or >=) and (< or <=) operators. It's a completely independent problem, and one that is well scoped. -- Peter Geoghegan
Re: Proposal to Enable/Disable Index using ALTER INDEX
On Mon, Sep 23, 2024 at 11:14 AM Peter Eisentraut wrote: > I think a better approach would be to make the list of disabled indexes > a GUC setting, which would then internally have an effect similar to > enable_indexscan, meaning it would make the listed indexes unattractive > to the planner. > > This seems better than the proposed DDL command, because you'd be able > to use this per-session, instead of forcing a global state, and even > unprivileged users could use it. > > (I think we have had proposals like this before, but I can't find the > discussion I'm thinking of right now.) I feel like a given user could want either one of these things. If you've discovered that a certain index is causing your production application to pick the wrong index, disabling it and thereby affecting all backends is what you want. If you're trying to experiment with different query plans without changing anything for other backends, being able to set some session-local state is better. I don't understand the argument that one of these is categorically better than the other. -- Robert Haas EDB: http://www.enterprisedb.com
Re: long-standing data loss bug in initial sync of logical replication
On Fri, 4 Oct 2024 at 12:52, Shlok Kyal wrote: > > Hi Kuroda-san, > > Thanks for reviewing the patch. > > > > 1. > > I feel the name of SnapBuildDistributeNewCatalogSnapshot() should be > > updated because it > > distributes two objects: catalog snapshot and invalidation messages. Do you > > have good one > > in your mind? I considered > > "SnapBuildDistributeNewCatalogSnapshotAndInValidations" or > > "SnapBuildDistributeItems" but seems not good :-(. > > I have renamed the function to 'SnapBuildDistributeSnapshotAndInval'. > Thoughts? > > > 2. > > Hmm, still, it is overengineering for me to add a new type of invalidation > > message > > only for the publication. According to the ExecRenameStmt() we can > > implement an > > arbitrary rename function like RenameConstraint() and RenameDatabase(). > > Regaring the ALTER PUBLICATION OWNER TO, I feel adding > > CacheInvalidateRelcacheAll() > > and InvalidatePublicationRels() is enough. > > I agree with you. > > > > > I attached a PoC which implements above. It could pass tests on my env. > > Could you > > please see it tell me how you think? > > I have tested the POC and it is working as expected. The changes look > fine to me. I have created a patch for the same. > Currently, we are passing 'PUBLICATION_PART_ALL' as an argument to > function 'GetPublicationRelations' and > 'GetAllSchemaPublicationRelations'. Need to check if we can use > 'PUBLICATION_PART_ROOT' or 'PUBLICATION_PART_LEAF' depending on the > 'publish_via_partition_root' option. Will test and address this in the > next version of the patch. For now, I have added a TODO. I have tested this part. I observed that ,whenever we insert data in a partition table, the function 'get_rel_sync_entry' is called and a hash entry is created for the corresponding leaf node relid. So I feel while invalidating here we can specify 'PUBLICATION_PART_LEAF' . I have made the corresponding changes 0002 patch. I have also modified the tests in 0001 patch. These changes are only related to syntax of writing tests. Thanks and Regards, Shlok Kyal v13-0002-Selective-Invalidation-of-Cache.patch Description: Binary data v13-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch Description: Binary data
Re: Enhance create subscription reference manual
On Thu, 03 Oct 2024 12:23:34 +0900 (JST) Tatsuo Ishii wrote: > > parameter in this case (it is an "optional" parameter, though). However, > > when we refer to the stored catalog value, we should call it an option or > > a property and calling it parameter is not suitable. > > Not sure. The stored catalog value of a subscription can be changed > ALTER SUBSCRIPTION. In the ALTER SUBSCRIPTION manual, the placeholders > for these properties are "parameter". So I think we should use > "parameter" in this case at least for the stored catalog values of > subscriptions. > > > If so, I feel that "the failover" in the following statement means > > the catalog value (or the failover feature itself), so we should not > > rewrite this to "the failover parameter". > > My conclusion is we should rewrite it as "the failover parameter" for > the reason above. > > >> To initiate replication, you must manually create the replication slot, > >> enable the failover if required, enable the subscription, and refresh the > >> subscription. > > > > Instead, should we use "failover option"? > > Yes. because "enable the failover" actually means an operation using > ALTER SUBSCRIPTION IMO. After reading the above, I think you would prefer "failover parameter" to "failover option". However, after all, I'm fine with either any way. If we use "the failover parameter", I would read "enable the failover parameter" as "enable the failover parameter on executing ALTER SUBSCRIPTION command". Otherwise in the "failover option" case, I would read "enable the failover option" as "enable the subscription's failover option by executing ALTER SUBSCRIPTION command". Regards, Yugo Nagata > > > Or, if it would mean to the failover > > feature rather than the parameter, is it not proper to add tag to > > this > > "failover"? > > I don't think so. > > Best reagards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp -- Yugo NAGATA
Re: Set query_id for query contained in utility statement
On Mon, Oct 7, 2024 at 1:39 PM Michael Paquier wrote: > > On Fri, Oct 04, 2024 at 08:16:00PM +0800, jian he wrote: > > about v5 0001 > > select_with_parens: > > '(' select_no_parens ')'{ $$ = $2; } > > | '(' select_with_parens ')'{ $$ = $2; } > > ; > > > > > > toplevel | calls | query > > --+---+--- > > t| 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t > > t| 0 | SELECT toplevel, calls, query FROM pg_stat_statements+ > > | | ORDER BY query COLLATE "C", toplevel > > t| 1 | explain (select $1) > > f| 1 | select $1); > > > > query "select $1);" looks weird. not sure how to improve it, > > or this should be the expected behavior? > > GOod point, this is confusing. The point is that having only > stmt_location is not enough to detect where the element in the query > you want to track is because it only points at its start location in > the full query string. In an ideal world, what we should have is its > start and end, pass it down to pgss_store(), and store only this > subquery between the start and end positions in the stats entry. > Making that right through the parser may be challenging, though. > turns out UPDATE/DELETE/MERGE and other utilities stmt cannot have arbitrary parenthesis with EXPLAIN. attached patches can solve this specific problem. (based on v5-0001-Track-location-to-extract-relevant-part-in-nested.patch) the main gotcha is to add location information for the statement that is being explained. typedef struct ExplainStmt { NodeTagtype; Node *query;/* the query (see comments above) */ List *options;/* list of DefElem nodes */ ParseLoclocation;/* location of the statement being explained */ } ExplainStmt; explain select 1; explain (select 1); explain (((select 1))); the above 3 explained select queries will be normalized to one select query. v5-0001-exposse_explained_stmt_location.no-cfbot Description: Binary data
Re: [BUG FIX]Connection fails with whitespace after keepalives parameter value
On 2024/10/06 18:35, Michael Paquier wrote: On Thu, Oct 03, 2024 at 08:12:28PM -0400, Tom Lane wrote: OK, if there's no objections let's push both remaining patches to HEAD only. Done as of f22e84df1dea and 430ce189fc45. Commit 430ce189fc45 unexpectedly caused psql to report the error "error: trailing data found" when a connection URI contains a whitespace, e.g., in a parameter value. For example, the following command used to work but no longer does after this commit: $ psql -d "postgresql://localhost:5432/postgres?application_name=a b" I'm not sure if this URI format is valid (according to RFC 3986), though. + for (const char *s = q; *s == ' '; s++) + { + q++; + continue; + } Is the "continue" really necessary? Also could we simplify it like this? for (; *q == ' '; q++); Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
[WIP]Vertical Clustered Index (columnar store extension) - take2
Hi All, Suggestions == When analyzing real-time data collected by PostgreSQL, it can be difficult to tune the current PostgreSQL server for satisfactory performance. Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory column store function that holds data in a state suitable for business analysis and is also expected to improve analysis performance. With VCI, you can also expect to run analysis 7.8 times faster. This is achieved by the analytics engine, which optimizes parallel processing of column-oriented data, in addition to the fact that VCI stores data in a columnar format, enabling efficient retrieval of the columns needed for analysis. Similar Features One column store feature available with postgres is Citus Columnar Table. If you introduces the citus extension, which allows columnar tables to be used using the columnar access method. This function is intended to analyze the accumulated data. Therefore, you cannot update or delete data. VCI supports data updates and deletions. This enables you to analyze not only the accumulated data but also the data that occurs in real time. Implementing VCI To introduce an updatable column store, we explain how updates to row-oriented data are propagated to column-oriented data. VCI has two storage areas. - Write Optimized Storage (WOS) - Read Optimized Storage (ROS) Describes WOS. The WOS stores data for all columns in the VCI in a row-oriented format. All newly added data is stored in the WOS relation along with the transaction information. Using WOS to delete and update newly added data has no significant performance impact compared to deleting from columnar storage. ROS is the storage area where all column data is stored. When inserting/updating/deleting, data is written synchronously to WOS. It does not compress or index the data. This avoids the overhead of converting to a columnar while updating the data. After a certain amount of data accumulates in the WOS, the ROS control daemon converts it to column data asynchronously with updates. Column data transformation compresses and indexes the data and writes it to ROS. Describes searching for data. Since there are two storage formats, the SELECT process needs to convert the WOS data to local ROS to determine whether it is visible or invisible. This conversion cost depends on the number of tuples present in the WOS file. This may introduce some performance overhead. Obtain search results by referencing the local ROS and referencing the ROS in parallel. These implementation ideas are also posted on Fujitsu's blog for your reference. [1] Past discussions === We've proposed features before. [2] This thread also explains the details, so please check it. In a previous thread, we suggested implementing a modification to the PostgreSQL backend code. Based on the FB we received at that time, we think we need to re-implement this feature in pluggable storage using the table access method API. I also got a FB of the features I needed from a PostgreSQLPro member. We believe it is necessary to deal with these issues in stages. - Need to provide vector processing for nodes (filter, grand aggregate, aggregation with group by...) to speed up computation - Requires parallel processing support such as scanning It is assumed that the re-implementation will also require additional functionality to the current Table Access Method API. It is useful not only for VCI but also for other access methods. Therefore, we decided to propose the VCI feature to the community and proceed with development. Request matter === Are members of the PostgreSQL hackers interested in VCI features? We welcome your comments and suggestions on this feature. In particular, if you have any questions, required features, or implementations, please let me know. [1] https://www.postgresql.fastware.com/blog/improve-data-analysis-performance-without-impacting-business-transactions-with-vertical-clustered-index [2]https://www.postgresql.org/message-id/cajrrpgfac7wc9nk6ptty6yn-nn+hcy8xolah2doyhvg5d6h...@mail.gmail.com Regards, Aya Iwata FUJITSU LIMITED
Re: On disable_cost
On Sat, Oct 5, 2024 at 3:35 PM Tom Lane wrote: > BTW, getting off the question of EXPLAIN output for a moment, > I don't understand why disable_cost is still a thing. The > one remaining usage seems trivial to replace, as attached. I believe I commented on that somewhere upthread, but maybe I meant to and didn't or maybe you didn't see it in the flurry of emails. Basically, I wasn't confident that it made sense to treat this as the same kind of thing as other cases where we increment disabled_nodes. Because I couldn't make up my mind what to do and didn't get clear feedback from anybody else, I did nothing. The thing is, if somebody says enable_mergejoin=false, presumably they REALLY, REALLY don't want a merge join. If we start using that same mechanism for other purposes -- like making sure that a hash join doesn't overrun work_mem -- then the user might get a merge join anyway because we've represented a hash join that is big, but not disabled, in the same way that we represent as merge join that is actually disabled. I'm pretty uncomfortable with that. Sure, the user probably doesn't want us to overrun work_mem either, but when push comes to shove, shouldn't a very explicit user instruction like "don't use a merge join, l don't want that!" take precedence over any sort of planner estimate? Estimates can be wrong, and the user is in charge. -- Robert Haas EDB: http://www.enterprisedb.com
Re: On disable_cost
On Fri, Oct 4, 2024 at 10:37 PM David Rowley wrote: > I'd encourage anyone else on the sidelines who has an opinion on how > to display the disabled-ness of a plan node in EXPLAIN to speak up > now, even if it's just a +1 to something someone has already written. > It would be nice to see what more people think. > > As a DBA when I set one or more of the enable_* settings to false, and explain a query, I need to know: 1, that the plan shown to me is constrained, 2, which constraints are in place, and 3, which constraints were violated. The Settings option to Explain fulfills my second need. It is not a precise solution nor is it automatic. Because of these two things it really doesn't qualify as fulfilling the first need. To fulfill the first need I would want to see a data block containing the following information: How many (>= 1) enable_* settings are set to false. This is the bare requirement, but we can also include a count of how many violations exist, thus aggregating the count of the third need. This information is not specific to any node and thus should be displayed outside of the execution tree, the specific choice consistent with the output format under consideration. The detail for the third need, violations, is tied to specific executor nodes. The information provided here should inform me as to which specific setting was violated as well as, if possible, why. This is effectively three pieces of information: "Disabled: * (footnote)" The word disabled is the indicator that this node type was requested to not be included in the query plan. The * tells me exactly which of the disabled settings is at play here, reducing the cognitive burden of memorizing node types to settings. The footnote would be a reference into the documentation under the enable_* setting that explains why this node is appearing in the query plan even though I explicitly asked for it to be excluded. In a verbose output (add a new violations option for this) it would even be possible to save the trip to the documentation by adding the footnote text to the explain output. Now, existing proposals include another piece of data - for every node calculate how many violations occur in its tree (inclusive). I'm not understanding the motivation for this data. Knowing which nodes are violations seems like it is enough. I could always count, and processing tools could add this aggregate to their displays, but the count itself only seems useful at the scope of the entire query plan. And possibly sub-plans. So, by way of example: set enable_sort=0; explain (costs off, settings, violations) select * from lp order by a; Append -> Index Only Scan using lp1_a_idx on lp1 lp_1 -> Sort Disabled: Sort (A) Sort Key: lp_2.a -> Seq Scan on lp2 lp_2 Disabled Planner Settings: 1 Disabled Node Violations: 1 Settings: ... enable_sort = false Violation Reasons: Sort (A): The query contains an order by clause over data coming from a table being sequentially scanned. This scan's output must be sorted to fulfill the order by requirement. I was considering doing a layout like: Sort (disabled_sort.A) (cost...) (actual...) but having its own line on those nodes having the violation seems reasonable. It should be noticeable when the violations occur and this does stand out. The less pronounced format would be more appropriate for the "Disabled: #" display that would appear on every single node; which I believe is counter-productive. Only marking the violation provides the same amount of detail and allows for the computation of those counts should the need arise. As a DBA, though, I do not know how to use that count in a meaningful way. In text format we place additional information at the bottom of the query result. It is worth considering whether to place information before the planner tree. If that is acceptable the two "Disabled Planner %:" counts should be moved to before the node tree. This draws immediate attention to the explain output consumer that this plan is constrained and that other options, like settings and violations, should be added to the explain command to show additional details. But the two counts and the node detail "Disabled: * (footnote)" will always be visible. The footnote definitely is its own feature added to increase usability. I'm expecting it to not be accepted given the current design of explain, and also it seems quite difficult to get good data out of the planner to make the display accurate. But if we tell someone that a setting they disable is violated they are going to wonder why. David J.
Re: POC, WIP: OR-clause support for indexes
On Fri, Oct 4, 2024 at 2:40 PM Robert Haas wrote: > > On Fri, Oct 4, 2024 at 2:20 PM Peter Geoghegan wrote: > > On Fri, Oct 4, 2024 at 2:00 PM Alexander Korotkov > > wrote: > > > Yes, transformAExprIn() does the work to coerce all the expressions in > > > the right part to the same type. Similar logic could be implemented > > > in match_orclause_to_indexcol(). What worries me is whether it's > > > quite late stage for this kind of work. transformAExprIn() works > > > during parse stage, when we need to to resolve types, operators etc. > > > And we do that once. > > > > I agree that it would be a bit awkward. Especially having spent so > > much time talking about doing this later on, not during parsing. That > > doesn't mean that it's necessarily the wrong thing to do, though. > > True, but we also can't realistically use select_common_type() here. I > mean, it thinks that we have a ParseState and that there might be > values with type UNKNOWNOID floating around. By the time we reach the > planner, neither thing is true. And honestly, it looks to me like > that's pointing to a deeper problem with your idea. OK. To be clear, I don't think that it's essential that we have equivalent behavior in those cases where the patch applies its transformations. I have no objections to committing the patch without any handling for that. It's an important patch, and I really want it to get into 18 in a form that everybody can live with. -- Peter Geoghegan
Re: On disable_cost
On Sat, Oct 5, 2024 at 1:37 AM David Rowley wrote: > Thanks for explaining your point of view. I've not shifted my opinion > any, so I guess we just disagree. I feel a strong enough dislike for > the current EXPLAIN output to feel it's worth working harder to have a > better output. > > I won't push my point any further unless someone else appears > supporting Laurenz and I. Thank you for working on getting rid of the > disabled_cost. I think what we have is now much better than before. > The EXPLAIN output is the only part I dislike about this work. > > I'd encourage anyone else on the sidelines who has an opinion on how > to display the disabled-ness of a plan node in EXPLAIN to speak up > now, even if it's just a +1 to something someone has already written. > It would be nice to see what more people think. I think you have adequate consensus to proceed with this. I'd just ask that you don't disappear completely if it turns out that there are problems. I accept that my commit created this problem and I'm certainly willing to be involved too if we need to sort out more things. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Changing the state of data checksums in a running cluster
Tomas Vondra writes: > 3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345, >shadowing earlier variable All the ListCell variables can be eliminated by using the foreach_ptr and foreach_oid macros instead of plain foreach. - ilmari
Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.
On Fri, Sep 27, 2024 at 02:10:47PM -0500, Nathan Bossart wrote: > Alright. I was able to back-patch it to v12 without too much trouble, > fortunately. I'll commit that soon unless anyone else has feedback. Committed, thanks! I refrained from introducing INT64_HEX_FORMAT/UINT64_HEX_FORMAT in c.h because I felt there was a nonzero chance of that causing problems with third-party code on the back-branches. We could probably add them for v18, though. -- nathan
Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.
On Mon, Oct 07, 2024 at 02:00:05PM -0500, Nathan Bossart wrote: > I refrained from introducing INT64_HEX_FORMAT/UINT64_HEX_FORMAT in c.h > because I felt there was a nonzero chance of that causing problems with > third-party code on the back-branches. We could probably add them for v18, > though. Like so... -- nathan >From 03f7536acd06b91e6891ccfc86470a5a51b45736 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 7 Oct 2024 14:16:07 -0500 Subject: [PATCH 1/1] add INT64_HEX_FORMAT and UINT64_HEX_FORMAT --- contrib/postgres_fdw/option.c| 2 +- src/backend/utils/error/csvlog.c | 2 +- src/backend/utils/error/elog.c | 4 ++-- src/backend/utils/error/jsonlog.c| 2 +- src/include/c.h | 2 ++ src/test/modules/test_radixtree/test_radixtree.c | 2 -- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c index d740893918..9a6785720d 100644 --- a/contrib/postgres_fdw/option.c +++ b/contrib/postgres_fdw/option.c @@ -522,7 +522,7 @@ process_pgfdw_appname(const char *appname) appendStringInfoString(&buf, application_name); break; case 'c': - appendStringInfo(&buf, "%" INT64_MODIFIER "x.%x", MyStartTime, MyProcPid); + appendStringInfo(&buf, INT64_HEX_FORMAT ".%x", MyStartTime, MyProcPid); break; case 'C': appendStringInfoString(&buf, cluster_name); diff --git a/src/backend/utils/error/csvlog.c b/src/backend/utils/error/csvlog.c index eab8df3fcc..acdffb6d06 100644 --- a/src/backend/utils/error/csvlog.c +++ b/src/backend/utils/error/csvlog.c @@ -120,7 +120,7 @@ write_csvlog(ErrorData *edata) appendStringInfoChar(&buf, ','); /* session id */ - appendStringInfo(&buf, "%" INT64_MODIFIER "x.%x", MyStartTime, MyProcPid); + appendStringInfo(&buf, INT64_HEX_FORMAT ".%x", MyStartTime, MyProcPid); appendStringInfoChar(&buf, ','); /* Line number */ diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 987ff98067..c71508c923 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -2944,12 +2944,12 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata) { charstrfbuf[128]; - snprintf(strfbuf, sizeof(strfbuf) - 1, "%" INT64_MODIFIER "x.%x", + snprintf(strfbuf, sizeof(strfbuf) - 1, INT64_HEX_FORMAT ".%x", MyStartTime, MyProcPid); appendStringInfo(buf, "%*s", padding, strfbuf); } else - appendStringInfo(buf, "%" INT64_MODIFIER "x.%x", MyStartTime, MyProcPid); + appendStringInfo(buf, INT64_HEX_FORMAT ".%x", MyStartTime, MyProcPid); break; case 'p': if (padding != 0) diff --git a/src/backend/utils/error/jsonlog.c b/src/backend/utils/error/jsonlog.c index 2c7b14cacb..492383a89e 100644 --- a/src/backend/utils/error/jsonlog.c +++ b/src/backend/utils/error/jsonlog.c @@ -168,7 +168,7 @@ write_jsonlog(ErrorData *edata) } /* Session id */ - appendJSONKeyValueFmt(&buf, "session_id", true, "%" INT64_MODIFIER "x.%x", + appendJSONKeyValueFmt(&buf, "session_id", true, INT64_HEX_FORMAT ".%x", MyStartTime, MyProcPid); /* Line number */ diff --git a/src/include/c.h b/src/include/c.h index 3297d89ff0..9520b89b00 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -550,6 +550,8 @@ typedef unsigned long long int uint64; /* snprintf format strings to use for 64-bit integers */ #define INT64_FORMAT "%" INT64_MODIFIER "d" #define UINT64_FORMAT "%" INT64_MODIFIER "u" +#define INT64_HEX_FORMAT "%" INT64_MODIFIER "x" +#define UINT64_HEX_FORMAT "%" INT64_MODIFIER "x" /* * 128-bit signed and unsigned integers diff --git a/src/test/modules/test_radixtree/test_radixtree.c b/src/test/modules/test_radixtree/test_radixtree.c index 1d9165a3a2..3e6863f660 100644 --- a/src/test/modules/test_radixtree/test_radixtree.c +++ b/src/test/modules/test_radixtree/test_radixtree.c @@ -23,8 +23,6 @@ /* uncomment to use shared memory for the tree */ /* #define TEST_SHARED_RT */ -#define UINT64_HEX_FORMAT "%" INT64_MODIFIER "X" - /* Convenient macros to test results */ #define EXPECT_TRUE(expr) \ do { \ -- 2.39.5 (Apple Git-154)
Re: pg_upgrade check for invalid databases
On Tue, Oct 1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote: > > On 1 Oct 2024, at 00:20, Tom Lane wrote: > > > > Daniel Gustafsson writes: > >>> On 30 Sep 2024, at 16:55, Tom Lane wrote: > >>> TBH I'm not finding anything very much wrong with the current > >>> behavior... this has to be a rare situation, do we need to add > >>> debatable behavior to make it easier? > > > >> One argument would be to make the checks consistent, pg_upgrade generally > >> tries > >> to report all the offending entries to help the user when fixing the source > >> database. Not sure if it's a strong enough argument for carrying code > >> which > >> really shouldn't see much use though. > > > > OK, but the consistency argument would be to just report and fail. > > I don't think there's a precedent in other pg_upgrade checks for > > trying to fix problems automatically. > > Correct, sorry for being unclear. The consistency argument would be to expand > pg_upgrade to report all invalid databases rather than just the first found; > attempting to fix problems would be a new behavior. Yes, historically pg_upgrade will fail if it finds anything unusual, mostly because what it does normally is already scary enough. If users what pg_upgrade to do cleanups, it would be enabled by a separate flag, or even a new command-line app. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: First draft of PG 17 release notes
On Mon, Sep 30, 2024 at 02:20:21PM +0900, Yugo NAGATA wrote: > On Sat, 28 Sep 2024 21:19:11 -0400 > Bruce Momjian wrote: > > > On Thu, Sep 26, 2024 at 02:19:21PM +0900, Yugo Nagata wrote: > > > On Thu, 9 May 2024 00:03:50 -0400 > > > Bruce Momjian wrote: > > > > > > > I have committed the first draft of the PG 17 release notes; you can > > > > see the results here: > > > > > > I propose to improve the following description in "Migration to Version > > > 17" > > > section by adding CREATE INDEX and CREATE MATERIALIZED VIEW into the > > > command list. > > > > > > > > > Change functions to use a safe > > > during maintenance operations (Jeff Davis) > > > § > > > > > > > > > It is suggested in the thread [1] that users could not notice the > > > behaviour > > > of CREATE INDEX is changed because the explicit command name is not > > > listed in > > > the release notes. So, I think it is better to add CREATE INDEX and > > > CREATE MATERIALIZED VIEW into the command list. > > > > > > I've attached a patch. > > > > It this a valid change? Seems so. > > Yes. This change on CREATE INDEX was introduced by 2af07e2f7 together with > other commands, but it was missed to be mentioned in the commit message > although the description was added to the documentation. > > The change on CEATE MATERIALIZED VIEW was introduced by a separate commit > b4da732fd, since which the REFRESH logic is used when creating a matview. > Should we add here a link to that commit, too? I developed the attached patch which adds the two commands and the commit item. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml index ad814737745..7acdee05e42 100644 --- a/doc/src/sgml/release-17.sgml +++ b/doc/src/sgml/release-17.sgml @@ -125,6 +125,8 @@ @@ -136,7 +138,9 @@ Author: Jeff Davis This prevents maintenance operations (ANALYZE, - CLUSTER, REFRESH + CLUSTER, CREATE + INDEX, CREATE + MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, REINDEX, or VACUUM) from performing unsafe access. Functions used by expression indexes and materialized views that
Re: [BUG FIX]Connection fails with whitespace after keepalives parameter value
On Tue, Oct 08, 2024 at 01:19:59AM +0900, Fujii Masao wrote: > Commit 430ce189fc45 unexpectedly caused psql to report the error > "error: trailing data found" when a connection URI contains > a whitespace, e.g., in a parameter value. For example, > the following command used to work but no longer does after this commit: > > $ psql -d "postgresql://localhost:5432/postgres?application_name=a b" > > I'm not sure if this URI format is valid (according to RFC 3986), > though. I may be missing something, of course, but I am under the impression that ' ' is invalid, meaning that you should use "a%20b" here to get what you want as %20 would would be translated to a space character. > Is the "continue" really necessary? Also could we simplify it like this? > > for (; *q == ' '; q++); Sure, it could be changed this way. -- Michael signature.asc Description: PGP signature
Re: Avoiding superfluous buffer locking during nbtree backwards scans
On Mon, Sep 02, 2024 at 01:31:55PM +0200, Matthias van de Meent wrote: > I noticed I attached an older version of the patch which still had 1 > assertion failure case remaining (thanks cfbot), so here's v3 which > solves that problem. Peter G., this is in your area of expertise. Could you look at what Matthias has proposed? -- Michael signature.asc Description: PGP signature
Re: Support specify tablespace for each merged/split partition
On Thu, Aug 08, 2024 at 09:47:20AM +0800, Junwang Zhao wrote: > Thanks for your review. The SPLIT/MERGE grammar has been reverted in 3890d90c1508, so this patch concept does not apply anymore. -- Michael signature.asc Description: PGP signature
Re: Commit fest 2024-09
On Mon, Oct 07, 2024 at 04:47:12PM +0900, Michael Paquier wrote: > The commit fest 2024-09 should have ended one week ago, so I have been > taking one step ahead and I have begun cleaning up things. As of now, > there is still a total of 113 patches marked as "Needs Review", and I > hope that my non-auto-vacuuming will be over tomorrow. And the CF is closed. Final score: Committed: 90. Moved to next CF: 202. Withdrawn: 14. Rejected: 6. Returned with Feedback: 54. Total: 366. > If you feel that one or more of your patches have been handled in an > incorrect way, feel free to reply back to this thread, to update your > patch entry to a better state, or to reply back to me using this > thread. The number of entries that had an incorrect state or were left unanswered was much higher than the last time I've done such cleanup. Moving CF entries marked as "Needs Review" in block is perhaps not the best thing, because this contributes to the bloat in the CF entries. Please double-check your CF entries. -- Michael signature.asc Description: PGP signature
Re: Function for listing pg_wal/summaries directory
On 2024/10/07 23:35, Nathan Bossart wrote: On Mon, Oct 07, 2024 at 10:07:10AM +0900, Michael Paquier wrote: On Fri, Oct 04, 2024 at 10:02:11AM -0500, Nathan Bossart wrote: Could you explain why you feel the existing support functions are insufficient? Because it is not possible to outsource the scan of pg_wal/summaries/ to a different role, no? I was under the impression that you could do this with pg_available_wal_summaries() [0]. [0] https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-WAL-SUMMARY One benefit of supporting something like pg_ls_summariesdir() is that it allows us to view the last modification time of each WAL summary file and estimate when they'll be removed based on wal_summary_keep_time. Of course, we could also extend the existing function to report the last modification time if this use case is valid, though. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Expand applicability of aggregate's sortop optimization
On Tue, Oct 01, 2024 at 04:25:59PM +0700, Andrei Lepikhov wrote: > I have written a sketch patch to implement the idea with aggregate > prosupport in code - see the attachment. > My intention was to provide an example, not a ready-to-commit patch. > As I see, the only problem there lies in the tests - CREATE AGGREGATE > doesn't allow us to set a prosupport routine, and we need to update the > pg_proc table manually. Please note that the CF bot is red. -- Michael signature.asc Description: PGP signature
pgindent fails with perl 5.40
I get this error when running pgindent with perl 5.40: Attempt to call undefined import method with arguments ("devnull") via package "File::Spec" (Perhaps you forgot to load the package?) at src/tools/pgindent/pgindent line 10. BEGIN failed--compilation aborted at src/tools/pgindent/pgindent line 10. It definitely worked with perl 5.38 before. Not sure if something's wrong on my side. Is anybody else already using 5.40? For the moment, I just changed the File::Spec import to make it work: diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent index 48d83bc434..028d057ea4 100755 --- a/src/tools/pgindent/pgindent +++ b/src/tools/pgindent/pgindent @@ -7,7 +7,7 @@ use warnings FATAL => 'all'; use Cwd qw(abs_path getcwd); use File::Find; -use File::Spec qw(devnull); +use File::Spec; use File::Temp; use IO::Handle; use Getopt::Long; -- Erik
Re: [PATCH] pg_permissions
On Thu, Jun 13, 2024 at 07:34:30AM +0200, Joel Jacobson wrote: > Hmm, strange, the commitfest system didn't pick up the email with patch 0006 > for some reason, > with message id 0c5a6b79-408c-4910-9b2e-4aa9a7b30...@app.fastmail.com > > It's rebased to latest HEAD, so not sure why. > > Maybe it got confused when I quickly afterwards sent a new email without a > patch? > > Here is a new attempt, file content unchanged, just named to 0007 and added > "pg_get_acl" to the name. The CF bot is red for some time now. Please rebase. -- Michael signature.asc Description: PGP signature
Re: pgindent fails with perl 5.40
I wrote: > I get this error when running pgindent with perl 5.40: > > Attempt to call undefined import method with arguments ("devnull") via > package "File::Spec" (Perhaps you forgot to load the package?) at > src/tools/pgindent/pgindent line 10. > BEGIN failed--compilation aborted at src/tools/pgindent/pgindent line 10. > > It definitely worked with perl 5.38 before. Not sure if something's > wrong on my side. Ah, it's intentional: https://metacpan.org/release/HAARG/perl-5.40.0/view/pod/perldelta.pod#Calling-the-import-method-of-an-unknown-package-produces-a-warning > Calling the import method of an unknown package produces a warning > [...] > It will also detect cases where a user passes an argument when using a > package that does not provide its own import > [...] Because we use fatal warnings, pgindent fails. -- Erik
Re: Improve EXPLAIN output for multicolumn B-Tree Index
On Tue, Jul 02, 2024 at 03:44:01AM +, masahiro.ik...@nttdata.com wrote: > Although I plan to support "Rows Removed by Non Key Filtered"(or "Skip Scan > Filtered"), > I'd like to know whether the current direction is good. One of my concerns is > there might > be a better way to exact quals for boundary conditions in btexplain(). The CF bot is red for some time now, please provide a rebase. -- Michael signature.asc Description: PGP signature
Re: Assorted style changes with a tiny improvement
On Tue, Jul 02, 2024 at 02:39:20PM -0300, Ranier Vilela wrote: > This is a series of patches to change styles, in assorted sources. > IMO, this improves a tiny bit and is worth trying. > > 1. Avoid dereference iss_SortSupport if it has nulls. > 2. Avoid dereference plan_node_id if no dsm area. > 3. Avoid dereference spill partitions if zero ntuples. > 4. Avoid possible useless palloc call with zero size. > 5. Avoid redundant variable initialization in foreign. > 6. Check the cheap test first in ExecMain. > 7. Check the cheap test first in pathkeys. > 8. Delay possible expensive bms_is_empty call in sub_select. > 9. Reduce some scope in execPartition. > 10. Reduce some scope for TupleDescAttr array dereference. > 11. Remove useless duplicate test in ruleutils. > This is already checked at line 4566. > > 12. Remove useless duplicate test in string_utils. > This is already checked at line 982. You have something here, but not everything is worth changing without a reason to do so, other than code "correctness". For example, bms_is_empty() is a NULL comparison, so it does not matter. I don't see a point in the three dereference patches, either, as these states are expected AFAIK so it does not matter. If we crash, it's actually an indication that something has gone wrong. Same comment about the two remove-useless patches and the two reduce-some-scope. The point about group_keys_reorder_by_pathkeys() is to be proved; I doubt it matters. Same for the ExecBuildSlotValueDescription() business to check for the acl_result before bms_is_member() does not really matter performance-wise. The allocation in execTuplesMatchPrepare() is indeed something that we'd better avoid, that's minimal but memory that can be avoided is always less error-prone. pg_options_to_table() also, is a bit better this way. Applied these two, let's move on. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add log_transaction setting
On Wed, Aug 14, 2024 at 09:08:00PM +0300, Сергей Соловьев wrote: > From b5e779771e8a7582951aff6f43a716e9e5975884 Mon Sep 17 00:00:00 2001 > From: "Sergey Solovev" > Date: Thu, 4 Jul 2024 17:02:13 +0300 > Subject: [PATCH] Add log_transaction configuration setting CF bot is red, please provide a rebase. And also, perhaps avoid HTML messages.. -- Michael signature.asc Description: PGP signature
Re: Partition-wise join with whole row vars
On Fri, Jul 12, 2024 at 02:39:13PM +0300, Alexander Pyhalov wrote: > I was looking at enabling partition-wise join with whole row vars. My main > motivation > was to enable push down DML with partition-wise join in postgres_fdw. The > work is > based on the earlier patches of Ashutosh Bapat [1]. The CF bot is red for some time now. Please provide a rebase. -- Michael signature.asc Description: PGP signature
Re: Parallel CREATE INDEX for GIN indexes
On Fri, Jul 12, 2024 at 05:34:25PM +0200, Tomas Vondra wrote: > I got to do the detailed benchmarking on the latest version of the patch > series, so here's the results. My goal was to better understand the > impact of each patch individually - especially the two parts introduced > by Matthias, but not only - so I ran the test on a build with each fo > the 0001-0009 patches. _gin_parallel_build_main() is introduced in 0001. Please make sure to pass down a query ID. -- Michael signature.asc Description: PGP signature
Re: WIP: parallel GiST index builds
On Tue, Jul 30, 2024 at 11:05:56AM +0200, Tomas Vondra wrote: > I tried implementing this, see the attached 0002 patch that replaces the > fake LSN with an atomic counter in shared memory. It seems to work (more > testing needed), but I can't say I'm very happy with the code :-( While scanning quickly the code, please be careful about the query ID that should be passed down to _gist_parallel_build_main().. -- Michael signature.asc Description: PGP signature
RE: long-standing data loss bug in initial sync of logical replication
Dear Shlok, > I have tested this part. I observed that ,whenever we insert data in a > partition table, the function 'get_rel_sync_entry' is called and a > hash entry is created for the corresponding leaf node relid. So I feel > while invalidating here we can specify 'PUBLICATION_PART_LEAF' . I > have made the corresponding changes 0002 patch. I also verified and it seems true. The root table is a virtual table and actual changes are recorded in leaf ones. It is same for WAL layer. Logical decoding obtains info from WAL records so leaf tables are passed to pgoutput layer as "relation". I.e., I think it is enough to invalidate relcache of leaf. > I have also modified the tests in 0001 patch. These changes are only > related to syntax of writing tests. LGTM. I found small improvements, please find the attached. Best regards, Hayato Kuroda FUJITSU LIMITED minor_fix.diffs Description: minor_fix.diffs
Re: Function for listing pg_wal/summaries directory
On Tue, Oct 08, 2024 at 12:41:16PM +0900, Fujii Masao wrote: > One benefit of supporting something like pg_ls_summariesdir() is that > it allows us to view the last modification time of each WAL summary file > and estimate when they'll be removed based on wal_summary_keep_time. > > Of course, we could also extend the existing function to report > the last modification time if this use case is valid, though. My argument is about knowing the size of each file, for monitoring of disk space. The retention can be controlled by a GUC based on time, and this function requires knowing about the file name format. -- Michael signature.asc Description: PGP signature
Re: Consider the number of columns in the sort cost model
On 8/23/24 01:46, Andrei Lepikhov wrote: Just a rebase onto current master to make cf-bot be happy. -- regards, Andrei Lepikhov From 73472897442516f0df4ed945cda28d125df08197 Mon Sep 17 00:00:00 2001 From: "Andrei V. Lepikhov" Date: Tue, 8 Oct 2024 11:20:46 +0700 Subject: [PATCH v2] Introduce the number of columns in the cost-sort model. The sort node calls the comparison operator for each pair of attributes for each couple of tuples. However, the current cost model uses a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging. This technique can lead to incorrect estimations when sorting on three, four, or more columns, quite common in practice. Moreover, further elaboration of the optimiser forms more strict requirements for the balance of sortings, as caused by IncrementalSort, MergeAppend, and MergeJoin. In this patch, the multiplier is a linear function of a number of columns. Member 1.0 needs to smooth the fact that dependence on the number of columns is weaker than linear. It is an extreme formula. The number of comparisons depends on the distinct values in each column. As a TODO, we can natively elaborate this model by the next step, involving distinct statistics to make estimations more precise. --- .../postgres_fdw/expected/postgres_fdw.out| 15 +++-- contrib/postgres_fdw/postgres_fdw.c | 2 +- src/backend/optimizer/path/costsize.c | 34 ++ src/backend/optimizer/plan/createplan.c | 2 +- src/backend/optimizer/plan/planner.c | 2 +- src/backend/optimizer/prep/prepunion.c| 2 +- src/backend/optimizer/util/pathnode.c | 8 +-- src/include/optimizer/cost.h | 2 +- src/test/regress/expected/aggregates.out | 13 ++-- src/test/regress/expected/join.out| 20 +++--- src/test/regress/expected/partition_join.out | 8 ++- src/test/regress/expected/union.out | 66 +-- 12 files changed, 95 insertions(+), 79 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index f2bcd6aa98..5da33ab69e 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J -- left outer join + nullable clause EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; - QUERY PLAN - - Foreign Scan +QUERY PLAN +--- + Sort Output: t1.a, fprt2.b, fprt2.c - Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) - Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10 WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST -(4 rows) + Sort Key: t1.a, fprt2.b, fprt2.c + -> Foreign Scan + Output: t1.a, fprt2.b, fprt2.c + Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) + Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10 WHERE ((r5.a < 10)) +(7 rows) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; a | b | c diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index adc62576d1..f9a2e88a17 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -3667,7 +3667,7 @@ adjust_foreign_grouping_path_cost(PlannerInfo *root, cost_sort(&sort_path, root, - pathkeys, + list_length(pathkeys), 0, *p_startup_cost + *p_run_cost, retrieved_rows, diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index e1523d15df..a4d5e94b65 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsiz
Re: Make COPY format extendable: Extract COPY TO format implementations
On Sat, Sep 28, 2024 at 8:56 AM Sutou Kouhei wrote: > > Hi, > > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Fri, 27 Sep 2024 16:33:13 -0700, > Masahiko Sawada wrote: > > >> * 0005 (that add "void *opaque" to Copy{From,To}StateData) > >> has a bit negative impact for FROM and a bit positive > >> impact for TO > >> * But I don't know why. This doesn't change per row > >> related codes. Increasing Copy{From,To}StateData size > >> ("void *opaque" is added) may be related. > > > > I was surprised that the 0005 patch made COPY FROM slower (with fewer > > rows) and COPY TO faster overall in spite of just adding one struct > > field and some functions. > > Me too... > > > I'm interested in why the performance trends of COPY FROM are > > different between fewer than 6M rows and more than 6M rows. > > My hypothesis: > > With this patch set: > 1. One row processing is faster than master. > 2. Non row related processing is slower than master. > > If we have many rows, 1. impact is greater than 2. impact. > > > > Separating the patches into two parts (one is for COPY TO and another > > one is for COPY FROM) could be a good idea. It would help reviews and > > investigate performance regression in COPY FROM cases. And I think we > > can commit them separately. > > > > Also, could you please rebase the patches as they conflict with the > > current HEAD? > > OK. I've prepared 2 patch sets: > > v20: It just rebased on master. It still mixes COPY TO and > COPY FROM implementations. > > v21: It's based on v20 but splits COPY TO implementations > and COPY FROM implementations. > 0001-0005 includes only COPY TO related changes. > 0006-0010 includes only COPY FROM related changes. > > (v21 0001 + 0006) == (v20 v0001), > (v21 0002 + 0007) == (v20 v0002) and so on. > > > I'll run some benchmarks on my environment as well. > Thank you for updating the patches! I've run the same benchmark script on my various machines (Mac, Linux (with Intel CPU and Ryzen CPU) and Raspberry Pi etc). I've not investigated the results in depth yet but let me share the results. Please find the attached file, extensible_copy_benchmark_20241007.pdf. In the benchmark, I've applied the v20 patch set and 'master' in the result refers to a19f83f87966. And I disabled CPU turbo boost where possible. Overall, v20 patch got a similar or better performance in both COPY FROM and COPY TO compared to master except for on MacOS. I'm not sure that changes made to master since the last benchmark run by Tomas and Suto-san might contribute to these results. I'll try to investigate the performance regression that happened on MacOS. I think that other performance differences in my results seem to be within noises and could be acceptable. Of course, it would be great if others also could try to run benchmark tests. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com extensible_copy_benchmark_20241007.pdf Description: Adobe PDF document
Re: On disable_cost
Robert Haas writes: > On Sat, Oct 5, 2024 at 3:35 PM Tom Lane wrote: >> BTW, getting off the question of EXPLAIN output for a moment, >> I don't understand why disable_cost is still a thing. The >> one remaining usage seems trivial to replace, as attached. > I believe I commented on that somewhere upthread, but maybe I meant to > and didn't or maybe you didn't see it in the flurry of emails. > Basically, I wasn't confident that it made sense to treat this as the > same kind of thing as other cases where we increment disabled_nodes. I don't buy your argument that this case is so special that it warrants preserving disable_cost. I certainly didn't think it was special when I added it. There may be another way to do this that doesn't rely on disabling the path in the same way as the user-accessible knobs do, but I don't really believe it's worth the trouble to think of one. And I definitely don't want to keep disable_cost around even for just one usage, because then we've not fixed the user-experience aspect of this (that is, "why does this plan have a ridiculously high cost?"), nor have we fixed all the concerns you had about higher-level planning decisions being skewed by that cost. One other point here is that if disable_cost remains exposed as a global variable (as it is in HEAD), there is no reason to expect that any extensions that are using it will get on board with the new approach. regards, tom lane
Re: pg_parse_json() should not leak token copies on failure
On Wed, Oct 2, 2024 at 10:45 AM Andrew Dunstan wrote: > Generally looks good. Should we have a check in > setJsonLexContextOwnsTokens() that we haven't started parsing yet, for > the incremental case? Good catch. Added in v4. > > At the moment, we have a test matrix consisting of "standard frontend" > > and "shlib frontend" tests for the incremental parser. I'm planning > > for the v4 patch to extend that with a "owned/not owned" dimension; > > any objections? > > > > Sounds reasonable. This is also done, along with runs of pgindent/perltidy. I've added a 0002 as well. While running tests under Valgrind, it complained about the uninitialized dummy_lex on the stack, which is now fixed. (That bug was introduced with my patch in 0785d1b8b and is technically orthogonal to 0001, but I figured I could track it here for now.) This is also how I found out that my existing fuzzers weren't checking for uninitialized memory, like I thought they were. Grumble... Thanks, --Jacob 1: 81b1d7cdb6 ! 1: 6a0d088cbe jsonapi: add lexer option to keep token ownership @@ Commit message of any tokens it wants to persist past the callback lifetime, but the lexer will handle necessary cleanup on failure. +A -o option has been added to test_json_parser_incremental to exercise +the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP +tests make use of it. (The test program now cleans up allocated memory, +so that tests can be usefully run under leak sanitizers.) + ## src/common/jsonapi.c ## +@@ src/common/jsonapi.c: struct JsonParserStack + */ + struct JsonIncrementalState + { ++ boolstarted; + boolis_last_chunk; + boolpartial_completed; + jsonapi_StrValType partial_token; @@ src/common/jsonapi.c: static JsonParseErrorType parse_array_element(JsonLexContext *lex, const JsonSem static JsonParseErrorType parse_array(JsonLexContext *lex, const JsonSemAction *sem); static JsonParseErrorType report_parse_error(JsonParseContext ctx, JsonLexContext *lex); @@ src/common/jsonapi.c: makeJsonLexContextIncremental(JsonLexContext *lex, int enc +void +setJsonLexContextOwnsTokens(JsonLexContext *lex, bool owned_by_context) +{ ++ if (lex->incremental && lex->inc_state->started) ++ { ++ /* ++ * Switching this flag after parsing has already started is a ++ * programming error. ++ */ ++ Assert(false); ++ return; ++ } ++ + if (owned_by_context) + lex->flags |= JSONLEX_CTX_OWNS_TOKENS; + else @@ src/common/jsonapi.c: inc_lex_level(JsonLexContext *lex) + if (lex->incremental) + { + /* -+ * Ensure freeJsonLexContext() remains safe even if no fname is assigned -+ * at this level. ++ * Ensure freeJsonLexContext() remains safe even if no fname is ++ * assigned at this level. + */ + lex->pstack->fnames[lex->lex_level] = NULL; + } @@ src/common/jsonapi.c: inc_lex_level(JsonLexContext *lex) static inline void dec_lex_level(JsonLexContext *lex) { -+ set_fname(lex, NULL); /* free the current level's fname, if needed */ ++ set_fname(lex, NULL); /* free the current level's fname, if needed */ lex->lex_level -= 1; } @@ src/common/jsonapi.c: freeJsonLexContext(JsonLexContext *lex) FREE(lex->pstack); } +@@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex, + lex->input = lex->token_terminator = lex->line_start = json; + lex->input_length = len; + lex->inc_state->is_last_chunk = is_last; ++ lex->inc_state->started = true; + + /* get the initial token */ + result = json_lex(lex); @@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex, if (sfunc != NULL) { @@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex, + + /* + * Either ownership of the token passed to the -+ * callback, or we need to free it now. Either way, -+ * clear our pointer to it so it doesn't get freed -+ * in the future. ++ * callback, or we need to free it now. Either ++ * way, clear our pointer to it so it doesn't get ++ * fr
Re: Doc: typo in config.sgml
On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote: > > On 30 Sep 2024, at 11:03, Tatsuo Ishii wrote: > > > I think there's an unnecessary underscore in config.sgml. > > > > I was wrong. The particular byte sequences just looked an underscore > > on my editor but the byte sequence is actually 0xc2a0, which must be a > > "non breaking space" encoded in UTF-8. I guess someone mistakenly > > insert a non breaking space while editing config.sgml. > > I wonder if it would be worth to add a check for this like we have to tabs? > The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp > (doing so made me realize we don't have an equivalent meson target). Can we check for any character outside the support range of SGML? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: pg_upgrade check for invalid databases
On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote: > On Tue, Oct 1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote: >> Correct, sorry for being unclear. The consistency argument would be to >> expand >> pg_upgrade to report all invalid databases rather than just the first found; >> attempting to fix problems would be a new behavior. > > Yes, historically pg_upgrade will fail if it finds anything unusual, > mostly because what it does normally is already scary enough. If users > what pg_upgrade to do cleanups, it would be enabled by a separate flag, > or even a new command-line app. While I suspect it's rare that someone CTRL-C's out of an accidental DROP DATABASE and then runs pg_upgrade before trying to recover the data, I agree with the principle of having pg_upgrade fail by default for things like this. If we did add a new flag, the new invalid database report that Daniel mentions could say something like "try again with --skip-invalid-databases to have pg_upgrade automatically drop invalid databases." -- nathan
Re: Refactoring postmaster's code to cleanup after child exit
On 05/10/2024 22:51, Dagfinn Ilmari Mannsåker wrote: Heikki Linnakangas writes: Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or at least on this perl distribution we're using in Cirrus CI): Socket::pack_sockaddr_un not implemented on this architecture at C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872. so I'll have to disable this test on Windows anyway. Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un() on Windows, so that can be fixed by bumping the Perl version in https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl to something more modern (such as 5.40.0.1), and only skipping the test if on Windows if Socket is too old. The decision to use 5.26 seems to come from the initial creation of the CI images in 2021 (when 5.34 was current), with the comment «newer versions don't currently work correctly for plperl». That claim is worth revisiting, and fixing if it's still the case. Yeah, it would be nice to update it. I wonder if commit 341f4e002d461a3c5513cb864490cddae2b43a64 fixed whatever the problem was. In the meanwhile, here is a one more version of the test patches, with a SKIP that checks that IO::Socket::UNIX works. -- Heikki Linnakangas Neon (https://neon.tech) From 9b499e8cdf09a127d7506837d5e2a697dd342820 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Tue, 8 Oct 2024 00:54:30 +0300 Subject: [PATCH v6 1/3] Add test for connection limits Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec...@iki.fi --- src/test/Makefile | 2 +- src/test/meson.build | 1 + src/test/postmaster/Makefile | 23 ++ src/test/postmaster/README| 27 +++ src/test/postmaster/meson.build | 12 +++ .../postmaster/t/001_connection_limits.pl | 79 +++ 6 files changed, 143 insertions(+), 1 deletion(-) create mode 100644 src/test/postmaster/Makefile create mode 100644 src/test/postmaster/README create mode 100644 src/test/postmaster/meson.build create mode 100644 src/test/postmaster/t/001_connection_limits.pl diff --git a/src/test/Makefile b/src/test/Makefile index dbd3192874..abdd6e5a98 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -12,7 +12,7 @@ subdir = src/test top_builddir = ../.. include $(top_builddir)/src/Makefile.global -SUBDIRS = perl regress isolation modules authentication recovery subscription +SUBDIRS = perl postmaster regress isolation modules authentication recovery subscription ifeq ($(with_icu),yes) SUBDIRS += icu diff --git a/src/test/meson.build b/src/test/meson.build index c3d0dfedf1..67376e4b7f 100644 --- a/src/test/meson.build +++ b/src/test/meson.build @@ -4,6 +4,7 @@ subdir('regress') subdir('isolation') subdir('authentication') +subdir('postmaster') subdir('recovery') subdir('subscription') subdir('modules') diff --git a/src/test/postmaster/Makefile b/src/test/postmaster/Makefile new file mode 100644 index 00..dfcce9c9ee --- /dev/null +++ b/src/test/postmaster/Makefile @@ -0,0 +1,23 @@ +#- +# +# Makefile for src/test/postmaster +# +# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group +# Portions Copyright (c) 1994, Regents of the University of California +# +# src/test/postmaster/Makefile +# +#- + +subdir = src/test/postmaster +top_builddir = ../../.. +include $(top_builddir)/src/Makefile.global + +check: + $(prove_check) + +installcheck: + $(prove_installcheck) + +clean distclean: + rm -rf tmp_check diff --git a/src/test/postmaster/README b/src/test/postmaster/README new file mode 100644 index 00..7e47bf5cff --- /dev/null +++ b/src/test/postmaster/README @@ -0,0 +1,27 @@ +src/test/postmaster/README + +Regression tests for postmaster +=== + +This directory contains a test suite for postmaster's handling of +connections, connection limits, and startup/shutdown sequence. + + +Running the tests += + +NOTE: You must have given the --enable-tap-tests argument to configure. + +Run +make check +or +make installcheck +You can use "make installcheck" if you previously did "make install". +In that case, the code in the installation tree is tested. With +"make check", a temporary installation tree is built from the current +sources and then tested. + +Either way, this test initializes, starts, and stops a test Postgres +cluster. + +See src/test/perl/README for more info about running these tests. diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build new file mode 100644 index 00..c2de2e0eb5 --- /dev/null +++ b/src/test/postmaster/meson.build @@ -0,0 +1,12 @@ +# Copyright (c) 2022-2024, PostgreSQL Global Development Group + +tests += { + 'n
Re: First draft of PG 17 release notes
On Sun, Sep 29, 2024 at 06:33:29PM +0530, Amit Kapila wrote: > > > It is better to write the above statement as: > > > "pg_upgrade now preserves replication slots on > > > publishers and full subscription's state on subscribers". This is > > > because replication slots are preserved on publishers. The subscribers > > > preserve the subscription state. > > > > So, as I understand it, this preservation only happens when the _old_ > > Postgres version is 17+. > > > > Yes. > > > Do we want to try and explain that in the > > Postgres 17 release notes? > > > > It would be good if we can capture that information without bloating > the release document. However, this information is already present in > pg_upgrade docs, so users have a way to know the same even if we can't > mention it in the release notes. I have developed the attached patch to mention it is "logical" slots, and to mention its future use. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml index eb65d1d725d..1e620d810d4 100644 --- a/doc/src/sgml/release-17.sgml +++ b/doc/src/sgml/release-17.sgml @@ -63,7 +63,10 @@ pg_upgrade now - preserves replication slots on both publishers and subscribers + preserves logical replication slots on publishers and full + subscription state on subscribers. This will allow upgrades + to future major versions to continue logical replication without + requiring copy to resynchronize.
Re: Add contrib/pg_logicalsnapinspect
Hi, here are some review comments for patch v11. == contrib/pg_logicalinspect/specs/logical_inspect.spec 1. nit - Add some missing spaces after commas (,) in the SQL. nit - Also update the expected results accordingly == doc/src/sgml/pglogicalinspect.sgml 2. + + + The pg_logicalinspect functions are called + using a text argument that can be extracted from the output name of the + pg_ls_logicalsnapdir() function. + + 2a. wording The wording "using a text argument that can be extracted" seems like a hangover from the previous implementation; it does not even say what that "text argument" means. Why not just say it is a snapshot filename, something like below? SUGGESTION: The pg_logicalinspect functions are called passing a snapshot filename to be inspected. For example, pass a name obtained from the pg_ls_logicalsnapdir() function. ~ 2b. formatting nit - In the previous implementation the extraction of the LSN was trickier, so this part was worthy of an SGML "NOTE". Now that it is just a filename, I don't know if it needs to be a special note anymore. ~~~ 3. +postgres=# SELECT meta.* FROM pg_ls_logicalsnapdir(), +pg_get_logical_snapshot_meta(name) AS meta; + +-[ RECORD 1 ] +magic| 1369563137 +checksum | 1028045905 +version | 6 3a. If you are going to wrap the SQL across multiple lines like this, then you should show the psql continuation prompt, so that the example looks the same as what the user would see. ~ 3b. FYI, the output of that can return multiple records, which is b.i) probably not what you intended to demonstrate b.ii) not the same as what the example says e.g., I got this: test_pub=# SELECT meta.* FROM pg_ls_logicalsnapdir(), test_pub-# pg_get_logical_snapshot_meta(name) AS meta; -[ RECORD 1 ] magic| 1369563137 checksum | 681884630 version | 6 -[ RECORD 2 ] magic| 1369563137 checksum | 2213048308 version | 6 -[ RECORD 3 ] magic| 1369563137 checksum | 3812680762 version | 6 -[ RECORD 4 ] magic| 1369563137 checksum | 3759893001 version | 6 ~~~ (Also those #3a, #3b comments apply to both examples) == src/backend/replication/logical/snapbuild.c 4. - SnapBuild builder; - - /* variable amount of TransactionIds follows */ -} SnapBuildOnDisk; - #define SnapBuildOnDiskConstantSize \ offsetof(SnapBuildOnDisk, builder) #define SnapBuildOnDiskNotChecksummedSize \ Is it better to try to keep those "Size" macros defined along with wherever the SnapBuildOnDisk is defined? Otherwise, if the structure is ever changed, adjusting the macros could be easily overlooked. ~~~ 5. ValidateAndRestoreSnapshotFile nit - See [1] #4 suggestion to declare 'sz' at scope where used. The previous reason not to change this (e.g. "mainly inspired from SnapBuildRestore") seems less relevant because now most lines of this function have already been modified for other reasons. ~~~ 6. SnapBuildRestore: + if (fd < 0 && errno == ENOENT) + return false; + else if (fd < 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not open file \"%s\": %m", path))); I think this code fragment looked like this before, and you only relocated it, but it still seems a bit awkward to write this way. Since so much else has changed, how about also improving this in passing, like below: if (fd < 0) { if (errno == ENOENT) return false; ereport(ERROR, (errcode_for_file_access(), errmsg("could not open file \"%s\": %m", path))); } == [1] https://www.postgresql.org/message-id/ZusgK0/B8F/1rqft%40ip-10-97-1-34.eu-west-3.compute.internal Kind Regards, Peter Smith. Fujitsu Australia diff --git a/contrib/pg_logicalinspect/expected/logical_inspect.out b/contrib/pg_logicalinspect/expected/logical_inspect.out index 219afd6..71dea4a 100644 --- a/contrib/pg_logicalinspect/expected/logical_inspect.out +++ b/contrib/pg_logicalinspect/expected/logical_inspect.out @@ -37,7 +37,7 @@ table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null COMMIT (3 rows) -step s1_get_logical_snapshot_info: SELECT info.state,info.catchange_count,array_length(info.catchange_xip,1) AS catchange_array_length,info.committed_count,array_length(info.committed_xip,1) AS committed_array_length FROM pg_ls_logicalsnapdir(), pg_get_logical_snapshot_info(name) AS info ORDER BY 2; +step s1_get_logical_snapshot_info: SELECT info.state, info.catchange_count, array_length(info.catchange_xip, 1) AS catchange_array_length, info.committed_count, array_length(info.committed_xip, 1) AS committed_array_length FROM pg_ls_logicalsnapdir(), pg_get_logical_snapshot_info(name) AS info ORDER BY 2; state |catchange_count|catchange_array_length|committed_count|committed_array_length --+---+--+---+-- consistent| 0| | 2| 2 diff --git a/contrib/pg_
Re: long-standing data loss bug in initial sync of logical replication
Hi Kuroda-san, > > I have also modified the tests in 0001 patch. These changes are only > > related to syntax of writing tests. > > LGTM. I found small improvements, please find the attached. I have applied the changes and updated the patch. Thanks & Regards, Shlok Kyal From 07f94de76be177d0e39762cb2bd36a4bc04a7993 Mon Sep 17 00:00:00 2001 From: Shlok Kyal Date: Fri, 23 Aug 2024 14:02:20 +0530 Subject: [PATCH v14 1/2] Distribute invalidatons if change in catalog tables Distribute invalidations to inprogress transactions if the current committed transaction change any catalog table. --- .../replication/logical/reorderbuffer.c | 5 +- src/backend/replication/logical/snapbuild.c | 34 ++- src/include/replication/reorderbuffer.h | 4 + src/test/subscription/t/100_bugs.pl | 267 ++ 4 files changed, 296 insertions(+), 14 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 22bcf171ff..c5dfc1ab06 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -221,9 +221,6 @@ int debug_logical_replication_streaming = DEBUG_LOGICAL_REP_STREAMING_BUFFERED */ static ReorderBufferTXN *ReorderBufferGetTXN(ReorderBuffer *rb); static void ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn); -static ReorderBufferTXN *ReorderBufferTXNByXid(ReorderBuffer *rb, - TransactionId xid, bool create, bool *is_new, - XLogRecPtr lsn, bool create_as_top); static void ReorderBufferTransferSnapToParent(ReorderBufferTXN *txn, ReorderBufferTXN *subtxn); @@ -622,7 +619,7 @@ ReorderBufferReturnRelids(ReorderBuffer *rb, Oid *relids) * (with the given LSN, and as top transaction if that's specified); * when this happens, is_new is set to true. */ -static ReorderBufferTXN * +ReorderBufferTXN * ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create, bool *is_new, XLogRecPtr lsn, bool create_as_top) { diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 0450f94ba8..1f7c24cad0 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -300,7 +300,7 @@ static void SnapBuildFreeSnapshot(Snapshot snap); static void SnapBuildSnapIncRefcount(Snapshot snap); -static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn); +static void SnapBuildDistributeSnapshotAndInval(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid); static inline bool SnapBuildXidHasCatalogChanges(SnapBuild *builder, TransactionId xid, uint32 xinfo); @@ -859,18 +859,21 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid, } /* - * Add a new Snapshot to all transactions we're decoding that currently are - * in-progress so they can see new catalog contents made by the transaction - * that just committed. This is necessary because those in-progress - * transactions will use the new catalog's contents from here on (at the very - * least everything they do needs to be compatible with newer catalog - * contents). + * Add a new Snapshot and invalidation messages to all transactions we're + * decoding that currently are in-progress so they can see new catalog contents + * made by the transaction that just committed. This is necessary because those + * in-progress transactions will use the new catalog's contents from here on + * (at the very least everything they do needs to be compatible with newer + * catalog contents). */ static void -SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn) +SnapBuildDistributeSnapshotAndInval(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid) { dlist_iter txn_i; ReorderBufferTXN *txn; + ReorderBufferTXN *curr_txn; + + curr_txn = ReorderBufferTXNByXid(builder->reorder, xid, false, NULL, InvalidXLogRecPtr, false); /* * Iterate through all toplevel transactions. This can include @@ -913,6 +916,14 @@ SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn) SnapBuildSnapIncRefcount(builder->snapshot); ReorderBufferAddSnapshot(builder->reorder, txn->xid, lsn, builder->snapshot); + + /* + * Add invalidation messages to the reorder buffer of inprogress + * transactions except the current committed transaction + */ + if (txn->xid != xid && curr_txn->ninvalidations > 0) + ReorderBufferAddInvalidations(builder->reorder, txn->xid, lsn, + curr_txn->ninvalidations, curr_txn->invalidations); } } @@ -1184,8 +1195,11 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid, /* refcount of the snapshot builder for the new snapshot */ SnapBuildSnapIncRefcount(builder->snapshot); - /* add a new catalog snapshot to all currently running transactions */ - SnapBuildDistributeNewCatalogSnapshot(builder, lsn); +
Re: Expand applicability of aggregate's sortop optimization
On 10/8/24 08:19, Michael Paquier wrote: On Tue, Oct 01, 2024 at 04:25:59PM +0700, Andrei Lepikhov wrote: I have written a sketch patch to implement the idea with aggregate prosupport in code - see the attachment. My intention was to provide an example, not a ready-to-commit patch. As I see, the only problem there lies in the tests - CREATE AGGREGATE doesn't allow us to set a prosupport routine, and we need to update the pg_proc table manually. Please note that the CF bot is red. Thanks, I suppose CATALOG_VERSION_NO was the only reason for this fail. Also, I see that the union test fails: left join lateral (select t1.tenthous from tenk2 t2 union all (values(1))) on true limit 1; -QUERY PLAN - Limit - -> Nested Loop Left Join - -> Seq Scan on tenk1 t1 - -> Append - -> Index Only Scan using tenk2_hundred on tenk2 t2 - -> Result -(6 rows) - +ERROR: deadlock detected +DETAIL: Process 414506 waits for AccessShareLock on relation 24696 of database 16387; blocked by process 414511. +Process 414511 waits for AccessExclusiveLock on relation 16421 of database 16387; blocked by process 414506. +HINT: See server log for query details. But I wonder if new prosupport routine caused that. Please, let me know if it is not a well-known issue. -- regards, Andrei Lepikhov From c3678a52d8cac9293a50ff5a4bab951155d45c5c Mon Sep 17 00:00:00 2001 From: "Andrei V. Lepikhov" Date: Tue, 1 Oct 2024 16:08:11 +0700 Subject: [PATCH v1] Introduce prosupport helpers for aggregates. For example, add minmax_support routine to simplify min/max aggregates. --- src/backend/optimizer/plan/planagg.c | 76 ++ src/backend/optimizer/prep/prepagg.c | 26 + src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_proc.dat | 93 src/include/nodes/supportnodes.h | 7 ++ src/test/regress/expected/aggregates.out | 128 ++- src/test/regress/sql/aggregates.sql | 55 ++ src/tools/pgindent/typedefs.list | 1 + 8 files changed, 340 insertions(+), 48 deletions(-) diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c index afb5445b77..598da61af8 100644 --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -33,6 +33,7 @@ #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/supportnodes.h" #include "optimizer/cost.h" #include "optimizer/optimizer.h" #include "optimizer/pathnode.h" @@ -43,6 +44,7 @@ #include "parser/parse_clause.h" #include "parser/parsetree.h" #include "rewrite/rewriteManip.h" +#include "utils/fmgrprotos.h" #include "utils/lsyscache.h" #include "utils/syscache.h" @@ -510,3 +512,77 @@ fetch_agg_sort_op(Oid aggfnoid) return aggsortop; } + +Datum +minmax_support(PG_FUNCTION_ARGS) +{ + Node *rawreq = (Node *) PG_GETARG_POINTER(0); + Oid aggsortop; + + if (IsA(rawreq, SupportRequestMinMax)) + { + SupportRequestMinMax *req = (SupportRequestMinMax *) rawreq; + Aggref *aggref = req->aggref; + + if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL) + PG_RETURN_POINTER(NULL); + + aggsortop = fetch_agg_sort_op(aggref->aggfnoid); + if (!OidIsValid(aggsortop)) + PG_RETURN_POINTER(NULL); /* not a MIN/MAX aggregate */ + + if (aggref->aggorder != NIL) + { + SortGroupClause *orderClause; + TargetEntry *curTarget; + + curTarget = (TargetEntry *) linitial(aggref->args); + + /* + * If the order clause is the same column as the one we're + * aggregating, we can still use the index: It is undefined which + * value is MIN() or MAX(), as well as which value is first or + * last when sorted. So, we can still use the index IFF the + * aggregated expression equals the expression used in the + * ordering operation. + */ + + /* + * We only accept a single argument to min/max aggregates, + * orderings that have more clauses won't provide correct results. + */ + Assert(list_length(aggref->aggorder) == 1); + + orderClause = castNode(SortGroupClause, linitial(aggref->aggorder)); + + if (orderClause->tleSortGroupRef != curTarget->ressortgroupref) +elog(ERROR, "Aggregate order clause isn't found in target list"); + + if (orderClause->sortop != aggsortop) + { +List *btclasses; +ListCell *lc; + +btclasses = get_op_btree_interpretation(orderClause->sortop); + +foreach(lc, btclasses) +{ + OpBtreeInterpretation *interpretation; + + interpretation = (OpBtreeInterpretation *) lfirst(lc); + if (op_in_opfamily(aggsortop, interpretation->opfamily_id)) + { + aggref->aggorder = NIL; + break; + } +} + +list_free_deep(btclasses); + } + else +aggref->aggorder = NIL; + } + } + + PG_RETURN_POINTER(NULL);
Re: MergeJoin beats HashJoin in the case of multiple hash clauses
On 7/8/24 19:45, Andrei Lepikhov wrote: On 3/11/2023 23:43, Tomas Vondra wrote: On 9/11/23 10:04, Lepikhov Andrei wrote: * Determine bucketsize fraction and MCV frequency for the inner * relation. We use the smallest bucketsize or MCV frequency estimated * for any individual hashclause; this is undoubtedly conservative. I'm sure this may lead to inflated cost for "good" cases (where the actual bucket size really is a product), which may push the optimizer to use the less efficient/slower join method. Yes, It was contradictory idea, though. IMHO the only principled way forward is to get a better ndistinct estimate (which this implicitly does), perhaps by using extended statistics. I haven't tried, but I guess it'd need to extract the clauses for the inner side, and call estimate_num_groups() on it. And I've done it. Sorry for so long response. This patch employs of extended statistics for estimation of the HashJoin bucket_size. In addition, I describe the idea in more convenient form here [1]. Obviously, it needs the only ndistinct to make a prediction that allows to reduce computational cost of this statistic. Minor change to make cfbot happier. -- regards, Andrei Lepikhov From 73e4d16812431b91fb760d994d0198baab2fe034 Mon Sep 17 00:00:00 2001 From: "Andrei V. Lepikhov" Date: Mon, 13 May 2024 16:15:02 +0700 Subject: [PATCH v1] Use extended statistics for precise estimation of bucket size in hash join. Recognizing the real-life complexity where columns in the table often have functional dependencies, PostgreSQL's estimation of the number of distinct values over a set of columns can be underestimated (or much rarely, overestimated) when dealing with multi-clause JOIN. In the case of hash join, it can end up with a small number of predicted hash buckets and, as a result, picking non-optimal merge join. To improve the situation, we introduce one additional stage of bucket size estimation - having two or more join clauses estimator lookup for extended statistics and use it for multicolumn estimation. Clauses are grouped into lists, each containing expressions referencing the same relation. The result of the multicolumn estimation made over such a list is combined with others according to the caller's logic. Clauses that are not estimated are returned to the caller for further estimation. --- src/backend/optimizer/path/costsize.c | 12 +- src/backend/utils/adt/selfuncs.c | 172 ++ src/include/utils/selfuncs.h | 4 + 3 files changed, 187 insertions(+), 1 deletion(-) diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index e1523d15df..af1cb8ba78 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -4294,9 +4294,19 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, } else { + List *otherclauses; + innerbucketsize = 1.0; innermcvfreq = 1.0; - foreach(hcl, hashclauses) + + /* At first, try to estimate bucket size using extended statistics. */ + otherclauses = estimate_multivariate_bucketsize(root, + inner_path->parent, + hashclauses, + &innerbucketsize); + + /* Pass through the remaining clauses */ + foreach(hcl, otherclauses) { RestrictInfo *restrictinfo = lfirst_node(RestrictInfo, hcl); Selectivity thisbucketsize; diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 03d7fb5f48..7a57f444c8 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -3752,6 +3752,178 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows, return numdistinct; } +/* + * Try to estimate the bucket size of the hash join inner side when the join + * condition contains two or more clauses by employing extended statistics. + * + * The main idea of this approach is that the distinct value generated by + * multivariate estimation on two or more columns would provide less bucket size + * than estimation on one separate column. + * + * IMPORTANT: It is crucial to synchronise the approach of combining different + * estimations with the caller's method. + * Return a list of clauses that didn't fetch any extended statistics. + */ +List * +estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner, + List *hashclauses, + Selectivity *innerbucketsize) +{ + List *clauses = list_copy(hashclauses); + List *otherclauses = NIL; + double ndistinct = 1.0; + + if (list_length(hashclauses) <= 1) + /* + * Nothing to do for a single clause. Could we employ univariate + * extended stat here? + */ + return hashclauses; + + while (clauses != NIL) + { + ListCell *lc; + int relid = -1; + List *varinfos = NIL; + List *saveList = NIL; + double mvndistinct; + List *origin_varinfos; + int group_relid = -1; + RelOptInfo *group_rel = NULL; + ListCell *lc1, *lc2; + + /* + * Find clauses, referenci
Re: Incremental Sort Cost Estimation Instability
On 9/23/24 20:02, Andrei Lepikhov wrote: On 12/9/2024 12:12, David Rowley wrote: On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov wrote: Initial problem causes wrong cost_sort estimation. Right now I think about providing cost_sort() the sort clauses instead of (or in addition to) the pathkeys. I'm not quite sure why the sort clauses matter any more than the EquivalenceClass. If the EquivalanceClass defines that all members will have the same value for any given row, then, if we had to choose any single member to drive the n_distinct estimate from, isn't the most accurate distinct estimate from the member with the smallest n_distinct estimate? (That assumes the less distinct member has every value the more distinct member has, which might not be true) Finally, I implemented this approach in code (see attachment). The effectiveness may be debatable, but general approach looks even better than previous one. Change status to 'Need review'. Minor change to make compiler and cfbot happy. -- regards, Andrei Lepikhov From bdaf7cd46eb10353b6cbaf5f22d1ea835881db24 Mon Sep 17 00:00:00 2001 From: "Andrei V. Lepikhov" Date: Mon, 23 Sep 2024 14:30:28 +0200 Subject: [PATCH v3] Stabilize incremental sort estimation. Carefully identify an expression that can represent the path key on sort estimation. Columns may have different distinct estimations that can trigger wrong cost estimations and choice of sort strategy. Sorting has only pathkeys as input for the cost estimation. This patch, instead of a blind choice of the first equivalence class member, attempts to find columns that are used under the estimating sort operator and chooses the most negligible ndistinct value. --- src/backend/optimizer/path/costsize.c | 52 --- src/backend/optimizer/util/pathnode.c | 1 + src/include/optimizer/cost.h | 1 + 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index e1523d15df..01f166f504 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1984,6 +1984,50 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost, *run_cost = cpu_operator_cost * tuples; } +static Expr * +identify_sort_expression(PlannerInfo *root, PathKey *key, Relids relids) +{ + EquivalenceMember *candidate = linitial(key->pk_eclass->ec_members); + double ndistinct = -1.0; /* Not defined */ + + if (root == NULL || list_length(key->pk_eclass->ec_members) == 1) + /* Fast path */ + return candidate->em_expr; + + foreach_node(EquivalenceMember, em, key->pk_eclass->ec_members) + { + VariableStatData vardata; + boolisdefault = true; + doublendist = -1.0; + + /* Sorting over single partition? */ + if (em->em_is_child && bms_num_members(relids) > 1) + continue; + + if (!bms_is_subset(em->em_relids, relids)) + /* + * Avoid expressions not based on a table column. At least, the + * candidate value already initialised as the first EC member. + */ + continue; + + /* Let's check candidate's ndistinct value */ + examine_variable(root, (Node *) em->em_expr, 0, &vardata); + if (HeapTupleIsValid(vardata.statsTuple)) + ndist = get_variable_numdistinct(&vardata, &isdefault); + ReleaseVariableStats(vardata); + + if (ndistinct < 0.0 || (!isdefault && ndist < ndistinct)) + { + candidate = em; + ndistinct = ndist; + } + } + + Assert(candidate != NULL); + return candidate->em_expr; +} + /* * cost_incremental_sort * Determines and returns the cost of sorting a relation incrementally, when @@ -1999,6 +2043,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost, void cost_incremental_sort(Path *path, PlannerInfo *root, List *pathkeys, int presorted_keys, + Relids relids, int input_disabled_nodes, Cost input_startup_cost, Cost input_total_cost, double input_tuples, int width, Cost comparison_cost, int sort_mem, @@ -2053,21 +2098,20 @@ cost_incremental_sort(Path *path, foreach(l, pathkeys) { PathKey*key = (PathKey *) lfirst(l); - EquivalenceMember *member = (EquivalenceMember *) - linitial(key->pk_eclass->ec_members); + Expr *expr = identify_sort_expression(root, key, relids); /* * Check if the expression contains Var with "varno 0" so that we * don't call estimate_num_groups in that case. */ - if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr))) + if (bms_is_member(0, pull_varnos(root, (Node *) expr))) { unknown_varno = true; break; } /* expression not containing any Vars with "varno 0" */ - presortedExprs = lappend(presortedExprs, member->em_expr); + presortedExprs = lappend(presortedExprs, expr); if (foreach_current_index(l) + 1 >= presorted_keys) break; diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index fc97bf6ee2..8ed566b1bc 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/
Re: per backend I/O statistics
On Mon, Oct 07, 2024 at 09:54:21AM +, Bertrand Drouvot wrote: > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote: >> This would rely on the fact that we would use the ProcNumber for the >> dshash key, and this information is not provided in pg_stat_activity. >> Perhaps we should add this information in pg_stat_activity so as it >> would be easily possible to do joins with a SQL function that returns >> a SRF with all the stats associated with a given connection slot >> (auxiliary or backend process)? > > I'm not sure that's needed. What has been done in the previous versions is > to get the stats based on the pid (see pg_stat_get_backend_io()) where the > procnumber is retrieved with something like > GetNumberFromPGProc(BackendPidGetProc(pid)). Ah, I see. So you could just have the proc number in the key to control the upper-bound on the number of possible stats entries in the dshash. Assuming that none of this data is persisted to the stats file at shutdown and that the stats of a single entry are reset each time a new backend reuses a previous proc slot, that would be OK by me, I guess. >> The active PIDs of the live sessions are not stored in the active >> stats, why not? > > That was not needed. We can still retrieve the stats based on the pid thanks > to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having > to actually store the pid in the stats. I think that's fine because the pid > only matters at "display" time (pg_stat_get_backend_io()). Okay, per the above and the persistency of the stats. -- Michael signature.asc Description: PGP signature
Re: First draft of PG 17 release notes
On Mon, 2024-10-07 at 20:11 -0400, Bruce Momjian wrote: > On Tue, Oct 1, 2024 at 04:36:09PM +0200, Laurenz Albe wrote: > > I think that the removal of the "adminpack" extension should > > be listed in the section "migration to v17" as an incompatibility. > > > > I have seen one complaint that pg_upgrade fails if the extension > > is installed, but a dump/restore would also throw an error. > > Agreed. moved. Thank you! Yours, Laurenz Albe
Re: POC, WIP: OR-clause support for indexes
On Mon, Oct 7, 2024 at 10:06 PM jian he wrote: > > assume v40 is the latest version. make_bitmap_paths_for_or_group { /* * First, try to match the whole group to the one index. */ orargs = list_make1(ri); indlist = build_paths_for_OR(root, rel, orargs, other_clauses); if (indlist != NIL) { bitmapqual = choose_bitmap_and(root, rel, indlist); jointcost = bitmapqual->total_cost; jointlist = list_make1(bitmapqual); } /* * Also try to match all containing clauses 'one-by-one. */ foreach(lc, args) { orargs = list_make1(lfirst(lc)); indlist = build_paths_for_OR(root, rel, orargs, other_clauses); if (indlist == NIL) { splitlist = NIL; break; } bitmapqual = choose_bitmap_and(root, rel, indlist); } if other_clauses is not NIL, then "try to match all containing clauses 'one-by-one" the foreach loop "foreach(lc, args)" will apply other_clauses in build_paths_for_OR every time. then splitcost will obviously be higher than jointcost. if other_clauses is NIL. "foreach(lc, args)" will have list_length(args) startup cost. So overall, it looks like jointcost will alway less than splitcost, the only corner case would be both are zero. anyway, in make_bitmap_paths_for_or_group, above line "Pick the best option." I added: if (splitcost <= jointcost && splitcost != 0 && jointcost != 0) elog(INFO, "%s:%d splitcost <= jointcost and both is not zero", __FILE_NAME__, __LINE__); and the regress tests passed. That means we don't need to iterate "((BoolExpr *) ri->orclause)->args" in make_bitmap_paths_for_or_group ?
Re: Expand applicability of aggregate's sortop optimization
On Tue, 8 Oct 2024 at 18:47, Andrei Lepikhov wrote: > Thanks, I suppose CATALOG_VERSION_NO was the only reason for this fail. Please leave the cat version bump out of your patch. It's a waste of time and resource if you plan to post another patch every time a committer bumps the cat version. David
Re: Pgoutput not capturing the generated columns
On Fri, Oct 4, 2024 at 9:43 AM Peter Smith wrote: > > Hi Shubham, I don't have any new comments for the patch v36-0002. > > But, according to my records, there are multiple old comments not yet > addressed for this patch. I am giving reminders for those below so > they don't get accidentally overlooked. Please re-confirm and at the > next posted version please respond individually to each of these to > say if they are addressed or not. > > == > > 1. General > From review v31 [1] comment #1. Patches 0001 and 0002 should be merged. > > == > src/backend/replication/logical/tablesync.c > > make_copy_attnamelist: > > 2. > From review v31 [1] comment #4. Make the detailed useful error message > common if possible. > > ~~~ This comment is still open. Will fix this in next versions of patches. > > fetch_remote_table_info: > > 3. > From review v31 [1] comment #5. I was not sure if this logic is > sophisticated enough to handle the case when the same table has > gencols but there are multiple subscribed publications and the > 'publish_generated_columns' parameter differs. Is this scenario > tested? > > ~ > > 4. > + * Get column lists for each relation, and check if any of the > + * publications have the 'publish_generated_columns' parameter enabled. > > From review v32 [2] comment #1. This needs some careful testing. I was > not sure if sufficient to just check the 'publish_generated_columns' > flag. Now that "column lists take precedence" it is quite possible for > all publications to say 'publish_generated_columns=false', but the > publication can still publish gencols *anyway* if they are specified > in a column list. > > == > [1] review v31 18/9 - > https://www.postgresql.org/message-id/flat/CAHv8Rj%2BKOoh58Uf5k2MN-%3DA3VdV60kCVKCh5ftqYxgkdxFSkqg%40mail.gmail.com#f2f3b48080f96ea45e1410f5b1cd9735 > [2] review v32 24/9 - > https://www.postgresql.org/message-id/CAHut%2BPu7EcK_JTgWS7GzeStHk6Asb1dmEzCJU2TJf%2BW1Zy30LQ%40mail.gmail.com > I have fixed the comments and posted the v37 patches for them. Please refer to the updated v37 Patches here in [1]. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8Rj%2BRnw%2B_SfSyyrvWL49AfJzx4O8YVvdU9gB%2BSQdt3%3DqF%2BA%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Partition-wise join with whole row vars
Michael Paquier писал(а) 2024-10-08 05:01: On Fri, Jul 12, 2024 at 02:39:13PM +0300, Alexander Pyhalov wrote: I was looking at enabling partition-wise join with whole row vars. My main motivation was to enable push down DML with partition-wise join in postgres_fdw. The work is based on the earlier patches of Ashutosh Bapat [1]. The CF bot is red for some time now. Please provide a rebase. Hi. Attaching rebased patches. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom cd7c56dab70c48b7f2a104041ee6bb734050f4bb Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Wed, 27 Dec 2023 17:31:23 +0300 Subject: [PATCH 6/6] postgres_fdw: fix partition-wise DML --- contrib/postgres_fdw/deparse.c| 12 +- .../postgres_fdw/expected/postgres_fdw.out| 160 ++ contrib/postgres_fdw/postgres_fdw.c | 98 ++- contrib/postgres_fdw/sql/postgres_fdw.sql | 38 + src/backend/optimizer/util/appendinfo.c | 32 +++- src/include/optimizer/appendinfo.h| 1 + 6 files changed, 323 insertions(+), 18 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index f879ff3100f..d4c657fa4d5 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2309,7 +2309,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root, appendStringInfoString(buf, "UPDATE "); deparseRelation(buf, rel); - if (foreignrel->reloptkind == RELOPT_JOINREL) + if (IS_JOIN_REL(foreignrel)) appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex); appendStringInfoString(buf, " SET "); @@ -2336,7 +2336,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root, reset_transmission_modes(nestlevel); - if (foreignrel->reloptkind == RELOPT_JOINREL) + if (IS_JOIN_REL(foreignrel)) { List *ignore_conds = NIL; @@ -2352,7 +2352,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root, if (additional_conds != NIL) list_free_deep(additional_conds); - if (foreignrel->reloptkind == RELOPT_JOINREL) + if (IS_JOIN_REL(foreignrel)) deparseExplicitTargetList(returningList, true, retrieved_attrs, &context); else @@ -2417,10 +2417,10 @@ deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root, appendStringInfoString(buf, "DELETE FROM "); deparseRelation(buf, rel); - if (foreignrel->reloptkind == RELOPT_JOINREL) + if (IS_JOIN_REL(foreignrel)) appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex); - if (foreignrel->reloptkind == RELOPT_JOINREL) + if (IS_JOIN_REL(foreignrel)) { List *ignore_conds = NIL; @@ -2435,7 +2435,7 @@ deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root, if (additional_conds != NIL) list_free_deep(additional_conds); - if (foreignrel->reloptkind == RELOPT_JOINREL) + if (IS_JOIN_REL(foreignrel)) deparseExplicitTargetList(returningList, true, retrieved_attrs, &context); else diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 46f005307e1..f1af06b7a6b 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -10184,6 +10184,166 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a (4 rows) reset enable_sort; +-- test partition-wise DML +EXPLAIN (COSTS OFF, VERBOSE) +UPDATE fprt1 SET b=fprt1.b+1 FROM fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 25 = 0; + QUERY PLAN +-- + Update on public.fprt1 + Foreign Update on public.ftprt1_p1 fprt1_1 + Foreign Update on public.ftprt1_p2 fprt1_2 + -> Append + -> Foreign Update + Remote SQL: UPDATE public.fprt1_p1 r3 SET b = (r3.b + 1) FROM public.fprt2_p1 r5 WHERE ((r3.a = r5.b)) AND (((r5.a % 25) = 0)) + -> Foreign Update + Remote SQL: UPDATE public.fprt1_p2 r4 SET b = (r4.b + 1) FROM public.fprt2_p2 r6 WHERE ((r4.a = r6.b)) AND (((r6.a % 25) = 0)) +(8 rows) + +UPDATE fprt1 SET b=fprt1.b+1 FROM fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 25 = 0; +EXPLAIN (COSTS OFF, VERBOSE) +UPDATE fprt1 SET b=(fprt1.a+fprt2.b)/2 FROM fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 25 = 0; + QUERY PLAN +--- + Update on public.fprt1 + Foreign Update on public.ftprt1_p1 fprt1_1 + Foreign Update on public.ftprt1_p2 fprt1_2 + -> Append + -> Foreign Update + Remote SQL: UPDATE public.fprt1_p1 r3 SET b
Re: First draft of PG 17 release notes
On Tue, Oct 8, 2024 at 6:25 AM Bruce Momjian wrote: > > On Sun, Sep 29, 2024 at 06:33:29PM +0530, Amit Kapila wrote: > > > > It is better to write the above statement as: > > > > "pg_upgrade now preserves replication slots on > > > > publishers and full subscription's state on subscribers". This is > > > > because replication slots are preserved on publishers. The subscribers > > > > preserve the subscription state. > > > > > > So, as I understand it, this preservation only happens when the _old_ > > > Postgres version is 17+. > > > > > > > Yes. > > > > > Do we want to try and explain that in the > > > Postgres 17 release notes? > > > > > > > It would be good if we can capture that information without bloating > > the release document. However, this information is already present in > > pg_upgrade docs, so users have a way to know the same even if we can't > > mention it in the release notes. > > I have developed the attached patch to mention it is "logical" slots, > and to mention its future use. > LGTM. Thanks! -- With Regards, Amit Kapila.
Re: Proposal: allow database-specific role memberships
Denis Laxalde a écrit : Michael Paquier a écrit : On Wed, Sep 07, 2022 at 12:50:32PM +0500, Ibrar Ahmed wrote: The patch requires a rebase, please do that. Hunk #5 succeeded at 454 (offset 28 lines). 1 out of 5 hunks FAILED -- saving rejects to file doc/src/sgml/ref/grant.sgml.rej There has been no updates on this thread for one month, so this has been switched as RwF. I took the liberty to rebase this (old) patch, originally authored by Kenaniah Cerny. As the original commitfest entry, https://commitfest.postgresql.org/36/3374/, was "stalled", I created a new one at https://commitfest.postgresql.org/50/5284/; hoping this is okay. This is about adding a "IN DATABASE " clause to GRANT and REVOKE commands allowing to control role membership in a database scope, rather that cluster-wise. This could be interesting in combination with predefined roles, e.g.: GRANT pg_read_all_data TO bob IN DATABASE app; GRANT pg_maintain TO dba IN DATABASE metrics; without having to grant too many privileges when a user is supposed to only operate on some databases. The logic of the original patch (as of its version 11) is preserved. One noticeable change concerns tests: they got moved in src/test/regress (there were in 'unsafe_tests'), with proper cleanup, and now avoid using superuser as well as modifying templates. Is this a feature that's still interesting? (Feedbacks, from 2022, in the thread were a bit mixed.) Personally, I have a few concerns regarding the feature and its implementation: - The IN DATABASE clause does not make much sense for some roles, like pg_read_all_stats (the implementation does not guard against this). - An 'IN SCHEMA' clause might be a natural supplementary feature. However, the current implementation relying on a new 'dbid' column added in pg_auth_members catalog might not fit well in that case.