Useless LEFT JOIN breaks MIN/MAX optimization
Hi hackers! My colleague gave me an interesting case related to min max optimization. Adding a useless left join to the select min from t query breaks the min/max read optimization from the index. What is meant is shown in the example below: drop table if exists t1; drop table if exists t2; create table t1 (id int not null, mod text); insert into t1 select id, (id % 10)::text from generate_series(1,10) id; create unique index on t1(id); create index on t1(mod); This is the best plan for this query, since we only need one minimum value for this index. And it works perfectly: explain select min(mod) from t1; explain select min(mod) from t1; QUERY PLAN Result (cost=0.33..0.34 rows=1 width=32) InitPlan 1 (returns $0) -> Limit (cost=0.29..0.33 rows=1 width=32) -> Index Only Scan using t1_mod_idx on t1 (cost=0.29..3861.54 rows=99500 width=32) Index Cond: (mod IS NOT NULL) (5 rows) create table t2 (id int not null); insert into t2 select id from generate_series(1,10) id; create unique index on t2(id); But if we add a join, we fall into a sec scan without options: explain select min(t1.mod) from t1 left join t2 on t1.id = t2.id; postgres=# explain select min(t1.mod) from t1 left join t2 on t1.id = t2.id; QUERY PLAN - Aggregate (cost=1693.00..1693.01 rows=1 width=32) -> Seq Scan on t1 (cost=0.00..1443.00 rows=10 width=32) I have implemented a patch that solves this problem - allowing to consider and join expressions for trial optimization. I am glad for feedback and review! -- Regards, Alena Rybakina Postgres Professional From eb8fe49f68e198217a8f3e92ee424ff897f9e21e Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Thu, 27 Feb 2025 11:26:44 +0300 Subject: [PATCH] Apply min_max_transformation for eliminated join relations after successfuly applying join removal optimization. --- src/backend/optimizer/plan/planagg.c | 14 +++-- src/test/regress/expected/aggregates.out | 40 src/test/regress/sql/aggregates.sql | 17 ++ 3 files changed, 69 insertions(+), 2 deletions(-) diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c index 64605be3178..b1d44987fb1 100644 --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -125,9 +125,19 @@ preprocess_minmax_aggregates(PlannerInfo *root) return; jtnode = linitial(jtnode->fromlist); } - if (!IsA(jtnode, RangeTblRef)) + + /* + * Let's consider applying optimization to the remaining + * single relations after join removal optimization. + * To do this, we need to make sure that fromlist is empty and + * quals contains only one RTE relation.aggs_list. + */ + if (IsA(jtnode, JoinExpr) && jtnode->fromlist == NIL && IsA(jtnode->quals, RangeTblRef)) + rtr = (RangeTblRef *) (jtnode->quals); + else if (IsA(jtnode, RangeTblRef)) + rtr = (RangeTblRef *) jtnode; + else return; - rtr = (RangeTblRef *) jtnode; rte = planner_rt_fetch(rtr->rtindex, root); if (rte->rtekind == RTE_RELATION) /* ordinary relation, ok */ ; diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index 8c4f8ce27ed..79fe2fcce4d 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -1329,6 +1329,46 @@ from int4_tbl t0; (5 rows) rollback; +--Check min/max optimization for join expressions +create unique index tenk1_unique1_idx on tenk1(unique1); +create index tenk1_unique2_idx on tenk1(unique2); +create unique index INT4_TBL_f1_idx on INT4_TBL(f1); +explain (COSTS OFF) +select min(t1.unique2) from tenk1 t1 left join INT4_TBL t2 on t1.unique1 = t2.f1; + QUERY PLAN +-- + Result + InitPlan 1 + -> Limit + -> Index Scan using tenk1_unique2_idx on tenk1 t1 + Index Cond: (unique2 IS NOT NULL) +(5 rows) + +explain (COSTS OFF) +select min(t1.unique2) from tenk1 t1 inner join INT4_TBL t2 on t1.unique1 = t2.f1; + QUERY PLAN + + Aggregate + -> Nested Loop + -> Seq Scan on int4_tbl t2 + -> Index Scan using tenk1_unique1_idx on tenk1 t1 + Index Cond: (unique1 = t2.f1) +(5 rows) + +explain (COSTS OFF) +select min(t1.unique2) from tenk1 t1 left outer join INT4_TBL t2 on t1.unique1 = t2.f1; + QUERY PLAN +-- + Result + InitPlan 1 + -> Limit + -> Index Scan using tenk1_unique2_idx on tenk1 t1 + Index Cond: (unique2 IS NOT NULL) +(5 rows) + +DROP INDEX tenk1_unique1_idx; +DROP INDEX
Re: Changing shared_buffers without restart
> On Tue, Feb 25, 2025 at 10:52:05AM GMT, Dmitry Dolgov wrote: > > On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote: > > TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via > > changing shared memory mapping layout. Any feedback is appreciated. > > Hi, > > Here is a new version of the patch, which contains a proposal about how to > coordinate shared memory resizing between backends. The rest is more or less > the same, a feedback about coordination is appreciated. It's a lot to read, > but > the main difference is about: Just one note, there are still couple of compilation warnings in the code, which I haven't addressed yet. Those will go away in the next version.
Re: Enhances pg_createsubscriber documentation for the -d option.
On Wed, Feb 26, 2025 at 4:48 PM vignesh C wrote: > > On Wed, 26 Feb 2025 at 14:35, Amit Kapila wrote: > > > > How about changing it slightly as follows to make it clear: "If > > -doption is not provided, the database name will be > > obtained from -P. If the database name is not > > specified in either the -d option or > > -P, an error will be reported."? > > Looks good to me, here is an updated v2 version with the changes. > Pushed. -- With Regards, Amit Kapila.
RE: long-standing data loss bug in initial sync of logical replication
On Monday, February 24, 2025 5:50 PM Amit Kapila wrote: > > On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada > wrote: > > > > I confirmed that the proposed patch fixes these issues. I have one > > question about the patch: > > > > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the > > following code: > > > >/* > > * If we don't have a base snapshot yet, there are no changes in this > > * transaction which in turn implies we don't yet need a snapshot at > > * all. We'll add a snapshot when the first change gets queued. > > * > > * NB: This works correctly even for subtransactions because > > * ReorderBufferAssignChild() takes care to transfer the base > snapshot > > * to the top-level transaction, and while iterating the changequeue > > * we'll get the change from the subtxn. > > */ > >if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid)) > >continue; > > > > Is there any case where we need to distribute inval messages to > > transactions that don't have the base snapshot yet but eventually need > > the inval messages? > > > > Good point. It is mentioned that for snapshots: "We'll add a snapshot > when the first change gets queued.". I think we achieve this via > builder->committed.xip array such that when we set a base snapshot for > a transaction, we use that array to form a snapshot. However, I don't > see any such consideration for invalidations. Now, we could either > always add invalidations to xacts that don't have base_snapshot yet or > have a mechanism similar committed.xid array. But it is better to > first reproduce the problem. I think distributing invalidations to a transaction that has not yet built a base snapshot is un-necessary. This is because, during the process of building its base snapshot, such a transaction will have already recorded the XID of the transaction that altered the publication information into its array of committed XIDs. Consequently, it will reflect the latest changes in the catalog from the beginning. In the context of logical decoding, this scenario is analogous to decoding a new transaction initiated after the catalog-change transaction has been committed. The original issue arises because the catalog cache was constructed using an outdated snapshot that could not reflect the latest catalog changes. However, this is not a problem in cases without a base snapshot. Since the existing catalog cache should have been invalidated upon decoding the committed catalog-change transaction, the subsequent transactions will construct a new cache with the latest snapshot. I also considered the scenario where only a sub-transaction has a base snapshot that has not yet been transferred to its top-level transaction. However, I think this is not problematic because a sub-transaction transfers its snapshot immediately upon building it (see ReorderBufferSetBaseSnapshot). The only exception is if the sub-transaction is independent (i.e., not yet associated with its top-level transaction). In such a case, the sub-transaction is treated as a top-level transaction, and invalidations will be distributed to this sub-transaction after applying the patch which is sufficient to resolve the issue. Considering the complexity of this topic, I think it would be better to add some comments like the attachment Best Regards, Hou zj 0001-add-comments-for-txns-without-base-snapshot.patch Description: 0001-add-comments-for-txns-without-base-snapshot.patch
Re: Race condition in replication slot usage introduced by commit f41d846
On Wed, Feb 26, 2025 at 7:18 PM Amit Kapila wrote: > > On Wed, Feb 26, 2025 at 5:40 PM Nisha Moond wrote: > > > > Attached the patch v1 fixing this issue. > > > > Issue reported by: Hou Zhijie. > > > > Thanks for the report and patch. I'll look into it. > Pushed the patch with a slightly different comment. -- With Regards, Amit Kapila.
Re: Restrict copying of invalidated replication slots
On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada wrote: > > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila wrote: > > > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with > > this operation. So, isn't it possible that the source slot exists at > > the later position in ReplicationSlotCtl->replication_slots and the > > loop traversing slots is already ahead from the point where the newly > > copied slot is created? > > Good point. I think it's possible. > > > If so, the newly created slot won't be marked > > as invalid whereas the source slot will be marked as invalid. I agree > > that even in such a case, at a later point, the newly created slot > > will also be marked as invalid. > > The wal_status of the newly created slot would immediately become > 'lost' and the next checkpoint will invalidate it. Do we need to do > something to deal with this case? > + /* Check if source slot became invalidated during the copy operation */ + if (second_slot_contents.data.invalidated != RS_INVAL_NONE) + ereport(ERROR, + errmsg("cannot copy replication slot \"%s\"", +NameStr(*src_name)), + errdetail("The source replication slot was invalidated during the copy operation.")); How about adding a similar test as above for MyReplicationSlot? That should suffice the need because we have already acquired the new slot by this time and invalidation should signal this process before marking the new slot as invalid. -- With Regards, Amit Kapila.
Re: Draft for basic NUMA observability
Hi Jakub, On Wed, Feb 26, 2025 at 09:48:41AM +0100, Jakub Wartak wrote: > On Mon, Feb 24, 2025 at 5:11 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Mon, Feb 24, 2025 at 09:06:20AM -0500, Andres Freund wrote: > > > Does the issue with "new" backends seeing pages as not present exist both > > > with > > > and without huge pages? > > > > That's a good point and from what I can see it's correct with huge pages > > being > > used (it means all processes see the same NUMA node assignment regardless of > > access patterns). > > Hi Bertrand , please see that nearby thread. I've got quite the > opposite results. I need page-fault memory or I get invalid results > ("-2"). What kernel version are you using ? (I've tried it on two > 6.10.x series kernels , virtualized in both cases; one was EPYC [real > NUMA, but not VM so not real hardware]) Thanks for sharing your numbers! It looks like that with hp enabled then the shared_buffers plays a role. 1. With hp, shared_buffers 4GB: huge_pages_status --- on (1 row) shared_buffers 4GB (1 row) NOTICE: os_page_count=2048 os_page_size=2097152 pages_per_blk=0.003906 numa_zone_id | count --+ | 507618 0 | 1054 -2 | 15616 (3 rows) 2. With hp, shared_buffers 23GB: huge_pages_status --- on (1 row) shared_buffers 23GB (1 row) NOTICE: os_page_count=11776 os_page_size=2097152 pages_per_blk=0.003906 numa_zone_id | count --+- | 2997974 0 | 16682 (2 rows) 3. no hp, shared_buffers 23GB: huge_pages_status --- off (1 row) shared_buffers 23GB (1 row) ERROR: extension "pg_buffercache" already exists NOTICE: os_page_count=6029312 os_page_size=4096 pages_per_blk=2.00 numa_zone_id | count --+- | 2997975 -2 | 16482 1 | 199 (3 rows) Maybe the kernel is taking some decisions based on the HugePages_Rsvd, I've no ideas. Anyway there is little than we can do and the "touchpages" patch seems to provide "accurate" results. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Draft for basic NUMA observability
On Wed, Feb 26, 2025 at 6:13 PM Bertrand Drouvot wrote: [..] > > Meanwhile v5 is attached with slight changes to try to make cfbot happy: > > Thanks for the updated version! > > FWIW, I had to do a few changes to get an error free compiling experience with > autoconf/or meson and both with or without the libnuma configure option. > > Sharing here as .txt files: > > Also the pg_buffercache test fails without the libnuma configure option. Maybe > some tests should depend of the libnuma configure option. [..] Thank you so much for this Bertrand ! I've applied those , played a little bit with configure and meson and reproduced the test error and fixed it by silencing that NOTICE in tests. So v6 is attached even before I get a chance to start using that CI. Still waiting for some input and tests regarding that earlier touchpages attempt, docs are still missing... -J. v6-0003-Add-pg_shmem_numa_allocations-to-show-NUMA-zones-.patch Description: Binary data v6-0002-Extend-pg_buffercache-with-new-view-pg_buffercach.patch Description: Binary data v6-0001-Add-optional-dependency-to-libnuma-for-basic-NUMA.patch Description: Binary data
Re: Fix api misuse (src/bin/pg_amcheck/pg_amcheck.c)
Em qui., 27 de fev. de 2025 às 02:08, Michael Paquier escreveu: > On Wed, Feb 26, 2025 at 07:04:25PM +0530, vignesh C wrote: > > On Tue, 18 Feb 2025 at 18:18, Ranier Vilela wrote: > >> Similar to commit 5b94e27 [1] > >> The correct function when freeing memory in the frontend, > >> using the function PQescapeIdentifier, must be > >> PQfreemem. > >> > >> Trivial patch attached. > > > > Thanks, this change looks good to me. > > There's no point in leaving that unfixed in pg_amcheck, so done down > to 14 with 48e4ae9a0707. > Thank you Michael. best regards, Ranier Vilela
Re: Replace IN VALUES with ANY in WHERE clauses during optimization
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hi Ivan Kush I tested the patch with `commands.sql` and observed noticeable improvements in planning and execution time, especially with multiple tables. Even single-table queries show small time reductions (0.02–0.04 ms). The patch optimizes `IN` clauses effectively, particularly with `VALUES`. For example, `col IN (VALUES ('a'), ('b'), ('c'))` now behaves similarly to `col IN ('a', 'b', 'c')`, leading to faster execution and reduced planning overhead. Regards, Postgresql Contributors - NewtGlobal
Re: NOT ENFORCED constraint feature
On 2025-Feb-27, Amul Sul wrote: > Attached is the rebased patch set against the latest master head, > which also includes a *new* refactoring patch (0001). In this patch, > I’ve re-added ATExecAlterChildConstr(), which is required for the main > feature patch (0008) to handle recursion from different places while > altering enforceability. I think you refer to ATExecAlterConstrEnforceability, which claims (falsely) that it is a subroutine to ATExecAlterConstrRecurse; in reality it is called from ATExecAlterConstraintInternal or at least that's what I see in your 0008. So I wonder if you haven't confused yourself here. If nothing else, that comments needs fixed. I didn't review these patches. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Porque francamente, si para saber manejarse a uno mismo hubiera que rendir examen... ¿Quién es el machito que tendría carnet?" (Mafalda)
Re: New "raw" COPY format
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hi Joel, After testing the patch, I observed that for single-column tables, the format evolved from SINGLE to RAW and finally to LIST to handle diverse data more flexibly. For example, the command: \COPY test.foo2 FROM '/home/newtdba/postgres-cf-5300/testfiles/testname.txt' WITH (FORMAT LIST); works with CSV, TXT, and RAW files without specifying column names. This LIST format is effective for copying data to/from single-column tables but requires specifying the correct format. The new status of this patch is: Needs review
Re: SQLFunctionCache and generic plans
Pavel Stehule писал(а) 2025-02-26 22:34: hI I can confirm 60% speedup for execution of function fx and fx3 - both functions are very primitive, so for real code the benefit can be higher Unfortunately, there is about 5% slowdown for inlined code, and for just plpgsql code too. I tested fx4 create or replace function fx4(int) returns int immutable as $$ begin return $1 + $1; end $$ language plpgsql; and fx2 create or replace function fx2(int) returns int as $$ select 2 * $1; $$ language sql immutable; and execution of patched code is about 5% slower. It is strange so this patch has a negative impact on plpgsql execution. Regards Pavel Hi. I've tried to reproduce slowdown and couldn't. create or replace function fx4(int) returns int immutable as $$ begin return $1 + $1; end $$ language plpgsql; do $$ begin for i in 1..500 loop perform fx4((random()*100)::int); -- or fx2 end loop; end; $$; OLD results: Time: 8268.614 ms (00:08.269) Time: 8178.836 ms (00:08.179) Time: 8306.694 ms (00:08.307) New (patched) results: Time: 7743.945 ms (00:07.744) Time: 7803.109 ms (00:07.803) Time: 7736.735 ms (00:07.737) Not sure why new is faster (perhaps, some noise?) - looking at perf flamegraphs I don't see something evident. create or replace function fx2(int) returns int as $$ select 2 * $1; $$ language sql immutable; do $$ begin for i in 1..500 loop perform fx2((random()*100)::int); -- or fx2 end loop; end; $$; OLD results: Time: 5346.471 ms (00:05.346) Time: 5359.222 ms (00:05.359) Time: 5316.747 ms (00:05.317) New (patched) results: Time: 5188.363 ms (00:05.188) Time: 5225.322 ms (00:05.225) Time: 5203.667 ms (00:05.204) -- Best regards, Alexander Pyhalov, Postgres Professional
Re: psql \dh: List High-Level (Root) Tables and Indexes
Thanks Vignesh for the review! > Currently we are supporting only PG13 and higher versions. I understand that servers older than PG13 are no longer supported. But on the client side, we still have this notice at the top of describe.c file, which indicates that the client should support 9.2+. * Support for the various \d ("describe") commands. Note that the current * expectation is that all functions in this file will succeed when working * with servers of versions 9.2 and up. It's okay to omit irrelevant * information for an old server, but not to fail outright. (But failing * against a pre-9.2 server is allowed.) I'm just following the instructions here so as not to break anything unwanted, and you can see for instance \dP is doing the same. That said, I'm totally fine with removing the "if" from my patch, but first I think a committer should update the above comment to the least supported version for client code. Best Regards, Sadeq Dousti
Re: new commitfest transition guidance
On 2025-02-27 Th 12:16 AM, Michael Paquier wrote: On Mon, Feb 03, 2025 at 12:22:52PM +0100, Peter Eisentraut wrote: CF 2025-01 has just ended, so I suggest that everyone try this now. We can check in perhaps two weeks whether this results in lots of stuff falling through the cracks or still too much stuff with unclear status being moved forward, and then see what that might mean going forward. CF 2025-03 is to begin in more or less 48 hours, and we have still a grand total of 72 patches still listed in CF 2025-01: https://commitfest.postgresql.org/51/ It's a good score, as 286 patches have been moved without doing any kind of massive bulky and manual vacuum work on all these entries. As these have not been moved by their respective authors and/or reviewers, perhaps, based on the guidance I am reading from this thread, it would be time to give up on these rather than move them around? There are 4 marked "Ready For Committer" - all authored by committers :-) Maybe the message isn't getting through, After I got the email message, I moved one yesterday that I am listed on as an author although I'm not really, but the real author had not moved. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: SQLFunctionCache and generic plans
čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov < a.pyha...@postgrespro.ru> napsal: > Pavel Stehule писал(а) 2025-02-26 22:34: > > hI > > > > I can confirm 60% speedup for execution of function fx and fx3 - both > > functions are very primitive, so for real code the benefit can be > > higher > > > > Unfortunately, there is about 5% slowdown for inlined code, and for > > just plpgsql code too. > > > > I tested fx4 > > > > create or replace function fx4(int) returns int immutable as $$ begin > > return $1 + $1; end $$ language plpgsql; > > > > and fx2 > > > > create or replace function fx2(int) returns int as $$ select 2 * $1; > > $$ > > language sql immutable; > > > > and execution of patched code is about 5% slower. It is strange so > > this patch has a negative impact on plpgsql execution. > > > > Regards > > > > Pavel > > Hi. I've tried to reproduce slowdown and couldn't. > > create or replace function fx4(int) returns int immutable as $$ begin > return $1 + $1; end $$ language plpgsql; > > do $$ > begin >for i in 1..500 loop > perform fx4((random()*100)::int); -- or fx2 >end loop; > end; > $$; > > OLD results: > Time: 8268.614 ms (00:08.269) > Time: 8178.836 ms (00:08.179) > Time: 8306.694 ms (00:08.307) > > New (patched) results: > Time: 7743.945 ms (00:07.744) > Time: 7803.109 ms (00:07.803) > Time: 7736.735 ms (00:07.737) > > Not sure why new is faster (perhaps, some noise?) - looking at perf > flamegraphs I don't see something evident. > > create or replace function fx2(int) returns int as $$ select 2 * $1; $$ > language sql immutable; > do $$ > begin >for i in 1..500 loop > perform fx2((random()*100)::int); -- or fx2 >end loop; > end; > $$; > > OLD results: > Time: 5346.471 ms (00:05.346) > Time: 5359.222 ms (00:05.359) > Time: 5316.747 ms (00:05.317) > > New (patched) results: > Time: 5188.363 ms (00:05.188) > Time: 5225.322 ms (00:05.225) > Time: 5203.667 ms (00:05.204) > I'll try to get profiles. Regards Pavel > > -- > Best regards, > Alexander Pyhalov, > Postgres Professional >
Add support for EXTRA_REGRESS_OPTS for meson
Hi, We use EXTRA_REGRESS_OPTS to make sure the whole test suite passes with our extension loaded and since I prefer develop in meson over using autotools and make the lack of support for EXTRA_REGRESS_OPTS in meson has bugged me for a while. I have implemented support for it as an environment variable we read in the testwrap script instead of adding it as a configuration option to meson.build. The reason for this is that I do not like having to run "meson reconfigure" all the time plus that for the PG_TEST_EXTRA we ended up having to add an environment variable anyway. To use this run e.g. the following: EXTRA_REGRESS_OPTS="--load-extension=pgcrypto" meson test Question: Would it make sense to rename it to PG_REGRESS_EXTRA_OPTS or something similar while we already touch touch this code to make the various options easier to remember? Andreas From 87ce622a19315b679bbd5691e01c96261bc0c4c8 Mon Sep 17 00:00:00 2001 From: Andreas Karlsson Date: Thu, 27 Feb 2025 00:23:44 +0100 Subject: [PATCH] meson: Add support for EXTRA_REGRESS_OPTS Add support for the EXTRA_REGRESS_OPTS environment variable in meson which works just like with make and applies to all regress, ecpg and isolation tests. TAP tests which support it under make will continue to support the option. Run it with e.g: EXTRA_REGRESS_OPTS="--temp-config=test.conf" meson test --- src/tools/testwrap | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/tools/testwrap b/src/tools/testwrap index 8ae8fb79ba7..f52ca376da1 100755 --- a/src/tools/testwrap +++ b/src/tools/testwrap @@ -2,6 +2,7 @@ import argparse import shutil +import shlex import subprocess import os import sys @@ -51,7 +52,12 @@ env_dict = {**os.environ, if "PG_TEST_EXTRA" not in env_dict and args.pg_test_extra: env_dict["PG_TEST_EXTRA"] = args.pg_test_extra -sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE) +if args.testname in ['regress', 'isolation', 'ecpg'] and 'EXTRA_REGRESS_OPTS' in env_dict: +test_command = args.test_command + shlex.split(env_dict['EXTRA_REGRESS_OPTS']) +else: +test_command = args.test_command + +sp = subprocess.Popen(test_command, env=env_dict, stdout=subprocess.PIPE) # Meson categorizes a passing TODO test point as bad # (https://github.com/mesonbuild/meson/issues/13183). Remove the TODO # directive, so Meson computes the file result like Perl does. This could -- 2.47.2
Re: Add support for EXTRA_REGRESS_OPTS for meson
Hi, On 2025-02-27 13:53:17 +0100, Andreas Karlsson wrote: > We use EXTRA_REGRESS_OPTS to make sure the whole test suite passes with our > extension loaded and since I prefer develop in meson over using autotools > and make the lack of support for EXTRA_REGRESS_OPTS in meson > has bugged me for a while. Yep, we should add support for that. TEMP_CONFIG probably too. > Question: Would it make sense to rename it to PG_REGRESS_EXTRA_OPTS or > something similar while we already touch touch this code to make the various > options easier to remember? I'd not tackle that at the same time personally. > @@ -51,7 +52,12 @@ env_dict = {**os.environ, > if "PG_TEST_EXTRA" not in env_dict and args.pg_test_extra: > env_dict["PG_TEST_EXTRA"] = args.pg_test_extra > > -sp = subprocess.Popen(args.test_command, env=env_dict, > stdout=subprocess.PIPE) > +if args.testname in ['regress', 'isolation', 'ecpg'] and > 'EXTRA_REGRESS_OPTS' in env_dict: > +test_command = args.test_command + > shlex.split(env_dict['EXTRA_REGRESS_OPTS']) > +else: > +test_command = args.test_command > + > +sp = subprocess.Popen(test_command, env=env_dict, stdout=subprocess.PIPE) > # Meson categorizes a passing TODO test point as bad > # (https://github.com/mesonbuild/meson/issues/13183). Remove the TODO > # directive, so Meson computes the file result like Perl does. This could I hacked up something similar before, for TEMP_CONFIG, and found that I needed to do something like this: +if 'TEMP_CONFIG' in os.environ and \ +args.testname in ['regress', 'isolation', 'ecpg']: +# be careful to insert before non-option args, otherwise it'll fail +# e.g. on windows +args.test_command.insert(1, '--temp-config='+os.environ['TEMP_CONFIG']) Greetings, Andres Freund
Re: Small memory fixes for pg_createsubcriber
Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier escreveu: > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote: > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void) > > locale->db_locale, > > strlen(locale->db_locale)); > > else > > - datlocale_literal = pg_strdup("NULL"); > > + datlocale_literal = PQescapeLiteral(conn_new_template1, > > + "NULL", > > + strlen("NULL")); > > Yeah, I've considered that but hardcoding NULL twice felt a bit weird, > as well. Perhaps it's slightly cleaner to use an intermediate > variable given then as an argument of PQescapeLiteral()? > Yeah, I also think it would look good like this. v1 attached. best regards, Ranier Vilela v1-avoid-mix-api-pg_upgrade.patch Description: Binary data
Re: Serverside SNI support in libpq
> On 24 Feb 2025, at 22:51, Jacob Champion > wrote: > > On Wed, Feb 19, 2025 at 3:13 PM Daniel Gustafsson wrote: >> Are there any blockers for getting this in? > >> + SSL_context = ssl_init_context(isServerStart, host); > > I'm still not quite following the rationale behind the SSL_context > assignment. To maybe illustrate, attached are some tests that I > expected to pass, but don't. > > After adding an additional host and reloading the config, the behavior > of the original fallback host seems to change. Am I misunderstanding > the designed fallback behavior, have I misdesigned my test, or is this > a bug? Thanks for the tests, they did in fact uncover a bug in how fallback was handled which is now fixed. In doing so I revamped how the default context handling is done, it now always use the GUCs in postgresql.conf for consistency. The attached v6 rebase contains this as well as your tests as well as general cleanup and comment writing etc. -- Daniel Gustafsson v6-0001-Serverside-SNI-support-for-libpq.patch Description: Binary data
Re: Logging which local address was connected to in log_line_prefix
On Mon, Nov 18, 2024 at 10:07 AM Jim Jones wrote: > 2024-11-18 16:00:42.720 CET [3135117] -> 192.168.178.27 STATEMENT: > ... > 2024-11-18 16:01:23.273 CET [3114980] -> [local] LOG: received SIGHUP, > ... 2024-11-18 16:01:46.769 CET [3114981] -> [local] LOG: checkpoint > Is it supposed to be like this? > Great question. I think "supposed to" is a bit of a stretch, but I presume it's the difference between a client connecting and using its connection information versus an already existing backend process, which is always going to be "local". Overall this makes sense, as that checkpoint example above is coming from the checkpointer background process at 3114981, not the backend process that happened to trigger it. And 3114981 has no way of knowing the details of the caller's connection. FWIW, the patch still applies cleanly to head as of 2/27/2025, so no rebase needed. Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support
Re: per backend WAL statistics
On Thu, Feb 27, 2025 at 07:47:09AM +, Bertrand Drouvot wrote: > That's how I did it initially but decided to move it to pgstat_backend.c. The > reason was that it's fully linked to "per backend" stats and that there is > no SQL api on top of it (while I think that's the case for almost all the ones > in pgstatfuncs.c). Thoughts? Okay by me with pgstat_fetch_stat_backend in parallel, why not exposing this part as well.. Perhaps that could be useful for some extension? I'd rather have out-of-core code do these lookups with the same sanity checks in place for the procnumber and slot lookups. The name was inconsistent with the rest of the file, so I have settled to a pgstat_fetch_stat_backend_by_pid() to be more consistent. A second thing is to properly initialize bktype if defined by the caller. > Makes sense. I did not had in mind to submit a new patch version (to at least > implement the above) without getting your final thoughts on your first > comment. > But since a rebase is needed anyway,then please find attached a new version. > It > just implements your last comment. Attached is a rebased version of the rest. -- Michael From 42fd2860e3f06d525936187f46aad06892073078 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Mon, 6 Jan 2025 10:00:00 + Subject: [PATCH v12] per backend WAL statistics Now that commit 9aea73fc61 added backend-level statistics to pgstats (and per backend IO statistics) we can more easily add per backend statistics. This commit adds per backend WAL statistics using the same layer as pg_stat_wal, except that it is now possible to know how much WAL activity is happening in each backend rather than an overall aggregate of all the activity. A function called pg_stat_get_backend_wal() is added to access this data depending on the PID of a backend. The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes are not included in this set of statistics. XXX: bump catalog version XXX: bump of stats file format not required, as backend stats do not persist on disk. --- src/include/catalog/pg_proc.dat | 7 +++ src/include/pgstat.h| 37 ++-- src/include/utils/pgstat_internal.h | 3 +- src/backend/utils/activity/pgstat_backend.c | 64 + src/backend/utils/activity/pgstat_wal.c | 1 + src/backend/utils/adt/pgstatfuncs.c | 26 - src/test/regress/expected/stats.out | 14 + src/test/regress/sql/stats.sql | 6 ++ doc/src/sgml/monitoring.sgml| 19 ++ 9 files changed, 156 insertions(+), 21 deletions(-) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index cd9422d0bacf..3e35f8b8e99a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5954,6 +5954,13 @@ proargmodes => '{o,o,o,o,o}', proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}', prosrc => 'pg_stat_get_wal' }, +{ oid => '8037', descr => 'statistics: backend WAL activity', + proname => 'pg_stat_get_backend_wal', provolatile => 'v', + proparallel => 'r', prorettype => 'record', proargtypes => 'int4', + proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}', + proargmodes => '{i,o,o,o,o,o}', + proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}', + prosrc => 'pg_stat_get_backend_wal' }, { oid => '6248', descr => 'statistics: information about WAL prefetching', proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't', provolatile => 'v', prorettype => 'record', proargtypes => '', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 4aad10b0b6d5..06359b9157d2 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -340,24 +340,6 @@ typedef struct PgStat_IO PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; } PgStat_IO; -typedef struct PgStat_Backend -{ - TimestampTz stat_reset_timestamp; - PgStat_BktypeIO io_stats; -} PgStat_Backend; - -/* - - * PgStat_BackendPending Non-flushed backend stats. - * - - */ -typedef struct PgStat_BackendPending -{ - /* - * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO. - */ - PgStat_PendingIO pending_io; -} PgStat_BackendPending; - typedef struct PgStat_StatDBEntry { PgStat_Counter xact_commit; @@ -500,6 +482,25 @@ typedef struct PgStat_WalStats TimestampTz stat_reset_timestamp; } PgStat_WalStats; +typedef struct PgStat_Backend +{ + TimestampTz stat_reset_timestamp; + PgStat_BktypeIO io_stats; + PgStat_WalCounters wal_counters; +} PgStat_Backend; + +/* - + * PgStat_BackendPending Non-flushed backend stats. + * - + */ +typedef struct PgStat_BackendPending +{ + /* + * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO. + */ + PgStat_PendingIO pending_io; +} PgStat_BackendPending; + /* * Functions in pgstat.c */ diff --git a/src/include/utils/pgstat_internal.h b/sr
Re: Disabling vacuum truncate for autovacuum
On Tue, Feb 18, 2025 at 01:56:09PM -0600, Nathan Bossart wrote: > On Mon, Jan 27, 2025 at 03:38:39PM -0500, Robert Haas wrote: >> Also, how sure are we that turning this off globally is a solid idea? >> Off-hand, it doesn't sound that bad: there are probably situations in >> which autovacuum never truncates anything anyway just because the tail >> blocks of the relations always contain at least 1 tuple. But we should >> be careful not to add a setting that is far more likely to get people >> into trouble than to get them out of it. It would be good to hear what >> other people think about the risk vs. reward tradeoff in this case. > > My first reaction is that a global setting is probably fine most of the > time. I'm sure it's possible to get into bad situations if you try hard > enough, but that's not a unique characteristic. There are probably many > situations where the truncation is wasted effort because we'll just end up > extending the relation shortly afterwards, anyway. In any case, it's > already possible to achieve $SUBJECT with a trivial script that sets > storage parameters on all tables, so IMHO it would be silly to withhold a > global setting that does the same thing just on principle. I spent some time on this one today. A couple of notes: * Since the reloption is a Boolean, there isn't a good way to tell whether it is actually set for the table or if it just inherited the default value. This is important to know because we want the reloption to override the GUC. I considered a bunch of different ways to handle this, but everything felt like a cowboy hack. The cleanest cowboy hack I could come up with is an optional offset field in relopt_parse_elt that points to a variable that stores whether the option was explicitly set. * I didn't see a good GUC category for vacuum_truncate. I suppose we could create a new one, but for now I've just stashed it into the autovacuum one. Bikeshedding welcome. Thoughts? -- nathan >From e360b56acd1d3bd05c9df6cfc4586e51edce357e Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 27 Feb 2025 21:09:37 -0600 Subject: [PATCH v2 1/1] Add vacuum_truncate GUC. --- doc/src/sgml/config.sgml | 22 +++ doc/src/sgml/ref/create_table.sgml| 10 + doc/src/sgml/ref/vacuum.sgml | 3 ++- src/backend/access/common/reloptions.c| 11 -- src/backend/commands/vacuum.c | 17 ++ src/backend/utils/misc/guc_tables.c | 9 src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/reloptions.h | 1 + src/include/commands/vacuum.h | 1 + src/include/utils/rel.h | 1 + 10 files changed, 60 insertions(+), 16 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e55700f35b8..069ac35762f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -8934,6 +8934,28 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + vacuum_truncate (boolean) + +vacuum_truncate configuration parameter + + + + + Enables or disables vacuum to try to truncate off any empty pages at + the end of the table. The default value is true. + If true, VACUUM and autovacuum + do the truncation and the disk space for the truncated pages is + returned to the operating system. Note that the truncation requires + an ACCESS EXCLUSIVE lock on the table. The + TRUNCATE parameter of + VACUUM, if + specified, overrides the value of this parameter. The setting can be + overridden for individual tables by changing table storage parameters. + + + + diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 0a3e520f215..3c2315b1a8e 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1688,15 +1688,7 @@ WITH ( MODULUS numeric_literal, REM - Enables or disables vacuum to try to truncate off any empty pages - at the end of this table. The default value is true. - If true, VACUUM and - autovacuum do the truncation and the disk space for - the truncated pages is returned to the operating system. - Note that the truncation requires ACCESS EXCLUSIVE - lock on the table. The TRUNCATE parameter - of VACUUM, if specified, overrides the value - of this option. + Per-table value for parameter. diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index 971b1237d47..bd5dcaf86a5 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -265,7 +265,8 @@ VACUUM [ ( option [, ...] ) ] [ vacuum_truncate + and is the default unless + is set to false or
Re: Confine vacuum skip logic to lazy_scan_skip
On Fri, Feb 28, 2025 at 11:58 AM Melanie Plageman wrote: > On Thu, Feb 27, 2025 at 1:08 PM Tom Lane wrote: > > I wonder if it'd be a good idea to add something like > > > > Assert(stream->distance == 1); > > Assert(stream->pending_read_nblocks == 0); > > Assert(stream->per_buffer_data_size == 0); > > + Assert(per_buffer_data == NULL); > > > > in read_stream_next_buffer. I doubt that this will shut Coverity > > up, but it would help to catch caller coding errors, i.e. passing > > a per_buffer_data pointer when there's no per-buffer data. > > I think this is a good stopgap. I was discussing adding this assert > off-list with Thomas and he wanted to detail his more ambitious plans > for type safety improvements in the read stream API. Less on the order > of a redesign and more like a separate read_stream_next_buffer()s for > when there is per buffer data and when there isn't. And a by-value and > by-reference version for the one where there is data. Here's what I had in mind. Is it better? From 68e9424b590051959142917459eb4ea074589b79 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 28 Feb 2025 10:48:29 +1300 Subject: [PATCH 1/2] Improve API for retrieving data from read streams. Dealing with the per_buffer_data argument to read_stream_next_buffer() has proven a bit clunky. Provide some new wrapper functions/macros: buffer = read_stream_get_buffer(rs); buffer = read_stream_get_buffer_and_value(rs, &my_int); buffer = read_stream_get_buffer_and_pointer(rs, &my_pointer_to_int); These improve readability and type safety via assertions. --- contrib/pg_prewarm/pg_prewarm.c | 4 +- contrib/pg_visibility/pg_visibility.c| 6 +-- src/backend/access/heap/heapam.c | 2 +- src/backend/access/heap/heapam_handler.c | 2 +- src/backend/access/heap/vacuumlazy.c | 6 +-- src/backend/storage/aio/read_stream.c| 12 + src/backend/storage/buffer/bufmgr.c | 4 +- src/include/storage/read_stream.h| 64 8 files changed, 87 insertions(+), 13 deletions(-) diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c index a2f0ac4af0c..f6ae266d7b0 100644 --- a/contrib/pg_prewarm/pg_prewarm.c +++ b/contrib/pg_prewarm/pg_prewarm.c @@ -208,11 +208,11 @@ pg_prewarm(PG_FUNCTION_ARGS) Buffer buf; CHECK_FOR_INTERRUPTS(); - buf = read_stream_next_buffer(stream, NULL); + buf = read_stream_get_buffer(stream); ReleaseBuffer(buf); ++blocks_done; } - Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer); + Assert(read_stream_get_buffer(stream) == InvalidBuffer); read_stream_end(stream); } diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c index 7f268a18a74..e7187a46c9d 100644 --- a/contrib/pg_visibility/pg_visibility.c +++ b/contrib/pg_visibility/pg_visibility.c @@ -556,7 +556,7 @@ collect_visibility_data(Oid relid, bool include_pd) Buffer buffer; Page page; - buffer = read_stream_next_buffer(stream, NULL); + buffer = read_stream_get_buffer(stream); LockBuffer(buffer, BUFFER_LOCK_SHARE); page = BufferGetPage(buffer); @@ -569,7 +569,7 @@ collect_visibility_data(Oid relid, bool include_pd) if (include_pd) { - Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer); + Assert(read_stream_get_buffer(stream) == InvalidBuffer); read_stream_end(stream); } @@ -752,7 +752,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) 0); /* Loop over every block in the relation. */ - while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer) + while ((buffer = read_stream_get_buffer(stream)) != InvalidBuffer) { bool check_frozen = all_frozen; bool check_visible = all_visible; diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index fa7935a0ed3..86f280069e0 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -609,7 +609,7 @@ heap_fetch_next_buffer(HeapScanDesc scan, ScanDirection dir) scan->rs_dir = dir; - scan->rs_cbuf = read_stream_next_buffer(scan->rs_read_stream, NULL); + scan->rs_cbuf = read_stream_get_buffer(scan->rs_read_stream); if (BufferIsValid(scan->rs_cbuf)) scan->rs_cblock = BufferGetBlockNumber(scan->rs_cbuf); } diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index e78682c3cef..7487896b06c 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -1010,7 +1010,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream) * re-acquire sharelock for each tuple, but since we aren't doing much * work per tuple, the extra lock traffic is probably better avoided. */ - hscan->rs_cbuf = read_stream_next_buffer(stream, NULL); + hscan->rs_cbuf = read_stream_get_buffer(stream); if (!BufferIsValid(hscan
Re: Confine vacuum skip logic to lazy_scan_skip
On Fri, Feb 28, 2025 at 2:29 PM Thomas Munro wrote: > On Fri, Feb 28, 2025 at 11:58 AM Melanie Plageman > wrote: > > On Thu, Feb 27, 2025 at 1:08 PM Tom Lane wrote: > > > I wonder if it'd be a good idea to add something like > > > > > > Assert(stream->distance == 1); > > > Assert(stream->pending_read_nblocks == 0); > > > Assert(stream->per_buffer_data_size == 0); > > > + Assert(per_buffer_data == NULL); > > > > > > in read_stream_next_buffer. I doubt that this will shut Coverity > > > up, but it would help to catch caller coding errors, i.e. passing > > > a per_buffer_data pointer when there's no per-buffer data. > > > > I think this is a good stopgap. I was discussing adding this assert > > off-list with Thomas and he wanted to detail his more ambitious plans > > for type safety improvements in the read stream API. Less on the order > > of a redesign and more like a separate read_stream_next_buffer()s for > > when there is per buffer data and when there isn't. And a by-value and > > by-reference version for the one where there is data. > > Here's what I had in mind. Is it better? Here's a slightly better one. I think when you use read_stream_get_buffer_and_value(stream, &value), or read_stream_put_value(stream, space, value), then we should assert that sizeof(value) strictly matches the available space, as shown. But, new in v2, if you use read_stream_get_buffer_and_pointer(stream, &pointer), then sizeof(*pointer) should only have to be <= the storage space, not ==, because someone might plausibly want to make per_buffer_data_size variable at runtime (ie decide when they construct the stream), and then be able to retrieve a pointer to the start of a struct with a flexible array or something like that. In v1 I was just trying to assert that it was a pointer-to-a-pointer-to-something and no more (in a confusing compile-time assertion), but v2 is simpler, and is happy with a pointer to a pointer to something that doesn't exceed the space (run-time assertion). From b2dd9c90f970a889deea2c2e9e16097e4e06ece8 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 28 Feb 2025 10:48:29 +1300 Subject: [PATCH v2 1/2] Improve API for retrieving data from read streams. Dealing with the per_buffer_data argument to read_stream_next_buffer() has proven a bit clunky. Provide some new wrapper functions/macros: buffer = read_stream_get_buffer(rs); buffer = read_stream_get_buffer_and_value(rs, &my_int); buffer = read_stream_get_buffer_and_pointer(rs, &my_pointer_to_int); These improve readability and type safety via assertions. --- contrib/pg_prewarm/pg_prewarm.c | 4 +- contrib/pg_visibility/pg_visibility.c| 6 +-- src/backend/access/heap/heapam.c | 2 +- src/backend/access/heap/heapam_handler.c | 2 +- src/backend/access/heap/vacuumlazy.c | 6 +-- src/backend/storage/aio/read_stream.c| 12 ++ src/backend/storage/buffer/bufmgr.c | 4 +- src/include/storage/read_stream.h| 55 8 files changed, 78 insertions(+), 13 deletions(-) diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c index a2f0ac4af0c..f6ae266d7b0 100644 --- a/contrib/pg_prewarm/pg_prewarm.c +++ b/contrib/pg_prewarm/pg_prewarm.c @@ -208,11 +208,11 @@ pg_prewarm(PG_FUNCTION_ARGS) Buffer buf; CHECK_FOR_INTERRUPTS(); - buf = read_stream_next_buffer(stream, NULL); + buf = read_stream_get_buffer(stream); ReleaseBuffer(buf); ++blocks_done; } - Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer); + Assert(read_stream_get_buffer(stream) == InvalidBuffer); read_stream_end(stream); } diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c index 7f268a18a74..e7187a46c9d 100644 --- a/contrib/pg_visibility/pg_visibility.c +++ b/contrib/pg_visibility/pg_visibility.c @@ -556,7 +556,7 @@ collect_visibility_data(Oid relid, bool include_pd) Buffer buffer; Page page; - buffer = read_stream_next_buffer(stream, NULL); + buffer = read_stream_get_buffer(stream); LockBuffer(buffer, BUFFER_LOCK_SHARE); page = BufferGetPage(buffer); @@ -569,7 +569,7 @@ collect_visibility_data(Oid relid, bool include_pd) if (include_pd) { - Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer); + Assert(read_stream_get_buffer(stream) == InvalidBuffer); read_stream_end(stream); } @@ -752,7 +752,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) 0); /* Loop over every block in the relation. */ - while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer) + while ((buffer = read_stream_get_buffer(stream)) != InvalidBuffer) { bool check_frozen = all_frozen; bool check_visible = all_visible; diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index fa7935a0ed3..86f280069e0 100644 --- a/src/backend/access/h
Re: Log connection establishment timings
Hi, On Thu, Feb 27, 2025 at 05:55:19PM -0500, Melanie Plageman wrote: > On Thu, Feb 27, 2025 at 11:30 AM Andres Freund wrote: > > > > On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote: > > > > > However, since that is a bigger project (with more refactoring, etc), > > > he suggested that we change log_connections to a GUC_LIST > > > (ConfigureNamesString) with options like "none", "received, > > > "authenticated", "authorized", "all". > > > > Yep. > > I've done a draft of this in attached v7 (see 0001). Thanks for the patch! > It still needs polishing (e.g. I need to figure out where to put the new guc > hook > functions and enums and such) yeah, I wonder if it would make sense to create new dedicated "connection_logging" file(s). >, but I want to see if this is a viable direction forward. > > I'm worried the list of possible connection log messages could get > unwieldy if we add many more. Thinking out loud, I'm not sure we'd add "many more" but maybe what we could do is to introduce some predefined groups like: 'basic' (that would be equivalent to 'received, + timings introduced in 0002') 'security' (equivalent to 'authenticated,authorized') Not saying we need to create this kind of predefined groups now, just an idea in case we'd add many more: that could "help" the user experience, or are you more concerned about the code maintenance? Just did a quick test, (did not look closely at the code), and it looks like that: 'all': does not report the "received" (but does report authenticated and authorized) 'received': does not work? 'authenticated': works 'authorized': works Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Extend postgres_fdw_get_connections to return remote backend pid
> For now, I think it makes sense to keep postgres_fdw_get_connections() > aligned with the current implementation. Otherwise, explaining what > remote_backend_pid = NULL means could be confusing, especially since > pipeline mode isn't supported yet in postgres_fdw. Make sense. I have created a patch according to the above suggestions. Please review. -- Sagar Dilip Shedge, SDE AWS v2-0001-Extend-postgres_fdw_get_connections-to-return-rem.patch Description: Binary data
Re: Adding support for SSLKEYLOGFILE in the frontend
Attached a v6 with O_NOFOLLOW removed, I noticed that it failed on windows. Please let me know if I should do this in any other way that is portable across platforms. Thanks On Thu, Feb 27, 2025 at 11:39 PM Abhishek Chanda wrote: > > Thanks for the review, Daniel. Please find v5 of this patch attached. > I tried to bump up the minimum OpenBSD version in installation.sgml, > do I need to add this anywhere else? Please let me know if this needs > anything else. > > Thanks > > On Thu, Feb 20, 2025 at 4:25 PM Daniel Gustafsson wrote: > > > > > On 20 Feb 2025, at 04:36, Abhishek Chanda wrote: > > > Please find attached a new version of this patch. I rebased on master, > > > added better error reporting and skipped the permissions check on > > > windows. Please let me know if this needs any changes. I tested this > > > locally using meson running all TAP tests. > > > > Thanks for the new version, a few comments on this one from reading: > > > > +./src/test/ssl/key.txt > > The toplevel .gitignore should not be used for transient test output, there > > is > > a .gitignore in src/test/ssl for that. The key.txt file should also be > > placed > > in PostgreSQL::Test::Utils::tempdir and then cleaned up after the test. > > > > > > + > xreflabel="sslkeylogfile"> > > The documentation should state that the file will use the NSS format. > > > > > > + if (log_file == NULL) { > > + libpq_append_conn_error(conn, "could not open ssl key log > > ... > > + } > > This raise an error for not being able to operate on the file, but it > > doesn't > > error out from the function and instead keeps going with more operations on > > the > > file which couldn't be opened. > > > > > > + if (chmod(conn->sslkeylogfile, 0600) == -1) { > > Rather than opening and try to set permissions separately, why not use > > open(2) > > and set the umask? We probably also want to set O_NOFOLLOW. > > > > > > + if (conn->sslkeylogfile && strlen(conn->sslkeylogfile) > 0) > > + SSL_CTX_set_keylog_callback(SSL_context, SSL_CTX_keylog_cb); > > This callback does exist in OpenSSL 1.1.1 but it's only available in > > LibreSSL > > from OpenBSD 7.1 and onwards where we support 7.0 as the earliest version. > > This feature seems like a good enough reason to bump the minimum LibreSSL > > version, and 7.1 is well out support (7.5 goes EOL next), but it would need > > to > > get done here. > > > > > > +# Skip file mode check on Windows > > +return 1 if $windows_os; > > It should be up to each individual test to determine what to skip, not the > > library code (and it should use SKIP:). src/test/ssl/t/001_ssltests.pl is > > an > > example which skips permissions checks on Windows. Also, I'm not sure we > > need > > a generic function in the testlibrary for something so basic. > > > > > > +print STDERR sprintf("%s mode must be %04o\n", > > + $path, $expected_file_mode); > > Test code should not print anything to stdout/stderr, it should use the TAP > > compliant methods like diag() etc for this. > > > > > > + "connect with server root certand sslkeylogfile=key.txt"); > > s/certand/cert and/ ? > > > > -- > > Daniel Gustafsson > > > > > -- > Thanks and regards > Abhishek Chanda -- Thanks and regards Abhishek Chanda v6-0001-Add-support-for-dumping-SSL-keylog-to-a-file.patch Description: Binary data
Re: Disabling vacuum truncate for autovacuum
On Thu, 2025-02-27 at 21:35 -0600, Nathan Bossart wrote: > I spent some time on this one today. A couple of notes: > > * Since the reloption is a Boolean, there isn't a good way to tell whether > it is actually set for the table or if it just inherited the default > value. This is important to know because we want the reloption to > override the GUC. I agree with that, without being able to think of a better solution. > * I didn't see a good GUC category for vacuum_truncate. I suppose we could > create a new one, but for now I've just stashed it into the autovacuum > one. Bikeshedding welcome. Why not use "Client Connection Defaults / Statement Behavior", just like for "vacuum_freeze_min_age"? I don't think that "autovacuum" is appropriate, since it applies to manual VACUUM as well. Yours, Laurenz Albe
Re: Restrict copying of invalidated replication slots
On Fri, Feb 28, 2025 at 5:10 AM Masahiko Sawada wrote: > > On Thu, Feb 27, 2025 at 12:52 AM Amit Kapila wrote: > > > > On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada > > wrote: > > > > > > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila > > > wrote: > > > > > > > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with > > > > this operation. So, isn't it possible that the source slot exists at > > > > the later position in ReplicationSlotCtl->replication_slots and the > > > > loop traversing slots is already ahead from the point where the newly > > > > copied slot is created? > > > > > > Good point. I think it's possible. > > > > > > > If so, the newly created slot won't be marked > > > > as invalid whereas the source slot will be marked as invalid. I agree > > > > that even in such a case, at a later point, the newly created slot > > > > will also be marked as invalid. > > > > > > The wal_status of the newly created slot would immediately become > > > 'lost' and the next checkpoint will invalidate it. Do we need to do > > > something to deal with this case? > > > > > > > + /* Check if source slot became invalidated during the copy operation */ > > + if (second_slot_contents.data.invalidated != RS_INVAL_NONE) > > + ereport(ERROR, > > + errmsg("cannot copy replication slot \"%s\"", > > +NameStr(*src_name)), > > + errdetail("The source replication slot was invalidated during the > > copy operation.")); > > > > How about adding a similar test as above for MyReplicationSlot? That > > should suffice the need because we have already acquired the new slot > > by this time and invalidation should signal this process before > > marking the new slot as invalid. > > IIUC in the scenario you mentioned, the loop traversing slots already > passed the position of newly created slot in > ReplicationSlotCtl->replication_slots array, so > MyReplicationSlot->data.invalidated is still RS_INVAL_NONE, no? > Right, I don't have a simpler solution for this apart from making it somehow serialize this operation with InvalidateObsoleteReplicationSlots(). I don't think we need to go that far to handle the scenario being discussed. It is better to add a comment for this race and why it won't harm. -- With Regards, Amit Kapila.
Re: NOT ENFORCED constraint feature
On Thu, Feb 27, 2025 at 4:48 PM Álvaro Herrera wrote: > > On 2025-Feb-27, Amul Sul wrote: > > > Attached is the rebased patch set against the latest master head, > > which also includes a *new* refactoring patch (0001). In this patch, > > I’ve re-added ATExecAlterChildConstr(), which is required for the main > > feature patch (0008) to handle recursion from different places while > > altering enforceability. > > I think you refer to ATExecAlterConstrEnforceability, which claims > (falsely) that it is a subroutine to ATExecAlterConstrRecurse; in > reality it is called from ATExecAlterConstraintInternal or at least > that's what I see in your 0008. So I wonder if you haven't confused > yourself here. If nothing else, that comments needs fixed. I didn't > review these patches. > Yeah, that was intentional. I wanted to avoid recursion again by hitting ATExecAlterChildConstr() at the end of ATExecAlterConstraintInternal(). Also, I realized the value doesn’t matter since recurse = false is explicitly set inside the cmdcon->alterEnforceability condition. I wasn’t fully satisfied with how we handled the recursion decision (code design), so I’ll give it more thought. If I don’t find a better approach, I’ll add clearer comments to explain the reasoning. Regards, Amul
Re: Statistics Import and Export
> > +--- error: relation is wrong type > +SELECT pg_catalog.pg_restore_relation_stats( > +'relation', 0::oid, > +'relpages', 17::integer, > +'reltuples', 400.0::real, > +'relallvisible', 4::integer); > > Why do you need to specify all the stats (relpages, reltuples, etc)? > To exercise this you could just do: > select pg_catalog.pg_restore_relation_stats('relation', 0::oid); > In the above case, it's historical inertia in that the pg_set_* call required all those parameters, as well as a fear that the code - now or in the future - might evaluate "can anything actually change from this call" and short circuit out before actually trying to make sense of the reg_class oid. But we can assuage that fear with just one of the three stat parameters, and I'll adjust accordingly. > Since I haven't been following along with this feature development, I > don't think I can get comfortable enough with all of the changes in > this test diff to commit them. I can't really say if this is the set > of tests that is representative and sufficient for this feature. > That's fine, I hadn't anticipated that you'd review this patch, let alone commit it. > If you agree with me that the failure tests could be shorter, I'm > happy to commit that, but I don't really feel comfortable assessing > what the right set of full tests is. The set of tests is as short as I feel comfortable with. I'll give the parameter lists one more pass and repost.
Re: Log connection establishment timings
Hi, On Thu, Feb 27, 2025 at 11:14:56AM -0500, Andres Freund wrote: > I don't think the timing overhead is a relevant factor here - compared to the > fork of a new connection or performing authentication the cost of taking a few > timestamps is neglegible. A timestamp costs 10s to 100s of cycles, a fork many > many millions. Even if you have a really slow timestamp function, it's still > going to be way way cheaper. That's a very good point, it has to be put in perspective. The difference in scale is so significant that the timing collection shouldn't be a concern. Fair point! Now I'm thinking what about "if" the connection was on a multi-threaded model? I think we could reach the same conclusion as thread creation overhead is still substantial (allocating stack space, initializing thread state, and other kernel-level operations) as compare to a really slow timestamp function. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Licence preamble update
On Thu, Feb 27, 2025 at 04:56:05PM +, Dave Page wrote: > Per some brief discussion on the core list, the attached patch updates the > licence preamble to more accurately reflect the use of Postgres vs. > PostgreSQL (see https://www.postgresql.org/about/policies/project-name/ for > background from many years ago). > --- a/COPYRIGHT > +++ b/COPYRIGHT > @@ -1,5 +1,5 @@ > PostgreSQL Database Management System > -(formerly known as Postgres, then as Postgres95) > +(also known as Postgres, formerly as Postgres95) > > Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group I'm not seeing this change as aligned with https://www.postgresql.org/about/policies/project-name/, which says Postgres "is an alias or nickname and is not the official name of the project." The official product name did change Postgres -> Postgres95 -> PostgreSQL, with "Postgres" holding the status of a nickname since Postgres95 became the official name. Today's text matches that history, and the proposed text doesn't. Can you share more from the brief discussion? Changing a license file is an eyebrow-raising event, so we should do it only if the win is clear. There may be an argument for making this change, but I'm missing it currently.
Re: Reduce TupleHashEntryData struct size by half
On Thu, 13 Feb 2025 at 14:01, Jeff Davis wrote: > Attached a new patchset that doesn't change the API for > ExecCopySlotMinimalTuple(). Instead, I'm using > ExecFetchSlotMinimalTuple(), which avoids the extra memcpy if the slot > is TTSOpsMinimalTuple, which is what HashAgg uses. I separated out the > change to use ExecFetchSlotMinimalTuple() into 0004 to illustrate the > performance impact. Some review comments: * my_log2() takes a "long" parameter type but transitionSpace is a "Size". These types aren't the same width on all platforms we support. Maybe pg_nextpower2_size_t() is a better option. * Should the following have MAXALIGN(tupleSize) on the right side of the expression? tupleChunkSize = tupleSize I understand this was missing before, but both bump.c does consume at least MAXALIGN() in BumpAlloc(). * while (16 * maxBlockSize > work_mem * 1024L) The 1024L predates the change made in 041e8b95. "1024L" needs to be replaced with "(Size) 1024" Maybe you could just replace the while loop and the subsequent "if" check with: /* * Like CreateWorkExprContext(), use smaller sizings for smaller work_mem, * to avoid large jumps in memory usage. */ maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16); /* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */ maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE); /* and no smaller than ALLOCSET_DEFAULT_INITSIZE */ maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE); I believe that gives the same result without the looping. * In hash_create_memory(), can you get rid of the minContextSize and initBlockSize variables? * Is it worth an Assert() theck additionalsize > 0? * The amount of space available is the additionalsize requested in the call * to BuildTupleHashTable(). If additionalsize was specified as zero, no * additional space is available and this function should not be called. */ static inline void * TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry) { return (char *) entry->firstTuple - hashtable->additionalsize; } Benchmarks: I was also experimenting with the performance of this using the following test case: create table c (a int not null); insert into c select a from generate_Series(1,1000) a; vacuum freeze analyze c; Query: select a,count(*) from c group by a; Average TPS over 10x 10 second runs with -M prepared master: 3653.9853988 v7-0001: 3741.815979 v7-0001+0002: 3737.4313064 v7-0001+0002+0003: 3834.6271706 v7-0001+0002+0004+0004: 3912.1158887 I also tried out changing hash_agg_check_limits() so that it no longer calls MemoryContextMemAllocated and instead uses ->mem_allocated directly and with all the other patches got: v7-0001+0002+0004+0004+extra: 4049.0732381 We probably shouldn't do exactly that as it be better not to access that internal field from outside the memory context code. A static inline function in memutils.h that handles the non-recursive callers might be nice. I've attached my results of running your test in graph form. I also see a small regression for these small scale tests. I wondering if it's worth mocking up some code to see what the performance would be like without the additional memcpy() that's new to LookupTupleHashEntry_internal(). How about hacking something up that adds an additionalsize field to TupleDesc and then set that field to your additional size and have heap_form_minimal_tuple() allocate that much extra memory? David
Remove extra Sort node above a btree-compatible Index Scan
Hello hackers, While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1], I noticed some unexpected plan diffs when running the xtree and treeb test modules. Specifically, an additional Sort node appears on top of an Index Only Scan, even when the index key and sort key are the same. For example: --- src/test/modules/xtree/expected/interval.out +++ src/test/modules/xtree/results/interval.out @@ -375,10 +375,12 @@ SET enable_seqscan TO false; EXPLAIN (COSTS OFF) SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1; - QUERY PLAN - - Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1 -(1 row) +QUERY PLAN +-- + Sort + Sort Key: f1 + -> Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1 +(3 rows) I’ve attached a patch that removes this unnecessary Sort node for B-tree-compatible indexes. This change should: - Reduce the number of test failures in the xtree module from 43 to 30 - Reduce the size of regression.diffs from 234K to 123K Since pathkey comparison is a hot path in query planning and exercised by many test queries, I plan to gather more performance metrics. However, in a simple test running make installcheck with and without the patch, I observed no negative impact on the runtime of the regression test suite (which doesn’t include other btree-like indexes) and a positive impact on the same regression tests for xtree. Regression tests (same plans): w/o patch: make installcheck 1.36s user 2.21s system 12% cpu 28.018 total w/ patch: make installcheck 1.32s user 2.12s system 12% cpu 28.089 total xtree tests: w/o patch (inferior plan w/ extra sort node): make installcheck 1.52s user 2.42s system 10% cpu 36.817 total w/ patch (good plan): make installcheck 1.52s user 2.48s system 12% cpu 32.201 total I’ve marked the patch as no-cfbot, as it applies only on top of the aforementioned patch series [1]. Thoughts? [1] https://www.postgresql.org/message-id/a5dfb7cd-7a89-48ab-a913-e5304eee0854%40eisentraut.org Best, Alex v1-0001-Remove-unnecessary-Sort-node-above-Index-Scan-of-.patch.no-cfbot Description: Binary data
Re: Remove extra Sort node above a btree-compatible Index Scan
On Fri, Feb 28, 2025 at 12:23 AM Alexandra Wang wrote: > While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1], > I noticed some unexpected plan diffs when running the xtree and treeb > test modules. Specifically, an additional Sort node appears on top of > an Index Only Scan, even when the index key and sort key are the same. > For example: > > --- src/test/modules/xtree/expected/interval.out > +++ src/test/modules/xtree/results/interval.out What's the xtree module? There's no such test module? -- Peter Geoghegan
Re: Adding support for SSLKEYLOGFILE in the frontend
Thanks for the review, Daniel. Please find v5 of this patch attached. I tried to bump up the minimum OpenBSD version in installation.sgml, do I need to add this anywhere else? Please let me know if this needs anything else. Thanks On Thu, Feb 20, 2025 at 4:25 PM Daniel Gustafsson wrote: > > > On 20 Feb 2025, at 04:36, Abhishek Chanda wrote: > > Please find attached a new version of this patch. I rebased on master, > > added better error reporting and skipped the permissions check on > > windows. Please let me know if this needs any changes. I tested this > > locally using meson running all TAP tests. > > Thanks for the new version, a few comments on this one from reading: > > +./src/test/ssl/key.txt > The toplevel .gitignore should not be used for transient test output, there is > a .gitignore in src/test/ssl for that. The key.txt file should also be placed > in PostgreSQL::Test::Utils::tempdir and then cleaned up after the test. > > > + xreflabel="sslkeylogfile"> > The documentation should state that the file will use the NSS format. > > > + if (log_file == NULL) { > + libpq_append_conn_error(conn, "could not open ssl key log ... > + } > This raise an error for not being able to operate on the file, but it doesn't > error out from the function and instead keeps going with more operations on > the > file which couldn't be opened. > > > + if (chmod(conn->sslkeylogfile, 0600) == -1) { > Rather than opening and try to set permissions separately, why not use open(2) > and set the umask? We probably also want to set O_NOFOLLOW. > > > + if (conn->sslkeylogfile && strlen(conn->sslkeylogfile) > 0) > + SSL_CTX_set_keylog_callback(SSL_context, SSL_CTX_keylog_cb); > This callback does exist in OpenSSL 1.1.1 but it's only available in LibreSSL > from OpenBSD 7.1 and onwards where we support 7.0 as the earliest version. > This feature seems like a good enough reason to bump the minimum LibreSSL > version, and 7.1 is well out support (7.5 goes EOL next), but it would need to > get done here. > > > +# Skip file mode check on Windows > +return 1 if $windows_os; > It should be up to each individual test to determine what to skip, not the > library code (and it should use SKIP:). src/test/ssl/t/001_ssltests.pl is an > example which skips permissions checks on Windows. Also, I'm not sure we need > a generic function in the testlibrary for something so basic. > > > +print STDERR sprintf("%s mode must be %04o\n", > + $path, $expected_file_mode); > Test code should not print anything to stdout/stderr, it should use the TAP > compliant methods like diag() etc for this. > > > + "connect with server root certand sslkeylogfile=key.txt"); > s/certand/cert and/ ? > > -- > Daniel Gustafsson > -- Thanks and regards Abhishek Chanda v5-0001-Add-support-for-dumping-SSL-keylog-to-a-file.patch Description: Binary data
Re: Log connection establishment timings
Hi, On Thu, Feb 27, 2025 at 11:08:04AM -0500, Melanie Plageman wrote: > I was just talking to Andres off-list and he mentioned that the volume > of log_connections messages added in recent releases can really be a > problem for users. He said ideally we would emit one message which > consolidated these (and make sure we did so for failed connections too > detailing the successfully completed stages). > > However, since that is a bigger project (with more refactoring, etc), > he suggested that we change log_connections to a GUC_LIST > (ConfigureNamesString) with options like "none", "received, > "authenticated", "authorized", "all". > > Then we could add one like "established" for the final message and > timings my patch set adds. I think the overhead of an additional log > message being emitted probably outweighs the overhead of taking those > additional timings. > > String GUCs are a lot more work than enum GUCs, so I was thinking if > there is a way to do it as an enum. > > I think we want the user to be able to specify a list of all the log > messages they want included, not just have each one include the > previous ones. So, then it probably has to be a list right? There is > no good design that would fit as an enum. Interesting idea... Yeah, that would sound weird with an enum. I could think about providing an enum per possible combination but I think that would generate things like 2^N enum and won't be really user friendly (also that would double each time we'd want to add a new possible "value" becoming quickly unmanageable). So yeah, I can't think of anything better than GUC_LIST. > > I think that's somehow also around code maintenance (not only LOC), say for > > example > > if we want to add more "child_type" in the check (no need to remember to > > update both > > locations). > > I didn't include checking the child_type in that function since it is > unrelated to instr_time, so it sadly wouldn't help with that. We could > macro-ize the child_type check were we to add another child_type. Yup but my idea was to put all those line: " if (Log_connections && (child_type == B_BACKEND || child_type == B_WAL_SENDER)) { instr_time fork_time = ((BackendStartupData *) startup_data)->fork_time; conn_timing.fork_duration = INSTR_TIME_GET_DURATION_SINCE(fork_time); } " into a dedicated helper function. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [BUG]: the walsender does not update its IO statistics until it exits
On Wed, Feb 26, 2025 at 09:48:50AM +, Bertrand Drouvot wrote: > Yeah I think that makes sense, done that way in the attached. > > Speaking about physical walsender, I moved the test to 001_stream_rep.pl > instead > (would also fail without the fix). Hmm. I was doing some more checks with this patch, and on closer look I am wondering if the location you have chosen for the stats reports is too aggressive: this requires a LWLock for the WAL sender backend type taken in exclusive mode, with each step of WalSndLoop() taken roughly each time a record or a batch of records is sent. A single installcheck with a primary/standby setup can lead to up to 50k stats report calls. With smaller records, the loop can become hotter, can't it? Also, there can be a high number of WAL senders on a single node, and I've heard of some customers with complex logical decoding deployments with dozens of logical WAL senders. Isn't there a risk of having this code path become a point of contention? It seems to me that we should benchmark this change more carefully, perhaps even reduce the frequency of the report calls. -- Michael signature.asc Description: PGP signature
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
Dear Amit, > I understand that we may not have a clear use case for this to work in > single-user mode. But how will we define the boundary in similar > cases? I mean, we should have some rule for such exposed functions, > and it should be followed uniformly. Now, if one needs a bigger or > complex change to make the function work in single-user mode, then it > is easy to take an exception from what is currently being followed in > the code. However, if the change is as trivial as you proposed in the > first email, why not go with that and make this function work in > single-user mode? Hmm, the opinion about this is completely opposite with other reviewers. I want to ask them again how you feel. I also added Tom who pointed out in the initial thread. Question: how you feel to restrict SQL functions for slot during the single-user mode? Nobody has considered use cases for it; we do not have concrete theory and similar cases. And needed band-aid to work seems very small. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
On Thu, Feb 27, 2025 at 1:29 PM Hayato Kuroda (Fujitsu) wrote: > > > > > Which other functions do we see similar restrictions? I checked > > "sequence manipulation functions" (1), and "Transaction ID and > > Snapshot Information Functions" (2) but couldn't see similar > > restrictions. > > > > (1) - https://www.postgresql.org/docs/current/functions-sequence.html > > (2) - > > https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INF > > O-SNAPSHOT > > I grepped sources and could not find explicit limitations neither. So...this > might > be overkill. > Yes, this is my worry. > But I still think the restriction is OK for the slot - no need to do > some efforts for accepting single-user mode, just close the cover. > I understand that we may not have a clear use case for this to work in single-user mode. But how will we define the boundary in similar cases? I mean, we should have some rule for such exposed functions, and it should be followed uniformly. Now, if one needs a bigger or complex change to make the function work in single-user mode, then it is easy to take an exception from what is currently being followed in the code. However, if the change is as trivial as you proposed in the first email, why not go with that and make this function work in single-user mode? -- With Regards, Amit Kapila.
Re: Remove extra Sort node above a btree-compatible Index Scan
Alexandra Wang writes: > I’ve attached a patch that removes this unnecessary Sort node for > B-tree-compatible indexes. This does not look right at all. You can't just ignore the opfamily etc. while deciding whether two pathkeys represent the same sort ordering, as you did in create_mergejoin_plan(). I don't like pathkeys_have_same_sortop() either. The pathkey data structures were designed to let pointer comparison be sufficient for deciding whether pathkeys are equivalent: see the "canonical pathkey" stuff in pathkeys.c. I think that this patch may be band-aiding over some breakage of that concept, but you've not provided enough context to figure out what the underlying problem is. regards, tom lane
RE: SIMD optimization for list_sort
Hi All, Thank you so much for the feedback. > I don't think "another extension might use it someday" makes a very strong > case, > particularly for something that requires a new dependency. The x86-simdsort library is an optional dependency in Postgres. Also the new list sort implementation which uses the x86-simdsort library does not impact any of the existing workflows in Postgres. We have at least one use case where Pgvector gets a gain of 7 - 10 % in HNSW index build time using the SIMD sort implementation. Hence we feel that this patch could be up streamed further. Open to further discussion on the same. > Note that our current implemention is highly optimized for low-cardinality > inputs. > This is needed for aggregate queries. I found this write-up of a > couple scalar and vectorized sorts, and they show this library doing > very poorly on very-low cardinality inputs. I would look into that > before trying to get something in shape to share. > > https://github.com/Voultapher/sort-research- > rs/blob/main/writeup/intel_avx512/text.md We ran our extension to stress list sort with low cardinality inputs. For eg, for an array of size 100k having repeated values in the range 1-10 we still see a gain of around 20% in throughput. We will collect more data for low cardinality inputs and with AVX2 too. Thanks and regards Rakshit Rajeev
Re: Get rid of WALBufMappingLock
On Wed, Feb 26, 2025 at 01:48:47PM +0200, Alexander Korotkov wrote: > Thank you for offering the help. Updated version of patch is attached > (I've added one memory barrier there just in case). I would > appreciate if you could run it on batta, hachi or similar hardware. Doing a revert of the revert done in 3fb58625d18f proves that reproducing the error is not really difficult. I've done a make installcheck-world USE_MODULE_DB=1 -j N without assertions, and the point that saw a failure quickly is one of the tests of pgbench: PANIC: could not find WAL buffer for 0/19D366 This one happened for the test "concurrent OID generation" and CREATE TYPE. Of course, as it is a race condition, it is random, but it's taking me only a couple of minutes to see the original issue on my buildfarm host. With assertion failures enabled, same story, and same failure from the pgbench TAP test. Saying that, I have also done similar tests with your v12 for a couple of hours and this looks stable under installcheck-world. I can see that you've reworked quite a bit the surroundings of InitializedFrom in this one. If you apply that once again at some point, the buildfarm will be judge in the long-term, but I am rather confident by saying that the situation looks better here, at least. One thing I would consider doing if you want to gain confidence is tests like the one I saw causing problems with pgbench, with DDL patterns stressing specific paths like this CREATE TYPE case. -- Michael signature.asc Description: PGP signature
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
On Thu, Dec 19, 2024 at 8:29 AM Robert Haas wrote: > Cool. I don't want to commit this right before my vacation but I'll > look into it in the new year if nobody beats me to it. Nobody has beaten me to it, but I thought of one potential issue. Suppose that somebody has a foreign table that exists today with one of these properties set to the empty string. Such a foreign table is not usable for queries, because the queries won't make it past scan.l on the remote side, but it's allowed to exist. With this patch, it's not allowed to exist any more. That would be fine if no such objects exist in the wild, but if they do, people will get a pg_dump or pg_upgrade failure when they try to upgrade to a release that includes this change. Does that mean that we shouldn't commit this patch? I'm inclined to think that it's probably still OK, because (1) few users are likely to have such objects, (2) the workaround is easy, just drop the object or fix it to do something useful, just as users already need to do in plenty of other, similar cases and (3) unclear error messages are not great and future users might benefit from this error message being better. However, one could take the opposite position and say that (C1) it is unlikely that anyone will try to do this in the first place and (C2) the error message that is currently generated when you try to do this isn't really all that unclear, and therefore it's not worth creating even a minor dump-and-reload hazard; and that position also seems pretty defensible to me. Other opinions? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan wrote: > I need more feedback about this. I don't understand your perspective here. > > If I commit the skip scan patch, but don't have something like this > instrumentation in place, it seems quite likely that users will > complain about how opaque its behavior is. While having this > instrumentation isn't quite a blocker to committing the skip scan > patch, it's not too far off, either. I want to be pragmatic. Any > approach that's deemed acceptable is fine by me, provided it > implements approximately the same behavior as the patch that I wrote > implements. > > Where is this state that tracks the number of index searches going to > live, if not in IndexScanDesc? I get why you don't particularly care > for that. But I don't understand what the alternative you favor looks > like. +1 for having some instrumentation. I do not agree with Tom that these are numbers that only Peter Geoghegan and 2-3 other people will ever understand. I grant that it's not going to make sense to everyone, but the number of people to which it will make sense I would guess is probably in the hundreds or the thousands rather than the single digits. Good documentation could help. So, where should we store that information? The thing that's odd about using IndexScanDesc is that it doesn't contain any other instrumentation currently, or at least not that I can see. Everything else that EXPLAIN prints in terms of index scan is printed by show_instrumentation_count() from planstate->instrument. So it seems reasonable to think maybe this should be part of planstate->instrument, too, but there seem to be at least two problems with that idea. First, that struct has just four counters (ntuples, ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of them are in use -- a plain index scan does not use ntuples2 -- but I think this optimization can apply to both index and index-only scans, so we just don't have room. Second, those existing counters are used for things that we can count in the executor, but the executor won't know how many index searches occur down inside the AM. So I see the following possibilities: 1. Add a new field to struct Instrumentation. Make index_getnext_slot() and possibly other similar functions return a count of index searches via a new parameter, and use that to bump the new struct Instrumentation counter. It's a little ugly to have to add a special-purpose parameter for this, but it doesn't look like there are tons and tons of calls so maybe it's OK. 2. Add a new field to BufferUsage. Then the AM can bump this field and explain.c will know about it the same way it knows about other changes to pgBufferUsage. However, this is not really about buffer usage and the new field seems utterly unlike the other things that are in that structure, so this seems really bad to me. 3. Add a new field to IndexScanDesc, as you originally proposed. This seems similar to #1: it's still shoveling instrumentation data around in a way that we don't currently, but instead of shoveling it through a new parameter, we shovel it through a new structure member. Either way, the new thing (parameter or structure member) doesn't really look like it belongs with what's already there, so it seems like conservation of ugliness. 4. Add a new field to the btree-specific structure referenced by the IndexScanDesc's opaque pointer. I think this is what Matthias was proposing. It doesn't seem particularly hard to implement. and seems similar to #1 and #3. It is not clear to me that any of #1, #3, and #4 are radically better than any of the other ones, with the following exception: it would be a poor idea to choose #4 over #3 if this field will ultimately be used for a bunch of different AMs, and a poor idea to choose #3 over #4 if it's always going to be interesting only for btree. I'll defer to you on which of those things is the case, but with the request that you think about what is practically likely to happen and not advocate too vigorously based on an argument that makes prominent use of the phrase "in theory". To be honest, I don't really like any of these options very much: they all seem a tad inelegant. But sometimes that is a necessary evil when inventing something new. I believe if I were implementing this myself I would probably try #1 first; if that ended up seeming too ugly, then I would fall back to #3 or #4. Does that help? -- Robert Haas EDB: http://www.enterprisedb.com
RE: Improve CRC32C performance on SSE4.2
Hi John, Going back to your earlier comment: > > > I'm not a fan of exposing these architecture-specific details to > > > places that consult the capabilities. That requires things like +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42) > > Isn't that one thing currently pg_cpu_have(FEATURE)? > > No, I mean the one thing would be "do we have hardware CRC?", and each > architecture would set (and check if needed) the same slot. > > I'm not 100% sure this is the better way, and there may be a disadvantage I > haven't thought of, but this makes the most sense to me. I'm willing to hear > reasons to do it differently, but I'm thinking those reasons need to be > weighed > against the loss of abstraction. IMO, CPU SIMD (SSE4.2, AVX, etc.) features are a module of their own separate from capabilities/features supported in postgres (CRC32C, bitcount, etc.). Exposing architecture-specific details to the features that need to check against them is a way to make code more modular and reusable. It is reasonable to expect developers writing SIMD specific functions to naturally understand the runtime architecture requirements for that function which is easily accessible by just checking the value of pg_cpu_have(PG_CPU_FEATURE_..). If another feature in postgres requires SSE4.2, then the CPU initialization module doesn't need to be modified at all. They just have to worry about their feature and its CPU requirements. > Just so we're on the same page, the current separate files do initialization, > not > dispatch. I'm seeing conflicting design goals > here: > > 1. Have multiple similar dispatch functions with the only difference (for now) > being certain names (not in the patch, just discussed). > 2. Put initialization routines in the same file, even though they have > literally > nothing in common. Sorry for not being clear. I will move the initialization routines to separate files. Just haven’t gotten to it yet with v10. > > > This also makes it easier to add more implementations in the future without > having to make the dispatch function work for both ARM and x86. > > I don't quite understand, since with 0001 it already works (at least in > theory) for 3 > architectures with hardware CRC, plus those without, and I don't see it > changing > with more architectures. The reason its working across all architectures as of now is because there is only one runtime check for CRC32C feature across x86, ARM and LongArch. (PCLMUL dispatch happens inside the SSE42). If and when we add the AVX-512 implementation, won't the dispatch logic will look different for x86 and ARM? Along with that, the header file pg_crc32c.h will also look a lot messier with a whole bunch of nested #ifdefs. IMO, the header file should be devoid of any architecture dispatch logic and simply contain declarations of various implementations (see https://github.com/r-devulap/postgressql/blob/cf80f7375f24d2fb5cd29e95dc73d4988fc6d066/src/include/port/pg_crc32c.h for example). The dispatch logic should be handled separately in a C file. > > +/* Access to pg_cpucap for modules that need runtime CPUID information > +*/ bool pg_cpu_have(int feature_id) { > +return pg_cpucap[feature_id]; > } > > Putting this in a separate translation may have to do the decision to turn > v8's > #defines into an enum? In any case, this means doing a function call to decide > which function to call. That makes no sense. The reason it is a separate translational unit is because I intended pg_cpucap to be a read only variable outside of pg_cpucap.c. If the function overhead is not preferred, then I can get rid of it. > > The goal of v9/10 was to centralize the initialization from cpuid etc, but it > also did > a major re-architecture of the runtime-checks at the same. That made review > more difficult. I assumed we wanted to move the runtime checks to a central place and that needed this re-architecture. > > +#if defined(__x86_64__) || defined ... > +pg_cpucap_x86(); > +#elif defined(__aarch64__) || defined ... > +pg_cpucap_arm(); > +#endif > > I view this another conflict in design goals: After putting everything in the > same > file, the routines still have different names and have to be guarded > accordingly. I > don't really see the advantage, especially since these guard symbols don't > even > match the ones that guard the actual initialization steps. I tried to imply > that in > my last review, but maybe I should have been more explicit. > > I think the least painful step is to take the x86 initialization from v10, > which is > looking great, but > - keep separate initialization files > - don't whack around the runtime representation, at least not in the same > patch Sure. I can fix that in v11. Raghuveer
Re: Update docs for UUID data type
On Thu, Feb 27, 2025 at 1:26 PM Andy Alsup wrote: > > I've submitted it for the up-coming commitfest. The link is: > https://commitfest.postgresql.org/patch/5604/ > Thanks for all your help in reviewing these changes. Thank you for the patch! Regarding the 0001 patch, I think we can put uuidv4() and get_random_uuid() in the same row since they are effectively identical functions. For example, we have precedent such as char_length() and character_length(). The 0002 patch looks good to me. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Small memory fixes for pg_createsubcriber
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote: > Yeah, I also think it would look good like this. It's the least confusing option, indeed. I've reduced a bit the diffs and done that down to v16 for the pg_upgrade part where this exists. Double-checking the tree, it does not seem that we have simolar holes now.. I hope that I'm not wrong. -- Michael signature.asc Description: PGP signature
Re: new commitfest transition guidance
On Thu, Feb 27, 2025 at 11:14 AM Tom Lane wrote: > Robert Haas writes: > > But let's not make the mistake of saying "we're not going to move > > things automatically because we want to find out if the authors are > > still interested" and then getting really concerned when some stuff > > doesn't get moved. That's missing the whole point. > > +1. Having a significant fraction of patches drop off the radar > was the desired outcome, wasn't it? Well, not if the authors really are still actively caring about them. But there's been a funny evolution of our thinking about CommitFests over the years. When I first started managing CommitFests, I evicted patches that the author didn't update -- following a review -- within four days. Not four business days - four calendar days. After all, it was a CommitFest -- it was supposed to be for patches that were ready to be committed. That policy was, I think, viewed by many as too draconian, and it probably was. But now we've gotten to a point where the one month gap between CommitFest N and CommitFest N+1 is thought to be so short that it might be unreasonable to expect a patch author to move their patch forward sometime during that time. And I think that's clearly going too far the other way. Perhaps even way, way too far the other way. I think it's completely fine if somebody has a patch that they update occasionally as they have and it putters along for a few years and it finally either gets committed or it doesn't. I one hundred percent support part-time developers having the opportunity to participate as and when their schedule permits and I think that is an important part of being a good and welcoming open source community. But there are also people who are working very hard and very actively to progress patches and staying on top of the CF status and any new review emails every day, and that is ALSO a great way to do development, and it's reasonable to treat those cases differently. I'm not sure exactly how we can best do that, but it makes my head hurt every time I find something in the CF where the patch author was like "well, I'll get back to this at some point" and then three CFs later it's still sitting there in "Needs Review" or something. Probably not moving things forward automatically is only one part of that problem -- we could very much use some better tooling for judging how active certain patches are and, if not so active, whether that's more a lack of reviewers or more author inactivity. But we're not doing ourselves any favors by working super-hard to keep everything that anybody might potentially care about in the same bucket as things that are actually active. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Confine vacuum skip logic to lazy_scan_skip
Hi. Em ter., 18 de fev. de 2025 às 11:31, Melanie Plageman < melanieplage...@gmail.com> escreveu: > On Sun, Feb 16, 2025 at 1:12 PM Tom Lane wrote: > > > > Thomas Munro writes: > > > Thanks! It's green again. > > > > The security team's Coverity instance complained about this patch: > > > > *** CID 1642971: Null pointer dereferences (FORWARD_NULL) > > > /srv/coverity/git/pgsql-git/postgresql/src/backend/access/heap/vacuumlazy.c: > 1295 in lazy_scan_heap() > > 1289buf = read_stream_next_buffer(stream, > &per_buffer_data); > > 1290 > > 1291/* The relation is exhausted. */ > > 1292if (!BufferIsValid(buf)) > > 1293break; > > 1294 > > >>> CID 1642971: Null pointer dereferences (FORWARD_NULL) > > >>> Dereferencing null pointer "per_buffer_data". > > 1295blk_info = *((uint8 *) per_buffer_data); > > 1296CheckBufferIsPinnedOnce(buf); > > 1297page = BufferGetPage(buf); > > 1298blkno = BufferGetBlockNumber(buf); > > 1299 > > 1300vacrel->scanned_pages++; > > > > Basically, Coverity doesn't understand that a successful call to > > read_stream_next_buffer must set per_buffer_data here. I don't > > think there's much chance of teaching it that, so we'll just > > have to dismiss this item as "intentional, not a bug". > > Is this easy to do? Like is there a list of things from coverity to ignore? > > > I do have a suggestion: I think the "per_buffer_data" variable > > should be declared inside the "while (true)" loop not outside. > > That way there is no chance of a value being carried across > > iterations, so that if for some reason read_stream_next_buffer > > failed to do what we expect and did not set per_buffer_data, > > we'd be certain to get a null-pointer core dump rather than > > accessing data from a previous buffer. > > Done and pushed. Thanks! > Per Coverity. CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL) 8. var_deref_op: Dereferencing null pointer per_buffer_data. I think that function *read_stream_next_buffer* can return a invalid per_buffer_data pointer, with a valid buffer. Sorry if I'm wrong, but the function is very suspicious. Attached a patch, which tries to fix. best regards, Ranier Vilela fix-possible-invalid-pointer-read_stream.patch Description: Binary data
Re: Confine vacuum skip logic to lazy_scan_skip
Hi, On 2025-02-27 14:32:28 -0300, Ranier Vilela wrote: > Hi. > > Em ter., 18 de fev. de 2025 às 11:31, Melanie Plageman < > melanieplage...@gmail.com> escreveu: > > > On Sun, Feb 16, 2025 at 1:12 PM Tom Lane wrote: > > > > > > Thomas Munro writes: > > > > Thanks! It's green again. > > > > > > The security team's Coverity instance complained about this patch: > > > > > > *** CID 1642971: Null pointer dereferences (FORWARD_NULL) > > > > > /srv/coverity/git/pgsql-git/postgresql/src/backend/access/heap/vacuumlazy.c: > > 1295 in lazy_scan_heap() > > > 1289buf = read_stream_next_buffer(stream, > > &per_buffer_data); > > > 1290 > > > 1291/* The relation is exhausted. */ > > > 1292if (!BufferIsValid(buf)) > > > 1293break; > > > 1294 > > > >>> CID 1642971: Null pointer dereferences (FORWARD_NULL) > > > >>> Dereferencing null pointer "per_buffer_data". > > > 1295blk_info = *((uint8 *) per_buffer_data); > > > 1296CheckBufferIsPinnedOnce(buf); > > > 1297page = BufferGetPage(buf); > > > 1298blkno = BufferGetBlockNumber(buf); > > > 1299 > > > 1300vacrel->scanned_pages++; > > > > > > Basically, Coverity doesn't understand that a successful call to > > > read_stream_next_buffer must set per_buffer_data here. I don't > > > think there's much chance of teaching it that, so we'll just > > > have to dismiss this item as "intentional, not a bug". > > > > Is this easy to do? Like is there a list of things from coverity to ignore? > > > > > I do have a suggestion: I think the "per_buffer_data" variable > > > should be declared inside the "while (true)" loop not outside. > > > That way there is no chance of a value being carried across > > > iterations, so that if for some reason read_stream_next_buffer > > > failed to do what we expect and did not set per_buffer_data, > > > we'd be certain to get a null-pointer core dump rather than > > > accessing data from a previous buffer. > > > > Done and pushed. Thanks! > > > Per Coverity. > > CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL) > 8. var_deref_op: Dereferencing null pointer per_buffer_data. That's exactly what the messages you quoted are discussing, no? > Sorry if I'm wrong, but the function is very suspicious. How so? > diff --git a/src/backend/storage/aio/read_stream.c > b/src/backend/storage/aio/read_stream.c > index 04bdb5e6d4..18e9b4f3c4 100644 > --- a/src/backend/storage/aio/read_stream.c > +++ b/src/backend/storage/aio/read_stream.c > @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void > **per_buffer_data) > > READ_BUFFERS_ISSUE_ADVICE : 0))) > { > /* Fast return. */ > + if (per_buffer_data) > + *per_buffer_data = > get_per_buffer_data(stream, oldest_buffer_index); > return buffer; > } A few lines above: Assert(stream->per_buffer_data_size == 0); The fast path isn't used when per buffer data is used. Adding a check for per_buffer_data and assigning something to it is nonsensical. Greetings, Andres Freund
Re: Log connection establishment timings
Hi, On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote: > I was just talking to Andres off-list and he mentioned that the volume > of log_connections messages added in recent releases can really be a > problem for users. For some added color: I've seen plenty systems where almost all the log volume is log_connection messages, which they *have* to enable for various regulatory reasons. It'd still be a lot if we just emitted one message for each connection, but logging three (and possibly four with $thread), for each connection makes it substantially worse. > He said ideally we would emit one message which consolidated these (and make > sure we did so for failed connections too detailing the successfully > completed stages). A combined message would also not *quite* replace all use-cases, e.g. if you want to debug arriving connections or auth problems you do want the additional messages. But yea, for normal operation, I do think most folks want just one message. > However, since that is a bigger project (with more refactoring, etc), > he suggested that we change log_connections to a GUC_LIST > (ConfigureNamesString) with options like "none", "received, > "authenticated", "authorized", "all". Yep. > Then we could add one like "established" for the final message and > timings my patch set adds. I think the overhead of an additional log > message being emitted probably outweighs the overhead of taking those > additional timings. To bikeshed a bit: "established" could be the TCP connection establishment just as well. I'd go for "completed" or "timings". > String GUCs are a lot more work than enum GUCs, so I was thinking if > there is a way to do it as an enum. > > I think we want the user to be able to specify a list of all the log > messages they want included, not just have each one include the > previous ones. So, then it probably has to be a list right? There is > no good design that would fit as an enum. I don't see a way to comfortably shove this into an enum either. Greetings, Andres Freund
Re: Confine vacuum skip logic to lazy_scan_skip
Hi, On 2025-02-27 12:44:24 -0500, Andres Freund wrote: > > CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL) > > 8. var_deref_op: Dereferencing null pointer per_buffer_data. > > That's exactly what the messages you quoted are discussing, no? Ah, no, it isn't. But I still think the coverity alert and the patch don't make sense, as per the below: > > diff --git a/src/backend/storage/aio/read_stream.c > > b/src/backend/storage/aio/read_stream.c > > index 04bdb5e6d4..18e9b4f3c4 100644 > > --- a/src/backend/storage/aio/read_stream.c > > +++ b/src/backend/storage/aio/read_stream.c > > @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void > > **per_buffer_data) > > > > READ_BUFFERS_ISSUE_ADVICE : 0))) > > { > > /* Fast return. */ > > + if (per_buffer_data) > > + *per_buffer_data = > > get_per_buffer_data(stream, oldest_buffer_index); > > return buffer; > > } > > A few lines above: > Assert(stream->per_buffer_data_size == 0); > > The fast path isn't used when per buffer data is used. Adding a check for > per_buffer_data and assigning something to it is nonsensical. Greetings, Andres Freund
Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
On Sat, Feb 8, 2025 at 8:47 AM Michail Nikolaev wrote: > Just some commit messages + few cleanups. I'm worried about this: +These longer pin lifetimes can cause buffer exhaustion with messages like "no +unpinned buffers available" when the index has many pages that have similar +ordering; but future work can figure out how to best work that out. I think that we should have some kind of upper bound on the number of pins that can be acquired at any one time, in order to completely avoid these problems. Solving that problem will probably require GiST expertise that I don't have right now. -- Peter Geoghegan
Re: Statistics Import and Export
I know I'm coming late to this, but I would like us to rethink having statistics dumped by default. I was caught by this today, as I was doing two dumps in a row, but the output changed between runs solely because the stats got updated. It got me thinking about all the use cases of pg_dump I've seen over the years. I think this has the potential to cause a lot of problems for things like automated scripts. It certainly violates the principle of least astonishment to have dumps change when no user interaction has happened. Alternatively, we could put stats into SECTION_POST_DATA, No, that would make the above-mentioned problem much worse. Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support
Re: Disabling vacuum truncate for autovacuum
On Mon, Jan 27, 2025 at 1:55 AM Laurenz Albe wrote: > > On Thu, 2025-01-23 at 22:33 -0800, Gurjeet Singh wrote: > > > > I am also wondering if having an autovacuum setting to control it would > > > > be > > > > a good idea for a feature. > > > > > > I'm all for that. > > > > Please see attached an initial patch to disable truncation behaviour in > > autovacuum. This patch retains the default behavior of autovacuum truncating > > relations. The user is allowed to change the behaviour and disable relation > > truncations system-wide by setting autovacuum_disable_vacuum_truncate = > > true. > > Better parameter names welcome :-) > > I hope it is possible to override the global setting with the > "vacuum_truncate" > option on an individual table. Current patch behaviour is that if the autovacuum_vacuum_truncate is false, then autovacuum will _not_ truncate any relations. If the parameter's value is true (the default), then the relation's reloption will be honored. A table-owner, or the database-owner, may enable truncation of a table, as they may be trying to be nice and return the unused disk space back to the OS/filesystem. But if the sysadmin/DBA (who is ultimately responsible for the health of the entire db instance, as well as of any replicas of the db instance), wants to disable truncation across all databases to ensure that the replication does not get stuck, then IMHO Postgres should empower the sysadmin to make that decision, and override the relation-level setting enabled by the table- owner or the database-owner. > > One additional improvement I can think of is to emit a WARNING or NOTICE > > message > > that truncate operation is being skipped, perhaps only if the truncation > > would've freed up space over a certain threshold. > > Interesting idea, but I think it is independent from this patch. Agreed. I'll consider writing a separate patch for this. > > Perhaps there's value in letting this parameter be specified at database > > level, > > but I'm not able to think of a reason why someone would want to disable this > > behaviour on just one database. So leaving the parameter context to be the > > same > > as most other autovacuum parameters: SIGHUP. > > I can imagine setting that on only a certain database. Different databases > typically have different applications, which have different needs. Makes sense. I don't think anything special needs to be done in the patch to address this. > Eventually, the patch should have documentation and regression tests. Documentation added. Pointers on if, where, and what kind of tests to add will be appreciated. On Mon, Jan 27, 2025 at 12:38 PM Robert Haas wrote: > > On Mon, Jan 27, 2025 at 4:55 AM Laurenz Albe wrote: > > My suggestion for the parameter name is "autovacuum_disable_truncate". > > Then it would have a different name than the reloption, and the > opposite sense. In most cases, we've been able to keep those matching > -- autovacuum vs. autovacuum_enabled being, I believe, the only > current mismatch. If we want to maintain the convention of autovacuum parameters names to be of the format "autovacuum_" then I believe the name autovacuum_vacuum_truncate (boolean) would be better, as compared to my original proposal (autovacuum_disable_vacuum_truncate), or Laurenz's proposal above. The default value should be true, to match the current autovacuum behaviour. > Also, how sure are we that turning this off globally is a solid idea? > Off-hand, it doesn't sound that bad: there are probably situations in > which autovacuum never truncates anything anyway just because the tail > blocks of the relations always contain at least 1 tuple. But we should > be careful not to add a setting that is far more likely to get people > into trouble than to get them out of it. It would be good to hear what > other people think about the risk vs. reward tradeoff in this case. Taking silence from others to be a sign of no opposition, I'm moving forward with the patch. On Tue, Feb 18, 2025 at 11:56 AM Nathan Bossart wrote: > > On Mon, Jan 27, 2025 at 03:38:39 PM -0500, Robert Haas wrote: > > Also, how sure are we that turning this off globally is a solid idea? > In any case, it's > already possible to achieve $SUBJECT with a trivial script that sets > storage parameters on all tables, so IMHO it would be silly to withhold a > global setting that does the same thing just on principle. +1 For documentation of this GUC, I borrowed heavily from the relevant sections of CREATE TABLE and VACUUM docs. There are 3 ways I wrote one of the sentences in the docs. I picked the last one, as it is concise and clearer than the others. If others feel a different choice of words would be better, I'm all ears. If false, autovacuum will not perform the truncation, even if the vacuum_truncate option has been set to true for the table being processed. If false, autovacuum will not perform the truncation, and it ignores the v
Re: Statistics Import and Export
On Thu, Feb 27, 2025 at 10:01 PM Corey Huinker wrote: > +--- error: relation is wrong type >> +SELECT pg_catalog.pg_restore_relation_stats( >> +'relation', 0::oid, >> +'relpages', 17::integer, >> +'reltuples', 400.0::real, >> +'relallvisible', 4::integer); >> >> Why do you need to specify all the stats (relpages, reltuples, etc)? >> To exercise this you could just do: >> select pg_catalog.pg_restore_relation_stats('relation', 0::oid); >> > > In the above case, it's historical inertia in that the pg_set_* call > required all those parameters, as well as a fear that the code - now or in > the future - might evaluate "can anything actually change from this call" > and short circuit out before actually trying to make sense of the reg_class > oid. But we can assuage that fear with just one of the three stat > parameters, and I'll adjust accordingly. > * reduced relstats parameters specified to the minimum needed to verify the error and avoid a theoretical future logic short-circuit described above. * version parameter usage reduced to absolute minimum - verifying that it is accepted and ignored, though Melanie's patch may introduce a need to bring it back in a place or two. 84 lines deleted. Not great, not terrible. I suppose if we really trusted the TAP test databases to have "one of everything" in terms of tables with all the datatypes, and sufficient rows to generate interesting stats, plus some indexes of each, then we could get rid of those two, but I feel very strongly that it would be a minor savings at a major cost to clarity. From d0dfca62eb8ce27fb4bfce77bd2ee835738899a8 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Wed, 26 Feb 2025 21:02:44 -0500 Subject: [PATCH v2] Organize and deduplicate statistics import tests. Many changes, refactorings, and rebasings have taken their toll on the statistics import tests. Now that things appear more stable and the pg_set_* functions are gone in favor of using pg_restore_* in all cases, it's safe to remove duplicates, combine tests where possible, and make the test descriptions a bit more descriptive and uniform. Additionally, parameters that were not strictly needed to demonstrate the purpose(s) of a test were removed to reduce clutter. --- src/test/regress/expected/stats_import.out | 680 - src/test/regress/sql/stats_import.sql | 554 +++-- 2 files changed, 466 insertions(+), 768 deletions(-) diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out index 1f150f7b08d..7bd7bfb3e7b 100644 --- a/src/test/regress/expected/stats_import.out +++ b/src/test/regress/expected/stats_import.out @@ -13,19 +13,48 @@ CREATE TABLE stats_import.test( tags text[] ) WITH (autovacuum_enabled = false); CREATE INDEX test_i ON stats_import.test(id); +-- +-- relstats tests +-- +--- error: relation is wrong type +SELECT pg_catalog.pg_restore_relation_stats( +'relation', 0::oid, +'relpages', 17::integer); +WARNING: argument "relation" has type "oid", expected type "regclass" +ERROR: "relation" cannot be NULL +-- error: relation not found +SELECT pg_catalog.pg_restore_relation_stats( +'relation', 0::oid::regclass, +'relpages', 17::integer); +ERROR: could not open relation with OID 0 +-- error: odd number of variadic arguments cannot be pairs +SELECT pg_restore_relation_stats( +'relation', 'stats_import.test'::regclass, +'relallvisible'); +ERROR: variadic arguments must be name/value pairs +HINT: Provide an even number of variadic arguments that can be divided into pairs. +-- error: argument name is NULL +SELECT pg_restore_relation_stats( +'relation', 'stats_import.test'::regclass, +NULL, '17'::integer); +ERROR: name at variadic position 3 is NULL +-- error: argument name is not a text type +SELECT pg_restore_relation_stats( +'relation', '0'::oid::regclass, +17, '17'::integer); +ERROR: name at variadic position 3 has type "integer", expected type "text" -- starting stats SELECT relpages, reltuples, relallvisible FROM pg_class -WHERE oid = 'stats_import.test'::regclass; +WHERE oid = 'stats_import.test_i'::regclass; relpages | reltuples | relallvisible --+---+--- -0 |-1 | 0 +1 | 0 | 0 (1 row) -BEGIN; -- regular indexes have special case locking rules -SELECT -pg_catalog.pg_restore_relation_stats( +BEGIN; +SELECT pg_catalog.pg_restore_relation_stats( 'relation', 'stats_import.test_i'::regclass, 'relpages', 18::integer); pg_restore_relation_stats @@ -50,32 +79,6 @@ WHERE relation = 'stats_import.test_i'::regclass AND (1 row) COMMIT; -SELECT -pg_catalog.pg_restore_relation_stats( -'relation', 'stats_import.test_i'::regclass, -'relpages', 19::integer ); - pg_restore_relation_stats - t -(1 row) - --- clear -SE
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 27 Feb 2025 15:24:26 -0800, Masahiko Sawada wrote: > Pushed the 0001 patch. Thanks! > Regarding the 0002 patch, I realized we stopped exposing > NextCopyFromRawFields() function: > > --- a/src/include/commands/copy.h > +++ b/src/include/commands/copy.h > @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState > *pstate, Relation rel, Node *where > extern void EndCopyFrom(CopyFromState cstate); > extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, > Datum *values, bool *nulls); > -extern bool NextCopyFromRawFields(CopyFromState cstate, > - char ***fields, int *nfields); > > I think that this change is not relevant with the refactoring and > probably we should keep it exposed as extension might be using it. > Considering that we added pg_attribute_always_inline to the function, > does it work even if we omit pg_attribute_always_inline to its > function declaration in the copy.h file? Unfortunately, no. The inline + constant argument optimization requires "static". How about the following? static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) { ... } bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) { return NextCopyFromRawFieldsInternal(cstate, fields, nfields, cstate->opts.csv_mode); } Thanks, -- kou
Re: Improve documentation regarding custom settings, placeholders, and the administrative functions
On Feb 28, 2025 at 02:27 +0800, David G. Johnston , wrote: > On Thu, Feb 6, 2025 at 8:04 AM Zhang Mingli wrote: > > On Feb 6, 2025 at 10:49 +0800, David G. Johnston > > , wrote: > > > > > > Yes, and when it does it is doing so in lieu of an error, and thus it is > > > not returning the value of the setting. It does not mean the value taken > > > on by the named parameter is NULL, which is not possible. i.e., SHOW will > > > never produce NULL. > > Hi, > > > > OK, LGTM. > > > > Thank you for the review. If you would like to add yourself as the reviewer > and mark this ready to commit here is the commitfest entry. > https://commitfest.postgresql.org/patch/5548/ > David J. Hi, Done. -- Zhang Mingli HashData
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
On Thu, Feb 27, 2025 at 3:42 PM Robert Haas wrote: > +1 for having some instrumentation. I do not agree with Tom that these > are numbers that only Peter Geoghegan and 2-3 other people will ever > understand. I grant that it's not going to make sense to everyone, but > the number of people to which it will make sense I would guess is > probably in the hundreds or the thousands rather than the single > digits. I agree that it's likely to be of interest to only a minority of users. But a fairly large minority. > Good documentation could help. It's easy to produce an example that makes intuitive sense. For example, with skip scan that has a qual such as "WHERE a BETWEEN 1 and 5 AND b = 12345", it is likely that EXPLAIN ANALYZE will show "Index Searches: 5" -- one search per "a" value. Such an example might be more useful than my original pgbench_accounts example. Do you think that that would help? > So, where should we store that information? > > The thing that's odd about using IndexScanDesc is that it doesn't > contain any other instrumentation currently, or at least not that I > can see. It is unique right now, but perhaps only because this is the first piece of instrumentation that: A. We must track at the level of an individual index scan -- not at the level of an index relation, like pgstat_count_index_scan(), nor at the level of the whole system, like BufferUsage state. AND B. Requires that we count something that fundamentally lives inside the index AM -- something that we cannot reasonably track/infer from the executor proper (more on that below, in my response to your scheme #1). > First, that struct has just four counters (ntuples, > ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of > them are in use -- a plain index scan does not use ntuples2 -- but I > think this optimization can apply to both index and index-only scans, > so we just don't have room. Right, index-only scans have exactly the same requirements as plain index scans/bitmap index scans. > 1. Add a new field to struct Instrumentation. Make > index_getnext_slot() and possibly other similar functions return a > count of index searches via a new parameter, and use that to bump the > new struct Instrumentation counter. It's a little ugly to have to add > a special-purpose parameter for this, but it doesn't look like there > are tons and tons of calls so maybe it's OK. That seems like the strongest possible alternative to the original scheme used in the current draft patch (scheme #3). This scheme #1 has the same issue as scheme #3, though: it still requires an integer counter that tracks the number of index searches (something a bit simpler than that, such as a boolean flag set once per amgettuple call, won't do). This is due to there being no fixed limit on the number of index searches required during any single amgettuple call: in general the index AM may perform quite a few index searches before it is able to return the first tuple to the scan (or before it can return the next tuple). The only difference that I can see between scheme #1 and scheme #3 is that the former requires 2 counters instead of just 1. And, we'd still need to have 1 out of the 2 counters located either in IndexScanDesc itself (just like scheme #3), or located in some other struct that can at least be accessed through IndexScanDesc (like the index AM opaque state, per scheme #4). After all, *every* piece of state known to any amgettuple routine must ultimately come from IndexScanDesc (or from backend global state, per scheme #2). (Actually, I supposed it is technically possible to avoid storing anything in IndexScanDesc by inventing another amgettuple argument, just for this. That seems like a distinction without a difference, though.) > 2. Add a new field to BufferUsage. Then the AM can bump this field and > explain.c will know about it the same way it knows about other changes > to pgBufferUsage. However, this is not really about buffer usage and > the new field seems utterly unlike the other things that are in that > structure, so this seems really bad to me. I agree. The use of global variables seems quite inappropriate for something like this. It'll result in wrong output whenever an index scan uses an InitPlan in its qual when that InitPlan is itself a plan that contains an index scan (this is already an issue with "Buffers:" instrumentation, but this would be much worse). > 3. Add a new field to IndexScanDesc, as you originally proposed. This > seems similar to #1: it's still shoveling instrumentation data around > in a way that we don't currently, but instead of shoveling it through > a new parameter, we shovel it through a new structure member. Either > way, the new thing (parameter or structure member) doesn't really look > like it belongs with what's already there, so it seems like > conservation of ugliness. Perhaps a comment noting why the new counter lives in IndexScanDesc would help? > 4. Add a new field to th
Re: Update docs for UUID data type
Masahiko, I have combined the gen_random_uuid() and uuidv4() into a single row, as you suggested. Please find the v5 patch, which has been squashed into a single commit. Best regards, Andy Alsup On Thu, Feb 27, 2025 at 5:02 PM Masahiko Sawada wrote: > On Thu, Feb 27, 2025 at 1:26 PM Andy Alsup wrote: > > > > I've submitted it for the up-coming commitfest. The link is: > https://commitfest.postgresql.org/patch/5604/ > > Thanks for all your help in reviewing these changes. > > Thank you for the patch! > > Regarding the 0001 patch, I think we can put uuidv4() and > get_random_uuid() in the same row since they are effectively identical > functions. For example, we have precedent such as char_length() and > character_length(). > > The 0002 patch looks good to me. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com > v5-0001-Docs-for-UUID-funcs-formatted-in-table-and-UUID-d.patch Description: Binary data
Re: Amcheck verification of GiST and GIN
On Sat, 22 Feb 2025 at 03:51, Mark Dilger wrote: > > > > > On Feb 21, 2025, at 12:50 PM, Mark Dilger > > wrote: > > > > Could one of the patch authors take a look? > > I turned the TAP test which triggers the error into a regression test that > does likewise, for ease of stepping through the test, if anybody should want > to do that. I'm attaching that patch here, but please note that I'm not > expecting this to be committed. Hi! Your efforts are much appreciated! I used this patch to derive a smaller repro. > this seems to either be a bug in the checking code complaining about > perfectly valid tuple order, I'm doubtful this is the case. I have added some more logging to gin_index_check, and here is output after running attached: ``` DEBUG: processing entry tree page at blk 2, maxoff: 125 DEBUG: comparing for offset 78 category 0 DEBUG: comparing for offset 79 category 2 DEBUG: comparing for offset 80 category 3 DEBUG: comparing for offset 81 category 0 LOG: index "ginidx" has wrong tuple order on entry tree page, block 2, offset 81, rightlink 4294967295 DEBUG: comparing for offset 82 category 0 DEBUG: comparing for offset 100 category 0 DEBUG: comparing for offset 101 category 2 DEBUG: comparing for offset 102 category 3 DEBUG: comparing for offset 103 category 0 LOG: index "ginidx" has wrong tuple order on entry tree page, block 2, offset 103, rightlink 4294967295 DEBUG: comparing for offset 104 category 0 DEBUG: comparing for offset 105 category 0 ``` So, we have an entry tree page, where there is tuple on offset 80, with gin tuple category = 3, and then it goes category 0 again. And one more such pattern on the same page. The ginCompareEntries function compares the gin tuples category first. I do not understand how this would be a valid order on the page, given that ginCompareEntries used in `ginget.c` logic. . Maybe I'm missing something vital about GIN. -- Best regards, Kirill Reshke 0001-Much-smaller-repro.patch Description: Binary data
Re: SQLFunctionCache and generic plans
Hi čt 27. 2. 2025 v 21:45 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > čt 27. 2. 2025 v 20:52 odesílatel Tom Lane napsal: > >> So taken together, our results are all over the map, anywhere > >> from 7% speedup to 7% slowdown. My usual rule of thumb is that > > > Where do you see 7% speedup? Few lines up you wrote 0.7% faster. > > Alexander got that on the fx4 case, according to his response a > few messages ago [1]. It'd be good if someone else could reproduce > that, because right now we have two "it's slower" results versus > only one "it's faster". > ok here is a profile from master 6.98% postgres postgres [.] hash_bytes 6.30% postgres postgres [.] palloc0 3.57% postgres postgres [.] SearchCatCacheInternal 3.29% postgres postgres [.] AllocSetAlloc 2.65% postgres plpgsql.so [.] exec_stmts 2.55% postgres postgres [.] expression_tree_walker_impl 2.34% postgres postgres [.] _SPI_execute_plan 2.13% postgres postgres [.] CheckExprStillValid 2.02% postgres postgres [.] fmgr_info_cxt_security 1.89% postgres postgres [.] ExecInitFunc 1.51% postgres postgres [.] ExecInterpExpr 1.48% postgres postgres [.] ResourceOwnerForget 1.44% postgres postgres [.] AllocSetReset 1.35% postgres postgres [.] MemoryContextCreate 1.30% postgres plpgsql.so [.] plpgsql_exec_function 1.29% postgres libc.so.6 [.] __memcmp_sse2 1.24% postgres postgres [.] MemoryContextDelete 1.13% postgres postgres [.] check_stack_depth 1.11% postgres postgres [.] AllocSetContextCreateInternal 1.09% postgres postgres [.] resolve_polymorphic_argtypes 1.08% postgres postgres [.] hash_search_with_hash_value 1.07% postgres postgres [.] standard_ExecutorStart 1.07% postgres postgres [.] ExprEvalPushStep 1.04% postgres postgres [.] ExecInitExprRec 0.95% postgres plpgsql.so [.] plpgsql_estate_setup 0.91% postgres postgres [.] ExecReadyInterpretedExp and from patched 7.08% postgres postgres [.] hash_bytes 6.25% postgres postgres [.] palloc0 3.52% postgres postgres [.] SearchCatCacheInternal 3.30% postgres postgres [.] AllocSetAlloc 2.39% postgres postgres [.] expression_tree_walker_impl 2.37% postgres plpgsql.so [.] exec_stmts 2.15% postgres postgres [.] _SPI_execute_plan 2.10% postgres postgres [.] CheckExprStillValid 1.94% postgres postgres [.] fmgr_info_cxt_security 1.71% postgres postgres [.] ExecInitFunc 1.41% postgres postgres [.] AllocSetReset 1.40% postgres postgres [.] ExecInterpExpr 1.38% postgres postgres [.] ExprEvalPushStep 1.34% postgres postgres [.] ResourceOwnerForget 1.31% postgres postgres [.] MemoryContextDelete 1.24% postgres libc.so.6 [.] __memcmp_sse2 1.21% postgres postgres [.] MemoryContextCreate 1.18% postgres postgres [.] AllocSetContextCreateInternal 1.17% postgres postgres [.] hash_search_with_hash_value 1.13% postgres postgres [.] resolve_polymorphic_argtypes 1.13% postgres plpgsql.so [.] plpgsql_exec_function 1.03% postgres postgres [.] standard_ExecutorStart 0.98% postgres postgres [.] ExecInitExprRec 0.96% postgres postgres [.] check_stack_depth looks so there is only one significant differences ExprEvalPushStep 1.07 x 1.38% Regards Pavel compiled without assertions on gcc 15 with 02 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz stepping : 7 microcode : 0x2f cpu MHz : 2691.102 cache size : 6144 KB > regards, tom lane > > [1] > https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru >
Re: Statistics Import and Export
On Tue, 2025-02-25 at 11:11 +0530, Ashutosh Bapat wrote: > So the dumped statistics are not restored exactly. The reason for > this > is the table statistics is dumped before dumping ALTER TABLE ... ADD > CONSTRAINT command which changes the statistics. I think all the > pg_restore_relation_stats() calls should be dumped after all the > schema and data modifications have been done. OR what's the point in > dumping statistics only to get rewritten even before restore > finishes. In your example, it's not so bad because the stats are actually better: the index is built after the data is present, and therefore relpages and reltuples are correct. The problem is more clear if you use --no-data. If you load data, ANALYZE, pg_dump --no-data, then reload the sql file, then the stats are lost. That workflow is very close to what pg_upgrade does. We solved the problem for pg_upgrade in commit 71b66171d0 by simply not updating the statistics when building an index and IsBinaryUpgrade. To solve the issue with dump --no-data, I propose that we change the test in 71b66171d0 to only update the stats if the physical relpages is non-zero. Patch attached: * If the dump is --no-data, or during pg_upgrade, the table will be empty, so the physical relpages will be zero and the restored stats won't be overwritten. * If (like in your example) the dump includes data, the new stats are based on real data, so they are better anyway. This is sort of like the case where autoanalyze kicks in. * If the dump is --statistics-only, then there won't be any indexes created in the SQL file, so when you restore the stats, they will remain until you do something else to change them. * If your example really is a problem, you'd need to dump first with - -no-statistics, and then with --statistics-only, and restore the two SQL files in order. Alternatively, we could put stats into SECTION_POST_DATA, which was already discussed[*], and we decided against it (though there was not a clear consensus). Regards, Jeff Davis *: https://www.postgresql.org/message-id/1798867.1712376328%40sss.pgh.pa.us From 86c03cb525b49d24019c5c0ea8ec36bb82b3c58a Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Thu, 27 Feb 2025 17:06:00 -0800 Subject: [PATCH v1] Do not update stats on empty table when building index. We previously fixed this for binary upgrade in 71b66171d0, but a similar problem exists when using pg_dump --no-data without pg_upgrade involved. Fix both problems by not updating the stats when the table has no pages. Reported-by: Ashutosh Bapat Discussion: https://postgr.es/m/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=c+fnxdvfprwv...@mail.gmail.com --- src/backend/catalog/index.c| 17 +++-- src/test/regress/expected/stats_import.out | 22 +- src/test/regress/sql/stats_import.sql | 12 3 files changed, 44 insertions(+), 7 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index f37b990c81d..1a3fdeab350 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2833,11 +2833,7 @@ index_update_stats(Relation rel, if (reltuples == 0 && rel->rd_rel->reltuples < 0) reltuples = -1; - /* - * Don't update statistics during binary upgrade, because the indexes are - * created before the data is moved into place. - */ - update_stats = reltuples >= 0 && !IsBinaryUpgrade; + update_stats = reltuples >= 0; /* * Finish I/O and visibility map buffer locks before @@ -2850,7 +2846,16 @@ index_update_stats(Relation rel, { relpages = RelationGetNumberOfBlocks(rel); - if (rel->rd_rel->relkind != RELKIND_INDEX) + /* + * Don't update statistics when the relation is completely empty. This + * is important during binary upgrade, because at the time the schema + * is loaded, the data has not yet been moved into place. It's also + * useful when restoring a dump containing only schema and statistics. + */ + if (relpages == 0) + update_stats = false; + + if (update_stats && rel->rd_rel->relkind != RELKIND_INDEX) visibilitymap_count(rel, &relallvisible, NULL); } diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out index 1f150f7b08d..4c81fb60c91 100644 --- a/src/test/regress/expected/stats_import.out +++ b/src/test/regress/expected/stats_import.out @@ -12,14 +12,34 @@ CREATE TABLE stats_import.test( arange int4range, tags text[] ) WITH (autovacuum_enabled = false); +SELECT +pg_catalog.pg_restore_relation_stats( +'relation', 'stats_import.test'::regclass, +'relpages', 18::integer, + 'reltuples', 21::real, + 'relallvisible', 24::integer); + pg_restore_relation_stats +--- + t +(1 row) + +-- creating an index on an empty table shouldn't overwrite stats CREATE INDEX test_i ON stats_import.test(id); +SELECT relpages, reltuples, relallvisible +FROM pg_class +WHERE oid = 'stats_import.test'::regclass; + relpa
Re: Update docs for UUID data type
I've submitted it for the up-coming commitfest. The link is: https://commitfest.postgresql.org/patch/5604/ Thanks for all your help in reviewing these changes. Best Regards, Andy Alsup On Thu, Feb 27, 2025 at 1:58 PM Laurenz Albe wrote: > On Wed, 2025-02-26 at 22:11 -0500, Andy Alsup wrote: > > Please find the latest patch files attached. > > This is good to go. If you add it to the commitfest, I'm happy to > mark it "ready for committer". > > Yours, > Laurenz Albe >
Re: moving some code out of explain.c
Robert Haas writes: > OK, here is v2. One slightly unfortunate thing about this is that we > end up with a line that is over 80 characters: > extern DestReceiver *CreateExplainSerializeDestReceiver(struct > ExplainState *es); > Aside from perhaps shortening the function name, I don't see how to avoid > that. Can't get terribly excited about that. But speaking of CreateExplainSerializeDestReceiver, I see that a declaration of it is still there at the bottom of explain.h: extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es); #endif /* EXPLAIN_H */ Oversight I assume? regards, tom lane
Re: Should work_mem be stable for a prepared statement?
On Fri, 28 Feb 2025 at 07:42, Jeff Davis wrote: > https://www.postgresql.org/message-id/CAJVSvF6s1LgXF6KB2Cz68sHzk%2Bv%2BO_vmwEkaon%3DH8O9VcOr-tQ%40mail.gmail.com > > James pointed out something interesting, which is that a prepared > statement enforces the work_mem limit at execution time, which might be > different from the work_mem at the time the statement was prepared. There's a similar but not quite the same situation with the enable_* GUCs. The executor isn't going to pick up a new value for these like it will for work_mem, but I think portions of the same argument can be made, i.e. Someone might not like that turning off enable_seqscan after doing PREPARE and EXECUTE once does not invalidate their plan. > My first reaction is that it's not right because the costing for the > plan is completely bogus with a different work_mem. It would make more > sense to me if we either (a) enforced work_mem as it was at the time of > planning; or (b) replanned if executed with a different work_mem > (similar to how we replan sometimes with different parameters). If we were to fix this then a) effectively already happens for the enable_* GUCs, so b) would be the only logical way to fix. > But I'm not sure whether someone might be relying on the existing > behavior? It looks like there was a bit of discussion on this topic about 18 years ago in [1], but it didn't seem to end with a very conclusive outcome. I did learn that we once didn't have a method to invalidate cached plans, so perhaps the current behaviour is a remnant of the previous lack of infrastructure. David [1] https://www.postgresql.org/message-id/15168.1174410673%40sss.pgh.pa.us
Re: moving some code out of explain.c
On Thu, Feb 27, 2025 at 4:12 PM Tom Lane wrote: > +1, but how about explain_dr.h too? It doesn't seem though that > we can avoid #include "executor/instrument.h" there, so it'd be > a little asymmetrical. But the executor inclusion doesn't seem > nearly as much almost-circular. OK, here is v2. One slightly unfortunate thing about this is that we end up with a line that is over 80 characters: extern DestReceiver *CreateExplainSerializeDestReceiver(struct ExplainState *es); Aside from perhaps shortening the function name, I don't see how to avoid that. -- Robert Haas EDB: http://www.enterprisedb.com v2-0001-Avoid-including-explain.h-in-explain_format.h-and.patch Description: Binary data
Re: Restrict copying of invalidated replication slots
On Thu, Feb 27, 2025 at 12:52 AM Amit Kapila wrote: > > On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada > wrote: > > > > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila wrote: > > > > > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with > > > this operation. So, isn't it possible that the source slot exists at > > > the later position in ReplicationSlotCtl->replication_slots and the > > > loop traversing slots is already ahead from the point where the newly > > > copied slot is created? > > > > Good point. I think it's possible. > > > > > If so, the newly created slot won't be marked > > > as invalid whereas the source slot will be marked as invalid. I agree > > > that even in such a case, at a later point, the newly created slot > > > will also be marked as invalid. > > > > The wal_status of the newly created slot would immediately become > > 'lost' and the next checkpoint will invalidate it. Do we need to do > > something to deal with this case? > > > > + /* Check if source slot became invalidated during the copy operation */ > + if (second_slot_contents.data.invalidated != RS_INVAL_NONE) > + ereport(ERROR, > + errmsg("cannot copy replication slot \"%s\"", > +NameStr(*src_name)), > + errdetail("The source replication slot was invalidated during the > copy operation.")); > > How about adding a similar test as above for MyReplicationSlot? That > should suffice the need because we have already acquired the new slot > by this time and invalidation should signal this process before > marking the new slot as invalid. IIUC in the scenario you mentioned, the loop traversing slots already passed the position of newly created slot in ReplicationSlotCtl->replication_slots array, so MyReplicationSlot->data.invalidated is still RS_INVAL_NONE, no? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Should work_mem be stable for a prepared statement?
David Rowley writes: > On Fri, 28 Feb 2025 at 07:42, Jeff Davis wrote: >> My first reaction is that it's not right because the costing for the >> plan is completely bogus with a different work_mem. It would make more >> sense to me if we either (a) enforced work_mem as it was at the time of >> planning; or (b) replanned if executed with a different work_mem >> (similar to how we replan sometimes with different parameters). > If we were to fix this then a) effectively already happens for the > enable_* GUCs, so b) would be the only logical way to fix. Given that nobody's complained about this for twenty-plus years, I can't get excited about adding complexity to do either thing. regards, tom lane
Re: Should work_mem be stable for a prepared statement?
On Thu, 2025-02-27 at 17:04 -0500, Tom Lane wrote: > Given that nobody's complained about this for twenty-plus years, > I can't get excited about adding complexity to do either thing. I had in mind some refactoring in this area, which ideally would not add complexity. It might provide some nice benefits, but would introduce this behavior change, which makes it slightly more than a refactoring. It sounds like the behavior change would be desirable or at least neutral. I will have to try it out and see if the refactoring is a net improvement or turns into a mess. Regards, Jeff Davis
Re: Improve documentation regarding custom settings, placeholders, and the administrative functions
On Thu, Feb 6, 2025 at 8:04 AM Zhang Mingli wrote: > On Feb 6, 2025 at 10:49 +0800, David G. Johnston < > david.g.johns...@gmail.com>, wrote: > > > Yes, and when it does it is doing so in lieu of an error, and thus it is > not returning the value of the setting. It does not mean the value taken > on by the named parameter is NULL, which is not possible. i.e., SHOW will > never produce NULL. > > Hi, > > OK, LGTM. > > Thank you for the review. If you would like to add yourself as the reviewer and mark this ready to commit here is the commitfest entry. https://commitfest.postgresql.org/patch/5548/ David J.
Re: Should work_mem be stable for a prepared statement?
On Thu, Feb 27, 2025 at 1:42 PM Jeff Davis wrote: > It would make more sense to me if we either (a) enforced work_mem as it > was at the time of planning; or (b) replanned if executed with a different > work_mem (similar to how we replan sometimes with different parameters). > Definitely (b). But I'm not sure whether someone might be relying on the existing behavior? > I cannot fathom a reason why. Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support
Re: moving some code out of explain.c
Robert Haas writes: > The thing that was bugging me a bit is that explain_format.h includes > explain.h. Yeah, I did not like that at all either -- it makes things a bit circular, and I fear IWYU is going to make stupid recommendations like not including explain.h in explain.c. Did you look at avoiding that with our standard trick of writing detail-free struct declarations? That is, explain_format.h would need struct ExplainState;/* avoid including explain.h here */ and then s/ExplainState/struct ExplainState/ in all the function declarations. You'd still need to get List from someplace, but it could be gotten by including primnodes.h or even pg_list.h. regards, tom lane
Re: Statistics Import and Export commit related issue.
On Tue, Feb 25, 2025 at 1:31 AM jian he wrote: > hi. > the thread "Statistics Import and Export" is quite hot. > so a new thread for some minor issue i found. > > > static int > _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH) > { > > if (strcmp(te->desc, "STATISTICS DATA") == 0) > { > if (!ropt->dumpStatistics) > return 0; > else > res = REQ_STATS; > } > > /* If it's statistics and we don't want statistics, maybe ignore it */ > if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0) > return 0; > } > As you can see, there is some duplicate code there. > in _tocEntryRequired, we can move `` if (strcmp(te->desc, "STATISTICS > DATA") == 0)`` after switch (curSection). > so pg_dump --section option does not influence statistics dump. > attached is the proposed change. > +1, and I think the patch provided is good as-is. > > > in dumpTableData_copy, we have: > pg_log_info("dumping contents of table \"%s.%s\"", > tbinfo->dobj.namespace->dobj.name, classname); > dumpRelationStats add a pg_log_info would be a good thing. > commit: > https://git.postgresql.org/cgit/postgresql.git/commit/?id=1fd1bd871012732e3c6c482667d2f2c56f1a9395 > didn't have any pg_log_info. > I don't have a strong opinion on this one, but I lean against doing it. I think table data got a log message because dumping table data is an operation that takes up a major % of the dump's runtime, whereas statistics are just some catalog operations. There are log messages for the global catalog operations (ex. read in all user defined functions, read in all inheritance relationships), but not per-table catalog operations. If we end up going to a global export of attribute statistics then I'd definitely want a log message for that operation. > > > https://www.postgresql.org/docs/devel/app-pgrestore.html > """ > --jobs=number-of-jobs > Run the most time-consuming steps of pg_restore — those that load > data, create indexes, or create constraints — concurrently, using up > to number-of-jobs concurrent sessions. > """ > we may need to change the above sentence? > pg_restore restore STATISTICS DATA also doing it concurrently if > --jobs is specified. I also have no strong opinion here, but I think the sentence is fine as is because it specifies "most time-consuming" and a single catalog row operation is much smaller than a COPY load, or the sequential scans needed for index builds and constraint verification.
Re: SQL:2023 JSON simplified accessor support
Hello hackers, On Thu, Feb 27, 2025 at 9:46 AM Alexandra Wang wrote: > Summary of changes: > > v8-0001 through v8-0005: > Refactoring and preparatory steps for the actual implementation. > > v8-0006 (Implement read-only dot notation for JSONB): > I removed the vars field (introduced in v7) from JsonbSubWorkspace > after realizing that JsonPathVariable is not actually needed for > dot-notation. > > v8-0007 (Allow wildcard member access for JSONB): > I'm aware that the #if 0 in check_indirection() is not ideal. I > haven’t removed it yet because I’m still reviewing other cases—beyond > our JSONB simplified accessor use case—where this check should remain > strict. I’ll post an additional patch to address this. > I made the following minor changes in v9: - More detailed commit messages - Additional tests - Use "?column?" as the column name for trailing .*. Other than that, the patches remain the same as the previous version: 0001 - 0005: preparation steps for the actual implementation and do not change or add new behavior. 0006: jsonb dot notation and sliced subscripting 0007: jsonb wildcard member access Thanks, Alex v9-0003-Export-jsonPathFromParseResult.patch Description: Binary data v9-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-Not.patch Description: Binary data v9-0004-Extract-coerce_jsonpath_subscript.patch Description: Binary data v9-0001-Allow-transformation-of-only-a-sublist-of-subscri.patch Description: Binary data v9-0006-Implement-read-only-dot-notation-for-jsonb.patch Description: Binary data v9-0007-Allow-wild-card-member-access-for-jsonb.patch Description: Binary data v9-0005-Eanble-String-node-as-field-accessors-in-generic-.patch Description: Binary data
Re: Make COPY format extendable: Extract COPY TO format implementations
On Tue, Feb 25, 2025 at 6:08 PM Sutou Kouhei wrote: > > Hi, > > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Tue, 25 Feb 2025 17:14:43 -0800, > Masahiko Sawada wrote: > > > I've attached updated patches. > > Thanks. > > I found one more missing last ".": > > 0002: > > > --- a/src/backend/commands/copyfrom.c > > +++ b/src/backend/commands/copyfrom.c > > > @@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo > > > +/* > > + * Built-in format-specific routines. One-row callbacks are defined in > > + * copyfromparse.c > > + */ > > copyfromparse.c -> copyfromparse.c. > > > Could you push them? > Pushed the 0001 patch. Regarding the 0002 patch, I realized we stopped exposing NextCopyFromRawFields() function: --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where extern void EndCopyFrom(CopyFromState cstate); extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); -extern bool NextCopyFromRawFields(CopyFromState cstate, - char ***fields, int *nfields); I think that this change is not relevant with the refactoring and probably we should keep it exposed as extension might be using it. Considering that we added pg_attribute_always_inline to the function, does it work even if we omit pg_attribute_always_inline to its function declaration in the copy.h file? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Statistics Import and Export
On Wed, Feb 26, 2025 at 9:19 PM Corey Huinker wrote: >>> >>> +1, let's shorten those queries. The coast is probably pretty >>> clear now if you want to go do that. >> >> >> On it. So, I started reviewing this and my original thought about shortening the queries testing pg_restore_relation_stats() wasn't included in your patch. For example: +--- error: relation is wrong type +SELECT pg_catalog.pg_restore_relation_stats( +'relation', 0::oid, +'relpages', 17::integer, +'reltuples', 400.0::real, +'relallvisible', 4::integer); Why do you need to specify all the stats (relpages, reltuples, etc)? To exercise this you could just do: select pg_catalog.pg_restore_relation_stats('relation', 0::oid); Since I haven't been following along with this feature development, I don't think I can get comfortable enough with all of the changes in this test diff to commit them. I can't really say if this is the set of tests that is representative and sufficient for this feature. If you agree with me that the failure tests could be shorter, I'm happy to commit that, but I don't really feel comfortable assessing what the right set of full tests is. - Melanie
Re: moving some code out of explain.c
On Tue, Feb 18, 2025 at 1:11 PM Peter Geoghegan wrote: > On Tue, Feb 18, 2025 at 1:04 PM Robert Haas wrote: > > I think in cases like this there is a tendency to want to achieve an > > even split of the original file, but in practice I think that tends > > not to be difficult to achieve. These two patches combined move about > > 1000 lines of code into separate files, which I think is actually > > quite a good result for a refactoring of this sort. We might want to > > split it up even more, but I don't feel like that has to be done in > > the first set of patches or that all such patches have to come from > > me. So I propose to do just this much for now. > > > > Comments? > > Seems like a good idea to me. Thanks for the quick response! I have committed these patches. I see that the Redis-FDW test is failing; it will now need to include "commands/explain_format.h" -- unless we something, of course. -- Robert Haas EDB: http://www.enterprisedb.com
Re: moving some code out of explain.c
On Thu, Feb 27, 2025 at 1:24 PM Robert Haas wrote: > Thanks for the quick response! I have committed these patches. I recently did something similar myself when I moved all of the nbtree preprocessing code into its own file. The obvious downside is that it tends to make "git blame" much less useful. It was definitely worth it, though. -- Peter Geoghegan
Should work_mem be stable for a prepared statement?
As a part of this discussion: https://www.postgresql.org/message-id/CAJVSvF6s1LgXF6KB2Cz68sHzk%2Bv%2BO_vmwEkaon%3DH8O9VcOr-tQ%40mail.gmail.com James pointed out something interesting, which is that a prepared statement enforces the work_mem limit at execution time, which might be different from the work_mem at the time the statement was prepared. For instance: SET work_mem='1GB'; PREPARE foo AS ...; -- plans using 1GB limit SET work_mem='1MB'; EXECUTE foo; -- enforces 1MB limit My first reaction is that it's not right because the costing for the plan is completely bogus with a different work_mem. It would make more sense to me if we either (a) enforced work_mem as it was at the time of planning; or (b) replanned if executed with a different work_mem (similar to how we replan sometimes with different parameters). But I'm not sure whether someone might be relying on the existing behavior? If we were to implement (a) or (b), we couldn't use the work_mem global directly, we'd need to save it in the plan, and enforce using the plan's saved value. But that might be a good change anyway. In theory we might need to do something similar for hash_mem_multiplier, too. Regards, Jeff Davis
Re: SQLFunctionCache and generic plans
Pavel Stehule writes: > čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov < > a.pyha...@postgrespro.ru> napsal: >>> Unfortunately, there is about 5% slowdown for inlined code, and for >>> just plpgsql code too. >> Hi. I've tried to reproduce slowdown and couldn't. > I'll try to get profiles. I tried to reproduce this too. What I got on my usual development workstation (RHEL8/gcc 8.5.0 on x86_64) was: fx2 example: v6 patch about 2.4% slower than HEAD fx4 example: v6 patch about 7.3% slower than HEAD I was quite concerned after that result, but then I tried it on another machine (macOS/clang 16.0.0 on Apple M1) and got: fx2 example: v6 patch about 0.2% slower than HEAD fx4 example: v6 patch about 0.7% faster than HEAD (These are average-of-three-runs tests on --disable-cassert builds; I trust you guys were not doing performance tests on assert-enabled builds?) So taken together, our results are all over the map, anywhere from 7% speedup to 7% slowdown. My usual rule of thumb is that you can see up to 2% variation in this kind of microbenchmark even when "nothing has changed", just due to random build details like whether critical loops cross a cacheline or not. 7% is pretty well above that threshold, but maybe it's just random build variation anyway. Furthermore, since neither example involves functions.c at all (fx2 would be inlined, and fx4 isn't SQL-language), it's hard to see how the patch would directly affect either example unless it were adding overhead to plancache.c. And I don't see any changes there that would amount to meaningful overhead for the existing use-case with a raw parse tree. So right at the moment I'm inclined to write this off as measurement noise. Perhaps it'd be worth checking a few more platforms, though. regards, tom lane
Re: SQLFunctionCache and generic plans
Pavel Stehule writes: > čt 27. 2. 2025 v 20:52 odesílatel Tom Lane napsal: >> So taken together, our results are all over the map, anywhere >> from 7% speedup to 7% slowdown. My usual rule of thumb is that > Where do you see 7% speedup? Few lines up you wrote 0.7% faster. Alexander got that on the fx4 case, according to his response a few messages ago [1]. It'd be good if someone else could reproduce that, because right now we have two "it's slower" results versus only one "it's faster". regards, tom lane [1] https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru
Re: SQLFunctionCache and generic plans
Alexander Pyhalov writes: > Now sql functions plans are actually saved. The most of it is a > simplified version of plpgsql plan cache. Perhaps, I've missed > something. A couple of thoughts about v6: * I don't think it's okay to just summarily do this: /* It's stale; unlink and delete */ fcinfo->flinfo->fn_extra = NULL; MemoryContextDelete(fcache->fcontext); fcache = NULL; when fmgr_sql sees that the cache is stale. If we're doing a nested call of a recursive SQL function, this'd be cutting the legs out from under the outer recursion level. plpgsql goes to some lengths to do reference-counting of function cache entries, and I think you need the same here. * I don't like much of anything about 0004. It's messy and it gives up all the benefit of plan caching in some pretty-common cases (anywhere where the user was sloppy about what data type is being returned). I wonder if we couldn't solve that with more finesse by applying check_sql_fn_retval() to the query tree after parse analysis, but before we hand it to plancache.c? This is different from what happens now because we'd be applying it before not after query rewrite, but I don't think that query rewrite ever changes the targetlist results. Another point is that the resultTargetList output might change subtly, but I don't think we care there either: I believe we only examine that output for its result data types and resjunk markers. (This is nonobvious and inadequately documented, but for sure we couldn't try to execute that tlist --- it's never passed through the planner.) * One diff that caught my eye was - if (!ActiveSnapshotSet() && - plansource->raw_parse_tree && - analyze_requires_snapshot(plansource->raw_parse_tree)) + if (!ActiveSnapshotSet() && StmtPlanRequiresRevalidation(plansource)) Because StmtPlanRequiresRevalidation uses stmt_requires_parse_analysis, this is basically throwing away the distinction between stmt_requires_parse_analysis and analyze_requires_snapshot. I do not think that's okay, for the reasons explained in analyze_requires_snapshot. We should expend the additional notational overhead to keep those things separate. * I'm also not thrilled by teaching StmtPlanRequiresRevalidation how to do something equivalent to stmt_requires_parse_analysis for Query trees. The entire point of the current division of labor is for it *not* to know that, but keep the knowledge of the properties of different statement types in parser/analyze.c. So it looks to me like we need to add a function to parser/analyze.c that does this. Not quite sure what to call it though. querytree_requires_parse_analysis() would be a weird name, because if it's a Query tree then it's already been through parse analysis. Maybe querytree_requires_revalidation()? regards, tom lane
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan wrote: > I need more feedback about this. I don't understand your perspective here. Attached is a version of the patch that will apply cleanly against HEAD. (This is from v26 of my skip scan patch, which is why I've skipped so many version numbers compared to the last patch posted on this thread.) I still haven't changed anything about the implementation, since this is just to keep CFTester happy. -- Peter Geoghegan v26-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch Description: Binary data
Re: [Doc] Improve hostssl related descriptions and option presentation
On Mon, Apr 22, 2024 at 2:20 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > Thoughts anyone? > > On Thu, Feb 1, 2024 at 3:47 PM David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> Motivated by a recent complaint [1] I found the hostssl related material >> in our docs quite verbose and even repetitive. Some of that is normal >> since we have both an overview/walk-through section as well as a reference >> section. But the overview in particular was self-repetitive. Here is a >> first pass at some improvements. The commit message contains both the >> intent of the patch and some thoughts/questions still being considered. >> >> David J. >> >> [1] >> https://www.postgresql.org/message-id/170672433664.663.11895343120533141715%40wrigleys.postgresql.org >> > Withdrawn as this now has conflicts from recent changes and no one seems interested in voicing an opinion on this for or against. David J.
Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators
On Wed, Feb 26, 2025 at 4:09 PM Jeff Davis wrote: > > My idea was that we'd attach work_mem to each Path node and Plan node > at create time. For example, in create_agg_path() it could do: > > pathnode->path.node_work_mem = work_mem; > > And then add to copy_generic_path_info(): > > dest->node_work_mem = src->node_work_mem; > > (and similarly for other nodes; at least those that care about > work_mem) > > Then, at execution time, use node->ps.ss.plan.node_work_mem instead of > work_mem. This is essentially what patches 2 and 3 do. Some comments: First, there's no need to set the workmem_limit at Path time, since it's not needed until the Plan is init-ted into a PlanState. So I set it on the Plan but not on the Path. Second, the logic to assign a workmem_limit to the Agg node is a bit more complicated than in your example, because the Agg could be either a Hash or a Sort. If it's a Hash, it gets work_mem * hash_mem_limit; and if it's a Sort it gets either 0 or work_mem. We can adjust the logic so that it gets work_mem instead of 0, by pushing the complexity out of the original workmem_limit assignment and into later code blocks -- e.g., in an extension -- but we still need to decide whether the Agg is a Hash or a Sort. This is why Patch 2 does: switch (agg->aggstrategy) { case AGG_HASHED: case AGG_MIXED: /* * Because nodeAgg.c will combine all AGG_HASHED nodes into a * single phase, it's easier to store the hash working-memory * limit on the first AGG_{HASHED,MIXED} node, and set it to zero * for all subsequent AGG_HASHED nodes. */ agg->plan.workmem_limit = is_first ? get_hash_memory_limit() / 1024 : 0; break; case AGG_SORTED: /* * Also store the sort-output working-memory limit on the first * AGG_SORTED node, and set it to zero for all subsequent * AGG_SORTED nodes. * * We'll need working-memory to hold the "sort_out" only if this * isn't the last Agg node (in which case there's no one to sort * our output). */ agg->plan.workmem_limit = *is_first_sort && !is_last ? work_mem : 0; *is_first_sort = false; break; default: break; } Notice that the logic also sets the limit to 0 on certain Agg nodes -- this can be avoided, at the cost of added complexity later. The added complexity is because, for example, for Hash Aggs, all Hash tables share the same overall workmem_limit. So any workmem_limit set on subsequent hash Agg nodes would be ignored, which means that setting that limit adds conmplexity by obscuring the underlying logic. > Similarly, we could track the node_mem_wanted, which would be the > estimated amount of memory the node would use if unlimited memory were > available. I believe that's already known (or a simple calculation) at > costing time, and we can propagate it from the Path to the Plan the > same way. And this is what Patch 3 does. As you point out, this is already known, or, if not, it's a simple calculation. This is all that Patch 3 does. > (A variant of this approach could carry the values into the PlanState > nodes as well, and the executor would that value instead.) That's not needed, though, and would violate existing PG conventions: we don't copy anything from Plan to PlanState, because the assumption is that the PlanState always has access to its corresponding Plan. (The reason we copy from Path to Plan, I believe, is that we drop all Paths, to save memory; because we generally have many more Paths than Plans.) > Extensions like yours could have a GUC like ext.query_work_mem and use > planner_hook to modify plan after standard_planner is done, walking the > tree and adjusting each Plan node's node_work_mem to obey > ext.query_work_mem. Another extension might hook in at path generation > time with set_rel_pathlist_hook or set_join_pathlist_hook to create > paths with lower node_work_mem settings that total up to > ext.query_work_mem. I don't sent workmem_limit at Path time, because it's not needed there; but I do set the workmem (estimate) at Path time, exactly so that future optimizer hooks could make use of a Path's workmem (estimate) to decide between different Paths. Patch 3 sets workmem (estimate) on the Path and copies it to the Plan. As you know, there's deliberately not a 1-1 correspondence between Path and Plan (the way there is generally a 1-1 correspondence between Plan and PlanState), so Patch 3 has to do some additional work to propagate the workmem (estimate) from Path to Plan. You can see existing examples of similar work inside file createplan.c. Creating a Plan from a Path is not generally as simple as just copying the Path's fields over; there are lots of special cases. Although Patch 3 sets workmem (estimate) on the Plan, inside createplan.c, Patch 2 doesn't set workmem_limit inside createplan.c. An earlier draft of the patchset *did* set it there, but because of all the special casing in createplan.c, this ended up becoming difficul
Re: Confine vacuum skip logic to lazy_scan_skip
Em qui., 27 de fev. de 2025 às 14:49, Andres Freund escreveu: > Hi, > > On 2025-02-27 12:44:24 -0500, Andres Freund wrote: > > > CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL) > > > 8. var_deref_op: Dereferencing null pointer per_buffer_data. > > > > That's exactly what the messages you quoted are discussing, no? > > Ah, no, it isn't. But I still think the coverity alert and the patch don't > make sense, as per the below: > > > > diff --git a/src/backend/storage/aio/read_stream.c > b/src/backend/storage/aio/read_stream.c > > > index 04bdb5e6d4..18e9b4f3c4 100644 > > > --- a/src/backend/storage/aio/read_stream.c > > > +++ b/src/backend/storage/aio/read_stream.c > > > @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void > **per_buffer_data) > > > > READ_BUFFERS_ISSUE_ADVICE : 0))) > > > { > > > /* Fast return. */ > > > + if (per_buffer_data) > > > + *per_buffer_data = > get_per_buffer_data(stream, oldest_buffer_index); > > > return buffer; > > > } > > > > A few lines above: > > Assert(stream->per_buffer_data_size == 0); > > > > The fast path isn't used when per buffer data is used. Adding a check > for > > per_buffer_data and assigning something to it is nonsensical. > Perhaps. But the fast path and the parameter void **per_buffer_data, IMHO, is a serious risk in my opinion. Of course, when in runtime. best regards, Ranier Vilela
Re: Docs for pg_basebackup needs v17 note for incremental backup
On Thu, Feb 6, 2025 at 2:46 PM Daniel Gustafsson wrote: > I'd be happy to help getting this in, do you have a suggested wording? > > Thank you. Attached. David J. From fc9768fa1de17529cddc3f3ac1fba208f7500083 Mon Sep 17 00:00:00 2001 From: "David G. Johnston" Date: Thu, 27 Feb 2025 10:58:57 -0700 Subject: [PATCH] Document pg_basebackup incremental backup requires v17 server. --- doc/src/sgml/ref/pg_basebackup.sgml | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index c2d721208b..9659f76042 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -1005,10 +1005,11 @@ PostgreSQL documentation pg_basebackup works with servers of the same - or an older major version, down to 9.1. However, WAL streaming mode (-X - stream) only works with server version 9.3 and later, and tar format + or older major version, down to 9.1. However, WAL streaming mode (-X + stream) only works with server version 9.3 and later, the tar format (--format=tar) only works with server version 9.5 - and later. + and later, and incremental backup (--incremental) only works + with server version 17 and later. -- 2.34.1
Re: Confine vacuum skip logic to lazy_scan_skip
Andres Freund writes: > Ah, no, it isn't. But I still think the coverity alert and the patch don't > make sense, as per the below: Coverity's alert makes perfect sense if you posit that Coverity doesn't assume that this read_stream_next_buffer call will only be applied to a stream that has per_buffer_data_size > 0. (Even if it did understand that, I wouldn't assume that it's smart enough to see that the fast path will never be taken.) I wonder if it'd be a good idea to add something like Assert(stream->distance == 1); Assert(stream->pending_read_nblocks == 0); Assert(stream->per_buffer_data_size == 0); + Assert(per_buffer_data == NULL); in read_stream_next_buffer. I doubt that this will shut Coverity up, but it would help to catch caller coding errors, i.e. passing a per_buffer_data pointer when there's no per-buffer data. On the whole I doubt we can get rid of this warning without some significant redesign of the read_stream API, and I don't think it's worth the trouble. Coverity is a tool not a requirement. I'm content to just dismiss the warning. regards, tom lane
Re: Confine vacuum skip logic to lazy_scan_skip
On Thu, Feb 27, 2025 at 1:08 PM Tom Lane wrote: > > I wonder if it'd be a good idea to add something like > > Assert(stream->distance == 1); > Assert(stream->pending_read_nblocks == 0); > Assert(stream->per_buffer_data_size == 0); > + Assert(per_buffer_data == NULL); > > in read_stream_next_buffer. I doubt that this will shut Coverity > up, but it would help to catch caller coding errors, i.e. passing > a per_buffer_data pointer when there's no per-buffer data. I think this is a good stopgap. I was discussing adding this assert off-list with Thomas and he wanted to detail his more ambitious plans for type safety improvements in the read stream API. Less on the order of a redesign and more like a separate read_stream_next_buffer()s for when there is per buffer data and when there isn't. And a by-value and by-reference version for the one where there is data. I'll plan to add this assert tomorrow if that discussion doesn't materialize. - Melanie
Re: Document How Commit Handles Aborted Transactions
Ahmed, Thank you for the review. I'm a bit confused by the reports of apply and compile errors. I didn't touch anything involving "varlist" and see no errors then or now in my Meson ninja build. Nor does the CI report any. On Fri, Feb 14, 2025 at 2:01 PM Ahmed Ashour wrote: > Feedback: > - > - The patch was manually applied due to conflicts in `advanced.sgml` and > `commit.sgml`. > - Fixed invalid SGML structure by wrapping `` in > ``. > If those errors were/are real this wouldn't be ready to commit. But as they seem to be a local environment issue on your end, and you agree with the content, I'll keep it ready to commit. David J.
Re: moving some code out of explain.c
On Thu, Feb 27, 2025 at 2:21 PM Álvaro Herrera wrote: > On 2025-Feb-27, Robert Haas wrote: > > I see that the Redis-FDW test is failing; it will now need to include > > "commands/explain_format.h" -- unless we something, of course. > > I wonder if it was really a useful idea to move the declarations of > those functions from explain.h to the new explain_format.h file. It > seems that this new file now has a bunch of declarations that have > always been something of a public interface, together with others that > are only of internal explain.c interest, such as > ExplainOpenSetAsideGroup() and friends. I'm not completely certain about what I did with the header files, but for a somewhat different reason than what you mention here. I don't see ExplainOpenSetAsideGroup() as being any different from anything else that I put in explain_format.h -- it's something that code might want to call if it's generating EXPLAIN output, and otherwise not. But maybe I'm missing something -- what makes you see it as different from the other stuff? The thing that was bugging me a bit is that explain_format.h includes explain.h. It would be nice if those were more independent of each other, but I'm not sure if there's a reasonably clean way of accomplishing that. When deciding what to do there, it's also worth keeping in mind that there may be more opportunities to move stuff out of explain.c, so we might not want to get too dogmatic about the header file organization just yet. But, I'm certainly open to ideas. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Add regression tests of ecpg command notice (error / warning)
On Tue, Feb 18, 2025 at 12:49 PM Ryo Kanbayashi wrote: > > On Thu, Feb 13, 2025 at 10:49 PM Fujii Masao > wrote: > > > > > > > > On 2025/02/06 8:57, Ryo Kanbayashi wrote: > > > On Wed, Feb 5, 2025 at 9:31 PM Ryo Kanbayashi > > > wrote: > > >> > > >> Hi hackers, > > >> > > >> When I wrote patch of ecpg command notice bug, I recognized needs of > > >> regression tests for ecpg command notices and I say that I write the > > >> tests. > > > > Thanks for working on this! > > > > > > >> I explain about implementation of this patch. > > >> > > >> What is this patch > > >> - add regression tests which test ecpg command notices such as warning > > >> and errors > > >> - test notices implemented in ecpg.addons file > > >> > > >> Basic policy on implementation > > >> - do in a way that matches the method using the existing pg_regress > > >> command as much as possible > > >> - avoid methods that increase the scope of influence > > >> > > >> Next, I list answers to points that are likely to be pointed out in > > >> advance below :) > > >> - shell scripts and bat files is used due to ... > > >> avoid non zero exit code of ecpg command makes tests failure > > >> avoid increasing C code for executing binary which cares cross > > >> platform > > >> - python code is used because I couldn't write meson.build > > >> appropriately describe dependency about materials which is used on > > >> tests without it. please help me... > > >> - as you said, kick this kind of tests by pg_regress accompanied with > > >> needless PG server process execution. but pg_regress doesn't execute > > >> test without it and making pg_regress require modification which has > > >> not small scope of influence > > > > Wouldn't it be simpler to use the existing TAP test mechanism, > > as shown in the attached patch? Please note that this patch is very WIP, > > so there would be many places that need further implementation and > > refinement. > > Fujii San, > > Thank you for reviewing and indication of better implementation. > I rewrite my patch based on your reference implementation :) Fujii San and other hackers, I have rewrote my patch on TAP test sttyle :) File for build are also updated. Commitfest entry: https://commitfest.postgresql.org/patch/5543/ --- Great regards, Ryo Kanbayashi NTT Open Source Software Center ecpg-notice-regress-patch-tap-ver.patch Description: Binary data
Re: moving some code out of explain.c
Robert Haas writes: > On Thu, Feb 27, 2025 at 3:05 PM Tom Lane wrote: >> Did you look at avoiding that with our standard trick of writing >> detail-free struct declarations? > OK, here's a patch that does that. Thoughts? +1, but how about explain_dr.h too? It doesn't seem though that we can avoid #include "executor/instrument.h" there, so it'd be a little asymmetrical. But the executor inclusion doesn't seem nearly as much almost-circular. regards, tom lane
Re: moving some code out of explain.c
On Thu, Feb 27, 2025 at 3:05 PM Tom Lane wrote: > Robert Haas writes: > > The thing that was bugging me a bit is that explain_format.h includes > > explain.h. > > Yeah, I did not like that at all either -- it makes things a bit > circular, and I fear IWYU is going to make stupid recommendations > like not including explain.h in explain.c. > > Did you look at avoiding that with our standard trick of writing > detail-free struct declarations? That is, explain_format.h > would need > > struct ExplainState;/* avoid including explain.h here */ > > and then s/ExplainState/struct ExplainState/ in all the function > declarations. You'd still need to get List from someplace, but > it could be gotten by including primnodes.h or even pg_list.h. OK, here's a patch that does that. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com v1-0001-Avoid-including-commands-explain.h-in-commands-ex.patch Description: Binary data
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
On Thu, Feb 27, 2025 at 4:08 PM Tom Lane wrote: > I was down on it to start with, on the grounds that it wasn't adding > enough ease-of-use to justify even a relatively small amount of added > code. I'm not sure that the dump-reload failure angle is much of a > concern in reality, but I would have voted not to commit before and > I still do. OK, fair. Anyone want to argue the other side? -- Robert Haas EDB: http://www.enterprisedb.com
Re: moving some code out of explain.c
On 2025-Feb-27, Robert Haas wrote: > I see that the Redis-FDW test is failing; it will now need to include > "commands/explain_format.h" -- unless we something, of course. I wonder if it was really a useful idea to move the declarations of those functions from explain.h to the new explain_format.h file. It seems that this new file now has a bunch of declarations that have always been something of a public interface, together with others that are only of internal explain.c interest, such as ExplainOpenSetAsideGroup() and friends. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/