Useless LEFT JOIN breaks MIN/MAX optimization

2025-02-27 Thread Alena Rybakina

Hi hackers!

My colleague gave me an interesting case related to min max 
optimization. Adding a useless left join to the select min from t query 
breaks the min/max read optimization from the index.

What is meant is shown in the example below:

drop table if exists t1;
drop table if exists t2;

create table t1 (id int not null, mod text);
insert into t1 select id, (id % 10)::text from generate_series(1,10) id;
create unique index on t1(id);
create index on t1(mod);

This is the best plan for this query, since we only need one minimum 
value for this index. And it works perfectly:

explain select min(mod) from t1;
explain select min(mod) from t1;
QUERY PLAN

 Result (cost=0.33..0.34 rows=1 width=32)
 InitPlan 1 (returns $0)
 -> Limit (cost=0.29..0.33 rows=1 width=32)
 -> Index Only Scan using t1_mod_idx on t1 (cost=0.29..3861.54 
rows=99500 width=32)

 Index Cond: (mod IS NOT NULL)
(5 rows)

create table t2 (id int not null);
insert into t2 select id from generate_series(1,10) id;
create unique index on t2(id);

But if we add a join, we fall into a sec scan without options:
explain select min(t1.mod) from t1 left join t2 on t1.id = t2.id;
postgres=# explain select min(t1.mod) from t1 left join t2 on t1.id = t2.id;
QUERY PLAN
-
Aggregate (cost=1693.00..1693.01 rows=1 width=32)
-> Seq Scan on t1 (cost=0.00..1443.00 rows=10 width=32)

I have implemented a patch that solves this problem - allowing to 
consider and join expressions for trial optimization. I am glad for 
feedback and review!


--
Regards,
Alena Rybakina
Postgres Professional
From eb8fe49f68e198217a8f3e92ee424ff897f9e21e Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 27 Feb 2025 11:26:44 +0300
Subject: [PATCH] Apply min_max_transformation for eliminated join relations
 after successfuly applying join removal optimization.

---
 src/backend/optimizer/plan/planagg.c | 14 +++--
 src/test/regress/expected/aggregates.out | 40 
 src/test/regress/sql/aggregates.sql  | 17 ++
 3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 64605be3178..b1d44987fb1 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -125,9 +125,19 @@ preprocess_minmax_aggregates(PlannerInfo *root)
 			return;
 		jtnode = linitial(jtnode->fromlist);
 	}
-	if (!IsA(jtnode, RangeTblRef))
+
+	/*
+	 * Let's consider applying optimization to the remaining
+	 * single relations after join removal optimization.
+	 * To do this, we need to make sure that fromlist is empty and
+	 * quals contains only one RTE relation.aggs_list.
+	 */
+	if (IsA(jtnode, JoinExpr) && jtnode->fromlist == NIL && IsA(jtnode->quals, RangeTblRef))
+		rtr = (RangeTblRef *) (jtnode->quals);
+	else if (IsA(jtnode, RangeTblRef))
+		rtr = (RangeTblRef *) jtnode;
+	else
 		return;
-	rtr = (RangeTblRef *) jtnode;
 	rte = planner_rt_fetch(rtr->rtindex, root);
 	if (rte->rtekind == RTE_RELATION)
 		 /* ordinary relation, ok */ ;
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8c4f8ce27ed..79fe2fcce4d 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1329,6 +1329,46 @@ from int4_tbl t0;
 (5 rows)
 
 rollback;
+--Check min/max optimization for join expressions
+create unique index tenk1_unique1_idx on tenk1(unique1);
+create index tenk1_unique2_idx on tenk1(unique2);
+create unique index INT4_TBL_f1_idx on INT4_TBL(f1);
+explain (COSTS OFF)
+select min(t1.unique2) from tenk1 t1 left join INT4_TBL t2 on t1.unique1 = t2.f1;
+  QUERY PLAN  
+--
+ Result
+   InitPlan 1
+ ->  Limit
+   ->  Index Scan using tenk1_unique2_idx on tenk1 t1
+ Index Cond: (unique2 IS NOT NULL)
+(5 rows)
+
+explain (COSTS OFF)
+select min(t1.unique2) from tenk1 t1 inner join INT4_TBL t2 on t1.unique1 = t2.f1;
+ QUERY PLAN 
+
+ Aggregate
+   ->  Nested Loop
+ ->  Seq Scan on int4_tbl t2
+ ->  Index Scan using tenk1_unique1_idx on tenk1 t1
+   Index Cond: (unique1 = t2.f1)
+(5 rows)
+
+explain (COSTS OFF)
+select min(t1.unique2) from tenk1 t1 left outer join INT4_TBL t2 on t1.unique1 = t2.f1;
+  QUERY PLAN  
+--
+ Result
+   InitPlan 1
+ ->  Limit
+   ->  Index Scan using tenk1_unique2_idx on tenk1 t1
+ Index Cond: (unique2 IS NOT NULL)
+(5 rows)
+
+DROP INDEX tenk1_unique1_idx;
+DROP INDEX 

Re: Changing shared_buffers without restart

2025-02-27 Thread Dmitry Dolgov
> On Tue, Feb 25, 2025 at 10:52:05AM GMT, Dmitry Dolgov wrote:
> > On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote:
> > TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via
> > changing shared memory mapping layout. Any feedback is appreciated.
>
> Hi,
>
> Here is a new version of the patch, which contains a proposal about how to
> coordinate shared memory resizing between backends. The rest is more or less
> the same, a feedback about coordination is appreciated. It's a lot to read, 
> but
> the main difference is about:

Just one note, there are still couple of compilation warnings in the
code, which I haven't addressed yet. Those will go away in the next
version.




Re: Enhances pg_createsubscriber documentation for the -d option.

2025-02-27 Thread Amit Kapila
On Wed, Feb 26, 2025 at 4:48 PM vignesh C  wrote:
>
> On Wed, 26 Feb 2025 at 14:35, Amit Kapila  wrote:
> >
> > How about changing it slightly as follows to make it clear: "If
> > -doption is not provided, the database name will be
> > obtained from -P. If the database name is not
> > specified in either the -d option or
> > -P, an error will be reported."?
>
> Looks good to me, here is an updated v2 version with the changes.
>

Pushed.

-- 
With Regards,
Amit Kapila.




RE: long-standing data loss bug in initial sync of logical replication

2025-02-27 Thread Zhijie Hou (Fujitsu)
On Monday, February 24, 2025 5:50 PM Amit Kapila  
wrote:
> 
> On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada
>  wrote:
> >
> > I confirmed that the proposed patch fixes these issues. I have one
> > question about the patch:
> >
> > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> > following code:
> >
> >/*
> > * If we don't have a base snapshot yet, there are no changes in this
> > * transaction which in turn implies we don't yet need a snapshot at
> > * all. We'll add a snapshot when the first change gets queued.
> > *
> > * NB: This works correctly even for subtransactions because
> > * ReorderBufferAssignChild() takes care to transfer the base
> snapshot
> > * to the top-level transaction, and while iterating the changequeue
> > * we'll get the change from the subtxn.
> > */
> >if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
> >continue;
> >
> > Is there any case where we need to distribute inval messages to
> > transactions that don't have the base snapshot yet but eventually need
> > the inval messages?
> >
> 
> Good point. It is mentioned that for snapshots: "We'll add a snapshot
> when the first change gets queued.". I think we achieve this via
> builder->committed.xip array such that when we set a base snapshot for
> a transaction, we use that array to form a snapshot. However, I don't
> see any such consideration for invalidations. Now, we could either
> always add invalidations to xacts that don't have base_snapshot yet or
> have a mechanism similar committed.xid array. But it is better to
> first reproduce the problem.

I think distributing invalidations to a transaction that has not yet built a
base snapshot is un-necessary. This is because, during the process of building
its base snapshot, such a transaction will have already recorded the XID of the
transaction that altered the publication information into its array of
committed XIDs. Consequently, it will reflect the latest changes in the catalog
from the beginning. In the context of logical decoding, this scenario is
analogous to decoding a new transaction initiated after the catalog-change
transaction has been committed.

The original issue arises because the catalog cache was constructed using an
outdated snapshot that could not reflect the latest catalog changes. However,
this is not a problem in cases without a base snapshot. Since the existing
catalog cache should have been invalidated upon decoding the committed
catalog-change transaction, the subsequent transactions will construct a new
cache with the latest snapshot.

I also considered the scenario where only a sub-transaction has a base snapshot
that has not yet been transferred to its top-level transaction. However, I
think this is not problematic because a sub-transaction transfers its snapshot
immediately upon building it (see ReorderBufferSetBaseSnapshot). The only
exception is if the sub-transaction is independent (i.e., not yet associated
with its top-level transaction). In such a case, the sub-transaction is treated
as a top-level transaction, and invalidations will be distributed to this
sub-transaction after applying the patch which is sufficient to resolve the
issue.

Considering the complexity of this topic, I think it would be better to add some
comments like the attachment

Best Regards,
Hou zj


0001-add-comments-for-txns-without-base-snapshot.patch
Description: 0001-add-comments-for-txns-without-base-snapshot.patch


Re: Race condition in replication slot usage introduced by commit f41d846

2025-02-27 Thread Amit Kapila
On Wed, Feb 26, 2025 at 7:18 PM Amit Kapila  wrote:
>
> On Wed, Feb 26, 2025 at 5:40 PM Nisha Moond  wrote:
> >
> > Attached the patch v1 fixing this issue.
> >
> > Issue reported by: Hou Zhijie.
> >
>
> Thanks for the report and patch. I'll look into it.
>

Pushed the patch with a slightly different comment.

-- 
With Regards,
Amit Kapila.




Re: Restrict copying of invalidated replication slots

2025-02-27 Thread Amit Kapila
On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada  wrote:
>
> On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila  wrote:
> >
> > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with
> > this operation. So, isn't it possible that the source slot exists at
> > the later position in ReplicationSlotCtl->replication_slots and the
> > loop traversing slots is already ahead from the point where the newly
> > copied slot is created?
>
> Good point. I think it's possible.
>
> > If so, the newly created slot won't be marked
> > as invalid whereas the source slot will be marked as invalid. I agree
> > that even in such a case, at a later point, the newly created slot
> > will also be marked as invalid.
>
> The wal_status of the newly created slot would immediately become
> 'lost' and the next checkpoint will invalidate it. Do we need to do
> something to deal with this case?
>

+ /* Check if source slot became invalidated during the copy operation */
+ if (second_slot_contents.data.invalidated != RS_INVAL_NONE)
+ ereport(ERROR,
+ errmsg("cannot copy replication slot \"%s\"",
+NameStr(*src_name)),
+ errdetail("The source replication slot was invalidated during the
copy operation."));

How about adding a similar test as above for MyReplicationSlot? That
should suffice the need because we have already acquired the new slot
by this time and invalidation should signal this process before
marking the new slot as invalid.

-- 
With Regards,
Amit Kapila.




Re: Draft for basic NUMA observability

2025-02-27 Thread Bertrand Drouvot
Hi Jakub,

On Wed, Feb 26, 2025 at 09:48:41AM +0100, Jakub Wartak wrote:
> On Mon, Feb 24, 2025 at 5:11 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Mon, Feb 24, 2025 at 09:06:20AM -0500, Andres Freund wrote:
> > > Does the issue with "new" backends seeing pages as not present exist both 
> > > with
> > > and without huge pages?
> >
> > That's a good point and from what I can see it's correct with huge pages 
> > being
> > used (it means all processes see the same NUMA node assignment regardless of
> > access patterns).
> 
> Hi Bertrand , please see that nearby thread. I've got quite the
> opposite results. I need page-fault memory or I get invalid results
> ("-2"). What kernel version are you using ? (I've tried it on two
> 6.10.x series kernels , virtualized in both cases; one was EPYC [real
> NUMA, but not VM so not real hardware])

Thanks for sharing your numbers!

It looks like that with hp enabled then the shared_buffers plays a role.

1. With hp, shared_buffers 4GB:

 huge_pages_status
---
 on
(1 row)

 shared_buffers

 4GB
(1 row)

NOTICE:  os_page_count=2048 os_page_size=2097152 pages_per_blk=0.003906
 numa_zone_id | count
--+
  | 507618
0 |   1054
   -2 |  15616
(3 rows)

2. With hp, shared_buffers 23GB:

 huge_pages_status
---
 on
(1 row)

 shared_buffers

 23GB
(1 row)

NOTICE:  os_page_count=11776 os_page_size=2097152 pages_per_blk=0.003906
 numa_zone_id |  count
--+-
  | 2997974
0 |   16682
(2 rows)

3. no hp, shared_buffers 23GB:

 huge_pages_status
---
 off
(1 row)

 shared_buffers

 23GB
(1 row)

ERROR:  extension "pg_buffercache" already exists
NOTICE:  os_page_count=6029312 os_page_size=4096 pages_per_blk=2.00
 numa_zone_id |  count
--+-
  | 2997975
   -2 |   16482
1 | 199
(3 rows)

Maybe the kernel is taking some decisions based on the HugePages_Rsvd, I've
no ideas. Anyway there is little than we can do and the "touchpages" patch
seems to provide "accurate" results.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Draft for basic NUMA observability

2025-02-27 Thread Jakub Wartak
On Wed, Feb 26, 2025 at 6:13 PM Bertrand Drouvot
 wrote:
[..]
> > Meanwhile v5 is attached with slight changes to try to make cfbot happy:
>
> Thanks for the updated version!
>
> FWIW, I had to do a few changes to get an error free compiling experience with
> autoconf/or meson and both with or without the libnuma configure option.
>
> Sharing here as .txt files:
>
> Also the pg_buffercache test fails without the libnuma configure option. Maybe
> some tests should depend of the libnuma configure option.
[..]

Thank you so much for this Bertrand !

I've applied those , played a little bit with configure and meson and
reproduced the test error and fixed it by silencing that NOTICE in
tests. So v6 is attached even before I get a chance to start using
that CI. Still waiting for some input and tests regarding that earlier
touchpages attempt, docs are still missing...

-J.


v6-0003-Add-pg_shmem_numa_allocations-to-show-NUMA-zones-.patch
Description: Binary data


v6-0002-Extend-pg_buffercache-with-new-view-pg_buffercach.patch
Description: Binary data


v6-0001-Add-optional-dependency-to-libnuma-for-basic-NUMA.patch
Description: Binary data


Re: Fix api misuse (src/bin/pg_amcheck/pg_amcheck.c)

2025-02-27 Thread Ranier Vilela
Em qui., 27 de fev. de 2025 às 02:08, Michael Paquier 
escreveu:

> On Wed, Feb 26, 2025 at 07:04:25PM +0530, vignesh C wrote:
> > On Tue, 18 Feb 2025 at 18:18, Ranier Vilela  wrote:
> >> Similar to commit 5b94e27 [1]
> >> The correct function when freeing memory in the frontend,
> >> using the function PQescapeIdentifier, must be
> >> PQfreemem.
> >>
> >> Trivial patch attached.
> >
> > Thanks, this change looks good to me.
>
> There's no point in leaving that unfixed in pg_amcheck, so done down
> to 14 with 48e4ae9a0707.
>
Thank you Michael.

best regards,
Ranier Vilela


Re: Replace IN VALUES with ANY in WHERE clauses during optimization

2025-02-27 Thread newtglobal postgresql_contributors
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hi Ivan Kush
I tested the patch with `commands.sql` and observed noticeable improvements in 
planning and execution time, especially with multiple tables. Even single-table 
queries show small time reductions (0.02–0.04 ms). The patch optimizes `IN` 
clauses effectively, particularly with `VALUES`. For example, `col IN (VALUES 
('a'), ('b'), ('c'))` now behaves similarly to `col IN ('a', 'b', 'c')`, 
leading to faster execution and reduced planning overhead. 
Regards, 
Postgresql Contributors - NewtGlobal

Re: NOT ENFORCED constraint feature

2025-02-27 Thread Álvaro Herrera
On 2025-Feb-27, Amul Sul wrote:

> Attached is the rebased patch set against the latest master head,
> which also includes a *new* refactoring patch (0001). In this patch,
> I’ve re-added ATExecAlterChildConstr(), which is required for the main
> feature patch (0008) to handle recursion from different places while
> altering enforceability.

I think you refer to ATExecAlterConstrEnforceability, which claims
(falsely) that it is a subroutine to ATExecAlterConstrRecurse; in
reality it is called from ATExecAlterConstraintInternal or at least
that's what I see in your 0008.  So I wonder if you haven't confused
yourself here.  If nothing else, that comments needs fixed.  I didn't
review these patches.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?"  (Mafalda)




Re: New "raw" COPY format

2025-02-27 Thread newtglobal postgresql_contributors
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hi Joel,
After testing the patch, I observed that for single-column tables, the format 
evolved from SINGLE to RAW and finally to LIST to handle diverse data more 
flexibly. For example, the command: \COPY test.foo2 FROM 
'/home/newtdba/postgres-cf-5300/testfiles/testname.txt' WITH (FORMAT LIST); 
works with CSV, TXT, and RAW files without specifying column names. This LIST 
format is effective for copying data to/from single-column tables but requires 
specifying the correct format.

The new status of this patch is: Needs review


Re: SQLFunctionCache and generic plans

2025-02-27 Thread Alexander Pyhalov

Pavel Stehule писал(а) 2025-02-26 22:34:

hI

I can confirm 60% speedup for execution of function fx and fx3 - both
functions are very primitive, so for real code the benefit can be
higher

Unfortunately, there is about 5% slowdown for inlined code, and for
just plpgsql code too.

I tested fx4

create or replace function fx4(int) returns int immutable as $$ begin
return $1 + $1; end $$ language plpgsql;

and fx2

create or replace function fx2(int) returns int as $$ select 2 * $1;
$$
language sql immutable;

and execution of patched code is about 5% slower. It is strange so
this patch has a negative impact on plpgsql execution.

Regards

Pavel


Hi. I've tried to reproduce slowdown and couldn't.

create or replace function fx4(int) returns int immutable as $$ begin 
return $1 + $1; end $$ language plpgsql;


do $$
begin
  for i in 1..500 loop
perform fx4((random()*100)::int); -- or fx2
  end loop;
end;
$$;

OLD results:
Time: 8268.614 ms (00:08.269)
Time: 8178.836 ms (00:08.179)
Time: 8306.694 ms (00:08.307)

New (patched) results:
Time: 7743.945 ms (00:07.744)
Time: 7803.109 ms (00:07.803)
Time: 7736.735 ms (00:07.737)

Not sure why new is faster (perhaps, some noise?) - looking at perf 
flamegraphs I don't see something evident.


create or replace function fx2(int) returns int as $$ select 2 * $1; $$ 
language sql immutable;

do $$
begin
  for i in 1..500 loop
perform fx2((random()*100)::int); -- or fx2
  end loop;
end;
$$;

OLD results:
Time: 5346.471 ms (00:05.346)
Time: 5359.222 ms (00:05.359)
Time: 5316.747 ms (00:05.317)

New (patched) results:
Time: 5188.363 ms (00:05.188)
Time: 5225.322 ms (00:05.225)
Time: 5203.667 ms (00:05.204)

--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: psql \dh: List High-Level (Root) Tables and Indexes

2025-02-27 Thread Sadeq Dousti
Thanks Vignesh for the review!


> Currently we are supporting only PG13 and higher versions.

I understand that servers older than PG13 are no longer supported. But on
the client side, we still have this notice at the top of describe.c file,
which indicates that the client should support 9.2+.

* Support for the various \d ("describe") commands.  Note that the current
 * expectation is that all functions in this file will succeed when working
 * with servers of versions 9.2 and up.  It's okay to omit irrelevant
 * information for an old server, but not to fail outright.  (But failing
 * against a pre-9.2 server is allowed.)

I'm just following the instructions here so as not to break anything
unwanted, and you can see for instance \dP is doing the same.

That said, I'm totally fine with removing the "if" from my patch, but first
I think a committer should update the above comment to the least supported
version for client code.

Best Regards,
Sadeq Dousti


Re: new commitfest transition guidance

2025-02-27 Thread Andrew Dunstan



On 2025-02-27 Th 12:16 AM, Michael Paquier wrote:

On Mon, Feb 03, 2025 at 12:22:52PM +0100, Peter Eisentraut wrote:

CF 2025-01 has just ended, so I suggest that everyone try this now.  We can
check in perhaps two weeks whether this results in lots of stuff falling
through the cracks or still too much stuff with unclear status being moved
forward, and then see what that might mean going forward.

CF 2025-03 is to begin in more or less 48 hours, and we have still a
grand total of 72 patches still listed in CF 2025-01:
https://commitfest.postgresql.org/51/

It's a good score, as 286 patches have been moved without doing any
kind of massive bulky and manual vacuum work on all these entries.

As these have not been moved by their respective authors and/or
reviewers, perhaps, based on the guidance I am reading from this
thread, it would be time to give up on these rather than move them
around?




There are 4 marked "Ready For Committer" - all authored by committers :-)

Maybe the message isn't getting through, After I got the email message, 
I moved one yesterday that I am listed on as an author although I'm not 
really, but the real author had not moved.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQLFunctionCache and generic plans

2025-02-27 Thread Pavel Stehule
čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <
a.pyha...@postgrespro.ru> napsal:

> Pavel Stehule писал(а) 2025-02-26 22:34:
> > hI
> >
> > I can confirm 60% speedup for execution of function fx and fx3 - both
> > functions are very primitive, so for real code the benefit can be
> > higher
> >
> > Unfortunately, there is about 5% slowdown for inlined code, and for
> > just plpgsql code too.
> >
> > I tested fx4
> >
> > create or replace function fx4(int) returns int immutable as $$ begin
> > return $1 + $1; end $$ language plpgsql;
> >
> > and fx2
> >
> > create or replace function fx2(int) returns int as $$ select 2 * $1;
> > $$
> > language sql immutable;
> >
> > and execution of patched code is about 5% slower. It is strange so
> > this patch has a negative impact on plpgsql execution.
> >
> > Regards
> >
> > Pavel
>
> Hi. I've tried to reproduce slowdown and couldn't.
>
> create or replace function fx4(int) returns int immutable as $$ begin
> return $1 + $1; end $$ language plpgsql;
>
> do $$
> begin
>for i in 1..500 loop
>  perform fx4((random()*100)::int); -- or fx2
>end loop;
> end;
> $$;
>
> OLD results:
> Time: 8268.614 ms (00:08.269)
> Time: 8178.836 ms (00:08.179)
> Time: 8306.694 ms (00:08.307)
>
> New (patched) results:
> Time: 7743.945 ms (00:07.744)
> Time: 7803.109 ms (00:07.803)
> Time: 7736.735 ms (00:07.737)
>
> Not sure why new is faster (perhaps, some noise?) - looking at perf
> flamegraphs I don't see something evident.
>
> create or replace function fx2(int) returns int as $$ select 2 * $1; $$
> language sql immutable;
> do $$
> begin
>for i in 1..500 loop
>  perform fx2((random()*100)::int); -- or fx2
>end loop;
> end;
> $$;
>
> OLD results:
> Time: 5346.471 ms (00:05.346)
> Time: 5359.222 ms (00:05.359)
> Time: 5316.747 ms (00:05.317)
>
> New (patched) results:
> Time: 5188.363 ms (00:05.188)
> Time: 5225.322 ms (00:05.225)
> Time: 5203.667 ms (00:05.204)
>

I'll try to get profiles.

Regards

Pavel

>
> --
> Best regards,
> Alexander Pyhalov,
> Postgres Professional
>


Add support for EXTRA_REGRESS_OPTS for meson

2025-02-27 Thread Andreas Karlsson

Hi,

We use EXTRA_REGRESS_OPTS to make sure the whole test suite passes with 
our extension loaded and since I prefer develop in meson over using 
autotools and make the lack of support for EXTRA_REGRESS_OPTS in meson

has bugged me for a while.

I have implemented support for it as an environment variable we read in 
the testwrap script instead of adding it as a configuration option to 
meson.build. The reason for this is that I do not like having to run 
"meson reconfigure" all the time plus that for the PG_TEST_EXTRA we 
ended up having to add an environment variable anyway.


To use this run e.g. the following:

  EXTRA_REGRESS_OPTS="--load-extension=pgcrypto" meson test

Question: Would it make sense to rename it to PG_REGRESS_EXTRA_OPTS or 
something similar while we already touch touch this code to make the 
various options easier to remember?


Andreas
From 87ce622a19315b679bbd5691e01c96261bc0c4c8 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson 
Date: Thu, 27 Feb 2025 00:23:44 +0100
Subject: [PATCH] meson: Add support for EXTRA_REGRESS_OPTS

Add support for the EXTRA_REGRESS_OPTS environment variable in meson
which works just like with make and applies to all regress, ecpg and
isolation tests. TAP tests which support it under make will continue
to support the option.

Run it with e.g:

EXTRA_REGRESS_OPTS="--temp-config=test.conf" meson test
---
 src/tools/testwrap | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/tools/testwrap b/src/tools/testwrap
index 8ae8fb79ba7..f52ca376da1 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -2,6 +2,7 @@
 
 import argparse
 import shutil
+import shlex
 import subprocess
 import os
 import sys
@@ -51,7 +52,12 @@ env_dict = {**os.environ,
 if "PG_TEST_EXTRA" not in env_dict and args.pg_test_extra:
 env_dict["PG_TEST_EXTRA"] = args.pg_test_extra
 
-sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE)
+if args.testname in ['regress', 'isolation', 'ecpg'] and 'EXTRA_REGRESS_OPTS' in env_dict:
+test_command = args.test_command + shlex.split(env_dict['EXTRA_REGRESS_OPTS'])
+else:
+test_command = args.test_command
+
+sp = subprocess.Popen(test_command, env=env_dict, stdout=subprocess.PIPE)
 # Meson categorizes a passing TODO test point as bad
 # (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
 # directive, so Meson computes the file result like Perl does.  This could
-- 
2.47.2



Re: Add support for EXTRA_REGRESS_OPTS for meson

2025-02-27 Thread Andres Freund
Hi,

On 2025-02-27 13:53:17 +0100, Andreas Karlsson wrote:
> We use EXTRA_REGRESS_OPTS to make sure the whole test suite passes with our
> extension loaded and since I prefer develop in meson over using autotools
> and make the lack of support for EXTRA_REGRESS_OPTS in meson
> has bugged me for a while.

Yep, we should add support for that.  TEMP_CONFIG probably too.


> Question: Would it make sense to rename it to PG_REGRESS_EXTRA_OPTS or
> something similar while we already touch touch this code to make the various
> options easier to remember?

I'd not tackle that at the same time personally.


> @@ -51,7 +52,12 @@ env_dict = {**os.environ,
>  if "PG_TEST_EXTRA" not in env_dict and args.pg_test_extra:
>  env_dict["PG_TEST_EXTRA"] = args.pg_test_extra
>
> -sp = subprocess.Popen(args.test_command, env=env_dict, 
> stdout=subprocess.PIPE)
> +if args.testname in ['regress', 'isolation', 'ecpg'] and 
> 'EXTRA_REGRESS_OPTS' in env_dict:
> +test_command = args.test_command + 
> shlex.split(env_dict['EXTRA_REGRESS_OPTS'])
> +else:
> +test_command = args.test_command
> +
> +sp = subprocess.Popen(test_command, env=env_dict, stdout=subprocess.PIPE)
>  # Meson categorizes a passing TODO test point as bad
>  # (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
>  # directive, so Meson computes the file result like Perl does.  This could

I hacked up something similar before, for TEMP_CONFIG, and found that I needed
to do something like this:

+if 'TEMP_CONFIG' in os.environ and \
+args.testname in ['regress', 'isolation', 'ecpg']:
+# be careful to insert before non-option args, otherwise it'll fail
+# e.g. on windows
+args.test_command.insert(1, '--temp-config='+os.environ['TEMP_CONFIG'])

Greetings,

Andres Freund




Re: Small memory fixes for pg_createsubcriber

2025-02-27 Thread Ranier Vilela
Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier 
escreveu:

> On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote:
> > @@ -455,7 +455,9 @@ set_locale_and_encoding(void)
> >   locale->db_locale,
> >   strlen(locale->db_locale));
> >   else
> > - datlocale_literal = pg_strdup("NULL");
> > + datlocale_literal = PQescapeLiteral(conn_new_template1,
> > + "NULL",
> > + strlen("NULL"));
>
> Yeah, I've considered that but hardcoding NULL twice felt a bit weird,
> as well.  Perhaps it's slightly cleaner to use an intermediate
> variable given then as an argument of PQescapeLiteral()?
>
Yeah, I also think it would look good like this.

v1 attached.

best regards,
Ranier Vilela


v1-avoid-mix-api-pg_upgrade.patch
Description: Binary data


Re: Serverside SNI support in libpq

2025-02-27 Thread Daniel Gustafsson
> On 24 Feb 2025, at 22:51, Jacob Champion  
> wrote:
> 
> On Wed, Feb 19, 2025 at 3:13 PM Daniel Gustafsson  wrote:
>> Are there any blockers for getting this in?
> 
>> +   SSL_context = ssl_init_context(isServerStart, host);
> 
> I'm still not quite following the rationale behind the SSL_context
> assignment. To maybe illustrate, attached are some tests that I
> expected to pass, but don't.
> 
> After adding an additional host and reloading the config, the behavior
> of the original fallback host seems to change. Am I misunderstanding
> the designed fallback behavior, have I misdesigned my test, or is this
> a bug?

Thanks for the tests, they did in fact uncover a bug in how fallback was
handled which is now fixed.  In doing so I revamped how the default context
handling is done, it now always use the GUCs in postgresql.conf for
consistency.  The attached v6 rebase contains this as well as your tests as
well as general cleanup and comment writing etc.

--
Daniel Gustafsson



v6-0001-Serverside-SNI-support-for-libpq.patch
Description: Binary data


Re: Logging which local address was connected to in log_line_prefix

2025-02-27 Thread Greg Sabino Mullane
On Mon, Nov 18, 2024 at 10:07 AM Jim Jones 
wrote:

> 2024-11-18 16:00:42.720 CET [3135117] -> 192.168.178.27 STATEMENT:
> ...
> 2024-11-18 16:01:23.273 CET [3114980] -> [local] LOG:  received SIGHUP,
> ...

2024-11-18 16:01:46.769 CET [3114981] -> [local] LOG:  checkpoint
> Is it supposed to be like this?
>

Great question. I think "supposed to" is a bit of a stretch, but I presume
it's the difference between a client connecting and using its connection
information versus an already existing backend process, which is always
going to be "local".

Overall this makes sense, as that checkpoint example above is coming from
the checkpointer background process at 3114981, not the backend process
that happened to trigger it. And 3114981 has no way of knowing the details
of the caller's connection.

FWIW, the patch still applies cleanly to head as of 2/27/2025, so no rebase
needed.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: per backend WAL statistics

2025-02-27 Thread Michael Paquier
On Thu, Feb 27, 2025 at 07:47:09AM +, Bertrand Drouvot wrote:
> That's how I did it initially but decided to move it to pgstat_backend.c. The
> reason was that it's fully linked to "per backend" stats and that there is
> no SQL api on top of it (while I think that's the case for almost all the ones
> in pgstatfuncs.c). Thoughts?

Okay by me with pgstat_fetch_stat_backend in parallel, why not
exposing this part as well..  Perhaps that could be useful for some
extension?  I'd rather have out-of-core code do these lookups with the
same sanity checks in place for the procnumber and slot lookups.

The name was inconsistent with the rest of the file, so I have settled
to a pgstat_fetch_stat_backend_by_pid() to be more consistent.  A
second thing is to properly initialize bktype if defined by the
caller.

> Makes sense. I did not had in mind to submit a new patch version (to at least
> implement the above) without getting your final thoughts on your first 
> comment.
> But since a rebase is needed anyway,then please find attached a new version. 
> It
> just implements your last comment.

Attached is a rebased version of the rest.
--
Michael
From 42fd2860e3f06d525936187f46aad06892073078 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 10:00:00 +
Subject: [PATCH v12] per backend WAL statistics

Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.

This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.

The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.

XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
 src/include/catalog/pg_proc.dat |  7 +++
 src/include/pgstat.h| 37 ++--
 src/include/utils/pgstat_internal.h |  3 +-
 src/backend/utils/activity/pgstat_backend.c | 64 +
 src/backend/utils/activity/pgstat_wal.c |  1 +
 src/backend/utils/adt/pgstatfuncs.c | 26 -
 src/test/regress/expected/stats.out | 14 +
 src/test/regress/sql/stats.sql  |  6 ++
 doc/src/sgml/monitoring.sgml| 19 ++
 9 files changed, 156 insertions(+), 21 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bacf..3e35f8b8e99a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
   proargmodes => '{o,o,o,o,o}',
   proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+  proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+  proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+  proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o}',
+  proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+  prosrc => 'pg_stat_get_backend_wal' },
 { oid => '6248', descr => 'statistics: information about WAL prefetching',
   proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d5..06359b9157d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
-typedef struct PgStat_Backend
-{
-	TimestampTz stat_reset_timestamp;
-	PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* -
- * PgStat_BackendPending	Non-flushed backend stats.
- * -
- */
-typedef struct PgStat_BackendPending
-{
-	/*
-	 * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
-	 */
-	PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
 typedef struct PgStat_StatDBEntry
 {
 	PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO io_stats;
+	PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* -
+ * PgStat_BackendPending	Non-flushed backend stats.
+ * -
+ */
+typedef struct PgStat_BackendPending
+{
+	/*
+	 * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+	 */
+	PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
 /*
  * Functions in pgstat.c
  */
diff --git a/src/include/utils/pgstat_internal.h b/sr

Re: Disabling vacuum truncate for autovacuum

2025-02-27 Thread Nathan Bossart
On Tue, Feb 18, 2025 at 01:56:09PM -0600, Nathan Bossart wrote:
> On Mon, Jan 27, 2025 at 03:38:39PM -0500, Robert Haas wrote:
>> Also, how sure are we that turning this off globally is a solid idea?
>> Off-hand, it doesn't sound that bad: there are probably situations in
>> which autovacuum never truncates anything anyway just because the tail
>> blocks of the relations always contain at least 1 tuple. But we should
>> be careful not to add a setting that is far more likely to get people
>> into trouble than to get them out of it. It would be good to hear what
>> other people think about the risk vs. reward tradeoff in this case.
> 
> My first reaction is that a global setting is probably fine most of the
> time.  I'm sure it's possible to get into bad situations if you try hard
> enough, but that's not a unique characteristic.  There are probably many
> situations where the truncation is wasted effort because we'll just end up
> extending the relation shortly afterwards, anyway.  In any case, it's
> already possible to achieve $SUBJECT with a trivial script that sets
> storage parameters on all tables, so IMHO it would be silly to withhold a
> global setting that does the same thing just on principle.

I spent some time on this one today.  A couple of notes:

* Since the reloption is a Boolean, there isn't a good way to tell whether
  it is actually set for the table or if it just inherited the default
  value.  This is important to know because we want the reloption to
  override the GUC.  I considered a bunch of different ways to handle this,
  but everything felt like a cowboy hack.  The cleanest cowboy hack I could
  come up with is an optional offset field in relopt_parse_elt that points
  to a variable that stores whether the option was explicitly set.

* I didn't see a good GUC category for vacuum_truncate.  I suppose we could
  create a new one, but for now I've just stashed it into the autovacuum
  one.  Bikeshedding welcome.

Thoughts?

-- 
nathan
>From e360b56acd1d3bd05c9df6cfc4586e51edce357e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 27 Feb 2025 21:09:37 -0600
Subject: [PATCH v2 1/1] Add vacuum_truncate GUC.

---
 doc/src/sgml/config.sgml  | 22 +++
 doc/src/sgml/ref/create_table.sgml| 10 +
 doc/src/sgml/ref/vacuum.sgml  |  3 ++-
 src/backend/access/common/reloptions.c| 11 --
 src/backend/commands/vacuum.c | 17 ++
 src/backend/utils/misc/guc_tables.c   |  9 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/reloptions.h   |  1 +
 src/include/commands/vacuum.h |  1 +
 src/include/utils/rel.h   |  1 +
 10 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e55700f35b8..069ac35762f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8934,6 +8934,28 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH 
csv;

   
 
+  
+   vacuum_truncate (boolean)
+   
+vacuum_truncate configuration 
parameter
+   
+   
+   
+
+ Enables or disables vacuum to try to truncate off any empty pages at
+ the end of the table.  The default value is true.
+ If true, VACUUM and autovacuum
+ do the truncation and the disk space for the truncated pages is
+ returned to the operating system.  Note that the truncation requires
+ an ACCESS EXCLUSIVE lock on the table.  The
+ TRUNCATE parameter of
+ VACUUM, if
+ specified, overrides the value of this parameter.  The setting can be
+ overridden for individual tables by changing table storage parameters.
+
+   
+  
+
  
 
 
diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index 0a3e520f215..3c2315b1a8e 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1688,15 +1688,7 @@ WITH ( MODULUS numeric_literal, REM
 
 
  
-  Enables or disables vacuum to try to truncate off any empty pages
-  at the end of this table. The default value is true.
-  If true, VACUUM and
-  autovacuum do the truncation and the disk space for
-  the truncated pages is returned to the operating system.
-  Note that the truncation requires ACCESS EXCLUSIVE
-  lock on the table. The TRUNCATE parameter
-  of VACUUM, if 
specified, overrides the value
-  of this option.
+  Per-table value for  parameter.
  
 

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 971b1237d47..bd5dcaf86a5 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -265,7 +265,8 @@ VACUUM [ ( option [, ...] ) ] [ vacuum_truncate
+  and is the default unless 
+  is set to false or 

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Thomas Munro
On Fri, Feb 28, 2025 at 11:58 AM Melanie Plageman
 wrote:
> On Thu, Feb 27, 2025 at 1:08 PM Tom Lane  wrote:
> > I wonder if it'd be a good idea to add something like
> >
> > Assert(stream->distance == 1);
> > Assert(stream->pending_read_nblocks == 0);
> > Assert(stream->per_buffer_data_size == 0);
> > +   Assert(per_buffer_data == NULL);
> >
> > in read_stream_next_buffer.  I doubt that this will shut Coverity
> > up, but it would help to catch caller coding errors, i.e. passing
> > a per_buffer_data pointer when there's no per-buffer data.
>
> I think this is a good stopgap. I was discussing adding this assert
> off-list with Thomas and he wanted to detail his more ambitious plans
> for type safety improvements in the read stream API. Less on the order
> of a redesign and more like a separate read_stream_next_buffer()s for
> when there is per buffer data and when there isn't. And a by-value and
> by-reference version for the one where there is data.

Here's what I had in mind.  Is it better?
From 68e9424b590051959142917459eb4ea074589b79 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 28 Feb 2025 10:48:29 +1300
Subject: [PATCH 1/2] Improve API for retrieving data from read streams.

Dealing with the per_buffer_data argument to read_stream_next_buffer()
has proven a bit clunky.  Provide some new wrapper functions/macros:

buffer = read_stream_get_buffer(rs);
buffer = read_stream_get_buffer_and_value(rs, &my_int);
buffer = read_stream_get_buffer_and_pointer(rs, &my_pointer_to_int);

These improve readability and type safety via assertions.
---
 contrib/pg_prewarm/pg_prewarm.c  |  4 +-
 contrib/pg_visibility/pg_visibility.c|  6 +--
 src/backend/access/heap/heapam.c |  2 +-
 src/backend/access/heap/heapam_handler.c |  2 +-
 src/backend/access/heap/vacuumlazy.c |  6 +--
 src/backend/storage/aio/read_stream.c| 12 +
 src/backend/storage/buffer/bufmgr.c  |  4 +-
 src/include/storage/read_stream.h| 64 
 8 files changed, 87 insertions(+), 13 deletions(-)

diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index a2f0ac4af0c..f6ae266d7b0 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -208,11 +208,11 @@ pg_prewarm(PG_FUNCTION_ARGS)
 			Buffer		buf;
 
 			CHECK_FOR_INTERRUPTS();
-			buf = read_stream_next_buffer(stream, NULL);
+			buf = read_stream_get_buffer(stream);
 			ReleaseBuffer(buf);
 			++blocks_done;
 		}
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 7f268a18a74..e7187a46c9d 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -556,7 +556,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 			Buffer		buffer;
 			Page		page;
 
-			buffer = read_stream_next_buffer(stream, NULL);
+			buffer = read_stream_get_buffer(stream);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 			page = BufferGetPage(buffer);
@@ -569,7 +569,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 
 	if (include_pd)
 	{
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
@@ -752,7 +752,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 		0);
 
 	/* Loop over every block in the relation. */
-	while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
+	while ((buffer = read_stream_get_buffer(stream)) != InvalidBuffer)
 	{
 		bool		check_frozen = all_frozen;
 		bool		check_visible = all_visible;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..86f280069e0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -609,7 +609,7 @@ heap_fetch_next_buffer(HeapScanDesc scan, ScanDirection dir)
 
 	scan->rs_dir = dir;
 
-	scan->rs_cbuf = read_stream_next_buffer(scan->rs_read_stream, NULL);
+	scan->rs_cbuf = read_stream_get_buffer(scan->rs_read_stream);
 	if (BufferIsValid(scan->rs_cbuf))
 		scan->rs_cblock = BufferGetBlockNumber(scan->rs_cbuf);
 }
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e78682c3cef..7487896b06c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1010,7 +1010,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
 	 * re-acquire sharelock for each tuple, but since we aren't doing much
 	 * work per tuple, the extra lock traffic is probably better avoided.
 	 */
-	hscan->rs_cbuf = read_stream_next_buffer(stream, NULL);
+	hscan->rs_cbuf = read_stream_get_buffer(stream);
 	if (!BufferIsValid(hscan

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Thomas Munro
On Fri, Feb 28, 2025 at 2:29 PM Thomas Munro  wrote:
> On Fri, Feb 28, 2025 at 11:58 AM Melanie Plageman
>  wrote:
> > On Thu, Feb 27, 2025 at 1:08 PM Tom Lane  wrote:
> > > I wonder if it'd be a good idea to add something like
> > >
> > > Assert(stream->distance == 1);
> > > Assert(stream->pending_read_nblocks == 0);
> > > Assert(stream->per_buffer_data_size == 0);
> > > +   Assert(per_buffer_data == NULL);
> > >
> > > in read_stream_next_buffer.  I doubt that this will shut Coverity
> > > up, but it would help to catch caller coding errors, i.e. passing
> > > a per_buffer_data pointer when there's no per-buffer data.
> >
> > I think this is a good stopgap. I was discussing adding this assert
> > off-list with Thomas and he wanted to detail his more ambitious plans
> > for type safety improvements in the read stream API. Less on the order
> > of a redesign and more like a separate read_stream_next_buffer()s for
> > when there is per buffer data and when there isn't. And a by-value and
> > by-reference version for the one where there is data.
>
> Here's what I had in mind.  Is it better?

Here's a slightly better one.  I think when you use
read_stream_get_buffer_and_value(stream, &value), or
read_stream_put_value(stream, space, value), then we should assert
that sizeof(value) strictly matches the available space, as shown.  But,
new in v2, if you use read_stream_get_buffer_and_pointer(stream,
&pointer), then sizeof(*pointer) should only have to  be <= the
storage space, not ==, because someone might plausibly want to make
per_buffer_data_size variable at runtime (ie decide when they
construct the stream), and then be able to retrieve a pointer to the
start of a struct with a flexible array or something like that.  In v1
I was just trying to assert that it was a
pointer-to-a-pointer-to-something and no more (in a confusing
compile-time assertion), but v2 is simpler, and is happy with a
pointer to a pointer to something that doesn't exceed the space
(run-time assertion).
From b2dd9c90f970a889deea2c2e9e16097e4e06ece8 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 28 Feb 2025 10:48:29 +1300
Subject: [PATCH v2 1/2] Improve API for retrieving data from read streams.

Dealing with the per_buffer_data argument to read_stream_next_buffer()
has proven a bit clunky.  Provide some new wrapper functions/macros:

buffer = read_stream_get_buffer(rs);
buffer = read_stream_get_buffer_and_value(rs, &my_int);
buffer = read_stream_get_buffer_and_pointer(rs, &my_pointer_to_int);

These improve readability and type safety via assertions.
---
 contrib/pg_prewarm/pg_prewarm.c  |  4 +-
 contrib/pg_visibility/pg_visibility.c|  6 +--
 src/backend/access/heap/heapam.c |  2 +-
 src/backend/access/heap/heapam_handler.c |  2 +-
 src/backend/access/heap/vacuumlazy.c |  6 +--
 src/backend/storage/aio/read_stream.c| 12 ++
 src/backend/storage/buffer/bufmgr.c  |  4 +-
 src/include/storage/read_stream.h| 55 
 8 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index a2f0ac4af0c..f6ae266d7b0 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -208,11 +208,11 @@ pg_prewarm(PG_FUNCTION_ARGS)
 			Buffer		buf;
 
 			CHECK_FOR_INTERRUPTS();
-			buf = read_stream_next_buffer(stream, NULL);
+			buf = read_stream_get_buffer(stream);
 			ReleaseBuffer(buf);
 			++blocks_done;
 		}
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 7f268a18a74..e7187a46c9d 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -556,7 +556,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 			Buffer		buffer;
 			Page		page;
 
-			buffer = read_stream_next_buffer(stream, NULL);
+			buffer = read_stream_get_buffer(stream);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 			page = BufferGetPage(buffer);
@@ -569,7 +569,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 
 	if (include_pd)
 	{
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
@@ -752,7 +752,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 		0);
 
 	/* Loop over every block in the relation. */
-	while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
+	while ((buffer = read_stream_get_buffer(stream)) != InvalidBuffer)
 	{
 		bool		check_frozen = all_frozen;
 		bool		check_visible = all_visible;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..86f280069e0 100644
--- a/src/backend/access/h

Re: Log connection establishment timings

2025-02-27 Thread Bertrand Drouvot
Hi,

On Thu, Feb 27, 2025 at 05:55:19PM -0500, Melanie Plageman wrote:
> On Thu, Feb 27, 2025 at 11:30 AM Andres Freund  wrote:
> >
> > On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote:
> >
> > > However, since that is a bigger project (with more refactoring, etc),
> > > he suggested that we change log_connections to a GUC_LIST
> > > (ConfigureNamesString) with options like "none", "received,
> > > "authenticated", "authorized", "all".
> >
> > Yep.
> 
> I've done a draft of this in attached v7 (see 0001).

Thanks for the patch!

> It still needs polishing (e.g. I need to figure out where to put the new guc 
> hook
> functions and enums and such)

yeah, I wonder if it would make sense to create new dedicated 
"connection_logging"
file(s).

>, but I want to see if this is a viable direction forward.
> 
> I'm worried the list of possible connection log messages could get
> unwieldy if we add many more.

Thinking out loud, I'm not sure we'd add "many more" but maybe what we could do
is to introduce some predefined groups like:

'basic' (that would be equivalent to 'received, + timings introduced in 0002')
'security' (equivalent to 'authenticated,authorized')

Not saying we need to create this kind of predefined groups now, just an idea
in case we'd add many more: that could "help" the user experience, or are you
more concerned about the code maintenance?

Just did a quick test, (did not look closely at the code), and it looks like
that:

'all': does not report the "received" (but does report authenticated and 
authorized)
'received': does not work?
'authenticated': works
'authorized': works

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Extend postgres_fdw_get_connections to return remote backend pid

2025-02-27 Thread Sagar Shedge
> For now, I think it makes sense to keep postgres_fdw_get_connections()
> aligned with the current implementation. Otherwise, explaining what
> remote_backend_pid = NULL means could be confusing, especially since
> pipeline mode isn't supported yet in postgres_fdw.

Make sense. I have created a patch according to the above suggestions.
Please review.

-- 
Sagar Dilip Shedge,
SDE AWS


v2-0001-Extend-postgres_fdw_get_connections-to-return-rem.patch
Description: Binary data


Re: Adding support for SSLKEYLOGFILE in the frontend

2025-02-27 Thread Abhishek Chanda
Attached a v6 with O_NOFOLLOW removed, I noticed that it failed on
windows. Please let me know if I should do this in any other way that
is portable across platforms.

Thanks

On Thu, Feb 27, 2025 at 11:39 PM Abhishek Chanda
 wrote:
>
> Thanks for the review, Daniel. Please find v5 of this patch attached.
> I tried to bump up the minimum OpenBSD version in installation.sgml,
> do I need to add this anywhere else? Please let me know if this needs
> anything else.
>
> Thanks
>
> On Thu, Feb 20, 2025 at 4:25 PM Daniel Gustafsson  wrote:
> >
> > > On 20 Feb 2025, at 04:36, Abhishek Chanda  wrote:
> > > Please find attached a new version of this patch. I rebased on master,
> > > added better error reporting and skipped the permissions check on
> > > windows. Please let me know if this needs any changes. I tested this
> > > locally using meson running all TAP tests.
> >
> > Thanks for the new version, a few comments on this one from reading:
> >
> > +./src/test/ssl/key.txt
> > The toplevel .gitignore should not be used for transient test output, there 
> > is
> > a .gitignore in src/test/ssl for that.  The key.txt file should also be 
> > placed
> > in PostgreSQL::Test::Utils::tempdir and then cleaned up after the test.
> >
> >
> > +  > xreflabel="sslkeylogfile">
> > The documentation should state that the file will use the NSS format.
> >
> >
> > +   if (log_file == NULL) {
> > +   libpq_append_conn_error(conn, "could not open ssl key log 
> > ...
> > +   }
> > This raise an error for not being able to operate on the file, but it 
> > doesn't
> > error out from the function and instead keeps going with more operations on 
> > the
> > file which couldn't be opened.
> >
> >
> > +   if (chmod(conn->sslkeylogfile, 0600) == -1) {
> > Rather than opening and try to set permissions separately, why not use 
> > open(2)
> > and set the umask?  We probably also want to set O_NOFOLLOW.
> >
> >
> > +   if (conn->sslkeylogfile && strlen(conn->sslkeylogfile) > 0)
> > +   SSL_CTX_set_keylog_callback(SSL_context, SSL_CTX_keylog_cb);
> > This callback does exist in OpenSSL 1.1.1 but it's only available in 
> > LibreSSL
> > from OpenBSD 7.1 and onwards where we support 7.0 as the earliest version.
> > This feature seems like a good enough reason to bump the minimum LibreSSL
> > version, and 7.1 is well out support (7.5 goes EOL next), but it would need 
> > to
> > get done here.
> >
> >
> > +# Skip file mode check on Windows
> > +return 1 if $windows_os;
> > It should be up to each individual test to determine what to skip, not the
> > library code (and it should use SKIP:).  src/test/ssl/t/001_ssltests.pl is 
> > an
> > example which skips permissions checks on Windows.  Also, I'm not sure we 
> > need
> > a generic function in the testlibrary for something so basic.
> >
> >
> > +print STDERR sprintf("%s mode must be %04o\n",
> > +   $path, $expected_file_mode);
> > Test code should not print anything to stdout/stderr, it should use the TAP
> > compliant methods like diag() etc for this.
> >
> >
> > +   "connect with server root certand sslkeylogfile=key.txt");
> > s/certand/cert and/ ?
> >
> > --
> > Daniel Gustafsson
> >
>
>
> --
> Thanks and regards
> Abhishek Chanda



-- 
Thanks and regards
Abhishek Chanda


v6-0001-Add-support-for-dumping-SSL-keylog-to-a-file.patch
Description: Binary data


Re: Disabling vacuum truncate for autovacuum

2025-02-27 Thread Laurenz Albe
On Thu, 2025-02-27 at 21:35 -0600, Nathan Bossart wrote:
> I spent some time on this one today.  A couple of notes:
> 
> * Since the reloption is a Boolean, there isn't a good way to tell whether
>   it is actually set for the table or if it just inherited the default
>   value.  This is important to know because we want the reloption to
>   override the GUC.

I agree with that, without being able to think of a better solution.

> * I didn't see a good GUC category for vacuum_truncate.  I suppose we could
>   create a new one, but for now I've just stashed it into the autovacuum
>   one.  Bikeshedding welcome.

Why not use "Client Connection Defaults / Statement Behavior", just like for
"vacuum_freeze_min_age"?  I don't think that "autovacuum" is appropriate,
since it applies to manual VACUUM as well.

Yours,
Laurenz Albe




Re: Restrict copying of invalidated replication slots

2025-02-27 Thread Amit Kapila
On Fri, Feb 28, 2025 at 5:10 AM Masahiko Sawada  wrote:
>
> On Thu, Feb 27, 2025 at 12:52 AM Amit Kapila  wrote:
> >
> > On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila  
> > > wrote:
> > > >
> > > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with
> > > > this operation. So, isn't it possible that the source slot exists at
> > > > the later position in ReplicationSlotCtl->replication_slots and the
> > > > loop traversing slots is already ahead from the point where the newly
> > > > copied slot is created?
> > >
> > > Good point. I think it's possible.
> > >
> > > > If so, the newly created slot won't be marked
> > > > as invalid whereas the source slot will be marked as invalid. I agree
> > > > that even in such a case, at a later point, the newly created slot
> > > > will also be marked as invalid.
> > >
> > > The wal_status of the newly created slot would immediately become
> > > 'lost' and the next checkpoint will invalidate it. Do we need to do
> > > something to deal with this case?
> > >
> >
> > + /* Check if source slot became invalidated during the copy operation */
> > + if (second_slot_contents.data.invalidated != RS_INVAL_NONE)
> > + ereport(ERROR,
> > + errmsg("cannot copy replication slot \"%s\"",
> > +NameStr(*src_name)),
> > + errdetail("The source replication slot was invalidated during the
> > copy operation."));
> >
> > How about adding a similar test as above for MyReplicationSlot? That
> > should suffice the need because we have already acquired the new slot
> > by this time and invalidation should signal this process before
> > marking the new slot as invalid.
>
> IIUC in the scenario you mentioned, the loop traversing slots already
> passed the position of newly created slot in
> ReplicationSlotCtl->replication_slots array, so
> MyReplicationSlot->data.invalidated is still RS_INVAL_NONE, no?
>

Right, I don't have a simpler solution for this apart from making it
somehow serialize this operation with
InvalidateObsoleteReplicationSlots(). I don't think we need to go that
far to handle the scenario being discussed. It is better to add a
comment for this race and why it won't harm.

-- 
With Regards,
Amit Kapila.




Re: NOT ENFORCED constraint feature

2025-02-27 Thread Amul Sul
On Thu, Feb 27, 2025 at 4:48 PM Álvaro Herrera  wrote:
>
> On 2025-Feb-27, Amul Sul wrote:
>
> > Attached is the rebased patch set against the latest master head,
> > which also includes a *new* refactoring patch (0001). In this patch,
> > I’ve re-added ATExecAlterChildConstr(), which is required for the main
> > feature patch (0008) to handle recursion from different places while
> > altering enforceability.
>
> I think you refer to ATExecAlterConstrEnforceability, which claims
> (falsely) that it is a subroutine to ATExecAlterConstrRecurse; in
> reality it is called from ATExecAlterConstraintInternal or at least
> that's what I see in your 0008.  So I wonder if you haven't confused
> yourself here.  If nothing else, that comments needs fixed.  I didn't
> review these patches.
>

Yeah, that was intentional. I wanted to avoid recursion again by
hitting ATExecAlterChildConstr() at the end of
ATExecAlterConstraintInternal(). Also, I realized the value doesn’t
matter since recurse = false is explicitly set inside the
cmdcon->alterEnforceability condition. I wasn’t fully satisfied with
how we handled the recursion decision (code design), so I’ll give it
more thought. If I don’t find a better approach, I’ll add clearer
comments to explain the reasoning.


Regards,
Amul




Re: Statistics Import and Export

2025-02-27 Thread Corey Huinker
>
> +--- error: relation is wrong type
> +SELECT pg_catalog.pg_restore_relation_stats(
> +'relation', 0::oid,
> +'relpages', 17::integer,
> +'reltuples', 400.0::real,
> +'relallvisible', 4::integer);
>
> Why do you need to specify all the stats (relpages, reltuples, etc)?
> To exercise this you could just do:
> select pg_catalog.pg_restore_relation_stats('relation', 0::oid);
>

In the above case, it's historical inertia in that the pg_set_* call
required all those parameters, as well as a fear that the code - now or in
the future - might evaluate "can anything actually change from this call"
and short circuit out before actually trying to make sense of the reg_class
oid. But we can assuage that fear with just one of the three stat
parameters, and I'll adjust accordingly.


> Since I haven't been following along with this feature development, I
> don't think I can get comfortable enough with all of the changes in
> this test diff to commit them. I can't really say if this is the set
> of tests that is representative and sufficient for this feature.
>

That's fine, I hadn't anticipated that you'd review this patch, let alone
commit it.


> If you agree with me that the failure tests could be shorter, I'm
> happy to commit that, but I don't really feel comfortable assessing
> what the right set of full tests is.


The set of tests is as short as I feel comfortable with. I'll give the
parameter lists one more pass and repost.


Re: Log connection establishment timings

2025-02-27 Thread Bertrand Drouvot
Hi,

On Thu, Feb 27, 2025 at 11:14:56AM -0500, Andres Freund wrote:
> I don't think the timing overhead is a relevant factor here - compared to the
> fork of a new connection or performing authentication the cost of taking a few
> timestamps is neglegible. A timestamp costs 10s to 100s of cycles, a fork many
> many millions. Even if you have a really slow timestamp function, it's still
> going to be way way cheaper.

That's a very good point, it has to be put in perspective. The difference in
scale is so significant that the timing collection shouldn't be a concern.
Fair point!

Now I'm thinking what about "if" the connection was on a multi-threaded model?

I think we could reach the same conclusion as thread creation overhead is 
still substantial (allocating stack space, initializing thread state, and other
kernel-level operations) as compare to a really slow timestamp function.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Licence preamble update

2025-02-27 Thread Noah Misch
On Thu, Feb 27, 2025 at 04:56:05PM +, Dave Page wrote:
> Per some brief discussion on the core list, the attached patch updates the
> licence preamble to more accurately reflect the use of Postgres vs.
> PostgreSQL (see https://www.postgresql.org/about/policies/project-name/ for
> background from many years ago).

> --- a/COPYRIGHT
> +++ b/COPYRIGHT
> @@ -1,5 +1,5 @@
>  PostgreSQL Database Management System
> -(formerly known as Postgres, then as Postgres95)
> +(also known as Postgres, formerly as Postgres95)
>  
>  Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group

I'm not seeing this change as aligned with
https://www.postgresql.org/about/policies/project-name/, which says Postgres
"is an alias or nickname and is not the official name of the project."  The
official product name did change Postgres -> Postgres95 -> PostgreSQL, with
"Postgres" holding the status of a nickname since Postgres95 became the
official name.  Today's text matches that history, and the proposed text
doesn't.  Can you share more from the brief discussion?  Changing a license
file is an eyebrow-raising event, so we should do it only if the win is clear.
There may be an argument for making this change, but I'm missing it currently.




Re: Reduce TupleHashEntryData struct size by half

2025-02-27 Thread David Rowley
On Thu, 13 Feb 2025 at 14:01, Jeff Davis  wrote:
> Attached a new patchset that doesn't change the API for
> ExecCopySlotMinimalTuple(). Instead, I'm using
> ExecFetchSlotMinimalTuple(), which avoids the extra memcpy if the slot
> is TTSOpsMinimalTuple, which is what HashAgg uses. I separated out the
> change to use ExecFetchSlotMinimalTuple() into 0004 to illustrate the
> performance impact.

Some review comments:

* my_log2() takes a "long" parameter type but transitionSpace is a
"Size". These types aren't the same width on all platforms we support.
Maybe pg_nextpower2_size_t() is a better option.

* Should the following have MAXALIGN(tupleSize) on the right side of
the expression?

tupleChunkSize = tupleSize

I understand this was missing before, but both bump.c does consume at
least MAXALIGN() in BumpAlloc().

* while (16 * maxBlockSize > work_mem * 1024L)

The 1024L predates the change made in 041e8b95. "1024L" needs to be
replaced with "(Size) 1024"

Maybe you could just replace the while loop and the subsequent "if" check with:

/*
 * Like CreateWorkExprContext(), use smaller sizings for smaller work_mem,
 * to avoid large jumps in memory usage.
 */
maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);

/* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */
maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);

/* and no smaller than ALLOCSET_DEFAULT_INITSIZE */
maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);

I believe that gives the same result without the looping.

* In hash_create_memory(), can you get rid of the minContextSize and
initBlockSize variables?

* Is it worth an Assert() theck additionalsize > 0?

 * The amount of space available is the additionalsize requested in the call
 * to BuildTupleHashTable(). If additionalsize was specified as zero, no
 * additional space is available and this function should not be called.
 */
static inline void *
TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry)
{
return (char *) entry->firstTuple - hashtable->additionalsize;
}

Benchmarks:

I was also experimenting with the performance of this using the
following test case:

create table c (a int not null);
insert into c select a from generate_Series(1,1000) a;
vacuum freeze analyze c;

Query: select a,count(*) from c group by a;

Average TPS over 10x 10 second runs with -M prepared

master: 3653.9853988
v7-0001: 3741.815979
v7-0001+0002: 3737.4313064
v7-0001+0002+0003: 3834.6271706
v7-0001+0002+0004+0004: 3912.1158887

I also tried out changing hash_agg_check_limits() so that it no longer
calls MemoryContextMemAllocated and instead uses ->mem_allocated
directly and with all the other patches got:

v7-0001+0002+0004+0004+extra: 4049.0732381

We probably shouldn't do exactly that as it be better not to access
that internal field from outside the memory context code.  A static
inline function in memutils.h that handles the non-recursive callers
might be nice.

I've attached my results of running your test in graph form. I also
see a small regression for these small scale tests. I wondering if
it's worth mocking up some code to see what the performance would be
like without the additional memcpy() that's new to
LookupTupleHashEntry_internal().  How about hacking something up that
adds an additionalsize field to TupleDesc and then set that field to
your additional size and have heap_form_minimal_tuple() allocate that
much extra memory?

David


Remove extra Sort node above a btree-compatible Index Scan

2025-02-27 Thread Alexandra Wang
Hello hackers,

While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1],
I noticed some unexpected plan diffs when running the xtree and treeb
test modules. Specifically, an additional Sort node appears on top of
an Index Only Scan, even when the index key and sort key are the same.
For example:

--- src/test/modules/xtree/expected/interval.out
+++ src/test/modules/xtree/results/interval.out
@@ -375,10 +375,12 @@
 SET enable_seqscan TO false;
 EXPLAIN (COSTS OFF)
 SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
- QUERY PLAN
-
- Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
-(1 row)
+QUERY PLAN
+--
+ Sort
+   Sort Key: f1
+   ->  Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
+(3 rows)


I’ve attached a patch that removes this unnecessary Sort node for
B-tree-compatible indexes. This change should:
- Reduce the number of test failures in the xtree module from 43 to 30
- Reduce the size of regression.diffs from 234K to 123K

Since pathkey comparison is a hot path in query planning and exercised
by many test queries, I plan to gather more performance metrics.
However, in a simple test running make installcheck with and without
the patch, I observed no negative impact on the runtime of the
regression test suite (which doesn’t include other btree-like indexes)
and a positive impact on the same regression tests for xtree.

Regression tests (same plans):
w/o patch:
make installcheck  1.36s user 2.21s system 12% cpu 28.018 total
w/ patch:
make installcheck  1.32s user 2.12s system 12% cpu 28.089 total

xtree tests:
w/o patch (inferior plan w/ extra sort node):
make installcheck  1.52s user 2.42s system 10% cpu 36.817 total
w/ patch (good plan):
make installcheck  1.52s user 2.48s system 12% cpu 32.201 total

I’ve marked the patch as no-cfbot, as it applies only on top of the
aforementioned patch series [1].

Thoughts?

[1]
https://www.postgresql.org/message-id/a5dfb7cd-7a89-48ab-a913-e5304eee0854%40eisentraut.org

Best,
Alex


v1-0001-Remove-unnecessary-Sort-node-above-Index-Scan-of-.patch.no-cfbot
Description: Binary data


Re: Remove extra Sort node above a btree-compatible Index Scan

2025-02-27 Thread Peter Geoghegan
On Fri, Feb 28, 2025 at 12:23 AM Alexandra Wang
 wrote:
> While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1],
> I noticed some unexpected plan diffs when running the xtree and treeb
> test modules. Specifically, an additional Sort node appears on top of
> an Index Only Scan, even when the index key and sort key are the same.
> For example:
>
> --- src/test/modules/xtree/expected/interval.out
> +++ src/test/modules/xtree/results/interval.out

What's the xtree module? There's no such test module?

-- 
Peter Geoghegan




Re: Adding support for SSLKEYLOGFILE in the frontend

2025-02-27 Thread Abhishek Chanda
Thanks for the review, Daniel. Please find v5 of this patch attached.
I tried to bump up the minimum OpenBSD version in installation.sgml,
do I need to add this anywhere else? Please let me know if this needs
anything else.

Thanks

On Thu, Feb 20, 2025 at 4:25 PM Daniel Gustafsson  wrote:
>
> > On 20 Feb 2025, at 04:36, Abhishek Chanda  wrote:
> > Please find attached a new version of this patch. I rebased on master,
> > added better error reporting and skipped the permissions check on
> > windows. Please let me know if this needs any changes. I tested this
> > locally using meson running all TAP tests.
>
> Thanks for the new version, a few comments on this one from reading:
>
> +./src/test/ssl/key.txt
> The toplevel .gitignore should not be used for transient test output, there is
> a .gitignore in src/test/ssl for that.  The key.txt file should also be placed
> in PostgreSQL::Test::Utils::tempdir and then cleaned up after the test.
>
>
> +  xreflabel="sslkeylogfile">
> The documentation should state that the file will use the NSS format.
>
>
> +   if (log_file == NULL) {
> +   libpq_append_conn_error(conn, "could not open ssl key log ...
> +   }
> This raise an error for not being able to operate on the file, but it doesn't
> error out from the function and instead keeps going with more operations on 
> the
> file which couldn't be opened.
>
>
> +   if (chmod(conn->sslkeylogfile, 0600) == -1) {
> Rather than opening and try to set permissions separately, why not use open(2)
> and set the umask?  We probably also want to set O_NOFOLLOW.
>
>
> +   if (conn->sslkeylogfile && strlen(conn->sslkeylogfile) > 0)
> +   SSL_CTX_set_keylog_callback(SSL_context, SSL_CTX_keylog_cb);
> This callback does exist in OpenSSL 1.1.1 but it's only available in LibreSSL
> from OpenBSD 7.1 and onwards where we support 7.0 as the earliest version.
> This feature seems like a good enough reason to bump the minimum LibreSSL
> version, and 7.1 is well out support (7.5 goes EOL next), but it would need to
> get done here.
>
>
> +# Skip file mode check on Windows
> +return 1 if $windows_os;
> It should be up to each individual test to determine what to skip, not the
> library code (and it should use SKIP:).  src/test/ssl/t/001_ssltests.pl is an
> example which skips permissions checks on Windows.  Also, I'm not sure we need
> a generic function in the testlibrary for something so basic.
>
>
> +print STDERR sprintf("%s mode must be %04o\n",
> +   $path, $expected_file_mode);
> Test code should not print anything to stdout/stderr, it should use the TAP
> compliant methods like diag() etc for this.
>
>
> +   "connect with server root certand sslkeylogfile=key.txt");
> s/certand/cert and/ ?
>
> --
> Daniel Gustafsson
>


-- 
Thanks and regards
Abhishek Chanda


v5-0001-Add-support-for-dumping-SSL-keylog-to-a-file.patch
Description: Binary data


Re: Log connection establishment timings

2025-02-27 Thread Bertrand Drouvot
Hi,

On Thu, Feb 27, 2025 at 11:08:04AM -0500, Melanie Plageman wrote:
> I was just talking to Andres off-list and he mentioned that the volume
> of log_connections messages added in recent releases can really be a
> problem for users. He said ideally we would emit one message which
> consolidated these (and make sure we did so for failed connections too
> detailing the successfully completed stages).
> 
> However, since that is a bigger project (with more refactoring, etc),
> he suggested that we change log_connections to a GUC_LIST
> (ConfigureNamesString) with options like "none", "received,
> "authenticated", "authorized", "all".
> 
> Then we could add one like "established" for the final message and
> timings my patch set adds. I think the overhead of an additional log
> message being emitted probably outweighs the overhead of taking those
> additional timings.
> 
> String GUCs are a lot more work than enum GUCs, so I was thinking if
> there is a way to do it as an enum.
> 
> I think we want the user to be able to specify a list of all the log
> messages they want included, not just have each one include the
> previous ones. So, then it probably has to be a list right? There is
> no good design that would fit as an enum.

Interesting idea... Yeah, that would sound weird with an enum. I could think
about providing an enum per possible combination but I think that would
generate things like 2^N enum and won't be really user friendly (also that would
double each time we'd want to add a new possible "value" becoming quickly
unmanageable).

So yeah, I can't think of anything better than GUC_LIST.

> > I think that's somehow also around code maintenance (not only LOC), say for 
> > example
> > if we want to add more "child_type" in the check (no need to remember to 
> > update both
> > locations).
> 
> I didn't include checking the child_type in that function since it is
> unrelated to instr_time, so it sadly wouldn't help with that. We could
> macro-ize the child_type check were we to add another child_type.

Yup but my idea was to put all those line:

"
if (Log_connections &&
(child_type == B_BACKEND || child_type == B_WAL_SENDER))
{
instr_time  fork_time = ((BackendStartupData *) 
startup_data)->fork_time;

conn_timing.fork_duration = INSTR_TIME_GET_DURATION_SINCE(fork_time);
}
"

into a dedicated helper function.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [BUG]: the walsender does not update its IO statistics until it exits

2025-02-27 Thread Michael Paquier
On Wed, Feb 26, 2025 at 09:48:50AM +, Bertrand Drouvot wrote:
> Yeah I think that makes sense, done that way in the attached.
> 
> Speaking about physical walsender, I moved the test to 001_stream_rep.pl 
> instead
> (would also fail without the fix).

Hmm.  I was doing some more checks with this patch, and on closer look
I am wondering if the location you have chosen for the stats reports
is too aggressive: this requires a LWLock for the WAL sender backend
type taken in exclusive mode, with each step of WalSndLoop() taken
roughly each time a record or a batch of records is sent.  A single
installcheck with a primary/standby setup can lead to up to 50k stats
report calls.

With smaller records, the loop can become hotter, can't it?  Also,
there can be a high number of WAL senders on a single node, and I've
heard of some customers with complex logical decoding deployments with
dozens of logical WAL senders.  Isn't there a risk of having this code
path become a point of contention?  It seems to me that we should
benchmark this change more carefully, perhaps even reduce the
frequency of the report calls.
--
Michael


signature.asc
Description: PGP signature


RE: ReplicationSlotRelease() crashes when the instance is in the single user mode

2025-02-27 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> I understand that we may not have a clear use case for this to work in
> single-user mode. But how will we define the boundary in similar
> cases? I mean, we should have some rule for such exposed functions,
> and it should be followed uniformly. Now, if one needs a bigger or
> complex change to make the function work in single-user mode, then it
> is easy to take an exception from what is currently being followed in
> the code. However, if the change is as trivial as you proposed in the
> first email, why not go with that and make this function work in
> single-user mode?

Hmm, the opinion about this is completely opposite with other reviewers. I want
to ask them again how you feel. I also added Tom who pointed out in the initial
thread.

Question: how you feel to restrict SQL functions for slot during the single-user
mode? Nobody has considered use cases for it; we do not have concrete theory and
similar cases. And needed band-aid to work seems very small.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Re: ReplicationSlotRelease() crashes when the instance is in the single user mode

2025-02-27 Thread Amit Kapila
On Thu, Feb 27, 2025 at 1:29 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> >
> > Which other functions do we see similar restrictions? I checked
> > "sequence manipulation functions" (1), and "Transaction ID and
> > Snapshot Information Functions" (2) but couldn't see similar
> > restrictions.
> >
> > (1) - https://www.postgresql.org/docs/current/functions-sequence.html
> > (2) -
> > https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INF
> > O-SNAPSHOT
>
> I grepped sources and could not find explicit limitations neither. So...this 
> might
> be overkill.
>

Yes, this is my worry.

> But I still think the restriction is OK for the slot - no need to do
> some efforts for accepting single-user mode, just close the cover.
>

I understand that we may not have a clear use case for this to work in
single-user mode. But how will we define the boundary in similar
cases? I mean, we should have some rule for such exposed functions,
and it should be followed uniformly. Now, if one needs a bigger or
complex change to make the function work in single-user mode, then it
is easy to take an exception from what is currently being followed in
the code. However, if the change is as trivial as you proposed in the
first email, why not go with that and make this function work in
single-user mode?

-- 
With Regards,
Amit Kapila.




Re: Remove extra Sort node above a btree-compatible Index Scan

2025-02-27 Thread Tom Lane
Alexandra Wang  writes:
> I’ve attached a patch that removes this unnecessary Sort node for
> B-tree-compatible indexes.

This does not look right at all.  You can't just ignore the opfamily
etc. while deciding whether two pathkeys represent the same sort
ordering, as you did in create_mergejoin_plan().  I don't like
pathkeys_have_same_sortop() either.  The pathkey data structures
were designed to let pointer comparison be sufficient for deciding
whether pathkeys are equivalent: see the "canonical pathkey" stuff
in pathkeys.c.  I think that this patch may be band-aiding over some
breakage of that concept, but you've not provided enough context to
figure out what the underlying problem is.

regards, tom lane




RE: SIMD optimization for list_sort

2025-02-27 Thread R, Rakshit
Hi All,

Thank you so much for the feedback. 

> I don't think "another extension might use it someday" makes a very strong 
> case, 
> particularly for something that requires a new dependency.

The x86-simdsort library is an optional dependency in Postgres. Also the new 
list sort implementation which uses the x86-simdsort library does not impact 
any of the existing workflows in Postgres. We have at least one use case where 
Pgvector gets a gain of 7 - 10 % in HNSW index build time using the SIMD sort 
implementation. Hence we feel that this patch could be up streamed further. 
Open to further discussion on the same.


> Note that our current implemention is highly optimized for low-cardinality 
> inputs.
> This is needed for aggregate queries. I found this write-up of a 
> couple scalar and vectorized sorts, and they show this library doing 
> very poorly on very-low cardinality inputs. I would look into that 
> before trying to get something in shape to share.
> 
> https://github.com/Voultapher/sort-research-
> rs/blob/main/writeup/intel_avx512/text.md

We ran our extension to stress list sort with low cardinality inputs. For eg, 
for an array of size 100k having repeated values in the range 1-10 we still see 
a gain of around 20% in throughput.
We will collect more data for low cardinality inputs and with AVX2 too.

Thanks and regards
Rakshit Rajeev


Re: Get rid of WALBufMappingLock

2025-02-27 Thread Michael Paquier
On Wed, Feb 26, 2025 at 01:48:47PM +0200, Alexander Korotkov wrote:
> Thank you for offering the help.  Updated version of patch is attached
> (I've added one memory barrier there just in case).  I would
> appreciate if you could run it on batta, hachi or similar hardware.

Doing a revert of the revert done in 3fb58625d18f proves that
reproducing the error is not really difficult.  I've done a make
installcheck-world USE_MODULE_DB=1 -j N without assertions, and the
point that saw a failure quickly is one of the tests of pgbench:
PANIC:  could not find WAL buffer for 0/19D366

This one happened for the test "concurrent OID generation" and CREATE
TYPE.  Of course, as it is a race condition, it is random, but it's
taking me only a couple of minutes to see the original issue on my
buildfarm host.  With assertion failures enabled, same story, and same
failure from the pgbench TAP test.

Saying that, I have also done similar tests with your v12 for a couple
of hours and this looks stable under installcheck-world.  I can see
that you've reworked quite a bit the surroundings of InitializedFrom
in this one.  If you apply that once again at some point, the
buildfarm will be judge in the long-term, but I am rather confident by
saying that the situation looks better here, at least.

One thing I would consider doing if you want to gain confidence is
tests like the one I saw causing problems with pgbench, with DDL
patterns stressing specific paths like this CREATE TYPE case.
--
Michael


signature.asc
Description: PGP signature


Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2025-02-27 Thread Robert Haas
On Thu, Dec 19, 2024 at 8:29 AM Robert Haas  wrote:
> Cool. I don't want to commit this right before my vacation but I'll
> look into it in the new year if nobody beats me to it.

Nobody has beaten me to it, but I thought of one potential issue.
Suppose that somebody has a foreign table that exists today with one
of these properties set to the empty string. Such a foreign table is
not usable for queries, because the queries won't make it past scan.l
on the remote side, but it's allowed to exist. With this patch, it's
not allowed to exist any more. That would be fine if no such objects
exist in the wild, but if they do, people will get a pg_dump or
pg_upgrade failure when they try to upgrade to a release that includes
this change. Does that mean that we shouldn't commit this patch?

I'm inclined to think that it's probably still OK, because (1) few
users are likely to have such objects, (2) the workaround is easy,
just drop the object or fix it to do something useful, just as users
already need to do in plenty of other, similar cases and (3) unclear
error messages are not great and future users might benefit from this
error message being better. However, one could take the opposite
position and say that (C1) it is unlikely that anyone will try to do
this in the first place and (C2) the error message that is currently
generated when you try to do this isn't really all that unclear, and
therefore it's not worth creating even a minor dump-and-reload hazard;
and that position also seems pretty defensible to me.

Other opinions?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-02-27 Thread Robert Haas
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan  wrote:
> I need more feedback about this. I don't understand your perspective here.
>
> If I commit the skip scan patch, but don't have something like this
> instrumentation in place, it seems quite likely that users will
> complain about how opaque its behavior is. While having this
> instrumentation isn't quite a blocker to committing the skip scan
> patch, it's not too far off, either. I want to be pragmatic. Any
> approach that's deemed acceptable is fine by me, provided it
> implements approximately the same behavior as the patch that I wrote
> implements.
>
> Where is this state that tracks the number of index searches going to
> live, if not in IndexScanDesc? I get why you don't particularly care
> for that. But I don't understand what the alternative you favor looks
> like.

+1 for having some instrumentation. I do not agree with Tom that these
are numbers that only Peter Geoghegan and 2-3 other people will ever
understand. I grant that it's not going to make sense to everyone, but
the number of people to which it will make sense I would guess is
probably in the hundreds or the thousands rather than the single
digits. Good documentation could help.

So, where should we store that information?

The thing that's odd about using IndexScanDesc is that it doesn't
contain any other instrumentation currently, or at least not that I
can see. Everything else that EXPLAIN prints in terms of index scan is
printed by show_instrumentation_count() from planstate->instrument. So
it seems reasonable to think maybe this should be part of
planstate->instrument, too, but there seem to be at least two problems
with that idea. First, that struct has just four counters (ntuples,
ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of
them are in use -- a plain index scan does not use ntuples2 -- but I
think this optimization can apply to both index and index-only scans,
so we just don't have room. Second, those existing counters are used
for things that we can count in the executor, but the executor won't
know how many index searches occur down inside the AM. So I see the
following possibilities:

1. Add a new field to struct Instrumentation. Make
index_getnext_slot() and possibly other similar functions return a
count of index searches via a new parameter, and use that to bump the
new struct Instrumentation counter. It's a little ugly to have to add
a special-purpose parameter for this, but it doesn't look like there
are tons and tons of calls so maybe it's OK.

2. Add a new field to BufferUsage. Then the AM can bump this field and
explain.c will know about it the same way it knows about other changes
to pgBufferUsage. However, this is not really about buffer usage and
the new field seems utterly unlike the other things that are in that
structure, so this seems really bad to me.

3. Add a new field to IndexScanDesc, as you originally proposed. This
seems similar to #1: it's still shoveling instrumentation data around
in a way that we don't currently, but instead of shoveling it through
a new parameter, we shovel it through a new structure member. Either
way, the new thing (parameter or structure member) doesn't really look
like it belongs with what's already there, so it seems like
conservation of ugliness.

4. Add a new field to the btree-specific structure referenced by the
IndexScanDesc's opaque pointer. I think this is what Matthias was
proposing. It doesn't seem particularly hard to implement. and seems
similar to #1 and #3.

It is not clear to me that any of #1, #3, and #4 are radically better
than any of the other ones, with the following exception: it would be
a poor idea to choose #4 over #3 if this field will ultimately be used
for a bunch of different AMs, and a poor idea to choose #3 over #4 if
it's always going to be interesting only for btree. I'll defer to you
on which of those things is the case, but with the request that you
think about what is practically likely to happen and not advocate too
vigorously based on an argument that makes prominent use of the phrase
"in theory". To be honest, I don't really like any of these options
very much: they all seem a tad inelegant. But sometimes that is a
necessary evil when inventing something new. I believe if I were
implementing this myself I would probably try #1 first; if that ended
up seeming too ugly, then I would fall back to #3 or #4.

Does that help?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: Improve CRC32C performance on SSE4.2

2025-02-27 Thread Devulapalli, Raghuveer
Hi John, 

Going back to your earlier comment: 

> > > I'm not a fan of exposing these architecture-specific details to
> > > places that consult the capabilities. That requires things like
+#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)

> > Isn't that one thing currently pg_cpu_have(FEATURE)?
> 
> No, I mean the one thing would be "do we have hardware CRC?", and each
> architecture would set (and check if needed) the same slot.
> 
> I'm not 100% sure this is the better way, and there may be a disadvantage I
> haven't thought of, but this makes the most sense to me. I'm willing to hear
> reasons to do it differently, but I'm thinking those reasons need to be 
> weighed
> against the loss of abstraction.

IMO, CPU SIMD (SSE4.2, AVX, etc.) features are a module of their own separate 
from capabilities/features supported in postgres (CRC32C, bitcount, etc.). 
Exposing architecture-specific details to the features that need to check 
against them is a way to make code more modular and reusable. It is reasonable 
to expect developers writing SIMD specific functions to naturally understand 
the runtime architecture requirements for that function which is easily 
accessible by just checking the value of pg_cpu_have(PG_CPU_FEATURE_..). If 
another feature in postgres requires SSE4.2, then the CPU initialization module 
doesn't need to be modified at all. They just have to worry about their feature 
and its CPU requirements. 

> Just so we're on the same page, the current separate files do initialization, 
> not
> dispatch. I'm seeing conflicting design goals
> here:
> 
> 1. Have multiple similar dispatch functions with the only difference (for now)
> being certain names (not in the patch, just discussed).
> 2. Put initialization routines in the same file, even though they have 
> literally
> nothing in common.

Sorry for not being clear. I will move the initialization routines to separate 
files. Just haven’t gotten to it yet with v10. 

> 
> > This also makes it easier to add more implementations in the future without
> having to make the dispatch function work for both ARM and x86.
> 
> I don't quite understand, since with 0001 it already works (at least in 
> theory) for 3
> architectures with hardware CRC, plus those without, and I don't see it 
> changing
> with more architectures.

The reason its working across all architectures as of now is because there is 
only one runtime check for CRC32C feature across x86, ARM and LongArch. (PCLMUL 
dispatch happens inside the SSE42). If and when we add the AVX-512 
implementation, won't the dispatch logic will look different for x86 and ARM? 
Along with that, the header file pg_crc32c.h will also look a lot messier with 
a whole bunch of nested #ifdefs.  IMO, the header file should be devoid of any 
architecture dispatch logic and simply contain declarations of various 
implementations (see 
https://github.com/r-devulap/postgressql/blob/cf80f7375f24d2fb5cd29e95dc73d4988fc6d066/src/include/port/pg_crc32c.h
 for example). The dispatch logic should be handled separately in a C file. 

> 
> +/* Access to pg_cpucap for modules that need runtime CPUID information
> +*/ bool pg_cpu_have(int feature_id) {
> +return pg_cpucap[feature_id];
>  }
> 
> Putting this in a separate translation may have to do the decision to turn 
> v8's
> #defines into an enum? In any case, this means doing a function call to decide
> which function to call. That makes no sense.

The reason it is a separate translational unit is because I intended pg_cpucap 
to be a read only variable outside of pg_cpucap.c. If the function overhead is 
not preferred, then I can get rid of it.

> 
> The goal of v9/10 was to centralize the initialization from cpuid etc, but it 
> also did
> a major re-architecture of the runtime-checks at the same. That made review
> more difficult.

I assumed we wanted to move the runtime checks to a central place and that 
needed this re-architecture. 

> 
> +#if defined(__x86_64__) || defined ...
> +pg_cpucap_x86();
> +#elif defined(__aarch64__) || defined ...
> +pg_cpucap_arm();
> +#endif
> 
> I view this another conflict in design goals: After putting everything in the 
> same
> file, the routines still have different names and have to be guarded 
> accordingly. I
> don't really see the advantage, especially since these guard symbols don't 
> even
> match the ones that guard the actual initialization steps. I tried to imply 
> that in
> my last review, but maybe I should have been more explicit.
> 
> I think the least painful step is to take the x86 initialization from v10, 
> which is
> looking great, but
> - keep separate initialization files
> - don't whack around the runtime representation, at least not in the same 
> patch

Sure. I can fix that in v11. 

Raghuveer






Re: Update docs for UUID data type

2025-02-27 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 1:26 PM Andy Alsup  wrote:
>
> I've submitted it for the up-coming commitfest. The link is: 
> https://commitfest.postgresql.org/patch/5604/
> Thanks for all your help in reviewing these changes.

Thank you for the patch!

Regarding the 0001 patch, I think we can put uuidv4() and
get_random_uuid() in the same row since they are effectively identical
functions. For example, we have precedent such as char_length() and
character_length().

The 0002 patch looks good to me.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Small memory fixes for pg_createsubcriber

2025-02-27 Thread Michael Paquier
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> Yeah, I also think it would look good like this.

It's the least confusing option, indeed.  I've reduced a bit the diffs
and done that down to v16 for the pg_upgrade part where this exists.

Double-checking the tree, it does not seem that we have simolar holes
now..  I hope that I'm not wrong.
--
Michael


signature.asc
Description: PGP signature


Re: new commitfest transition guidance

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 11:14 AM Tom Lane  wrote:
> Robert Haas  writes:
> > But let's not make the mistake of saying "we're not going to move
> > things automatically because we want to find out if the authors are
> > still interested" and then getting really concerned when some stuff
> > doesn't get moved. That's missing the whole point.
>
> +1.  Having a significant fraction of patches drop off the radar
> was the desired outcome, wasn't it?

Well, not if the authors really are still actively caring about them.
But there's been a funny evolution of our thinking about CommitFests
over the years. When I first started managing CommitFests, I evicted
patches that the author didn't update -- following a review -- within
four days. Not four business days - four calendar days. After all, it
was a CommitFest -- it was supposed to be for patches that were ready
to be committed. That policy was, I think, viewed by many as too
draconian, and it probably was. But now we've gotten to a point where
the one month gap between CommitFest N and CommitFest N+1 is thought
to be so short that it might be unreasonable to expect a patch author
to move their patch forward sometime during that time. And I think
that's clearly going too far the other way.

Perhaps even way, way too far the other way. I think it's completely
fine if somebody has a patch that they update occasionally as they
have and it putters along for a few years and it finally either gets
committed or it doesn't. I one hundred percent support part-time
developers having the opportunity to participate as and when their
schedule permits and I think that is an important part of being a good
and welcoming open source community. But there are also people who are
working very hard and very actively to progress patches and staying on
top of the CF status and any new review emails every day, and that is
ALSO a great way to do development, and it's reasonable to treat those
cases differently. I'm not sure exactly how we can best do that, but
it makes my head hurt every time I find something in the CF where the
patch author was like "well, I'll get back to this at some point" and
then three CFs later it's still sitting there in "Needs Review" or
something.

Probably not moving things forward automatically is only one part of
that problem -- we could very much use some better tooling for judging
how active certain patches are and, if not so active, whether that's
more a lack of reviewers or more author inactivity. But we're not
doing ourselves any favors by working super-hard to keep everything
that anybody might potentially care about in the same bucket as things
that are actually active.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Ranier Vilela
Hi.

Em ter., 18 de fev. de 2025 às 11:31, Melanie Plageman <
melanieplage...@gmail.com> escreveu:

> On Sun, Feb 16, 2025 at 1:12 PM Tom Lane  wrote:
> >
> > Thomas Munro  writes:
> > > Thanks!  It's green again.
> >
> > The security team's Coverity instance complained about this patch:
> >
> > *** CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> >
> /srv/coverity/git/pgsql-git/postgresql/src/backend/access/heap/vacuumlazy.c:
> 1295 in lazy_scan_heap()
> > 1289buf = read_stream_next_buffer(stream,
> &per_buffer_data);
> > 1290
> > 1291/* The relation is exhausted. */
> > 1292if (!BufferIsValid(buf))
> > 1293break;
> > 1294
> > >>> CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> > >>> Dereferencing null pointer "per_buffer_data".
> > 1295blk_info = *((uint8 *) per_buffer_data);
> > 1296CheckBufferIsPinnedOnce(buf);
> > 1297page = BufferGetPage(buf);
> > 1298blkno = BufferGetBlockNumber(buf);
> > 1299
> > 1300vacrel->scanned_pages++;
> >
> > Basically, Coverity doesn't understand that a successful call to
> > read_stream_next_buffer must set per_buffer_data here.  I don't
> > think there's much chance of teaching it that, so we'll just
> > have to dismiss this item as "intentional, not a bug".
>
> Is this easy to do? Like is there a list of things from coverity to ignore?
>
> > I do have a suggestion: I think the "per_buffer_data" variable
> > should be declared inside the "while (true)" loop not outside.
> > That way there is no chance of a value being carried across
> > iterations, so that if for some reason read_stream_next_buffer
> > failed to do what we expect and did not set per_buffer_data,
> > we'd be certain to get a null-pointer core dump rather than
> > accessing data from a previous buffer.
>
> Done and pushed. Thanks!
>
Per Coverity.

CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
8. var_deref_op: Dereferencing null pointer per_buffer_data.

I think that function *read_stream_next_buffer* can return
a invalid per_buffer_data pointer, with a valid buffer.

Sorry if I'm wrong, but the function is very suspicious.

Attached a patch, which tries to fix.

best regards,
Ranier Vilela


fix-possible-invalid-pointer-read_stream.patch
Description: Binary data


Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Andres Freund
Hi,

On 2025-02-27 14:32:28 -0300, Ranier Vilela wrote:
> Hi.
> 
> Em ter., 18 de fev. de 2025 às 11:31, Melanie Plageman <
> melanieplage...@gmail.com> escreveu:
> 
> > On Sun, Feb 16, 2025 at 1:12 PM Tom Lane  wrote:
> > >
> > > Thomas Munro  writes:
> > > > Thanks!  It's green again.
> > >
> > > The security team's Coverity instance complained about this patch:
> > >
> > > *** CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> > >
> > /srv/coverity/git/pgsql-git/postgresql/src/backend/access/heap/vacuumlazy.c:
> > 1295 in lazy_scan_heap()
> > > 1289buf = read_stream_next_buffer(stream,
> > &per_buffer_data);
> > > 1290
> > > 1291/* The relation is exhausted. */
> > > 1292if (!BufferIsValid(buf))
> > > 1293break;
> > > 1294
> > > >>> CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> > > >>> Dereferencing null pointer "per_buffer_data".
> > > 1295blk_info = *((uint8 *) per_buffer_data);
> > > 1296CheckBufferIsPinnedOnce(buf);
> > > 1297page = BufferGetPage(buf);
> > > 1298blkno = BufferGetBlockNumber(buf);
> > > 1299
> > > 1300vacrel->scanned_pages++;
> > >
> > > Basically, Coverity doesn't understand that a successful call to
> > > read_stream_next_buffer must set per_buffer_data here.  I don't
> > > think there's much chance of teaching it that, so we'll just
> > > have to dismiss this item as "intentional, not a bug".
> >
> > Is this easy to do? Like is there a list of things from coverity to ignore?
> >
> > > I do have a suggestion: I think the "per_buffer_data" variable
> > > should be declared inside the "while (true)" loop not outside.
> > > That way there is no chance of a value being carried across
> > > iterations, so that if for some reason read_stream_next_buffer
> > > failed to do what we expect and did not set per_buffer_data,
> > > we'd be certain to get a null-pointer core dump rather than
> > > accessing data from a previous buffer.
> >
> > Done and pushed. Thanks!
> >
> Per Coverity.
> 
> CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
> 8. var_deref_op: Dereferencing null pointer per_buffer_data.

That's exactly what the messages you quoted are discussing, no?


> Sorry if I'm wrong, but the function is very suspicious.

How so?



> diff --git a/src/backend/storage/aio/read_stream.c 
> b/src/backend/storage/aio/read_stream.c
> index 04bdb5e6d4..18e9b4f3c4 100644
> --- a/src/backend/storage/aio/read_stream.c
> +++ b/src/backend/storage/aio/read_stream.c
> @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void 
> **per_buffer_data)
>   
> READ_BUFFERS_ISSUE_ADVICE : 0)))
>   {
>   /* Fast return. */
> + if (per_buffer_data)
> + *per_buffer_data = 
> get_per_buffer_data(stream, oldest_buffer_index);
>   return buffer;
>   }

A few lines above:
Assert(stream->per_buffer_data_size == 0);

The fast path isn't used when per buffer data is used.  Adding a check for
per_buffer_data and assigning something to it is nonsensical.

Greetings,

Andres Freund




Re: Log connection establishment timings

2025-02-27 Thread Andres Freund
Hi,

On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote:
> I was just talking to Andres off-list and he mentioned that the volume
> of log_connections messages added in recent releases can really be a
> problem for users.

For some added color: I've seen plenty systems where almost all the log volume
is log_connection messages, which they *have* to enable for various regulatory
reasons.  It'd still be a lot if we just emitted one message for each
connection, but logging three (and possibly four with $thread), for each
connection makes it substantially worse.


> He said ideally we would emit one message which consolidated these (and make
> sure we did so for failed connections too detailing the successfully
> completed stages).

A combined message would also not *quite* replace all use-cases, e.g. if you
want to debug arriving connections or auth problems you do want the additional
messages.  But yea, for normal operation, I do think most folks want just one
message.


> However, since that is a bigger project (with more refactoring, etc),
> he suggested that we change log_connections to a GUC_LIST
> (ConfigureNamesString) with options like "none", "received,
> "authenticated", "authorized", "all".

Yep.


> Then we could add one like "established" for the final message and
> timings my patch set adds. I think the overhead of an additional log
> message being emitted probably outweighs the overhead of taking those
> additional timings.

To bikeshed a bit: "established" could be the TCP connection establishment
just as well. I'd go for "completed" or "timings".


> String GUCs are a lot more work than enum GUCs, so I was thinking if
> there is a way to do it as an enum.
> 
> I think we want the user to be able to specify a list of all the log
> messages they want included, not just have each one include the
> previous ones. So, then it probably has to be a list right? There is
> no good design that would fit as an enum.

I don't see a way to comfortably shove this into an enum either.

Greetings,

Andres Freund




Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Andres Freund
Hi,

On 2025-02-27 12:44:24 -0500, Andres Freund wrote:
> > CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
> > 8. var_deref_op: Dereferencing null pointer per_buffer_data.
> 
> That's exactly what the messages you quoted are discussing, no?

Ah, no, it isn't. But I still think the coverity alert and the patch don't
make sense, as per the below:

> > diff --git a/src/backend/storage/aio/read_stream.c 
> > b/src/backend/storage/aio/read_stream.c
> > index 04bdb5e6d4..18e9b4f3c4 100644
> > --- a/src/backend/storage/aio/read_stream.c
> > +++ b/src/backend/storage/aio/read_stream.c
> > @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void 
> > **per_buffer_data)
> > 
> > READ_BUFFERS_ISSUE_ADVICE : 0)))
> > {
> > /* Fast return. */
> > +   if (per_buffer_data)
> > +   *per_buffer_data = 
> > get_per_buffer_data(stream, oldest_buffer_index);
> > return buffer;
> > }
> 
> A few lines above:
>   Assert(stream->per_buffer_data_size == 0);
> 
> The fast path isn't used when per buffer data is used.  Adding a check for
> per_buffer_data and assigning something to it is nonsensical.

Greetings,

Andres Freund




Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2025-02-27 Thread Peter Geoghegan
On Sat, Feb 8, 2025 at 8:47 AM Michail Nikolaev
 wrote:
> Just some commit messages + few cleanups.

I'm worried about this:

+These longer pin lifetimes can cause buffer exhaustion with messages like "no
+unpinned buffers available" when the index has many pages that have similar
+ordering; but future work can figure out how to best work that out.

I think that we should have some kind of upper bound on the number of
pins that can be acquired at any one time, in order to completely
avoid these problems. Solving that problem will probably require GiST
expertise that I don't have right now.

-- 
Peter Geoghegan




Re: Statistics Import and Export

2025-02-27 Thread Greg Sabino Mullane
I know I'm coming late to this, but I would like us to rethink having
statistics dumped by default. I was caught by this today, as I was doing
two dumps in a row, but the output changed between runs solely because the
stats got updated. It got me thinking about all the use cases of pg_dump
I've seen over the years. I think this has the potential to cause a lot of
problems for things like automated scripts. It certainly violates the
principle of least astonishment to have dumps change when no user
interaction has happened.

Alternatively, we could put stats into SECTION_POST_DATA,


No, that would make the above-mentioned problem much worse.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: Disabling vacuum truncate for autovacuum

2025-02-27 Thread Gurjeet Singh
On Mon, Jan 27, 2025 at 1:55 AM Laurenz Albe  wrote:
>
> On Thu, 2025-01-23 at 22:33 -0800, Gurjeet Singh wrote:
> > > > I am also wondering if having an autovacuum setting to control it would 
> > > > be
> > > > a good idea for a feature.
> > >
> > > I'm all for that.
> >
> > Please see attached an initial patch to disable truncation behaviour in
> > autovacuum. This patch retains the default behavior of autovacuum truncating
> > relations. The user is allowed to change the behaviour and disable relation
> > truncations system-wide by setting autovacuum_disable_vacuum_truncate = 
> > true.
> > Better parameter names welcome :-)
>
> I hope it is possible to override the global setting with the 
> "vacuum_truncate"
> option on an individual table.

Current patch behaviour is that if the autovacuum_vacuum_truncate is false, then
autovacuum will _not_ truncate any relations. If the parameter's value is true
(the default), then the relation's reloption will be honored.

A table-owner, or the database-owner, may enable truncation of a table, as they
may be trying to be nice and return the unused disk space back to the
OS/filesystem. But if the sysadmin/DBA (who is ultimately responsible for the
health of the entire db instance, as well as of any replicas of the db
instance),
wants to disable truncation across all databases to ensure that the replication
does not get stuck, then IMHO Postgres should empower the sysadmin to make
that decision, and override the relation-level setting enabled by the table-
owner or the database-owner.

> > One additional improvement I can think of is to emit a WARNING or NOTICE 
> > message
> > that truncate operation is being skipped, perhaps only if the truncation
> > would've freed up space over a certain threshold.
>
> Interesting idea, but I think it is independent from this patch.

Agreed. I'll consider writing a separate patch for this.

> > Perhaps there's value in letting this parameter be specified at database 
> > level,
> > but I'm not able to think of a reason why someone would want to disable this
> > behaviour on just one database. So leaving the parameter context to be the 
> > same
> > as most other autovacuum parameters: SIGHUP.
>
> I can imagine setting that on only a certain database. Different databases
> typically have different applications, which have different needs.

Makes sense. I don't think anything special needs to be done in the patch to
address this.

> Eventually, the patch should have documentation and regression tests.

Documentation added. Pointers on if, where, and what kind of tests to add will
be appreciated.

On Mon, Jan 27, 2025 at 12:38 PM Robert Haas  wrote:
>
> On Mon, Jan 27, 2025 at 4:55 AM Laurenz Albe  wrote:
> > My suggestion for the parameter name is "autovacuum_disable_truncate".
>
> Then it would have a different name than the reloption, and the
> opposite sense. In most cases, we've been able to keep those matching
> -- autovacuum vs. autovacuum_enabled being, I believe, the only
> current mismatch.

If we want to maintain the convention of autovacuum parameters names to be of
the format "autovacuum_" then I believe the name
autovacuum_vacuum_truncate (boolean) would be better, as compared to my original
proposal (autovacuum_disable_vacuum_truncate), or Laurenz's proposal above. The
default value should be true, to match the current autovacuum behaviour.

> Also, how sure are we that turning this off globally is a solid idea?
> Off-hand, it doesn't sound that bad: there are probably situations in
> which autovacuum never truncates anything anyway just because the tail
> blocks of the relations always contain at least 1 tuple. But we should
> be careful not to add a setting that is far more likely to get people
> into trouble than to get them out of it. It would be good to hear what
> other people think about the risk vs. reward tradeoff in this case.

Taking silence from others to be a sign of no opposition, I'm moving forward
with the patch.

On Tue, Feb 18, 2025 at 11:56 AM Nathan Bossart
 wrote:
>
> On Mon, Jan 27, 2025 at 03:38:39 PM -0500, Robert Haas wrote:
> > Also, how sure are we that turning this off globally is a solid idea?

> In any case, it's
> already possible to achieve $SUBJECT with a trivial script that sets
> storage parameters on all tables, so IMHO it would be silly to withhold a
> global setting that does the same thing just on principle.

+1

For documentation of this GUC, I borrowed heavily from the relevant sections of
CREATE TABLE and VACUUM docs.

There are 3 ways I wrote one of the sentences in the docs. I picked the last
one, as it is concise and clearer than the others. If others feel a different
choice of words would be better, I'm all ears.

 If false, autovacuum will not perform the
 truncation, even if the vacuum_truncate option has
 been set to true for the table being processed.

 If false, autovacuum will not perform the
 truncation, and it ignores the v

Re: Statistics Import and Export

2025-02-27 Thread Corey Huinker
On Thu, Feb 27, 2025 at 10:01 PM Corey Huinker 
wrote:

> +--- error: relation is wrong type
>> +SELECT pg_catalog.pg_restore_relation_stats(
>> +'relation', 0::oid,
>> +'relpages', 17::integer,
>> +'reltuples', 400.0::real,
>> +'relallvisible', 4::integer);
>>
>> Why do you need to specify all the stats (relpages, reltuples, etc)?
>> To exercise this you could just do:
>> select pg_catalog.pg_restore_relation_stats('relation', 0::oid);
>>
>
> In the above case, it's historical inertia in that the pg_set_* call
> required all those parameters, as well as a fear that the code - now or in
> the future - might evaluate "can anything actually change from this call"
> and short circuit out before actually trying to make sense of the reg_class
> oid. But we can assuage that fear with just one of the three stat
> parameters, and I'll adjust accordingly.
>

* reduced relstats parameters specified to the minimum needed to verify the
error and avoid a theoretical future logic short-circuit described above.
* version parameter usage reduced to absolute minimum - verifying that it
is accepted and ignored, though Melanie's patch may introduce a need to
bring it back in a place or two.

84 lines deleted. Not great, not terrible.

I suppose if we really trusted the TAP test databases to have "one of
everything" in terms of tables with all the datatypes, and sufficient rows
to generate interesting stats, plus some indexes of each, then we could get
rid of those two, but I feel very strongly that it would be a minor savings
at a major cost to clarity.
From d0dfca62eb8ce27fb4bfce77bd2ee835738899a8 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Wed, 26 Feb 2025 21:02:44 -0500
Subject: [PATCH v2] Organize and deduplicate statistics import tests.

Many changes, refactorings, and rebasings have taken their toll on the
statistics import tests. Now that things appear more stable and the
pg_set_* functions are gone in favor of using pg_restore_* in all cases,
it's safe to remove duplicates, combine tests where possible, and make
the test descriptions a bit more descriptive and uniform.

Additionally, parameters that were not strictly needed to demonstrate
the purpose(s) of a test were removed to reduce clutter.
---
 src/test/regress/expected/stats_import.out | 680 -
 src/test/regress/sql/stats_import.sql  | 554 +++--
 2 files changed, 466 insertions(+), 768 deletions(-)

diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out
index 1f150f7b08d..7bd7bfb3e7b 100644
--- a/src/test/regress/expected/stats_import.out
+++ b/src/test/regress/expected/stats_import.out
@@ -13,19 +13,48 @@ CREATE TABLE stats_import.test(
 tags text[]
 ) WITH (autovacuum_enabled = false);
 CREATE INDEX test_i ON stats_import.test(id);
+--
+-- relstats tests
+--
+--- error: relation is wrong type
+SELECT pg_catalog.pg_restore_relation_stats(
+'relation', 0::oid,
+'relpages', 17::integer);
+WARNING:  argument "relation" has type "oid", expected type "regclass"
+ERROR:  "relation" cannot be NULL
+-- error: relation not found
+SELECT pg_catalog.pg_restore_relation_stats(
+'relation', 0::oid::regclass,
+'relpages', 17::integer);
+ERROR:  could not open relation with OID 0
+-- error: odd number of variadic arguments cannot be pairs
+SELECT pg_restore_relation_stats(
+'relation', 'stats_import.test'::regclass,
+'relallvisible');
+ERROR:  variadic arguments must be name/value pairs
+HINT:  Provide an even number of variadic arguments that can be divided into pairs.
+-- error: argument name is NULL
+SELECT pg_restore_relation_stats(
+'relation', 'stats_import.test'::regclass,
+NULL, '17'::integer);
+ERROR:  name at variadic position 3 is NULL
+-- error: argument name is not a text type
+SELECT pg_restore_relation_stats(
+'relation', '0'::oid::regclass,
+17, '17'::integer);
+ERROR:  name at variadic position 3 has type "integer", expected type "text"
 -- starting stats
 SELECT relpages, reltuples, relallvisible
 FROM pg_class
-WHERE oid = 'stats_import.test'::regclass;
+WHERE oid = 'stats_import.test_i'::regclass;
  relpages | reltuples | relallvisible 
 --+---+---
-0 |-1 | 0
+1 | 0 | 0
 (1 row)
 
-BEGIN;
 -- regular indexes have special case locking rules
-SELECT
-pg_catalog.pg_restore_relation_stats(
+BEGIN;
+SELECT pg_catalog.pg_restore_relation_stats(
 'relation', 'stats_import.test_i'::regclass,
 'relpages', 18::integer);
  pg_restore_relation_stats 
@@ -50,32 +79,6 @@ WHERE relation = 'stats_import.test_i'::regclass AND
 (1 row)
 
 COMMIT;
-SELECT
-pg_catalog.pg_restore_relation_stats(
-'relation', 'stats_import.test_i'::regclass,
-'relpages', 19::integer );
- pg_restore_relation_stats 

- t
-(1 row)
-
--- clear
-SE

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-27 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 27 Feb 2025 15:24:26 -0800,
  Masahiko Sawada  wrote:

> Pushed the 0001 patch.

Thanks!

> Regarding the 0002 patch, I realized we stopped exposing
> NextCopyFromRawFields() function:
> 
>  --- a/src/include/commands/copy.h
> +++ b/src/include/commands/copy.h
> @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
> *pstate, Relation rel, Node *where
>  extern void EndCopyFrom(CopyFromState cstate);
>  extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
>  Datum *values, bool *nulls);
> -extern bool NextCopyFromRawFields(CopyFromState cstate,
> - char ***fields, int *nfields);
> 
> I think that this change is not relevant with the refactoring and
> probably we should keep it exposed as extension might be using it.
> Considering that we added pg_attribute_always_inline to the function,
> does it work even if we omit pg_attribute_always_inline to its
> function declaration in the copy.h file?

Unfortunately, no. The inline + constant argument
optimization requires "static".

How about the following?

static pg_attribute_always_inline bool
NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int 
*nfields, bool is_csv)
{
...
}

bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields, 
cstate->opts.csv_mode);
}


Thanks,
-- 
kou




Re: Improve documentation regarding custom settings, placeholders, and the administrative functions

2025-02-27 Thread Zhang Mingli
On Feb 28, 2025 at 02:27 +0800, David G. Johnston , 
wrote:
> On Thu, Feb 6, 2025 at 8:04 AM Zhang Mingli  wrote:
> > On Feb 6, 2025 at 10:49 +0800, David G. Johnston 
> > , wrote:
> > >
> > > Yes, and when it does it is doing so in lieu of an error, and thus it is 
> > > not returning the value of the setting.  It does not mean the value taken 
> > > on by the named parameter is NULL, which is not possible. i.e., SHOW will 
> > > never produce NULL.
> > Hi,
> >
> > OK, LGTM.
> >
>
> Thank you for the review.  If you would like to add yourself as the reviewer 
> and mark this ready to commit here is the commitfest entry.
> https://commitfest.postgresql.org/patch/5548/
> David J.
Hi, Done.


--
Zhang Mingli
HashData


Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-02-27 Thread Peter Geoghegan
On Thu, Feb 27, 2025 at 3:42 PM Robert Haas  wrote:
> +1 for having some instrumentation. I do not agree with Tom that these
> are numbers that only Peter Geoghegan and 2-3 other people will ever
> understand. I grant that it's not going to make sense to everyone, but
> the number of people to which it will make sense I would guess is
> probably in the hundreds or the thousands rather than the single
> digits.

I agree that it's likely to be of interest to only a minority of
users. But a fairly large minority.

> Good documentation could help.

It's easy to produce an example that makes intuitive sense. For
example, with skip scan that has a qual such as "WHERE a BETWEEN 1 and
5 AND b = 12345", it is likely that EXPLAIN ANALYZE will show "Index
Searches: 5" -- one search per "a" value. Such an example might be
more useful than my original pgbench_accounts example.

Do you think that that would help?

> So, where should we store that information?
>
> The thing that's odd about using IndexScanDesc is that it doesn't
> contain any other instrumentation currently, or at least not that I
> can see.

It is unique right now, but perhaps only because this is the first
piece of instrumentation that:

A. We must track at the level of an individual index scan -- not at
the level of an index relation, like pgstat_count_index_scan(), nor at
the level of the whole system, like BufferUsage state.

AND

B. Requires that we count something that fundamentally lives inside
the index AM -- something that we cannot reasonably track/infer from
the executor proper (more on that below, in my response to your scheme
#1).

> First, that struct has just four counters (ntuples,
> ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of
> them are in use -- a plain index scan does not use ntuples2 -- but I
> think this optimization can apply to both index and index-only scans,
> so we just don't have room.

Right, index-only scans have exactly the same requirements as plain
index scans/bitmap index scans.

> 1. Add a new field to struct Instrumentation. Make
> index_getnext_slot() and possibly other similar functions return a
> count of index searches via a new parameter, and use that to bump the
> new struct Instrumentation counter. It's a little ugly to have to add
> a special-purpose parameter for this, but it doesn't look like there
> are tons and tons of calls so maybe it's OK.

That seems like the strongest possible alternative to the original
scheme used in the current draft patch (scheme #3).

This scheme #1 has the same issue as scheme #3, though: it still
requires an integer counter that tracks the number of index searches
(something a bit simpler than that, such as a boolean flag set once
per amgettuple call, won't do). This is due to there being no fixed
limit on the number of index searches required during any single
amgettuple call: in general the index AM may perform quite a few index
searches before it is able to return the first tuple to the scan (or
before it can return the next tuple).

The only difference that I can see between scheme #1 and scheme #3 is
that the former requires 2 counters instead of just 1. And, we'd still
need to have 1 out of the 2 counters located either in IndexScanDesc
itself (just like scheme #3), or located in some other struct that can
at least be accessed through IndexScanDesc (like the index AM opaque
state, per scheme #4). After all, *every* piece of state known to any
amgettuple routine must ultimately come from IndexScanDesc (or from
backend global state, per scheme #2).

(Actually, I supposed it is technically possible to avoid storing
anything in IndexScanDesc by inventing another amgettuple argument,
just for this. That seems like a distinction without a difference,
though.)

> 2. Add a new field to BufferUsage. Then the AM can bump this field and
> explain.c will know about it the same way it knows about other changes
> to pgBufferUsage. However, this is not really about buffer usage and
> the new field seems utterly unlike the other things that are in that
> structure, so this seems really bad to me.

I agree. The use of global variables seems quite inappropriate for
something like this. It'll result in wrong output whenever an index
scan uses an InitPlan in its qual when that InitPlan is itself a plan
that contains an index scan (this is already an issue with "Buffers:"
instrumentation, but this would be much worse).

> 3. Add a new field to IndexScanDesc, as you originally proposed. This
> seems similar to #1: it's still shoveling instrumentation data around
> in a way that we don't currently, but instead of shoveling it through
> a new parameter, we shovel it through a new structure member. Either
> way, the new thing (parameter or structure member) doesn't really look
> like it belongs with what's already there, so it seems like
> conservation of ugliness.

Perhaps a comment noting why the new counter lives in IndexScanDesc would help?

> 4. Add a new field to th

Re: Update docs for UUID data type

2025-02-27 Thread Andy Alsup
Masahiko,

I have combined the gen_random_uuid() and uuidv4() into a single row, as
you suggested. Please find the v5 patch, which has been squashed into a
single commit.

Best regards,
Andy Alsup

On Thu, Feb 27, 2025 at 5:02 PM Masahiko Sawada 
wrote:

> On Thu, Feb 27, 2025 at 1:26 PM Andy Alsup  wrote:
> >
> > I've submitted it for the up-coming commitfest. The link is:
> https://commitfest.postgresql.org/patch/5604/
> > Thanks for all your help in reviewing these changes.
>
> Thank you for the patch!
>
> Regarding the 0001 patch, I think we can put uuidv4() and
> get_random_uuid() in the same row since they are effectively identical
> functions. For example, we have precedent such as char_length() and
> character_length().
>
> The 0002 patch looks good to me.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>


v5-0001-Docs-for-UUID-funcs-formatted-in-table-and-UUID-d.patch
Description: Binary data


Re: Amcheck verification of GiST and GIN

2025-02-27 Thread Kirill Reshke
On Sat, 22 Feb 2025 at 03:51, Mark Dilger  wrote:
>
>
>
> > On Feb 21, 2025, at 12:50 PM, Mark Dilger  
> > wrote:
> >
> > Could one of the patch authors take a look?
>
> I turned the TAP test which triggers the error into a regression test that 
> does likewise, for ease of stepping through the test, if anybody should want 
> to do that.  I'm attaching that patch here, but please note that I'm not 
> expecting this to be committed.

Hi!
Your efforts are much appreciated!
I used this patch to derive a smaller repro.

> this seems to either be a bug in the checking code complaining about 
> perfectly valid tuple order,

I'm doubtful this is the case. I have added some more logging to
gin_index_check, and here is output after running attached:
```
DEBUG:  processing entry tree page at blk 2, maxoff: 125

DEBUG:  comparing for offset 78 category 0
DEBUG:  comparing for offset 79 category 2
DEBUG:  comparing for offset 80 category 3
DEBUG:  comparing for offset 81 category 0
LOG:  index "ginidx" has wrong tuple order on entry tree page, block
2, offset 81, rightlink 4294967295
DEBUG:  comparing for offset 82 category 0

DEBUG:  comparing for offset 100 category 0
DEBUG:  comparing for offset 101 category 2
DEBUG:  comparing for offset 102 category 3
DEBUG:  comparing for offset 103 category 0
LOG:  index "ginidx" has wrong tuple order on entry tree page, block
2, offset 103, rightlink 4294967295
DEBUG:  comparing for offset 104 category 0
DEBUG:  comparing for offset 105 category 0
```
So, we have an entry tree page, where there is tuple on offset 80,
with gin tuple category = 3, and then it goes category 0 again. And
one more such pattern on the same page.
The ginCompareEntries function compares the gin tuples category first.
I do not understand how this would be a valid order on the page, given
that
ginCompareEntries used in `ginget.c` logic. . Maybe I'm missing
something vital about GIN.


-- 
Best regards,
Kirill Reshke


0001-Much-smaller-repro.patch
Description: Binary data


Re: SQLFunctionCache and generic plans

2025-02-27 Thread Pavel Stehule
Hi

čt 27. 2. 2025 v 21:45 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > čt 27. 2. 2025 v 20:52 odesílatel Tom Lane  napsal:
> >> So taken together, our results are all over the map, anywhere
> >> from 7% speedup to 7% slowdown.  My usual rule of thumb is that
>
> > Where do you see 7% speedup? Few lines up you wrote 0.7% faster.
>
> Alexander got that on the fx4 case, according to his response a
> few messages ago [1].  It'd be good if someone else could reproduce
> that, because right now we have two "it's slower" results versus
> only one "it's faster".
>

ok

here is a profile from master

6.98%  postgres  postgres   [.] hash_bytes
   6.30%  postgres  postgres   [.] palloc0
   3.57%  postgres  postgres   [.] SearchCatCacheInternal
   3.29%  postgres  postgres   [.] AllocSetAlloc
   2.65%  postgres  plpgsql.so [.] exec_stmts
   2.55%  postgres  postgres   [.] expression_tree_walker_impl
   2.34%  postgres  postgres   [.] _SPI_execute_plan
   2.13%  postgres  postgres   [.] CheckExprStillValid
   2.02%  postgres  postgres   [.] fmgr_info_cxt_security
   1.89%  postgres  postgres   [.] ExecInitFunc
   1.51%  postgres  postgres   [.] ExecInterpExpr
   1.48%  postgres  postgres   [.] ResourceOwnerForget
   1.44%  postgres  postgres   [.] AllocSetReset
   1.35%  postgres  postgres   [.] MemoryContextCreate
   1.30%  postgres  plpgsql.so [.] plpgsql_exec_function
   1.29%  postgres  libc.so.6  [.] __memcmp_sse2
   1.24%  postgres  postgres   [.] MemoryContextDelete
   1.13%  postgres  postgres   [.] check_stack_depth
   1.11%  postgres  postgres   [.] AllocSetContextCreateInternal
   1.09%  postgres  postgres   [.] resolve_polymorphic_argtypes
   1.08%  postgres  postgres   [.] hash_search_with_hash_value
   1.07%  postgres  postgres   [.] standard_ExecutorStart
   1.07%  postgres  postgres   [.] ExprEvalPushStep
   1.04%  postgres  postgres   [.] ExecInitExprRec
   0.95%  postgres  plpgsql.so [.] plpgsql_estate_setup
   0.91%  postgres  postgres   [.] ExecReadyInterpretedExp

and from patched

   7.08%  postgres  postgres   [.] hash_bytes
   6.25%  postgres  postgres   [.] palloc0
   3.52%  postgres  postgres   [.] SearchCatCacheInternal
   3.30%  postgres  postgres   [.] AllocSetAlloc
   2.39%  postgres  postgres   [.] expression_tree_walker_impl
   2.37%  postgres  plpgsql.so [.] exec_stmts
   2.15%  postgres  postgres   [.] _SPI_execute_plan
   2.10%  postgres  postgres   [.] CheckExprStillValid
   1.94%  postgres  postgres   [.] fmgr_info_cxt_security
   1.71%  postgres  postgres   [.] ExecInitFunc
   1.41%  postgres  postgres   [.] AllocSetReset
   1.40%  postgres  postgres   [.] ExecInterpExpr
   1.38%  postgres  postgres   [.] ExprEvalPushStep
   1.34%  postgres  postgres   [.] ResourceOwnerForget
   1.31%  postgres  postgres   [.] MemoryContextDelete
   1.24%  postgres  libc.so.6  [.] __memcmp_sse2
   1.21%  postgres  postgres   [.] MemoryContextCreate
   1.18%  postgres  postgres   [.] AllocSetContextCreateInternal
   1.17%  postgres  postgres   [.] hash_search_with_hash_value
   1.13%  postgres  postgres   [.] resolve_polymorphic_argtypes
   1.13%  postgres  plpgsql.so [.] plpgsql_exec_function
   1.03%  postgres  postgres   [.] standard_ExecutorStart
   0.98%  postgres  postgres   [.] ExecInitExprRec
   0.96%  postgres  postgres   [.] check_stack_depth

looks so there is only one significant differences

ExprEvalPushStep 1.07 x 1.38%

Regards

Pavel

compiled without assertions on gcc 15 with 02

vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping : 7
microcode : 0x2f
cpu MHz : 2691.102
cache size : 6144 KB





> regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru
>


Re: Statistics Import and Export

2025-02-27 Thread Jeff Davis
On Tue, 2025-02-25 at 11:11 +0530, Ashutosh Bapat wrote:
> So the dumped statistics are not restored exactly. The reason for
> this
> is the table statistics is dumped before dumping ALTER TABLE ... ADD
> CONSTRAINT command which changes the statistics. I think all the
> pg_restore_relation_stats() calls should be dumped after all the
> schema and data modifications have been done. OR what's the point in
> dumping statistics only to get rewritten even before restore
> finishes.

In your example, it's not so bad because the stats are actually better:
the index is built after the data is present, and therefore relpages
and reltuples are correct.

The problem is more clear if you use --no-data. If you load data,
ANALYZE, pg_dump --no-data, then reload the sql file, then the stats
are lost.

That workflow is very close to what pg_upgrade does. We solved the
problem for pg_upgrade in commit 71b66171d0 by simply not updating the
statistics when building an index and IsBinaryUpgrade.

To solve the issue with dump --no-data, I propose that we change the
test in 71b66171d0 to only update the stats if the physical relpages is
non-zero.

Patch attached:

 * If the dump is --no-data, or during pg_upgrade, the table will be
empty, so the physical relpages will be zero and the restored stats
won't be overwritten.

 * If (like in your example) the dump includes data, the new stats are
based on real data, so they are better anyway. This is sort of like the
case where autoanalyze kicks in.

 * If the dump is --statistics-only, then there won't be any indexes
created in the SQL file, so when you restore the stats, they will
remain until you do something else to change them.

 * If your example really is a problem, you'd need to dump first with -
-no-statistics, and then with --statistics-only, and restore the two
SQL files in order.


Alternatively, we could put stats into SECTION_POST_DATA, which was
already discussed[*], and we decided against it (though there was not a
clear consensus).

Regards,
Jeff Davis

*:
https://www.postgresql.org/message-id/1798867.1712376328%40sss.pgh.pa.us
From 86c03cb525b49d24019c5c0ea8ec36bb82b3c58a Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 27 Feb 2025 17:06:00 -0800
Subject: [PATCH v1] Do not update stats on empty table when building index.

We previously fixed this for binary upgrade in 71b66171d0, but a
similar problem exists when using pg_dump --no-data without pg_upgrade
involved. Fix both problems by not updating the stats when the table
has no pages.

Reported-by: Ashutosh Bapat 
Discussion: https://postgr.es/m/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=c+fnxdvfprwv...@mail.gmail.com
---
 src/backend/catalog/index.c| 17 +++--
 src/test/regress/expected/stats_import.out | 22 +-
 src/test/regress/sql/stats_import.sql  | 12 
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f37b990c81d..1a3fdeab350 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2833,11 +2833,7 @@ index_update_stats(Relation rel,
 	if (reltuples == 0 && rel->rd_rel->reltuples < 0)
 		reltuples = -1;
 
-	/*
-	 * Don't update statistics during binary upgrade, because the indexes are
-	 * created before the data is moved into place.
-	 */
-	update_stats = reltuples >= 0 && !IsBinaryUpgrade;
+	update_stats = reltuples >= 0;
 
 	/*
 	 * Finish I/O and visibility map buffer locks before
@@ -2850,7 +2846,16 @@ index_update_stats(Relation rel,
 	{
 		relpages = RelationGetNumberOfBlocks(rel);
 
-		if (rel->rd_rel->relkind != RELKIND_INDEX)
+		/*
+		 * Don't update statistics when the relation is completely empty. This
+		 * is important during binary upgrade, because at the time the schema
+		 * is loaded, the data has not yet been moved into place. It's also
+		 * useful when restoring a dump containing only schema and statistics.
+		 */
+		if (relpages == 0)
+			update_stats = false;
+
+		if (update_stats && rel->rd_rel->relkind != RELKIND_INDEX)
 			visibilitymap_count(rel, &relallvisible, NULL);
 	}
 
diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out
index 1f150f7b08d..4c81fb60c91 100644
--- a/src/test/regress/expected/stats_import.out
+++ b/src/test/regress/expected/stats_import.out
@@ -12,14 +12,34 @@ CREATE TABLE stats_import.test(
 arange int4range,
 tags text[]
 ) WITH (autovacuum_enabled = false);
+SELECT
+pg_catalog.pg_restore_relation_stats(
+'relation', 'stats_import.test'::regclass,
+'relpages', 18::integer,
+	'reltuples', 21::real,
+	'relallvisible', 24::integer);
+ pg_restore_relation_stats 
+---
+ t
+(1 row)
+
+-- creating an index on an empty table shouldn't overwrite stats
 CREATE INDEX test_i ON stats_import.test(id);
+SELECT relpages, reltuples, relallvisible
+FROM pg_class
+WHERE oid = 'stats_import.test'::regclass;
+ relpa

Re: Update docs for UUID data type

2025-02-27 Thread Andy Alsup
I've submitted it for the up-coming commitfest. The link is:
https://commitfest.postgresql.org/patch/5604/
Thanks for all your help in reviewing these changes.

Best Regards,
Andy Alsup

On Thu, Feb 27, 2025 at 1:58 PM Laurenz Albe 
wrote:

> On Wed, 2025-02-26 at 22:11 -0500, Andy Alsup wrote:
> > Please find the latest patch files attached.
>
> This is good to go.  If you add it to the commitfest, I'm happy to
> mark it "ready for committer".
>
> Yours,
> Laurenz Albe
>


Re: moving some code out of explain.c

2025-02-27 Thread Tom Lane
Robert Haas  writes:
> OK, here is v2. One slightly unfortunate thing about this is that we
> end up with a line that is over 80 characters:

> extern DestReceiver *CreateExplainSerializeDestReceiver(struct
> ExplainState *es);

> Aside from perhaps shortening the function name, I don't see how to avoid 
> that.

Can't get terribly excited about that.  But speaking of
CreateExplainSerializeDestReceiver, I see that a declaration
of it is still there at the bottom of explain.h:

extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);

#endif  /* EXPLAIN_H */

Oversight I assume?

regards, tom lane




Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread David Rowley
On Fri, 28 Feb 2025 at 07:42, Jeff Davis  wrote:
> https://www.postgresql.org/message-id/CAJVSvF6s1LgXF6KB2Cz68sHzk%2Bv%2BO_vmwEkaon%3DH8O9VcOr-tQ%40mail.gmail.com
>
> James pointed out something interesting, which is that a prepared
> statement enforces the work_mem limit at execution time, which might be
> different from the work_mem at the time the statement was prepared.

There's a similar but not quite the same situation with the enable_*
GUCs. The executor isn't going to pick up a new value for these like
it will for work_mem, but I think portions of the same argument can be
made, i.e. Someone might not like that turning off enable_seqscan
after doing PREPARE and EXECUTE once does not invalidate their plan.

> My first reaction is that it's not right because the costing for the
> plan is completely bogus with a different work_mem. It would make more
> sense to me if we either (a) enforced work_mem as it was at the time of
> planning; or (b) replanned if executed with a different work_mem
> (similar to how we replan sometimes with different parameters).

If we were to fix this then a) effectively already happens for the
enable_* GUCs, so b) would be the only logical way to fix.

> But I'm not sure whether someone might be relying on the existing
> behavior?

It looks like there was a bit of discussion on this topic about 18
years ago in [1], but it didn't seem to end with a very conclusive
outcome. I did learn that we once didn't have a method to invalidate
cached plans, so perhaps the current behaviour is a remnant of the
previous lack of infrastructure.

David

[1] https://www.postgresql.org/message-id/15168.1174410673%40sss.pgh.pa.us




Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 4:12 PM Tom Lane  wrote:
> +1, but how about explain_dr.h too?  It doesn't seem though that
> we can avoid #include "executor/instrument.h" there, so it'd be
> a little asymmetrical.  But the executor inclusion doesn't seem
> nearly as much almost-circular.

OK, here is v2. One slightly unfortunate thing about this is that we
end up with a line that is over 80 characters:

extern DestReceiver *CreateExplainSerializeDestReceiver(struct
ExplainState *es);

Aside from perhaps shortening the function name, I don't see how to avoid that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Avoid-including-explain.h-in-explain_format.h-and.patch
Description: Binary data


Re: Restrict copying of invalidated replication slots

2025-02-27 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 12:52 AM Amit Kapila  wrote:
>
> On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada  
> wrote:
> >
> > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila  wrote:
> > >
> > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with
> > > this operation. So, isn't it possible that the source slot exists at
> > > the later position in ReplicationSlotCtl->replication_slots and the
> > > loop traversing slots is already ahead from the point where the newly
> > > copied slot is created?
> >
> > Good point. I think it's possible.
> >
> > > If so, the newly created slot won't be marked
> > > as invalid whereas the source slot will be marked as invalid. I agree
> > > that even in such a case, at a later point, the newly created slot
> > > will also be marked as invalid.
> >
> > The wal_status of the newly created slot would immediately become
> > 'lost' and the next checkpoint will invalidate it. Do we need to do
> > something to deal with this case?
> >
>
> + /* Check if source slot became invalidated during the copy operation */
> + if (second_slot_contents.data.invalidated != RS_INVAL_NONE)
> + ereport(ERROR,
> + errmsg("cannot copy replication slot \"%s\"",
> +NameStr(*src_name)),
> + errdetail("The source replication slot was invalidated during the
> copy operation."));
>
> How about adding a similar test as above for MyReplicationSlot? That
> should suffice the need because we have already acquired the new slot
> by this time and invalidation should signal this process before
> marking the new slot as invalid.

IIUC in the scenario you mentioned, the loop traversing slots already
passed the position of newly created slot in
ReplicationSlotCtl->replication_slots array, so
MyReplicationSlot->data.invalidated is still RS_INVAL_NONE, no?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread Tom Lane
David Rowley  writes:
> On Fri, 28 Feb 2025 at 07:42, Jeff Davis  wrote:
>> My first reaction is that it's not right because the costing for the
>> plan is completely bogus with a different work_mem. It would make more
>> sense to me if we either (a) enforced work_mem as it was at the time of
>> planning; or (b) replanned if executed with a different work_mem
>> (similar to how we replan sometimes with different parameters).

> If we were to fix this then a) effectively already happens for the
> enable_* GUCs, so b) would be the only logical way to fix.

Given that nobody's complained about this for twenty-plus years,
I can't get excited about adding complexity to do either thing.

regards, tom lane




Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread Jeff Davis
On Thu, 2025-02-27 at 17:04 -0500, Tom Lane wrote:
> Given that nobody's complained about this for twenty-plus years,
> I can't get excited about adding complexity to do either thing.

I had in mind some refactoring in this area, which ideally would not
add complexity. It might provide some nice benefits, but would
introduce this behavior change, which makes it slightly more than a
refactoring.

It sounds like the behavior change would be desirable or at least
neutral. I will have to try it out and see if the refactoring is a net
improvement or turns into a mess.

Regards,
Jeff Davis





Re: Improve documentation regarding custom settings, placeholders, and the administrative functions

2025-02-27 Thread David G. Johnston
On Thu, Feb 6, 2025 at 8:04 AM Zhang Mingli  wrote:

> On Feb 6, 2025 at 10:49 +0800, David G. Johnston <
> david.g.johns...@gmail.com>, wrote:
>
>
> Yes, and when it does it is doing so in lieu of an error, and thus it is
> not returning the value of the setting.  It does not mean the value taken
> on by the named parameter is NULL, which is not possible. i.e., SHOW will
> never produce NULL.
>
> Hi,
>
> OK, LGTM.
>
>
Thank you for the review.  If you would like to add yourself as the
reviewer and mark this ready to commit here is the commitfest entry.

https://commitfest.postgresql.org/patch/5548/

David J.


Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread Greg Sabino Mullane
On Thu, Feb 27, 2025 at 1:42 PM Jeff Davis  wrote:

> It would make more sense to me if we either (a) enforced work_mem as it
> was at the time of planning; or (b) replanned if executed with a different
> work_mem (similar to how we replan sometimes with different parameters).
>

Definitely (b).

But I'm not sure whether someone might be relying on the existing behavior?
>

I cannot fathom a reason why.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: moving some code out of explain.c

2025-02-27 Thread Tom Lane
Robert Haas  writes:
> The thing that was bugging me a bit is that explain_format.h includes
> explain.h.

Yeah, I did not like that at all either -- it makes things a bit
circular, and I fear IWYU is going to make stupid recommendations
like not including explain.h in explain.c.

Did you look at avoiding that with our standard trick of writing
detail-free struct declarations?  That is, explain_format.h
would need

struct ExplainState;/* avoid including explain.h here */

and then s/ExplainState/struct ExplainState/ in all the function
declarations.  You'd still need to get List from someplace, but
it could be gotten by including primnodes.h or even pg_list.h.

regards, tom lane




Re: Statistics Import and Export commit related issue.

2025-02-27 Thread Corey Huinker
On Tue, Feb 25, 2025 at 1:31 AM jian he  wrote:

> hi.
> the thread "Statistics Import and Export" is quite hot.
> so a new thread for some minor issue i found.
>
>
> static int
> _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
> {
>
> if (strcmp(te->desc, "STATISTICS DATA") == 0)
> {
> if (!ropt->dumpStatistics)
> return 0;
> else
> res = REQ_STATS;
> }
>
> /* If it's statistics and we don't want statistics, maybe ignore it */
> if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0)
> return 0;
> }
> As you can see, there is some duplicate code there.
> in _tocEntryRequired, we can move `` if (strcmp(te->desc, "STATISTICS
> DATA") == 0)`` after switch (curSection).
> so pg_dump --section option does not influence statistics dump.
> attached is the proposed change.
>

+1, and I think the patch provided is good as-is.


>
>
> in dumpTableData_copy, we have:
> pg_log_info("dumping contents of table \"%s.%s\"",
> tbinfo->dobj.namespace->dobj.name, classname);
> dumpRelationStats add a pg_log_info would be a good thing.
> commit:
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=1fd1bd871012732e3c6c482667d2f2c56f1a9395
> didn't have any pg_log_info.
>

I don't have a strong opinion on this one, but I lean against doing it. I
think table data got a log message because dumping table data is an
operation that takes up a major % of the dump's runtime, whereas statistics
are just some catalog operations. There are log messages for the global
catalog operations (ex. read in all user defined functions, read in all
inheritance relationships), but not per-table catalog operations. If we end
up going to a global export of attribute statistics then I'd definitely
want a log message for that operation.


>
>
> https://www.postgresql.org/docs/devel/app-pgrestore.html
> """
> --jobs=number-of-jobs
> Run the most time-consuming steps of pg_restore — those that load
> data, create indexes, or create constraints — concurrently, using up
> to number-of-jobs concurrent sessions.
> """
> we may need to change the above sentence?
> pg_restore restore STATISTICS DATA also doing it concurrently if
> --jobs is specified.


I also have no strong opinion here, but I think the sentence is fine as is
because it specifies "most time-consuming" and a single catalog row
operation is much smaller than a COPY load, or the sequential scans needed
for index builds and constraint verification.


Re: SQL:2023 JSON simplified accessor support

2025-02-27 Thread Alexandra Wang
Hello hackers,

On Thu, Feb 27, 2025 at 9:46 AM Alexandra Wang 
wrote:

> Summary of changes:
>
> v8-0001 through v8-0005:
> Refactoring and preparatory steps for the actual implementation.
>
> v8-0006 (Implement read-only dot notation for JSONB):
> I removed the vars field (introduced in v7) from JsonbSubWorkspace
> after realizing that JsonPathVariable is not actually needed for
> dot-notation.
>
> v8-0007 (Allow wildcard member access for JSONB):
> I'm aware that the #if 0 in check_indirection() is not ideal. I
> haven’t removed it yet because I’m still reviewing other cases—beyond
> our JSONB simplified accessor use case—where this check should remain
> strict. I’ll post an additional patch to address this.
>

I made the following minor changes in v9:
- More detailed commit messages
- Additional tests
- Use "?column?" as the column name for trailing .*.

Other than that, the patches remain the same as the previous
version:
0001 - 0005: preparation steps for the actual implementation and do
not change or add new behavior.
0006: jsonb dot notation and sliced subscripting
0007: jsonb wildcard member access

Thanks,
Alex


v9-0003-Export-jsonPathFromParseResult.patch
Description: Binary data


v9-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-Not.patch
Description: Binary data


v9-0004-Extract-coerce_jsonpath_subscript.patch
Description: Binary data


v9-0001-Allow-transformation-of-only-a-sublist-of-subscri.patch
Description: Binary data


v9-0006-Implement-read-only-dot-notation-for-jsonb.patch
Description: Binary data


v9-0007-Allow-wild-card-member-access-for-jsonb.patch
Description: Binary data


v9-0005-Eanble-String-node-as-field-accessors-in-generic-.patch
Description: Binary data


Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-27 Thread Masahiko Sawada
On Tue, Feb 25, 2025 at 6:08 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 25 Feb 2025 17:14:43 -0800,
>   Masahiko Sawada  wrote:
>
> > I've attached updated patches.
>
> Thanks.
>
> I found one more missing last ".":
>
> 0002:
>
> > --- a/src/backend/commands/copyfrom.c
> > +++ b/src/backend/commands/copyfrom.c
>
> > @@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
>
> > +/*
> > + * Built-in format-specific routines. One-row callbacks are defined in
> > + * copyfromparse.c
> > + */
>
> copyfromparse.c -> copyfromparse.c.
>
>
> Could you push them?
>

Pushed the 0001 patch.

Regarding the 0002 patch, I realized we stopped exposing
NextCopyFromRawFields() function:

 --- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
*pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
- char ***fields, int *nfields);

I think that this change is not relevant with the refactoring and
probably we should keep it exposed as extension might be using it.
Considering that we added pg_attribute_always_inline to the function,
does it work even if we omit pg_attribute_always_inline to its
function declaration in the copy.h file?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Statistics Import and Export

2025-02-27 Thread Melanie Plageman
On Wed, Feb 26, 2025 at 9:19 PM Corey Huinker  wrote:
>>>
>>> +1, let's shorten those queries.  The coast is probably pretty
>>> clear now if you want to go do that.
>>
>>
>> On it.

So, I started reviewing this and my original thought about shortening
the queries testing pg_restore_relation_stats() wasn't included in
your patch.

For example:

+--- error: relation is wrong type
+SELECT pg_catalog.pg_restore_relation_stats(
+'relation', 0::oid,
+'relpages', 17::integer,
+'reltuples', 400.0::real,
+'relallvisible', 4::integer);

Why do you need to specify all the stats (relpages, reltuples, etc)?
To exercise this you could just do:
select pg_catalog.pg_restore_relation_stats('relation', 0::oid);

Since I haven't been following along with this feature development, I
don't think I can get comfortable enough with all of the changes in
this test diff to commit them. I can't really say if this is the set
of tests that is representative and sufficient for this feature.

If you agree with me that the failure tests could be shorter, I'm
happy to commit that, but I don't really feel comfortable assessing
what the right set of full tests is.

- Melanie




Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Tue, Feb 18, 2025 at 1:11 PM Peter Geoghegan  wrote:
> On Tue, Feb 18, 2025 at 1:04 PM Robert Haas  wrote:
> > I think in cases like this there is a tendency to want to achieve an
> > even split of the original file, but in practice I think that tends
> > not to be difficult to achieve. These two patches combined move about
> > 1000 lines of code into separate files, which I think is actually
> > quite a good result for a refactoring of this sort. We might want to
> > split it up even more, but I don't feel like that has to be done in
> > the first set of patches or that all such patches have to come from
> > me. So I propose to do just this much for now.
> >
> > Comments?
>
> Seems like a good idea to me.

Thanks for the quick response! I have committed these patches.

I see that the Redis-FDW test is failing; it will now need to include
"commands/explain_format.h" -- unless we  something, of course.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: moving some code out of explain.c

2025-02-27 Thread Peter Geoghegan
On Thu, Feb 27, 2025 at 1:24 PM Robert Haas  wrote:
> Thanks for the quick response! I have committed these patches.

I recently did something similar myself when I moved all of the nbtree
preprocessing code into its own file.

The obvious downside is that it tends to make "git blame" much less
useful. It was definitely worth it, though.

-- 
Peter Geoghegan




Should work_mem be stable for a prepared statement?

2025-02-27 Thread Jeff Davis
As a part of this discussion:

https://www.postgresql.org/message-id/CAJVSvF6s1LgXF6KB2Cz68sHzk%2Bv%2BO_vmwEkaon%3DH8O9VcOr-tQ%40mail.gmail.com

James pointed out something interesting, which is that a prepared
statement enforces the work_mem limit at execution time, which might be
different from the work_mem at the time the statement was prepared.

For instance:

   SET work_mem='1GB';
   PREPARE foo AS ...; -- plans using 1GB limit
   SET work_mem='1MB';
   EXECUTE foo; -- enforces 1MB limit

My first reaction is that it's not right because the costing for the
plan is completely bogus with a different work_mem. It would make more
sense to me if we either (a) enforced work_mem as it was at the time of
planning; or (b) replanned if executed with a different work_mem
(similar to how we replan sometimes with different parameters).

But I'm not sure whether someone might be relying on the existing
behavior?

If we were to implement (a) or (b), we couldn't use the work_mem global
directly, we'd need to save it in the plan, and enforce using the
plan's saved value. But that might be a good change anyway. In theory
we might need to do something similar for hash_mem_multiplier, too.

Regards,
Jeff Davis





Re: SQLFunctionCache and generic plans

2025-02-27 Thread Tom Lane
Pavel Stehule  writes:
> čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <
> a.pyha...@postgrespro.ru> napsal:
>>> Unfortunately, there is about 5% slowdown for inlined code, and for
>>> just plpgsql code too.

>> Hi. I've tried to reproduce slowdown and couldn't.

> I'll try to get profiles.

I tried to reproduce this too.  What I got on my usual development
workstation (RHEL8/gcc 8.5.0 on x86_64) was:

fx2 example: v6 patch about 2.4% slower than HEAD
fx4 example: v6 patch about 7.3% slower than HEAD

I was quite concerned after that result, but then I tried it on
another machine (macOS/clang 16.0.0 on Apple M1) and got:

fx2 example: v6 patch about 0.2% slower than HEAD
fx4 example: v6 patch about 0.7% faster than HEAD

(These are average-of-three-runs tests on --disable-cassert
builds; I trust you guys were not doing performance tests on
assert-enabled builds?)

So taken together, our results are all over the map, anywhere
from 7% speedup to 7% slowdown.  My usual rule of thumb is that
you can see up to 2% variation in this kind of microbenchmark even
when "nothing has changed", just due to random build details like
whether critical loops cross a cacheline or not.  7% is pretty
well above that threshold, but maybe it's just random build
variation anyway.

Furthermore, since neither example involves functions.c at all
(fx2 would be inlined, and fx4 isn't SQL-language), it's hard
to see how the patch would directly affect either example unless
it were adding overhead to plancache.c.  And I don't see any
changes there that would amount to meaningful overhead for the
existing use-case with a raw parse tree.

So right at the moment I'm inclined to write this off as
measurement noise.  Perhaps it'd be worth checking a few
more platforms, though.

regards, tom lane




Re: SQLFunctionCache and generic plans

2025-02-27 Thread Tom Lane
Pavel Stehule  writes:
> čt 27. 2. 2025 v 20:52 odesílatel Tom Lane  napsal:
>> So taken together, our results are all over the map, anywhere
>> from 7% speedup to 7% slowdown.  My usual rule of thumb is that

> Where do you see 7% speedup? Few lines up you wrote 0.7% faster.

Alexander got that on the fx4 case, according to his response a
few messages ago [1].  It'd be good if someone else could reproduce
that, because right now we have two "it's slower" results versus
only one "it's faster".

regards, tom lane

[1] 
https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru




Re: SQLFunctionCache and generic plans

2025-02-27 Thread Tom Lane
Alexander Pyhalov  writes:
> Now sql functions plans are actually saved. The most of it is a 
> simplified version of plpgsql plan cache. Perhaps, I've missed 
> something.

A couple of thoughts about v6:

* I don't think it's okay to just summarily do this:

/* It's stale; unlink and delete */
fcinfo->flinfo->fn_extra = NULL;
MemoryContextDelete(fcache->fcontext);
fcache = NULL;

when fmgr_sql sees that the cache is stale.  If we're
doing a nested call of a recursive SQL function, this'd be
cutting the legs out from under the outer recursion level.
plpgsql goes to some lengths to do reference-counting of
function cache entries, and I think you need the same here.

* I don't like much of anything about 0004.  It's messy and it
gives up all the benefit of plan caching in some pretty-common
cases (anywhere where the user was sloppy about what data type
is being returned).  I wonder if we couldn't solve that with
more finesse by applying check_sql_fn_retval() to the query tree
after parse analysis, but before we hand it to plancache.c?
This is different from what happens now because we'd be applying
it before not after query rewrite, but I don't think that
query rewrite ever changes the targetlist results.  Another
point is that the resultTargetList output might change subtly,
but I don't think we care there either: I believe we only examine
that output for its result data types and resjunk markers.
(This is nonobvious and inadequately documented, but for sure we
couldn't try to execute that tlist --- it's never passed through
the planner.)

* One diff that caught my eye was

-   if (!ActiveSnapshotSet() &&
-   plansource->raw_parse_tree &&
-   analyze_requires_snapshot(plansource->raw_parse_tree))
+   if (!ActiveSnapshotSet() && StmtPlanRequiresRevalidation(plansource))

Because StmtPlanRequiresRevalidation uses
stmt_requires_parse_analysis, this is basically throwing away the
distinction between stmt_requires_parse_analysis and
analyze_requires_snapshot.  I do not think that's okay, for the
reasons explained in analyze_requires_snapshot.  We should expend the
additional notational overhead to keep those things separate.

* I'm also not thrilled by teaching StmtPlanRequiresRevalidation
how to do something equivalent to stmt_requires_parse_analysis for
Query trees.  The entire point of the current division of labor is
for it *not* to know that, but keep the knowledge of the properties
of different statement types in parser/analyze.c.  So it looks to me
like we need to add a function to parser/analyze.c that does this.
Not quite sure what to call it though.
querytree_requires_parse_analysis() would be a weird name, because
if it's a Query tree then it's already been through parse analysis.
Maybe querytree_requires_revalidation()?

regards, tom lane




Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-02-27 Thread Peter Geoghegan
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan  wrote:
> I need more feedback about this. I don't understand your perspective here.

Attached is a version of the patch that will apply cleanly against
HEAD. (This is from v26 of my skip scan patch, which is why I've
skipped so many version numbers compared to the last patch posted on
this thread.)

I still haven't changed anything about the implementation, since this
is just to keep CFTester happy.

-- 
Peter Geoghegan


v26-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch
Description: Binary data


Re: [Doc] Improve hostssl related descriptions and option presentation

2025-02-27 Thread David G. Johnston
On Mon, Apr 22, 2024 at 2:20 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Thoughts anyone?
>
> On Thu, Feb 1, 2024 at 3:47 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> Motivated by a recent complaint [1] I found the hostssl related material
>> in our docs quite verbose and even repetitive.  Some of that is normal
>> since we have both an overview/walk-through section as well as a reference
>> section.  But the overview in particular was self-repetitive.  Here is a
>> first pass at some improvements.  The commit message contains both the
>> intent of the patch and some thoughts/questions still being considered.
>>
>> David J.
>>
>> [1]
>> https://www.postgresql.org/message-id/170672433664.663.11895343120533141715%40wrigleys.postgresql.org
>>
>
Withdrawn as this now has conflicts from recent changes and no one seems
interested in voicing an opinion on this for or against.

David J.


Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators

2025-02-27 Thread James Hunter
On Wed, Feb 26, 2025 at 4:09 PM Jeff Davis  wrote:
>
> My idea was that we'd attach work_mem to each Path node and Plan node
> at create time. For example, in create_agg_path() it could do:
>
>   pathnode->path.node_work_mem = work_mem;
>
> And then add to copy_generic_path_info():
>
>   dest->node_work_mem = src->node_work_mem;
>
> (and similarly for other nodes; at least those that care about
> work_mem)
>
> Then, at execution time, use node->ps.ss.plan.node_work_mem instead of
> work_mem.

This is essentially what patches 2 and 3 do. Some comments:

First, there's no need to set the workmem_limit at Path time, since
it's not needed until the Plan is init-ted into a PlanState. So I set
it on the Plan but not on the Path.

Second, the logic to assign a workmem_limit to the Agg node is a bit
more complicated than in your example, because the Agg could be either
a Hash or a Sort. If it's a Hash, it gets work_mem * hash_mem_limit;
and if it's a Sort it gets either 0 or work_mem.

We can adjust the logic so that it gets work_mem instead of 0, by
pushing the complexity out of the original workmem_limit assignment
and into later code blocks -- e.g., in an extension -- but we still
need to decide whether the Agg is a Hash or a Sort. This is why Patch
2 does:

switch (agg->aggstrategy)
{
case AGG_HASHED:
case AGG_MIXED:

/*
 * Because nodeAgg.c will combine all AGG_HASHED nodes into a
 * single phase, it's easier to store the hash working-memory
 * limit on the first AGG_{HASHED,MIXED} node, and set it to zero
 * for all subsequent AGG_HASHED nodes.
 */
agg->plan.workmem_limit = is_first ?
get_hash_memory_limit() / 1024 : 0;
break;
case AGG_SORTED:

/*
 * Also store the sort-output working-memory limit on the first
 * AGG_SORTED node, and set it to zero for all subsequent
 * AGG_SORTED nodes.
 *
 * We'll need working-memory to hold the "sort_out" only if this
 * isn't the last Agg node (in which case there's no one to sort
 * our output).
 */
agg->plan.workmem_limit = *is_first_sort && !is_last ?
work_mem : 0;

*is_first_sort = false;
break;
default:
break;
}

Notice that the logic also sets the limit to 0 on certain Agg nodes --
this can be avoided, at the cost of added complexity later. The added
complexity is because, for example, for Hash Aggs, all Hash tables
share the same overall workmem_limit. So any workmem_limit set on
subsequent hash Agg nodes would be ignored, which means that setting
that limit adds conmplexity by obscuring the underlying logic.

> Similarly, we could track the node_mem_wanted, which would be the
> estimated amount of memory the node would use if unlimited memory were
> available. I believe that's already known (or a simple calculation) at
> costing time, and we can propagate it from the Path to the Plan the
> same way.

And this is what Patch 3 does. As you point out, this is already
known, or, if not, it's a simple calculation. This is all that Patch 3
does.

> (A variant of this approach could carry the values into the PlanState
> nodes as well, and the executor would that value instead.)

That's not needed, though, and would violate existing PG conventions:
we don't copy anything from Plan to PlanState, because the assumption
is that the PlanState always has access to its corresponding Plan.
(The reason we copy from Path to Plan, I believe, is that we drop all
Paths, to save memory; because we generally have many more Paths than
Plans.)

> Extensions like yours could have a GUC like ext.query_work_mem and use
> planner_hook to modify plan after standard_planner is done, walking the
> tree and adjusting each Plan node's node_work_mem to obey
> ext.query_work_mem. Another extension might hook in at path generation
> time with set_rel_pathlist_hook or set_join_pathlist_hook to create
> paths with lower node_work_mem settings that total up to
> ext.query_work_mem.

I don't sent workmem_limit at Path time, because it's not needed
there; but I do set the workmem (estimate) at Path time, exactly so
that future optimizer hooks could make use of a Path's workmem
(estimate) to decide between different Paths.

Patch 3 sets workmem (estimate) on the Path and copies it to the Plan.
As you know, there's deliberately not a 1-1 correspondence between
Path and Plan (the way there is generally a 1-1 correspondence between
Plan and PlanState), so Patch 3 has to do some additional work to
propagate the workmem (estimate) from Path to Plan. You can see
existing examples of similar work inside file createplan.c. Creating a
Plan from a Path is not generally as simple as just copying the Path's
fields over; there are lots of special cases.

Although Patch 3 sets workmem (estimate) on the Plan, inside
createplan.c, Patch 2 doesn't set workmem_limit inside createplan.c.
An earlier draft of the patchset *did* set it there, but because of
all the special casing in createplan.c, this ended up becoming
difficul

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Ranier Vilela
Em qui., 27 de fev. de 2025 às 14:49, Andres Freund 
escreveu:

> Hi,
>
> On 2025-02-27 12:44:24 -0500, Andres Freund wrote:
> > > CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
> > > 8. var_deref_op: Dereferencing null pointer per_buffer_data.
> >
> > That's exactly what the messages you quoted are discussing, no?
>
> Ah, no, it isn't. But I still think the coverity alert and the patch don't
> make sense, as per the below:
>
> > > diff --git a/src/backend/storage/aio/read_stream.c
> b/src/backend/storage/aio/read_stream.c
> > > index 04bdb5e6d4..18e9b4f3c4 100644
> > > --- a/src/backend/storage/aio/read_stream.c
> > > +++ b/src/backend/storage/aio/read_stream.c
> > > @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void
> **per_buffer_data)
> > >
>  READ_BUFFERS_ISSUE_ADVICE : 0)))
> > > {
> > > /* Fast return. */
> > > +   if (per_buffer_data)
> > > +   *per_buffer_data =
> get_per_buffer_data(stream, oldest_buffer_index);
> > > return buffer;
> > > }
> >
> > A few lines above:
> >   Assert(stream->per_buffer_data_size == 0);
> >
> > The fast path isn't used when per buffer data is used.  Adding a check
> for
> > per_buffer_data and assigning something to it is nonsensical.
>
Perhaps.

But the fast path and the parameter void **per_buffer_data,
IMHO, is a serious risk in my opinion.
Of course, when in runtime.

best regards,
Ranier Vilela


Re: Docs for pg_basebackup needs v17 note for incremental backup

2025-02-27 Thread David G. Johnston
On Thu, Feb 6, 2025 at 2:46 PM Daniel Gustafsson  wrote:

> I'd be happy to help getting this in, do you have a suggested wording?
>
>
Thank you.

Attached.

David J.
From fc9768fa1de17529cddc3f3ac1fba208f7500083 Mon Sep 17 00:00:00 2001
From: "David G. Johnston" 
Date: Thu, 27 Feb 2025 10:58:57 -0700
Subject: [PATCH] Document pg_basebackup incremental backup requires v17
 server.

---
 doc/src/sgml/ref/pg_basebackup.sgml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c2d721208b..9659f76042 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -1005,10 +1005,11 @@ PostgreSQL documentation
 
   
pg_basebackup works with servers of the same
-   or an older major version, down to 9.1. However, WAL streaming mode (-X
-   stream) only works with server version 9.3 and later, and tar format
+   or older major version, down to 9.1. However, WAL streaming mode (-X
+   stream) only works with server version 9.3 and later, the tar format
(--format=tar) only works with server version 9.5
-   and later.
+   and later, and incremental backup (--incremental) only works
+   with server version 17 and later.
   
 
   
-- 
2.34.1



Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Tom Lane
Andres Freund  writes:
> Ah, no, it isn't. But I still think the coverity alert and the patch don't
> make sense, as per the below:

Coverity's alert makes perfect sense if you posit that Coverity
doesn't assume that this read_stream_next_buffer call will
only be applied to a stream that has per_buffer_data_size > 0.
(Even if it did understand that, I wouldn't assume that it's
smart enough to see that the fast path will never be taken.)

I wonder if it'd be a good idea to add something like

Assert(stream->distance == 1);
Assert(stream->pending_read_nblocks == 0);
Assert(stream->per_buffer_data_size == 0);
+   Assert(per_buffer_data == NULL);

in read_stream_next_buffer.  I doubt that this will shut Coverity
up, but it would help to catch caller coding errors, i.e. passing
a per_buffer_data pointer when there's no per-buffer data.

On the whole I doubt we can get rid of this warning without some
significant redesign of the read_stream API, and I don't think
it's worth the trouble.  Coverity is a tool not a requirement.
I'm content to just dismiss the warning.

regards, tom lane




Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Melanie Plageman
On Thu, Feb 27, 2025 at 1:08 PM Tom Lane  wrote:
>
> I wonder if it'd be a good idea to add something like
>
> Assert(stream->distance == 1);
> Assert(stream->pending_read_nblocks == 0);
> Assert(stream->per_buffer_data_size == 0);
> +   Assert(per_buffer_data == NULL);
>
> in read_stream_next_buffer.  I doubt that this will shut Coverity
> up, but it would help to catch caller coding errors, i.e. passing
> a per_buffer_data pointer when there's no per-buffer data.

I think this is a good stopgap. I was discussing adding this assert
off-list with Thomas and he wanted to detail his more ambitious plans
for type safety improvements in the read stream API. Less on the order
of a redesign and more like a separate read_stream_next_buffer()s for
when there is per buffer data and when there isn't. And a by-value and
by-reference version for the one where there is data.

I'll plan to add this assert tomorrow if that discussion doesn't materialize.

- Melanie




Re: Document How Commit Handles Aborted Transactions

2025-02-27 Thread David G. Johnston
Ahmed,

Thank you for the review.

I'm a bit confused by the reports of apply and compile errors.  I didn't
touch anything involving "varlist" and see no errors then or now in my
Meson ninja build.  Nor does the CI report any.

On Fri, Feb 14, 2025 at 2:01 PM Ahmed Ashour  wrote:

> Feedback:
> -
> - The patch was manually applied due to conflicts in `advanced.sgml` and
> `commit.sgml`.
> - Fixed invalid SGML structure by wrapping `` in
> ``.
>

If those errors were/are real this wouldn't be ready to commit.  But as
they seem to be a local environment issue on your end, and you agree with
the content, I'll keep it ready to commit.

David J.


Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 2:21 PM Álvaro Herrera  wrote:
> On 2025-Feb-27, Robert Haas wrote:
> > I see that the Redis-FDW test is failing; it will now need to include
> > "commands/explain_format.h" -- unless we  something, of course.
>
> I wonder if it was really a useful idea to move the declarations of
> those functions from explain.h to the new explain_format.h file.  It
> seems that this new file now has a bunch of declarations that have
> always been something of a public interface, together with others that
> are only of internal explain.c interest, such as
> ExplainOpenSetAsideGroup() and friends.

I'm not completely certain about what I did with the header files, but
for a somewhat different reason than what you mention here.

I don't see ExplainOpenSetAsideGroup() as being any different from
anything else that I put in explain_format.h -- it's something that
code might want to call if it's generating EXPLAIN output, and
otherwise not. But maybe I'm missing something -- what makes you see
it as different from the other stuff?

The thing that was bugging me a bit is that explain_format.h includes
explain.h. It would be nice if those were more independent of each
other, but I'm not sure if there's a reasonably clean way of
accomplishing that. When deciding what to do there, it's also worth
keeping in mind that there may be more opportunities to move stuff out
of explain.c, so we might not want to get too dogmatic about the
header file organization just yet. But, I'm certainly open to ideas.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Add regression tests of ecpg command notice (error / warning)

2025-02-27 Thread Ryo Kanbayashi
On Tue, Feb 18, 2025 at 12:49 PM Ryo Kanbayashi
 wrote:
>
> On Thu, Feb 13, 2025 at 10:49 PM Fujii Masao
>  wrote:
> >
> >
> >
> > On 2025/02/06 8:57, Ryo Kanbayashi wrote:
> > > On Wed, Feb 5, 2025 at 9:31 PM Ryo Kanbayashi  
> > > wrote:
> > >>
> > >> Hi hackers,
> > >>
> > >> When I wrote patch of ecpg command notice bug, I recognized needs of
> > >> regression tests for ecpg command notices and I say that I write the
> > >> tests.
> >
> > Thanks for working on this!
> >
> >
> > >> I explain about implementation of this patch.
> > >>
> > >> What is this patch
> > >> - add regression tests which test ecpg command notices such as warning
> > >> and errors
> > >> - test notices implemented in ecpg.addons file
> > >>
> > >> Basic policy on implementation
> > >> - do in a way that matches the method using the existing pg_regress
> > >> command as much as possible
> > >> - avoid methods that increase the scope of influence
> > >>
> > >> Next, I list answers to points that are likely to be pointed out in
> > >> advance below :)
> > >> - shell scripts and bat files is used due to ...
> > >>  avoid non zero exit code of ecpg command makes tests failure
> > >>  avoid increasing C code for executing binary which cares cross 
> > >> platform
> > >> - python code is used because I couldn't write meson.build
> > >> appropriately describe dependency about materials which is used on
> > >> tests without it. please help me...
> > >> - as you said, kick this kind of tests by pg_regress accompanied with
> > >> needless PG server process execution. but pg_regress doesn't execute
> > >> test without it and making pg_regress require modification which has
> > >> not small scope of influence
> >
> > Wouldn't it be simpler to use the existing TAP test mechanism,
> > as shown in the attached patch? Please note that this patch is very WIP,
> > so there would be many places that need further implementation and 
> > refinement.
>
> Fujii San,
>
> Thank you for reviewing and indication of better implementation.
> I rewrite my patch based on your reference implementation :)

Fujii San and other hackers,

I have rewrote my patch on TAP test sttyle :)
File for build are also updated.

Commitfest entry:
https://commitfest.postgresql.org/patch/5543/

---
Great regards,
Ryo Kanbayashi
NTT Open Source Software Center


ecpg-notice-regress-patch-tap-ver.patch
Description: Binary data


Re: moving some code out of explain.c

2025-02-27 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 27, 2025 at 3:05 PM Tom Lane  wrote:
>> Did you look at avoiding that with our standard trick of writing
>> detail-free struct declarations?

> OK, here's a patch that does that. Thoughts?

+1, but how about explain_dr.h too?  It doesn't seem though that
we can avoid #include "executor/instrument.h" there, so it'd be
a little asymmetrical.  But the executor inclusion doesn't seem
nearly as much almost-circular.

regards, tom lane




Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 3:05 PM Tom Lane  wrote:
> Robert Haas  writes:
> > The thing that was bugging me a bit is that explain_format.h includes
> > explain.h.
>
> Yeah, I did not like that at all either -- it makes things a bit
> circular, and I fear IWYU is going to make stupid recommendations
> like not including explain.h in explain.c.
>
> Did you look at avoiding that with our standard trick of writing
> detail-free struct declarations?  That is, explain_format.h
> would need
>
> struct ExplainState;/* avoid including explain.h here */
>
> and then s/ExplainState/struct ExplainState/ in all the function
> declarations.  You'd still need to get List from someplace, but
> it could be gotten by including primnodes.h or even pg_list.h.

OK, here's a patch that does that. Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Avoid-including-commands-explain.h-in-commands-ex.patch
Description: Binary data


Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 4:08 PM Tom Lane  wrote:
> I was down on it to start with, on the grounds that it wasn't adding
> enough ease-of-use to justify even a relatively small amount of added
> code.  I'm not sure that the dump-reload failure angle is much of a
> concern in reality, but I would have voted not to commit before and
> I still do.

OK, fair. Anyone want to argue the other side?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: moving some code out of explain.c

2025-02-27 Thread Álvaro Herrera
On 2025-Feb-27, Robert Haas wrote:

> I see that the Redis-FDW test is failing; it will now need to include
> "commands/explain_format.h" -- unless we  something, of course.

I wonder if it was really a useful idea to move the declarations of
those functions from explain.h to the new explain_format.h file.  It
seems that this new file now has a bunch of declarations that have
always been something of a public interface, together with others that
are only of internal explain.c interest, such as
ExplainOpenSetAsideGroup() and friends.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




  1   2   >