date:20241128

Re: plperl version on the meson setup summary screen

2024-11-28 Thread Zharkov Roman


Hello,

On 2024-11-27 21:50, Andrew Dunstan wrote:

it should give the same answer


Sometimes "version" and "api_versionstring" are different. Here are two 
simple examples from my windows system and from a linux system of one of 
my colleagues:


C:\Temp>perl -MConfig -e "print \"$Config{api_versionstring}\n\"; print 
\"$Config{version}\n\""

5.32.0
5.32.1
C:\Temp>perl -v
This is perl 5, version 32, subversion 1 (v5.32.1) built for 
MSWin32-x64-multi-thread


perl -MConfig -e 'print "$Config{api_versionstring}\n"; print 
"$Config{version}\n"'

5.38.0
5.38.2
perl -v
This is perl 5, version 38, subversion 2 (v5.38.2)


--
Best regards, Roman Zharkov.

Re: Converting SetOp to read its two inputs separately

2024-11-28 Thread David Rowley

On Wed, 20 Nov 2024 at 15:09, Tom Lane  wrote:
> Once I'd wrapped my head around how things are done now (which the
> comments in prepunion.c were remarkably unhelpful about), I saw that
> most of the problem for #2 just requires re-ordering things that
> generate_nonunion_paths was already doing.  As for #1, I have a modest
> proposal: we should get rid of set_operation_ordered_results_useful
> entirely.  It's not the code that actually does useful work, and
> keeping it in sync with the code that does do useful work is hard and
> unnecessary.
>
> 0001-0003 below are the same as before (so the slot-munging TODO is
> still there).  0004 fixes a rather basic bug for nested set-operations
> and gets rid of set_operation_ordered_results_useful along the way.
> Then 0005 does your step 2.

Here's a quick review of all 5 patches together.

1. In setop_load_group(), the primary concern with the result of
setop_compare_slots() seems to be if the tuples match or not. If
you're not too concerned with keeping the Assert checking for
mis-sorts, then it could be much more efficient for many cases to
check the final column first. ExecBuildGroupingEqual() makes use of
this knowledge already, so it's nothing new. Maybe you could just use
an ExprState made by ExecBuildGroupingEqual() instead of
setop_compare_slots() for this case. That would allow JIT to work.

2. (related to #1) I'm unsure if it's also worth deforming all the
needed attributes in one go rather than calling slot_getattr() each
time, which will result in incrementally deforming the tuple.
ExecBuildGroupingEqual() also does that.

3.
/* XXX hack: force the tuple into minimal form */
/* XXX should not have to do this */
ExecForceStoreHeapTuple(ExecCopySlotHeapTuple(innerslot),
setopstate->ps.ps_ResultTupleSlot, true);
innerslot = setopstate->ps.ps_ResultTupleSlot;

nodeAgg.c seems to do this by using prepare_hash_slot() which deforms
the heap tuple and sets the Datums verbatim rather than making copies
of any byref ones.

4. Since the EXPLAIN output is going to change for SetOps, what are
your thoughts on being more explicit about the setop strategy being
used?  Showing "SetOp Except" isn't that informative.  You've got to
look at the non-text format to know it's using the merge method.  How
about "MergeSetOp Except"?

5. You might not be too worried, but in SetOpState if you move
need_init up to below setop_done, you'll close a 7-byte hole in that
struct

I'm also happy if you just push the 0004 patch. Are you planning to
backpatch that? If I'd realised you were proposing to remove that
function, I'd not have fixed the typo Richard mentioned.

David

RE: Rework subscription-related code for psql and pg_dump

2024-11-28 Thread Hayato Kuroda (Fujitsu)

Dear Michael,

Thanks for working on this.
It was not good to follow existing codes without confirmation :-(.

I grepped files in bin/ and could not find lines which includes 
catalog/pg_xxx.h files.
(One exception is pg_control.h. It is not a catalog-header but has the same 
prefix.)
The patch basically LGTM.

One comment...
In describeSubscriptions(), only the streaming option is represented as {off, 
on, parallel},
whereas twophase option is {d, p, e}.
I feel it is bit strange so we can fix to show like {disabled, pending, 
enabled} by the same approach.
Thought?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Отв.: Re: UUID v7

2024-11-28 Thread Masahiko Sawada

On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
 wrote:
>
> I mean to add not benchmark results to the patch, but functions so that 
> everyone can compare themselves on their equipment. The comparison with 
> UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and 
> an integer key. And I have described many use cases, and in your benchmark 
> there is only one, the simplest.

I don't think we should add such benchmark functions at least to this
patch. If there already is a well-established workload using UUIDv7
and UUIDv4 etc, users can use pgbench with custom scripts, or it might
make sense to add it to pgbench as a built-in workload. Which however
should be a separate patch. Having said that, I think users should use
benchmarks that fit their workloads, and it would not be easy to
establish workloads that are reasonable for most systems.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: relfilenode statistics

2024-11-28 Thread Kirill Reshke

On Tue, 5 Nov 2024 at 11:06, Bertrand Drouvot
 wrote:
>
>
> Does it sound ok to you to move with the above principal? (I'm +1 on it).
>

Hi! I looked through this thread.
Looks like we are still awaiting a patch which stores more counters
(n_dead_tup, ... etc) into relfilenode stats. So, I assume this should
be moved to the next CF.

I also have a very stupid question:
If we don’t have the relation OID when writing buffers out, can we
just store oid to buffertag mapping somewhere and use it?
I suspect that this is a horrible idea, but what's the exact reason?
Is it that we will break too many abstraction layers for such a minor
matter?

-- 
Best regards,
Kirill Reshke

Отв.: Re: UUID v7

2024-11-28 Thread Sergey Prokhorenko

I mean to add not benchmark results to the patch, but functions so that 
everyone can compare themselves on their equipment. The comparison with UUIDv4 
is not very interesting, as the choice is usually between UUIDv7 and an integer 
key. And I have described many use cases, and in your benchmark there is only 
one, the simplest.

Отправлено из Yahoo Почты на iPhone

Пользователь четверг, ноября 28, 2024, 11:09 AM написал Andrey M. Borodin 
:

> On 28 Nov 2024, at 04:07, Sergey Prokhorenko  
> wrote:
> 
> It would be useful to add a standard comparative benchmark with several 
> parameters and use cases to the patch, so that IT departments can compare 
> UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and 
> conditions.
> 
> I know for a fact that IT departments make such benchmarks of low quality. 
> They usually measure the generation rate, which is meaningless because it is 
> usually excessive. It makes sense to measure the rate of single-threaded and 
> multi-threaded insertion of a large number of records (with and without 
> partitioning), as well as the rate of execution of queries to join big 
> tables, to update or delete a large number of records. It is important to 
> measure memory usage, processor load, etc.

Publishing benchmarks seems to be far beyond what our documentation go for. 
Mostly, because benchmarks are tricky. You can prove anything with benchmarks.

Everyone is welcome to publish benchmark results in their blogs, but IMO docs 
have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39 applied on 
my MB Air M2 I get:

postgres=# create table table_for_uuidv4(id uuid primary key);
CREATE TABLE
Time: 9.479 ms
postgres=# insert into table_for_uuidv4 select uuidv4() from 
generate_series(1,3e7);
INSERT 0 3000
Time: 2003918.770 ms (33:23.919)
postgres=# create table table_for_uuidv7(id uuid primary key);
CREATE TABLE
Time: 3.930 ms
postgres=# insert into table_for_uuidv7 select uuidv7() from 
generate_series(1,3e7);
INSERT 0 3000
Time: 337001.315 ms (05:37.001)

Almost an order of magnitude better :)

Best regards, Andrey Borodin.

Re: Отв.: Re: UUID v7

2024-11-28 Thread Kirill Reshke

On Fri, 29 Nov 2024, 09:14 Sergey Prokhorenko, <
sergeyprokhore...@yahoo.com.au> wrote:

> I mean to add not benchmark results to the patch, but functions so that
> everyone can compare themselves on their equipment. The comparison with
> UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and
> an integer key. And I have described many use cases, and in your benchmark
> there is only one, the simplest.
>
>
> Отправлено из Yahoo Почты на iPhone
> 
>
> Пользователь четверг, ноября 28, 2024, 11:09 AM написал Andrey M. Borodin <
> x4...@yandex-team.ru>:
>
>
>
> > On 28 Nov 2024, at 04:07, Sergey Prokhorenko <
> sergeyprokhore...@yahoo.com.au> wrote:
> >
> > It would be useful to add a standard comparative benchmark with several
> parameters and use cases to the patch, so that IT departments can compare
> UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and
> conditions.
> >
> > I know for a fact that IT departments make such benchmarks of low
> quality. They usually measure the generation rate, which is meaningless
> because it is usually excessive. It makes sense to measure the rate of
> single-threaded and multi-threaded insertion of a large number of records
> (with and without partitioning), as well as the rate of execution of
> queries to join big tables, to update or delete a large number of records.
> It is important to measure memory usage, processor load, etc.
>
>
> Publishing benchmarks seems to be far beyond what our documentation go
> for. Mostly, because benchmarks are tricky. You can prove anything with
> benchmarks.
>
> Everyone is welcome to publish benchmark results in their blogs, but IMO
> docs have a very different job to do.
>
> I’ll just publish one benchmark in this mailing list. With patch v39
> applied on my MB Air M2 I get:
>
> postgres=# create table table_for_uuidv4(id uuid primary key);
> CREATE TABLE
> Time: 9.479 ms
> postgres=# insert into table_for_uuidv4 select uuidv4() from
> generate_series(1,3e7);
> INSERT 0 3000
> Time: 2003918.770 ms (33:23.919)
> postgres=# create table table_for_uuidv7(id uuid primary key);
> CREATE TABLE
> Time: 3.930 ms
> postgres=# insert into table_for_uuidv7 select uuidv7() from
> generate_series(1,3e7);
> INSERT 0 3000
> Time: 337001.315 ms (05:37.001)
>
> Almost an order of magnitude better :)
>
>
> Best regards, Andrey Borodin.
>
> Hi!
Do not top-post on this list

>

Re: [PATCH] Add sortsupport for range types and btree_gist

2024-11-28 Thread Michael Paquier

On Thu, Nov 28, 2024 at 11:32:59PM +0500, Andrey M. Borodin wrote:
> Michael, we have 30 tests with checks that need injection
> points. But these 30 tests also test functionality that needs to be
> tested even in build without injection points. 
> Do we have to extract checks with injection point into separate
> regression test? So that we can exclude this test in builds without
> injection points.

I've looked at what the patch is doing with injection points, and
that's incorrect.

+CREATE EXTENSION injection_points;
+
+SELECT injection_points_attach('gist-sorted-build', 'notice');

This attaches points locally, meaning that this is going to create
some noise for any tests running in parallel taking the same code path
as where the point is created.  To make tests able to run safely in a
concurrent manner, please use injection_points_set_local().  As a
bonus, your points would be automatically detached when the session
turns off.

Any module that requires the module injection_points to be installed
can do a few things, none of them are done this way in this patch:
- Add EXTRA_INSTALL=src/test/modules/injection_points.
- You could make a test able to run installcheck, but you should use
an extra query that checks if the module is available for installation
by querying pg_available_extensions and use two possible output files:
one for the case where the module is *not* installed and one for the
case where the module is installed.  A simpler way would be to block
installcheck, or add SQL tests in src/test/modules/injection_points.
Both options don't seem adapted to me here as they impact the
portability of existing tests.

As a whole, I'm very dubious about the need for injection points at
all here.  The sortsupport property claimed for this patch tells that
this results in smaller index sizes, but the tests don't really check
that: they just make sure that sortsupport routine paths are taken.
What this should test is not the path taken, but how the new code
affects the index data generated.  Perhaps pageinspect could be
something to use to show the difference in contents, not sure though.

The number of tests added to contrib/btree_gist/Makefile is not
acceptable for a patch of this size, leading to a large bloat.  And
that's harder to maintain in the long-term.
--
Michael

signature.asc
Description: PGP signature

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Alena Rybakina


On 29.11.2024 03:04, Alexander Korotkov wrote:

On Thu, Nov 28, 2024 at 9:33 PM Alena Rybakina
 wrote:

On 28.11.2024 22:28, Ranier Vilela wrote:

Em qui., 28 de nov. de 2024 às 16:03, Alena Rybakina 
 escreveu:

Hi! Thank you for the case.

On 28.11.2024 21:00, Alexander Lakhin wrote:

Hello Alexander,

21.11.2024 09:34, Alexander Korotkov wrote:

I'm going to push this if no objections.

Please look at the following query, which triggers an error after
ae4569161:
SET random_page_cost = 1;
CREATE TABLE tbl(u UUID);
CREATE INDEX idx ON tbl USING HASH (u);
SELECT COUNT(*) FROM tbl WHERE u = '' OR
   u = '';

ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625



I found out what the problem is index scan method was not generated. We
need to check this during OR clauses for SAOP transformation.

There is a patch to fix this problem.

Hi.
Thanks for the quick fix.

But I wonder if it is not possible to avoid all if the index is useless?
Maybe moving your fix to the beginning of the function?

diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index d827fc9f4d..5ea0b27d01 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3248,6 +3248,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
   Assert(IsA(orclause, BoolExpr));
   Assert(orclause->boolop == OR_EXPR);

+ /* Ignore index if it doesn't support index scans */
+ if(!index->amsearcharray)
+ return NULL;
+

Agree. I have updated the patch

   /*
   * Try to convert a list of OR-clauses to a single SAOP expression. Each
   * OR entry must be in the form: (indexkey operator constant) or (constant

The test bug:
EXPLAIN SELECT COUNT(*) FROM tbl WHERE u = '' 
OR u = '';
 QUERY PLAN
--
  Aggregate  (cost=12.46..12.47 rows=1 width=8)
->  Bitmap Heap Scan on tbl  (cost=2.14..12.41 rows=18 width=0)
  Recheck Cond: ((u = '----'::uuid) OR 
(u = '----'::uuid))
  ->  BitmapOr  (cost=2.14..2.14 rows=18 width=0)
->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 width=0)
  Index Cond: (u = 
'----'::uuid)
->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 width=0)
  Index Cond: (u = 
'----'::uuid)
(8 rows)

I slightly revised the fix and added similar check to
group_similar_or_args().  Could you, please, review that before
commit?


I agree with changes. Thank you!

--
Regards,
Alena Rybakina
Postgres Professional

Re: Added prosupport function for estimating numeric generate_series rows

2024-11-28 Thread songjinzhou

> This sort of stuff is best addressed by running the code through
> pgindent, rather than fixing it manually.  Usually we don't insist
> on submitters getting it right; the committer should pgindent it.


 
Hello, thank you and David Rowley for your comments. 


I have used pgindent to adjust the code format and added comments and missing 
regression test cases. Here is the patch of version v3. 


I look forward to your reply. Thank you very much!


Regards, Song Jinzhou

v3_0001-Added-prosupport-function-for-estimating-numeric-gen.patch
Description: Binary data

Re: Use streaming read API in pgstattuple.

2024-11-28 Thread Nazir Bilal Yavuz

Hi,

> Hello! Thank you for taking a peek. Your review comments have been
> corrected. Since my changes were wrong, I honestly don't know why this
> worked in version 1. By a miracle.
>
> As for CI, i rechecked v1:
>
> ```
> db2=#
> select * from pgstathashindex('test_hashidx');
>  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages
> | live_items | dead_items | free_percent
> -+--++--+--+++--
>4 |4 |  0 |1 |0
> |  0 |  0 |  100
> (1 row)
>
> db2=#
> select * from pgstattuple('test_hashidx');
> ERROR:  could not read blocks 6..16 in file "base/16454/16473": read
> only 0 of 90112 bytes
> ```
>
> In v2 this behaves correctly.

Yes, I confirm that it works.

> Should we add `pgstattuple(...)` after `pgstat*type*index(..)`
> everywhere in pgstattuple regression tests?

Not everywhere but at least one pgstattuple(...) call for each index type.

There is one behavior change, it may not be important but I think it
is worth mentioning:

-for (; blkno < nblocks; blkno++)
+p.last_exclusive = nblocks;
+
+while (BufferIsValid(buf = read_stream_next_buffer(stream, NULL)))
 {
 CHECK_FOR_INTERRUPTS();

-pagefn(&stat, rel, blkno, bstrategy);
+pagefn(&stat, rel, buf);
 }
+
+Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);

With this change we assume that if stream returns an invalid buffer
that means stream must be finished. We don't check if the stream
didn't finish but somehow read an invalid buffer. In the upstream
version, if we read an invalid buffer then postgres server will fail.
But in the patched version, the server will continue to run because it
will think that stream has reached to end. This could be happening for
other streaming read users; for example at least the
read_stream_next_buffer() calls in the collect_corrupt_items()
function face the same issue.


--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Remove useless GROUP BY columns considering unique index

2024-11-28 Thread jian he

On Fri, Nov 29, 2024 at 2:36 PM Andrei Lepikhov  wrote:
>
> > I forgot to do a local commit before sending v8. Fixed in the attached v9.
> 1. Thread reference in the patch comment doesn't work.
fixed.

I found that unique expression index can also be used for groupby
column removal.
so I implemented it, aslo added two tests on it.

now remove_useless_groupby_columns like:
  if (index->indexprs == NIL)
  {
--handle unique index
  }
  else
  {
--handle unique expression index
  }

what do you think?
From ca9302a8c98f2ad4a8f9837ec42ec2da6588e25d Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 29 Nov 2024 14:54:23 +0800
Subject: [PATCH v10 1/1] remove useless group by columns via unique-not-null

In the `remove_useless_groupby_columns` function,
use a unique index and a not-null constraint to eliminate unnecessary columns.
Here, the primary key is treated as a "unique index and not-null".
If there are multiple candidates,
use the unique index with the fewest columns to remove the other columns
in groupby expression.

discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0@Spark
---
 src/backend/optimizer/plan/initsplan.c   | 258 ++-
 src/backend/optimizer/plan/planmain.c|   3 +
 src/backend/optimizer/plan/planner.c | 163 --
 src/backend/optimizer/util/plancat.c |   1 +
 src/include/nodes/pathnodes.h|   2 +
 src/include/optimizer/planmain.h |   1 +
 src/test/regress/expected/aggregates.out |  79 +++
 src/test/regress/sql/aggregates.sql  |  39 
 8 files changed, 382 insertions(+), 164 deletions(-)

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 903c397d40..bd82745f1e 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -1,7 +1,7 @@
 /*-
  *
  * initsplan.c
- *	  Target list, qualification, joininfo initialization routines
+ *	  Target list, group by, qualification, joininfo initialization routines
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -387,6 +387,262 @@ add_vars_to_attr_needed(PlannerInfo *root, List *vars,
 }
 
 
+/*
+ *
+ *	  GROUP BY
+ *
+ */
+
+/*
+ * remove_useless_groupby_columns
+ *		Remove any columns in the GROUP BY clause that are redundant due to
+ *		being functionally dependent on other GROUP BY columns.
+ *
+ * Since some other DBMSes do not allow references to ungrouped columns, it's
+ * not unusual to find all columns listed in GROUP BY even though listing the
+ * primary-key columns, or columns of a unique constraint would be sufficient.
+ * Deleting such excess columns avoids redundant sorting or hashing work, so
+ * it's worth doing.
+ *
+ * Relcache invalidations will ensure that cached plans become invalidated
+ * when the underlying supporting indexes are dropped or if a column's NOT
+ * NULL attribute is removed.
+ */
+void
+remove_useless_groupby_columns(PlannerInfo *root)
+{
+	Query	   *parse = root->parse;
+	Bitmapset **groupbyattnos;
+	Bitmapset **surplusvars;
+	ListCell   *lc;
+	int			relid;
+
+	/* No chance to do anything if there are less than two GROUP BY items */
+	if (list_length(root->processed_groupClause) < 2)
+		return;
+
+	/* Don't fiddle with the GROUP BY clause if the query has grouping sets */
+	if (parse->groupingSets)
+		return;
+
+	/*
+	 * Scan the GROUP BY clause to find GROUP BY items that are simple Vars.
+	 * Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k
+	 * that are GROUP BY items.
+	 */
+	groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
+		   (list_length(parse->rtable) + 1));
+	foreach(lc, root->processed_groupClause)
+	{
+		SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+		TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
+		Var		   *var = (Var *) tle->expr;
+
+		/*
+		 * Ignore non-Vars and Vars from other query levels.
+		 *
+		 * XXX in principle, stable expressions containing Vars could also be
+		 * removed, if all the Vars are functionally dependent on other GROUP
+		 * BY items.  But it's not clear that such cases occur often enough to
+		 * be worth troubling over.
+		 */
+		if (!IsA(var, Var) ||
+			var->varlevelsup > 0)
+			continue;
+
+		/* OK, remember we have this Var */
+		relid = var->varno;
+		Assert(relid <= list_length(parse->rtable));
+		groupbyattnos[relid] = bms_add_member(groupbyattnos[relid],
+			  var->varattno - FirstLowInvalidHeapAttributeNumber);
+	}
+
+	/*
+	 * Consider each relation and see if it is possible to remove some of its
+	 * Vars from GROUP BY.  For simplicity and speed, we do the actual removal
+	 * in a separate p

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Alexander Korotkov

On Fri, Nov 29, 2024 at 7:51 AM Alena Rybakina
 wrote:
>
> On 29.11.2024 03:04, Alexander Korotkov wrote:
> > On Thu, Nov 28, 2024 at 9:33 PM Alena Rybakina
> >  wrote:
> >> On 28.11.2024 22:28, Ranier Vilela wrote:
> >>
> >> Em qui., 28 de nov. de 2024 às 16:03, Alena Rybakina 
> >>  escreveu:
> >>> Hi! Thank you for the case.
> >>>
> >>> On 28.11.2024 21:00, Alexander Lakhin wrote:
>  Hello Alexander,
> 
>  21.11.2024 09:34, Alexander Korotkov wrote:
> > I'm going to push this if no objections.
>  Please look at the following query, which triggers an error after
>  ae4569161:
>  SET random_page_cost = 1;
>  CREATE TABLE tbl(u UUID);
>  CREATE INDEX idx ON tbl USING HASH (u);
>  SELECT COUNT(*) FROM tbl WHERE u = '' OR
> u = '';
> 
>  ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
>  LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625
> 
> 
> >>> I found out what the problem is index scan method was not generated. We
> >>> need to check this during OR clauses for SAOP transformation.
> >>>
> >>> There is a patch to fix this problem.
> >> Hi.
> >> Thanks for the quick fix.
> >>
> >> But I wonder if it is not possible to avoid all if the index is useless?
> >> Maybe moving your fix to the beginning of the function?
> >>
> >> diff --git a/src/backend/optimizer/path/indxpath.c 
> >> b/src/backend/optimizer/path/indxpath.c
> >> index d827fc9f4d..5ea0b27d01 100644
> >> --- a/src/backend/optimizer/path/indxpath.c
> >> +++ b/src/backend/optimizer/path/indxpath.c
> >> @@ -3248,6 +3248,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
> >>Assert(IsA(orclause, BoolExpr));
> >>Assert(orclause->boolop == OR_EXPR);
> >>
> >> + /* Ignore index if it doesn't support index scans */
> >> + if(!index->amsearcharray)
> >> + return NULL;
> >> +
> >>
> >> Agree. I have updated the patch
> >>
> >>/*
> >>* Try to convert a list of OR-clauses to a single SAOP expression. Each
> >>* OR entry must be in the form: (indexkey operator constant) or 
> >> (constant
> >>
> >> The test bug:
> >> EXPLAIN SELECT COUNT(*) FROM tbl WHERE u = 
> >> '' OR u = 
> >> '';
> >>  QUERY PLAN
> >> --
> >>   Aggregate  (cost=12.46..12.47 rows=1 width=8)
> >> ->  Bitmap Heap Scan on tbl  (cost=2.14..12.41 rows=18 width=0)
> >>   Recheck Cond: ((u = 
> >> '----'::uuid) OR (u = 
> >> '----'::uuid))
> >>   ->  BitmapOr  (cost=2.14..2.14 rows=18 width=0)
> >> ->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 
> >> width=0)
> >>   Index Cond: (u = 
> >> '----'::uuid)
> >> ->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 
> >> width=0)
> >>   Index Cond: (u = 
> >> '----'::uuid)
> >> (8 rows)
> > I slightly revised the fix and added similar check to
> > group_similar_or_args().  Could you, please, review that before
> > commit?
> >
> I agree with changes. Thank you!

Andrei, Alena, thank you for the feedback. Pushed!

--
Regards,
Alexander Korotkov
Supabase

Re: Strange assertion in procarray.c

2024-11-28 Thread Michael Paquier

On Thu, Nov 28, 2024 at 09:04:36PM +0100, Michail Nikolaev wrote:
> I discovered that a similar issue was previously addressed by Heikki in
> commit [0], where installcheck was disabled for injection point tests.
> However, in the meson build configuration, this was only applied to
> regression tests - the isolation and TAP tests are still running during
> installcheck.

I fail to see how this is related?  The original issue was that this
was impossible to run safely concurrently, but now we have the
facilities able to do so.  There are a few cases where using a wait
point has limits, for example outside a transaction context for some
of the processes, but that has not really been an issue up to now.

> As demonstrated in the previously shared reproducer [1], even *local*
> injection points can cause backend crashes through unexpected side effects.
> Therefore, I propose extending the installcheck disable to cover both TAP
> and isolation tests as well.
> 
> I've attached a patch implementing these changes.

@@ -426,6 +427,7 @@ InvalidateCatalogSnapshot(void)
 pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
 CatalogSnapshot = NULL;
 SnapshotResetXmin();
+INJECTION_POINT("invalidate_catalog_snapshot_end");
[...]
+step s4_attach_locally { SELECT
injection_points_attach('invalidate_catalog_snapshot_end', 'wait'); }
[...]
#4  0x563f09d22f39 in ExceptionalCondition
(conditionName=0x563f0a072d00 
"TransactionIdIsValid(proc->xid)", fileName=0x563f0a072a4a
"procarray.c", lineNumber=677) at assert.c:66
#5  0x563f096f0684 in ProcArrayEndTransaction
(proc=0x7fbd4d083ac0, latestXid=750) at procarray.c:677
#6  0x563f088c54f3 in AbortTransaction () at xact.c:2946
#7  0x563f088c758d in AbortCurrentTransactionInternal () at
#xact.c:3531
#8  0x563f088c72a6 in AbortCurrentTransaction () at xact.c:3449
#9  0x563f0979c0f7 in PostgresMain (dbname=0x563f0f128100
"isolation_regression", username=0x563f0f1280e8 "ioltas") at
postgres.c:4524
#10 0x563f0978a5e5 in BackendMain (startup_data=0x7ffdcf50cfa8 "",
startup_data_len=4) at backend_startup.c:107
#11 0x563f094d8613 in postmaster_child_launch
(child_type=B_BACKEND, child_slot=1,
startup_data=0x7ffdcf50cfa8 "", startup_data_len=4,
client_sock=0x7ffdcf50cfe0) 

Isn't that pointing to an actual bug with serializable transactions?

What you are telling here is that there is a race condition where it
is possible to trigger an assertion failure when finishing a session
while another one is waiting on an invalidation, if there's in the mix
a read/write dependency error.  Disabling the test hides the problem,
it does not fix it.  And we should do the latter, not the former.
--
Michael


signature.asc
Description: PGP signature

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Andrei Lepikhov


On 11/29/24 07:04, Alexander Korotkov wrote:

On Thu, Nov 28, 2024 at 9:33 PM Alena Rybakina
I slightly revised the fix and added similar check to
group_similar_or_args().  Could you, please, review that before
commit?

LGTM,
As I see, we didn't pay attention to this option from the beginning. 
Thanks for fixing it!


--
regards, Andrei Lepikhov

Re: crash with synchronized_standby_slots

2024-11-28 Thread Amit Kapila

On Thu, Nov 28, 2024 at 5:54 PM Alvaro Herrera  wrote:
>
> Gabriele just reported a crash when changing synchronized_standby_slots
> under SIGHUP and logging collector working.  The problem apparently is
> that validate_sync_standby_slots is run for the GUC check routine, and
> it requires acquiring an LWLock; but syslogger doesn't have a PGPROC so
> that doesn't work.
>
> Apparently we already have a hack in place against the same problem in
> postmaster (and elsewhere?), by testing that ReplicationSlotCtl is not
> null.  But that hack seems to be incomplete, as the crash here attests.
>

I tried it on my Windows machine and noticed that ReplicationSlotCtl
is NULL for syslogger, so the problem doesn't occur. The reason is
that we don't attach to shared memory in syslogger, so ideally
ReplicationSlotCtl should be NULL. Because we inherit everything
through the fork for Linux systems and then later for processes that
don't want to attach to shared memory, we call PGSharedMemoryDetach()
from postmaster_child_launch(). The PGSharedMemoryDetach won't
reinitialize the memory pointed to by ReplicationSlotCtl, so, it would
be an invalid memory.

> To reproduce, simply start with no synchronized_standby_slots setting
> and logging_collector=on, then change the value in postgresql.conf and
> reload.
>
> One option to fix this might be as attached.  The point here is that
> processes without a PGPROC don't need or care to have a correct setting,
> so we just skip verifying it on those.  AFAICS this is more general than
> the test for ReplicationSlotCtl, so I just removed that one.
>

Yeah, I can't think of a better fix for this.

-- 
With Regards,
Amit Kapila.

Rework subscription-related code for psql and pg_dump

2024-11-28 Thread Michael Paquier

Hi all,

While looking at some of the code related to subscriptions in psql,
coming down to make LOGICALREP_STREAM_* available for client code,
I've also noticed quite a few inconsistencies and mistakes with how
pg_subscription is handled in pg_dump, that have accumulated over the
years as of what looks like set of copy-pastos for most of them:
- pg_dump.c includes pg_subscription.h and not pg_subscription_d.h.
This is a mistake as the former should only be included in the backend
code.
- SubscriptionInfo has accumulated dust over the years, using
declarations types that don't map with its catalog.  To keep it short
here, all the fields use (char *) leading to more DIY logic in the
code (see two_phase_disabled[]), while most of the fields are booleans
or char values.  Switching to char values allows direct comparisons
with the contents of pg_subscription_d.h, leading to more consistent
code.
- Inconsistent position of fields between SubscriptionInfo and the
catalog pg_subscription.

EXPOSE_TO_CLIENT_CODE has been added to pg_subscription.h so as values
for substream, twophasestate and origin can be used directly in psql
and pg_dump, switching these to use pg_subscription_d.h as this is
client code.

While all that addressed, I am finishing with the patch attached.

Thoughts or comments?
--
Michael
From 097de22e34dec9aab560cad85af48e07d446b432 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 29 Nov 2024 13:02:29 +0900
Subject: [PATCH] Refactor subscription code for psql and pg_dump

---
 src/include/catalog/pg_subscription.h | 44 +
 src/bin/pg_dump/pg_dump.c | 47 +--
 src/bin/pg_dump/pg_dump.h | 16 -
 src/bin/psql/describe.c   |  7 ++--
 4 files changed, 58 insertions(+), 56 deletions(-)

diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index b25f3fea56..beaff6578a 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -23,26 +23,6 @@
 #include "lib/stringinfo.h"
 #include "nodes/pg_list.h"
 
-/*
- * two_phase tri-state values. See comments atop worker.c to know more about
- * these states.
- */
-#define LOGICALREP_TWOPHASE_STATE_DISABLED 'd'
-#define LOGICALREP_TWOPHASE_STATE_PENDING 'p'
-#define LOGICALREP_TWOPHASE_STATE_ENABLED 'e'
-
-/*
- * The subscription will request the publisher to only send changes that do not
- * have any origin.
- */
-#define LOGICALREP_ORIGIN_NONE "none"
-
-/*
- * The subscription will request the publisher to send changes regardless
- * of their origin.
- */
-#define LOGICALREP_ORIGIN_ANY "any"
-
 /* 
  *		pg_subscription definition. cpp turns this into
  *		typedef struct FormData_pg_subscription
@@ -159,6 +139,28 @@ typedef struct Subscription
  * specified origin */
 } Subscription;
 
+#ifdef EXPOSE_TO_CLIENT_CODE
+
+/*
+ * two_phase tri-state values. See comments atop worker.c to know more about
+ * these states.
+ */
+#define LOGICALREP_TWOPHASE_STATE_DISABLED 'd'
+#define LOGICALREP_TWOPHASE_STATE_PENDING 'p'
+#define LOGICALREP_TWOPHASE_STATE_ENABLED 'e'
+
+/*
+ * The subscription will request the publisher to only send changes that do not
+ * have any origin.
+ */
+#define LOGICALREP_ORIGIN_NONE "none"
+
+/*
+ * The subscription will request the publisher to send changes regardless
+ * of their origin.
+ */
+#define LOGICALREP_ORIGIN_ANY "any"
+
 /* Disallow streaming in-progress transactions. */
 #define LOGICALREP_STREAM_OFF 'f'
 
@@ -174,6 +176,8 @@ typedef struct Subscription
  */
 #define LOGICALREP_STREAM_PARALLEL 'p'
 
+#endif			/* EXPOSE_TO_CLIENT_CODE */
+
 extern Subscription *GetSubscription(Oid subid, bool missing_ok);
 extern void FreeSubscription(Subscription *sub);
 extern void DisableSubscription(Oid subid);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index add7f16c90..ec0cdf4ed7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -50,7 +50,7 @@
 #include "catalog/pg_default_acl_d.h"
 #include "catalog/pg_largeobject_d.h"
 #include "catalog/pg_proc_d.h"
-#include "catalog/pg_subscription.h"
+#include "catalog/pg_subscription_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
 #include "common/int.h"
@@ -4968,20 +4968,20 @@ getSubscriptions(Archive *fout)
 	i_oid = PQfnumber(res, "oid");
 	i_subname = PQfnumber(res, "subname");
 	i_subowner = PQfnumber(res, "subowner");
+	i_subenabled = PQfnumber(res, "subenabled");
 	i_subbinary = PQfnumber(res, "subbinary");
 	i_substream = PQfnumber(res, "substream");
 	i_subtwophasestate = PQfnumber(res, "subtwophasestate");
 	i_subdisableonerr = PQfnumber(res, "subdisableonerr");
 	i_subpasswordrequired = PQfnumber(res, "subpasswordrequired");
 	i_subrunasowner = PQfnumber(res, "subrunasowner");
+	i_subfailover = PQfnumber(res, "subfailover");
 	i_subconninfo = PQfnumber(res, "subconninfo");
 	i_subslotname = PQfnumber(res, "subslotname");
 	i_su

Re: Remove useless GROUP BY columns considering unique index

2024-11-28 Thread Andrei Lepikhov


On 11/29/24 09:39, David Rowley wrote:

On Fri, 29 Nov 2024 at 15:02, David Rowley  wrote:

I've attached an updated patch that gets rid of the
get_unique_not_null_attnos() function completely and uses the
RelOptInfo.indexlist and RelOptInfo.notnullattnums.


I forgot to do a local commit before sending v8. Fixed in the attached v9.

1. Thread reference in the patch comment doesn't work.
2. May it be possible to move remove_useless_groupby_columns in the 
second commit? It would be easier to review the differences.


--
regards, Andrei Lepikhov

Re: doc fail about ALTER TABLE ATTACH re. NO INHERIT

2024-11-28 Thread Amit Langote

On Wed, Nov 27, 2024 at 12:53 AM Alvaro Herrera  wrote:
> On 2024-Nov-20, Amit Langote wrote:
> > On Tue, Nov 19, 2024 at 6:41 PM Alvaro Herrera  
> > wrote:
>
> > Here's an example that I think matches the above description, which,
> > ISTM, leads to a state similar to what one would encounter with my
> > patch as described earlier:
> >
> > create table parent (a int not null);
> > create table child (a int, constraint a_not_null_child not null a)
> > inherits (parent);
> > NOTICE:  merging column "a" with inherited definition
> >
> > \d+ child
> >   Table "public.child"
> >  Column |  Type   | Collation | Nullable | Default | Storage |
> > Compression | Stats target | Description
> > +-+---+--+-+-+-+--+-
> >  a  | integer |   | not null | | plain   |
> > |  |
> > Not-null constraints:
> > "a_not_null_child" NOT NULL "a" (local, inherited)
> > Inherits: parent
> > Access method: heap
> >
> > I think the inherited constraint should be named parent_a_not_null,
> > but the constraint that gets stored is the user-specified constraint
> > in `create table child`.
>
> Yeah, the user-specified name taking precedence over using a name from
> inheritance was an explicit decision.  I think this is working as
> intended.  An important point is that pg_dump should preserve any
> constraint name; that's for example why we disallow ALTER TABLE DROP
> CONSTRAINT of an inherited constraint: previously I made that command
> reset the 'islocal' flag of the constraint and otherwise do nothing; but
> the problem is that if we allow that, then the constraint gets the wrong
> name after dump/restore.

Ok, I see.  I didn't think about the pg_dump / restore aspect of this.

> > Not sure, but perhaps the following piece of code of
> > AddRelationNotNullConstraints() should also check that the names
> > match?
> >
> > /*
> >  * Search in the list of inherited constraints for any entries on 
> > the
> >  * same column; determine an inheritance count from that.  Also, if 
> > at
> >  * least one parent has a constraint for this column, then we must 
> > not
> >  * accept a user specification for a NO INHERIT one.  Any constraint
> >  * from parents that we process here is deleted from the list: we no
> >  * longer need to process it in the loop below.
> >  */
>
> I was of two minds about checking that the constraint names match, but
> in the end I decided it wasn't useful and limiting, because you cannot
> have a particular name in the children if you want to.
>
> One thing that distinguishes not-null constraints from check ones is
> that when walking down inheritance trees we match by the column that the
> apply to, rather than by name as check constraints do.  So the fact that
> the names don't match doesn't harm.  That would not fly for check
> constraints.

Yeah, that makes sense.

> > Unless the inherited NOT NULL constraints are not required to have the
> > same name.
>
> Yep.
>
> > > I agree that both types of constraints should behave as similarly as
> > > possible in as many ways as possible.  Behavioral differences are
> > > unlikely to be cherished by users.
> >
> > To be clear, I suppose we agree on continuing to throw an error when
> > trying to define a NO INHERIT CHECK constraint on a leaf partition.
> > That is to match the behavior we currently (as of 14e87ffa5) have for
> > NOT NULL constraints.
>
> Yeah.  I think we should only worry to the extent that these things
> trouble users over what they can and cannot do with tables.  Adding a NO
> INHERIT constraint to a partition, just so that they can continue to
> have the constraint after they detach and that the constraint doesn't
> affect any tables that are added as children ... doesn't seem a very
> important use case.  _But_, for anybody out there that does care about
> such a thing, we might have an ALTER TABLE command to change a
> constraint from NO INHERIT to INHERIT, and perhaps vice-versa.

+1

-- 
Thanks, Amit Langote

Re: Remove useless GROUP BY columns considering unique index

2024-11-28 Thread David Rowley

On Fri, 29 Nov 2024 at 19:36, Andrei Lepikhov  wrote:
> 1. Thread reference in the patch comment doesn't work.
> 2. May it be possible to move remove_useless_groupby_columns in the
> second commit? It would be easier to review the differences.

For #1, I rewrote the commit message.

I've split into two patches for ease of review. See attached.

David


v10-0001-Move-remove_useless_groupby_columns-to-initsplan.patch
Description: Binary data


v10-0002-Remove-redundant-GROUP-BY-columns-using-UNIQUE-i.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-11-28 Thread vignesh C

On Fri, 22 Nov 2024 at 17:43, vignesh C  wrote:
>
> On Thu, 21 Nov 2024 at 17:35, Nisha Moond  wrote:
> >
> > On Wed, Nov 20, 2024 at 1:29 PM vignesh C  wrote:
> > >
> > > On Tue, 19 Nov 2024 at 12:43, Nisha Moond  
> > > wrote:
> > > >
> > > > Attached is the v49 patch set:
> > > > - Fixed the bug reported in [1].
> > > > - Addressed comments in [2] and [3].
> > > >
> > > > I've split the patch into two, implementing the suggested idea in
> > > > comment #5 of [2] separately in 001:
> > > >
> > > > Patch-001: Adds additional error reports (for all invalidation types)
> > > > in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> > > > true.
> > > > Patch-002: The original patch with comments addressed.
> > >
> > > This Assert can fail:
> > >
> >
> > Attached v50 patch-set addressing review comments in [1] and [2].
>
> We are setting inactive_since when the replication slot is released.
> We are marking the slot as inactive only if it has been released.
> However, there's a scenario where the network connection between the
> publisher and subscriber may be lost where the replication slot is not
> released, but no changes are replicated due to the network problem. In
> this case, no updates would occur in the replication slot for a period
> exceeding the replication_slot_inactive_timeout.
> Should we invalidate these replication slots as well, or is it
> intentionally left out?

On further thinking, I felt we can keep the current implementation as
is and simply add a brief comment in the code to address this.
Additionally, we can mention it in the commit message for clarity.

Regards,
Vignesh

Re: CREATE SCHEMA ... CREATE DOMAIN support

2024-11-28 Thread Kirill Reshke

On Thu, 28 Nov 2024 at 10:52, Tom Lane  wrote:

> No, I don't think this should be part of the patch discussed in this
> thread.

Ok, I created a separate thread for this.

How about this one? Do you think the suggested idea is good? Is it
worthwhile to do this, in your opinion?

-- 
Best regards,
Kirill Reshke

Re: Adding OLD/NEW support to RETURNING

2024-11-28 Thread Dean Rasheed

On Tue, 29 Oct 2024 at 13:05, Dean Rasheed  wrote:
>
> Rebased version attached. No other changes.
>

In the light of 7f798aca1d5df290aafad41180baea0ae311b4ee, I guess I
should remove various (void *) casts that this patch adds, copied from
old code. I'll wait a while though, just in case the buildfarm objects
to that other patch.

Regards,
Dean

crash with synchronized_standby_slots

2024-11-28 Thread Alvaro Herrera

Hello,

Gabriele just reported a crash when changing synchronized_standby_slots
under SIGHUP and logging collector working.  The problem apparently is
that validate_sync_standby_slots is run for the GUC check routine, and
it requires acquiring an LWLock; but syslogger doesn't have a PGPROC so
that doesn't work.

Apparently we already have a hack in place against the same problem in
postmaster (and elsewhere?), by testing that ReplicationSlotCtl is not
null.  But that hack seems to be incomplete, as the crash here attests.

To reproduce, simply start with no synchronized_standby_slots setting
and logging_collector=on, then change the value in postgresql.conf and
reload.

One option to fix this might be as attached.  The point here is that
processes without a PGPROC don't need or care to have a correct setting,
so we just skip verifying it on those.  AFAICS this is more general than
the test for ReplicationSlotCtl, so I just removed that one.

Here's the backtrace Gabriele showed me.

[New LWP 29]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: freddie: logger  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at 
./build/../src/include/port/atomics/generic.h:48
Download failed: Invalid argument.  Continuing without source file 
./build/src/backend/storage/lmgr/./build/../src/include/port/atomics/generic.h.
48  ./build/../src/include/port/atomics/generic.h: No such file or 
directory.
#0  pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at 
./build/../src/include/port/atomics/generic.h:48
No locals.
#1  pg_atomic_read_u32 (ptr=0x7f0cf21a1f84) at 
./build/../src/include/port/atomics.h:237
No locals.
#2  LWLockAttemptLock (lock=lock@entry=0x7f0cf21a1f80, 
mode=mode@entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:796
old_state = 
#3  0x557cbe97ba60 in LWLockAcquire (lock=0x7f0cf21a1f80, 
mode=mode@entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:1235
mustwait = 
proc = 0x0
result = true
extraWaits = 0
__func__ = "LWLockAcquire"
#4  0x557cbe935919 in validate_sync_standby_slots (elemlist=0x7ffda5aa60d0, 
rawname=0x557cfd704c98 "_cnpg_freddie_2") at 
./build/../src/backend/replication/slot.c:2443
ok = true
ok = 
name = 
name__outerloop = 
name__state = {l = , i = }
slot = 
#5  check_synchronized_standby_slots (newval=, 
extra=0x7ffda5aa6300, source=) at 
./build/../src/backend/replication/slot.c:2494
rawname = 0x557cfd704c98 "_cnpg_freddie_2"
ptr = 
elemlist = 0x557cfd7041c8
size = 
ok = 
config = 
#6  0x557cbeae95de in call_string_check_hook 
(conf=conf@entry=0x557cbee2f8b0 , 
newval=newval@entry=0x7ffda5aa62f8, extra=extra@entry=0x7ffda5aa6300, 
source=source@entry=PGC_S_FILE, elevel=elevel@entry=12) at 
./build/../src/backend/utils/misc/guc.c:6925
_save_exception_stack = 0x0
_save_context_stack = 0x0
_local_sigjmp_buf = {{__jmpbuf = {93994266851504, -6990900416883569069, 
140727382860536, 12, 93994266851504, 93995316414744, -6990900416982135213, 
-374885626932111}, __mask_was_saved = 0, __saved_mask = {__val = 
{140727382860536, 140727382860272, 93995316305072, 139693771467072, 
93994263427300, 4662694561863172096, 93994263387537, 140727382860536, 
93994263505961, 93994266851504, 140727382860536, 140727382860336, 
93994263421816, 140727382860536, 12, 93994266851504
_do_rethrow = false
result = true
__func__ = "call_string_check_hook"
#7  0x557cbeaec1ba in parse_and_validate_value (record=0x557cbee2f8b0 
, name=0x557cfd765840 "synchronized_standby_slots", 
value=, source=PGC_S_FILE, elevel=12, newval=0x7ffda5aa62f8, 
newextra=0x7ffda5aa6300) at ./build/../src/backend/utils/misc/guc.c:3260
conf = 0x557cbee2f8b0 
__func__ = "parse_and_validate_value"
#8  0x557cbeaecd6f in set_config_with_handle (name=0x557cfd765840 
"synchronized_standby_slots", handle=, value=0x557cfd765868 
"_cnpg_freddie_2", context=PGC_SIGHUP, source=PGC_S_FILE, srole=10, 
action=GUC_ACTION_SET, changeVal=, elevel=12, is_reload=false) 
at ./build/../src/backend/utils/misc/guc.c:4013
conf = 0x557cbee2f8b0 
orig_context = PGC_SIGHUP
orig_source = PGC_S_FILE
orig_srole = 10
record = 0x557cbee2f8b0 
newval_union = {boolval = 24, intval = -42858216, realval = 
4.6439856710502817e-310, stringval = 0x557cfd720918 "_cnpg_freddie_2", enumval 
= -42858216}
newextra = 0x0
prohibitValueChange = false
makeDefault = 
__func__ = "set_config_with_handle"
#9  0x557cbeaef400 in set_config_option (is_reload=false, elevel=0, 
changeVal=true, action=GUC_ACTION_SET, source=PGC_S_FILE, context=PGC_SIGHUP, 
value=, name=) at 
./build/../sr

Re: Add Pipelining support in psql

2024-11-28 Thread Jelte Fennema-Nio

On Thu, 28 Nov 2024 at 07:43, Michael Paquier  wrote:
> Hmm.  The start, end and sync meta-commands are useful for testing.  I
> find the flush control a bit less interesting, TBH.
>
> What would you use these for?

I guess mostly for interactively playing around with pipelining from psql.

But I think \getresult would be useful for testing too. This would
allow us to test that we can read part of the pipeline, without
sending a sync and waiting for everything.

To be clear \flushrequest and \flush would be necessary to make
\getresult work reliably.

Re: Add Pipelining support in psql

2024-11-28 Thread Anthonin Bonnefoy

On Wed, Nov 27, 2024 at 11:46 AM Kirill Reshke  wrote:
> I'm very doubtful about the \syncpipeline . Maybe we should instead
> support \sync meta-command in psql? This will be a useful contribution
> itself.

\syncpipeline is useful to cover regression tests involving implicit
transaction blocks within a pipeline where a sync acts as an implicit
COMMIT. What would be the use of sending a \sync outside of a
pipeline? All extended queries meta-commands sent by psql already send
a sync if you're not in a pipeline, so the only use of a \sync
meta-command would be to send a single sync (which could be achieved
with \startpipeline followed by a \endpipeline).

On Wed, Nov 27, 2024 at 1:54 PM Jelte Fennema-Nio  wrote:
> This behaviour is expected, it also happens if you send "SELECT 1"
> instead of "begin" in the parse message:
> db1=# \startpipeline
> db1=# SELECT 1 \parse p1
> db1=*#
>
> The reason this happens is that the first command in a pipeline sent
> to the server opens an implicit transaction. See the "implicit COMMIT"
> wording here[1], or look at this code in exec_parse_message:

I don't think that's the case here. Nothing was sent to the server so
there's no active transaction yet. From prompt.c, psql will show a '*'
when PQtransactionStatus is either PQTRANS_ACTIVE or PQTRANS_INTRANS.

Looking at the state of the PGConn, after a \startpipeline, we have:
db->xactStatus: (PGTransactionStatusType) PQTRANS_IDLE
db->asyncStatus: (PGAsyncStatusType) PGASYNC_IDLE
db->pipelineStatus: (PGpipelineStatus) PQ_PIPELINE_ON

After the first command is sent to the pipeline:
db->asyncStatus: (PGAsyncStatusType) PGASYNC_BUSY
db->xactStatus: (PGTransactionStatusType) PQTRANS_IDLE
db->pipelineStatus: (PGpipelineStatus) PQ_PIPELINE_ON

The xactStatus is idle, however, since asyncStatus reports busy, we
fall in the "if (conn->asyncStatus != PGASYNC_IDLE) return
PQTRANS_ACTIVE;" and psql display '*' in the prompt. This might be a
bit misleading since there's no ongoing transaction block on the
server and maybe it's worth having a new state? I've updated the patch
to display a tentative '|' when there's an ongoing pipeline. On the
other hand, a pipeline will start an implicit transaction block even
without BEGIN so leaving the '*' may be a good way to reflect that?

On Wed, Nov 27, 2024 at 1:45 PM Jelte Fennema-Nio  wrote:
> I played around quickly with this patch and it works quite well. A few
> things that would be nice improvements I think. Feel free to change
> the command names:
> 1. Add a \flush command that calls PQflush
> 2. Add a \flushrequest command that calls PQsendFlushRequest
> 3. Add a \getresult command so you can get a result from a query
> without having to close the pipeline
>
> To be clear, not having those additional commands isn't a blocker for
> this patch imho, but I'd definitely miss them if they weren't there
> when I would be using it.

I will need a bit of time to check those and how they can be included
in the regression tests.

v02-0001-Add-pipelining-support-in-psql.patch
Description: Binary data

Pass ParseState as down to utility functions.

2024-11-28 Thread Kirill Reshke

Hi hackers!
PFA patch fixing a number of places where typenameType called with NULL pstate.

=== motivation.

Per discussion in a nearby thread for `CREATE SCHEMA ... CREATE DOMAIN
support`. Suggested by Jian He & Tom Lane.

On Thu, 28 Nov 2024 at 10:52, Tom Lane  wrote:
>
> Kirill Reshke  writes:
> > On Wed, 27 Nov 2024 at 23:39, Tom Lane  wrote:
> >> We've fixed a few utility statements so that they can receive
> >> a passed-down ParseState, but not DefineDomain.
>
> > PFA as an independent patch then. Or should we combine these two into one?
>
> No, I don't think this should be part of the patch discussed in this
> thread.
>
> It feels rather random to me to be fixing only DefineDomain;
> I'm sure there's more in the same vein.  I'd like to see a
> patch with a scope along the lines of "fix everything reachable
> within CREATE SCHEMA" or perhaps "fix all calls of typenameType".
> (A quick grep shows that an outright majority of the callers of that
> are passing null ParseState.  I didn't look to see if any of them
> have a good excuse beyond "we didn't do the plumbing work".)
>
> regards, tom lane


I chosed "fix all calls of typenameType" way.

I searched for typenameType(NULL pattern within sources and changed to
pass pstate where appropriate. This is AlterType, DefineDomain and
transformOfType cases. There are 2 more usages of this pattern left,
inside ATExecAlterColumnType & ATExecAddOf, which I dont think need to
be addressed (cure worse than disease).

=== examples
1) CREATE TYPE.
before:

```
db2=# create type int8alias3 (
input = int8alias2in,
output = int8alias2out,
like = int82
);
ERROR:  type "int82" does not exist
db2=# ^C
```

after:

```
db2=# create type int8alias3 (
input = int8alias2in,
output = int8alias2out,
like = int82
);
ERROR:  type "int82" does not exist
LINE 4: like = int82
   ^
db2=#
```

2) TABLE of TYPENAME case

before:

```
db2=# CREATE TABLE example OF mytype2 (PRIMARY KEY (some_id));
ERROR:  type "mytype2" does not exist
db2=#

```

after:

```
db2=# CREATE TABLE example OF mytype2 (PRIMARY KEY (some_id));
ERROR:  type "mytype2" does not exist
LINE 1: CREATE TABLE example OF mytype2 (PRIMARY KEY (some_id));
^
```

3) CREATE DOMAIN - analogous.

 testing

By-hand. Let me know if we can check this any other way.


-- 
Best regards,
Kirill Reshke


v2-0001-Pass-ParseState-as-down-to-utility-functions.patch
Description: Binary data

Re: Virtual generated columns

2024-11-28 Thread Peter Eisentraut


On 12.11.24 17:08, Peter Eisentraut wrote:

On 11.11.24 12:37, jian he wrote:
On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut 
 wrote:


New patch version.  I've gone through the whole thread again and looked
at all the feedback and various bug reports and test cases and made sure
they are all addressed in the latest patch version.  (I'll send some
separate messages to respond to some individual messages, but I'm
keeping the latest patch here.)


just quickly note the not good error message before you rebase.

src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS 
(2) ;

ERROR:  unrecognized constraint subtype: 4
src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
(2) stored;
ERROR:  unrecognized constraint subtype: 4
src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
(2) virtual;
ERROR:  unrecognized constraint subtype: 4

reading gram.y, typedef struct Constraint seems cannot distinguish, we
are creating a domain or create table.
I cannot found a way to error out in gram.y.

so we have to error out at DefineDomain.


This appears to be a very old problem independent of this patch.  I'll 
take a look at fixing it.


Here is a patch.

I'm on the fence about taking out the default case.  It does catch the 
missing enum values, and I suppose if the struct arrives in 
DefineDomain() with a corrupted contype value that is none of the enum 
values, then we'd just do nothing with it.  Maybe go ahead with this, 
but for backpatching leave the default case in place?


From 09f9484f0d2990b4b9be8d2a8b907c134d6d5ee7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 28 Nov 2024 10:23:26 +0100
Subject: [PATCH] Fix handling of CREATE DOMAIN with GENERATED constraint
 syntax

Stuff like

CREATE DOMAIN foo AS int CONSTRAINT cc GENERATED ALWAYS AS (2) STORED

is not supported for domains, but the parser allows it, because it's
the same syntax as for table constraints.  But CreateDomain() did not
explicitly handle all ConstrType values, so the above would get an
internal error like

ERROR:  unrecognized constraint subtype: 4

Fix that by providing a user-facing error message for all ConstrType
values.  Also, remove the switch default case, so future additions to
ConstrType are caught.

Reported-by: Jian He 
Discussion: 
https://www.postgresql.org/message-id/CACJufxF8fmM=dbm4pdfuv_nkgz2-no0k4yifhrf3-rjxtwj...@mail.gmail.com
---
 src/backend/commands/typecmds.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 859e2191f08..130741e777e 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -1011,10 +1011,14 @@ DefineDomain(CreateDomainStmt *stmt)
 errmsg("specifying constraint 
deferrability not supported for domains")));
break;
 
-   default:
-   elog(ERROR, "unrecognized constraint subtype: 
%d",
-(int) constr->contype);
+   case CONSTR_GENERATED:
+   case CONSTR_IDENTITY:
+   ereport(ERROR,
+   
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("specifying GENERATED 
not supported for domains")));
break;
+
+   /* no default, to let compiler warn about 
missing case */
}
}
 
-- 
2.47.0

Re: More CppAsString2() in psql's describe.c

2024-11-28 Thread Daniel Gustafsson

> On 28 Nov 2024, at 07:34, Michael Paquier  wrote:
> 
> On Tue, Oct 22, 2024 at 09:49:10AM +0900, Michael Paquier wrote:
>> Note that there were a couple of value checks not part of the queries
>> that relied on values from the catalogs for some relpersistences and
>> replidents.  I've fixed them while on it.
>> 
>> Thoughts or comments are welcome.
> 
> So, this one has been sitting in the CF for a couple of weeks now.
> I've looked at it again and the queries are written the same.  There
> was one inconsistency with the ordering of the headers and one
> indentation issue with a query for extended stats.
> 
> Any objections against applying it?  This is the last area of the code
> where we rely on such hardcoded values rather than CppAsString2().
> Note also the pg_am.h inclusion which is incorrect.

LGTM, I didn't scan for omissions but the ones in the patch look right to me.
I sort of wish we had a shorter macro as CppAsString2() get's pretty verbose
when used frequently like this.

--
Daniel Gustafsson

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-11-28 Thread Junwang Zhao

On Thu, Nov 28, 2024 at 2:16 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Wed, 27 Nov 2024 19:49:17 +0800,
>   Junwang Zhao  wrote:
>
> > I just gave this another round of benchmarking tests. I'd like to
> > share the number,
> > since COPY TO has some performance drawbacks, I test only COPY TO. I
> > use the run.sh Tomas provided earlier but use pgbench with a custom script, 
> > you
> > can find it here[0].
> >
> > I tested 3 branches:
> >
> > 1. the master branch
> > 2. all v26 patch sets applied
> > 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some
> > if branch in CopyOneRowTo, so I was expecting this slower than master
> >
> > 2 can be about -3%~+3% compared to 1, but what surprised me is that 3
> > is always better than 1 & 2.
> >
> > I reviewed the patch set of 3 and I don't see any magic.
> >
> > You can see the detailed results here[2], I can not upload files so I
> > just shared the google doc link, ping me if you can not open the link.
> >
> > [0]: https://github.com/pghacking/scripts/tree/main/extensible_copy
> > [1]: 
> > https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com
> > [2]: 
> > https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing
>
> Thanks for sharing your numbers.
>
> 1. and 2. shows that there is at least no significant
> performance regression.

Agreed.

>
> I see the patch set of 3. and I think that the result
> (there is no performance difference between 1. and 3.) isn't
> strange. The patch set adds some if branches but they aren't
> used with "text" format at least in per row process.

It is not used in "text" format, but it adds some assembly code
to the CopyOneRowTo function, so this will have some impact
on the cpu i cache I guess.

There is difference between 1 and 3, 3 is always better than 1
upto 4% improvement, I forgot to mention that the comparisons
are in *sheet2*.

>
> Thanks,
> --
> kou



-- 
Regards
Junwang Zhao

Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

2024-11-28 Thread Amit Kapila

On Thu, Nov 21, 2024 at 5:30 PM Shlok Kyal  wrote:
>

Review comments:
===
1.
+
+ /*
+ * true if all generated columns which are part of replica identity are
+ * published or the publication actions do not include UPDATE or DELETE.
+ */
+ bool replident_valid_for_update;
+ bool replident_valid_for_delete;

These are too generic names for the purpose they are used. How about
instead name them as gencols_valid_for_update and
gencols_valid_for_delete?

2. The comments atop RelationBuildPublicationDesc() is only about row
filter. We should update it for column list and generated columns as
well.

3. It is better to merge the functionality of the invalid column list
and unpublished generated columns as proposed by Hou-San above.



-- 
With Regards,
Amit Kapila.

Re: [Feature Request] Schema Aliases and Versioned Schemas

2024-11-28 Thread Ashutosh Bapat

On Wed, Nov 27, 2024 at 8:45 PM AbdelAziz Sharaf
 wrote:
>
> Dear PostgreSQL Development Team,
>
> I’d like to propose a new feature for consideration: schema aliases and 
> versions
>
> **Problem Statement:**
> For migrating old db to new one, one must use an external tool or define a 
> dedicated migration script where all possible issues could arise
>
> **Proposed Solution:**
> there is two ways I may think about
> - versioned schemas : where every version act as a separate schema and the 
> `latest` one or the one the program request is the one in use and each new 
> schema could inherit a table, index, view, ... without additional data
> - aliases : where every new schema is defined and migrated separately then an 
> alias is set for the one in use

Isn't this same as adding the required schema name in the search_path?

-- 
Best Wishes,
Ashutosh Bapat

Re: tab_complete for copy(merge

2024-11-28 Thread Peter Eisentraut


On 19.11.24 14:23, jian he wrote:

On Tue, Nov 19, 2024 at 5:20 PM Kirill Reshke  wrote:

On Tue, 19 Nov 2024, 14:08 jian he,  wrote:

in v17, we support COPY(MERGE RETURNING)
we can add tab_complete for it.



Sounds sane


 /* Complete COPY ( with legal query commands */
 else if (Matches("COPY|\\copy", "("))
-   COMPLETE_WITH("SELECT", "TABLE", "VALUES", "INSERT
INTO", "UPDATE", "DELETE FROM", "WITH");
+   COMPLETE_WITH("SELECT", "TABLE", "VALUES", "INSERT
INTO", "UPDATE", "DELETE FROM", "WITH", "MERGE INTO");


MERGE INTO?



per Synopsis
https://www.postgresql.org/docs/current/sql-merge.html
it should be "MERGE INTO".


Yes, we also complete "MERGE INTO" in other contexts.

Committed like that.

Re: Difference in dump from original and restored database due to NOT NULL constraints on children

2024-11-28 Thread Ashutosh Bapat

On Wed, Nov 27, 2024 at 7:04 PM Alvaro Herrera  wrote:
>
> On 2024-Nov-27, Ashutosh Bapat wrote:
>
> > I noticed that. But two reasons why I chose the backend changes
> > 1. The comment where we add explicit ADD CONSTRAINT is
> > /*
> > * Dump additional per-column properties that we can't handle in the
> > * main CREATE TABLE command.
> > */
> > ... snip
> >
> > /*
> > * If we didn't dump the column definition explicitly above, and
> > * it is not-null and did not inherit that property from a parent,
> > * we have to mark it separately.
> > */
> > if (!shouldPrintColumn(dopt, tbinfo, j) &&
> > tbinfo->notnull_constrs[j] != NULL &&
> > (tbinfo->notnull_islocal[j] && !tbinfo->ispartition && 
> > !dopt->binary_upgrade))
> > ... snip
> >
> > The comment seems to say that we can not handle the NOT NULL
> > constraint property in the CREATE TABLE command. Don't know why. We
> > add CHECK constraints separately in CREATE TABLE even if we didn't add
> > corresponding columns in CREATE TABLE. So there must be a reason not
> > to dump NOT NULL constraints that way and hence we required separate
> > code like above. I am afraid going that direction will show us some
> > other problems.
>
> I don't think this is an important restriction.  We can change that, as
> long as all cases work correctly.  We previously didn't try to use
> "CONSTRAINT foobar NOT NULL a" because 1) we didn't support the
> table-constraint syntax for not-null constraint and 2) not-null
> constraint didn't support names anyway.  We now support that syntax, so
> we can use it.
>

Ok. Here's the patch implementing the same. As you said, it's a much
simpler patch. The test being developed in [1] passes with this
change. pg_dump and pg_upgrade test suites also pass.

[1] 
https://www.postgresql.org/message-id/flat/CAExHW5uvx2LEyrUBdctV5gS25Zeb%2B-eXESkK93siQxWSjYFy6A%40mail.gmail.com#c8ed57b77d2f6132d5b8e1ecb2a8c47b

Adding this to CF for CI run.

-- 
Best Wishes,
Ashutosh Bapat
From 92be7b6f6306045f0cf84dad2625e5fd9cb16101 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 28 Nov 2024 16:21:42 +0530
Subject: [PATCH] Dumping local and inherited NOT NULL constraints on non-local
 columns

A child table is dumped as CREATE TABLE ... INHERITS. If there is any
local NOT NULL constraint on a column not included in CREATE TABLE
command, it is printed separately as an ALTER TABLE ... command.  When
restored this constraint gets the name of the corresponding parent
constraint since CREATE TABLE is executed first and constraint name, if
mentioned, in ALTER TABLE command is ignored. We end up losing the name
of child constraint in the restored database. Instead dump them as part
of the CREATE TABLE command itself so that their given or default name
is preserved.

Author: Ashutosh Bapat
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5tbdgAKDfqjDJ-7Fk6PJtHg8D4zUF6FQ4H2Pq8zK38Nyw%40mail.gmail.com
---
 src/bin/pg_dump/pg_dump.c | 45 +++
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index add7f16c902..2a4ff592fab 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16221,6 +16221,28 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 			  fmtQualifiedDumpable(coll));
 	}
 }
+/* Add local NOT NULL constraint on columns not printed above. */
+else if (!tbinfo->attisdropped[j] &&
+		 tbinfo->notnull_constrs[j] != NULL &&
+		 tbinfo->notnull_islocal[j])
+{
+	/* Format properly if not first attr */
+	if (actual_atts == 0)
+		appendPQExpBufferStr(q, " (");
+	else
+		appendPQExpBufferChar(q, ',');
+	appendPQExpBufferStr(q, "\n");
+	actual_atts++;
+
+	if (tbinfo->notnull_constrs[j][0] == '\0')
+		appendPQExpBuffer(q, "NOT NULL %s",
+		  fmtId(tbinfo->attnames[j]));
+	else
+		appendPQExpBuffer(q, "CONSTRAINT %s NOT NULL %s",
+		  tbinfo->notnull_constrs[j],
+		  fmtId(tbinfo->attnames[j]));
+
+}
 			}
 
 			/*
@@ -16662,29 +16684,6 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 			if (tbinfo->attisdropped[j])
 continue;
 
-			/*
-			 * If we didn't dump the column definition explicitly above, and
-			 * it is not-null and did not inherit that property from a parent,
-			 * we have to mark it separately.
-			 */
-			if (!shouldPrintColumn(dopt, tbinfo, j) &&
-tbinfo->notnull_constrs[j] != NULL &&
-(tbinfo->notnull_islocal[j] && !tbinfo->ispartition && !dopt->binary_upgrade))
-			{
-/* No constraint name desired? */
-if (tbinfo->notnull_constrs[j][0] == '\0')
-	appendPQExpBuffer(q,
-	  "ALTER %sTABLE ONLY %s ALTER COLUMN %s SET NOT NULL;\n",
-	  foreign, qualrelname,
-	  fmtId(tbinfo->attnames[j]));
-else
-	appendPQExpBuffer(q,
-	  "ALTER %sTABLE ONLY %s ADD CONSTRAINT %s NOT NULL %s;\n",
-	  fore

Re: Difference in dump from original and restored database due to NOT NULL constraints on children

2024-11-28 Thread Ashutosh Bapat

On Thu, Nov 28, 2024 at 4:44 PM Ashutosh Bapat
 wrote:
>
> On Wed, Nov 27, 2024 at 7:04 PM Alvaro Herrera  
> wrote:
> >
> > On 2024-Nov-27, Ashutosh Bapat wrote:
> >
> > > I noticed that. But two reasons why I chose the backend changes
> > > 1. The comment where we add explicit ADD CONSTRAINT is
> > > /*
> > > * Dump additional per-column properties that we can't handle in the
> > > * main CREATE TABLE command.
> > > */
> > > ... snip
> > >
> > > /*
> > > * If we didn't dump the column definition explicitly above, and
> > > * it is not-null and did not inherit that property from a parent,
> > > * we have to mark it separately.
> > > */
> > > if (!shouldPrintColumn(dopt, tbinfo, j) &&
> > > tbinfo->notnull_constrs[j] != NULL &&
> > > (tbinfo->notnull_islocal[j] && !tbinfo->ispartition && 
> > > !dopt->binary_upgrade))
> > > ... snip
> > >
> > > The comment seems to say that we can not handle the NOT NULL
> > > constraint property in the CREATE TABLE command. Don't know why. We
> > > add CHECK constraints separately in CREATE TABLE even if we didn't add
> > > corresponding columns in CREATE TABLE. So there must be a reason not
> > > to dump NOT NULL constraints that way and hence we required separate
> > > code like above. I am afraid going that direction will show us some
> > > other problems.
> >
> > I don't think this is an important restriction.  We can change that, as
> > long as all cases work correctly.  We previously didn't try to use
> > "CONSTRAINT foobar NOT NULL a" because 1) we didn't support the
> > table-constraint syntax for not-null constraint and 2) not-null
> > constraint didn't support names anyway.  We now support that syntax, so
> > we can use it.
> >
>
> Ok. Here's the patch implementing the same. As you said, it's a much
> simpler patch. The test being developed in [1] passes with this
> change. pg_dump and pg_upgrade test suites also pass.
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAExHW5uvx2LEyrUBdctV5gS25Zeb%2B-eXESkK93siQxWSjYFy6A%40mail.gmail.com#c8ed57b77d2f6132d5b8e1ecb2a8c47b
>
> Adding this to CF for CI run.

CF entry: https://commitfest.postgresql.org/51/5408/

-- 
Best Wishes,
Ashutosh Bapat

Re: [PATCH] kNN for btree

2024-11-28 Thread Kirill Reshke

On Wed, 31 Jul 2024 at 09:46, Anton A. Melnikov
 wrote:
>
> Hi!
>
> Rebased existing patch on current master to have an actual working version.
> There is an inconsistency with commit 5bf748b86.
>
> Reproduction:
> CREATE TABLE test (a int4);
> INSERT INTO test VALUES (2), (3);
> CREATE INDEX test_idx ON test USING btree(a);
> SET enable_seqscan = OFF;
> SELECT * FROM test WHERE a IN (2, 3) ORDER BY a <-> 5;
>
> Actual result:
> postgres=# SELECT * FROM test WHERE a IN (2, 3) ORDER BY a <-> 5;
>   a
> ---
>   3
> (1 row)
>
> Correct expected result:
> postgres=# SELECT * FROM test WHERE a IN (2, 3) ORDER BY a <-> 5;
>   a
> ---
>   3
>   2
> (2 rows)
>
> Reported an issue in the thread corresponding to commit 5bf748b86.
>
> With the best regards,
>
>
> --
> Anton A. Melnikov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
Hi!
Given little activity here, there is a little chance of being
committed in the current commitfest, so I moved to the next.

I will try to take a look at v19 soon.


-- 
Best regards,
Kirill Reshke

Re: Remove useless GROUP BY columns considering unique index

2024-11-28 Thread jian he

On Thu, Nov 28, 2024 at 2:11 PM David Rowley  wrote:
>

> I think it'll make more sense to adjust some of the
> existing tests to use a unique constraint instead of a PK and then
> adjust a column's NOT NULL property to check that part of the code is
> working correctly.
>
looking around,  i inserted some tests to indexing.sql,
create_index.sql, aggregates.sql
also added tests for sort by index oid.

> Another issue with this is that the list of indexes being used is not
> sorted by Oid.  If you look at RelationGetIndexList() you'll see that
> we perform a sort. That gives us more consistency in the planner. I
> think this patch should be doing that too, otherwise, you could end up
> with a plan change after some operation that changes the order that
> the indexes are stored in the pg_index table. It's probably fairly
> unlikely, but it is the sort of issue that someone will eventually
> discover and report.
>

Thanks for the tip!
I have wondered about multiple matches, simply choosing the first one
may have some surprise results.
i didn't know that in some cases we use list_sort to make the result
more deterministic.
When there are multiple matches, we need a determined way  to choose which one.
so please check attached.
From ad5ceaa641fcf719815779da5a86d8b0cf7fd70d Mon Sep 17 00:00:00 2001
From: jian he 
Date: Thu, 28 Nov 2024 20:28:10 +0800
Subject: [PATCH v6 1/1] remove useless group by columns via unique-not-null

In the `remove_useless_groupby_columns` function,
use a unique index and a not-null constraint to eliminate unnecessary columns.
Here, the primary key is treated as a "unique index and not-null".
If there are multiple candidates,
use the unique index with the fewest columns to remove the other columns
in groupby expression.

discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0@Spar
---
 src/backend/catalog/index.c| 147 +
 src/backend/optimizer/plan/planner.c   |  53 ++--
 src/include/catalog/index.h|   1 +
 src/test/regress/expected/aggregates.out   |  59 -
 src/test/regress/expected/create_index.out |  11 +-
 src/test/regress/expected/indexing.out |  11 ++
 src/test/regress/sql/aggregates.sql|  26 +++-
 src/test/regress/sql/create_index.sql  |   4 +-
 src/test/regress/sql/indexing.sql  |   3 +
 9 files changed, 300 insertions(+), 15 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f9bb721c5f..947dec0a2f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -56,6 +56,7 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
+#include "common/int.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -97,6 +98,14 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+/* Struct for sorting list of Bitmapsets in get_unique_not_null_attnos */
+typedef struct
+{
+	Bitmapset	*unique_notnull_attnos;	/* Bitmapsets contains attribute number
+			that are unique and not null*/
+	Oid			indexoid;/*the unique index oid */
+} unique_nn_info;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -131,7 +140,19 @@ static void SetReindexProcessing(Oid heapOid, Oid indexOid);
 static void ResetReindexProcessing(void);
 static void SetReindexPending(List *indexes);
 static void RemoveReindexPending(Oid indexOid);
+static int unique_notnull_sort_by_idxoid(const void *a, const void *b);
 
+/*
+ * Comparison function for sorting unique_nn_info in indexoid order.
+ */
+static int
+unique_notnull_sort_by_idxoid(const void *a, const void *b)
+{
+	const unique_nn_info *unique_a = (const unique_nn_info *) a;
+	const unique_nn_info *unique_b = (const unique_nn_info *) b;
+
+	return pg_cmp_u32(unique_a->indexoid, unique_b->indexoid);
+}
 
 /*
  * relationHasPrimaryKey
@@ -4257,3 +4278,129 @@ RestoreReindexState(const void *reindexstate)
 	/* Note the worker has its own transaction nesting level */
 	reindexingNestLevel = GetCurrentTransactionNestLevel();
 }
+
+/*
+ * get_unique_not_null_attnos
+ *		Returns a List of Bitmapsets with the attribute numbers (offset by
+ *		FirstLowInvalidHeapAttributeNumber) of all valid non-partial unique
+ *		indexes which contain only non-nullable columns.
+ */
+List *
+get_unique_not_null_attnos(Oid relid)
+{
+	Bitmapset	*not_null_attnos = NULL;
+	Relation	pg_index;
+	int			j = 0;
+	int			cnt = 0;
+	HeapTuple	indexTuple;
+	SysScanDesc scan;
+	ScanKeyData skey;
+	List 		*not_null_cs;
+	List	   *result = NIL;
+	List	   *indlist = NIL;
+	List	   *indexOidlist = NIL;
+	unique_nn_info *info;
+	ListCell   *lc,
+*lc2;
+
+	/* Use CookedConstraint to fetch not-null constraint. */
+	not_null_cs = RelationGetNotNullConstraints(relid, true, false);
+	if (not_null_cs == NIL)
+		return

Re: Make pg_stat_io view count IOs as bytes instead of blocks

2024-11-28 Thread Bertrand Drouvot

Hi,

On Wed, Nov 27, 2024 at 11:08:01AM -0500, Melanie Plageman wrote:
> On Wed, Sep 11, 2024 at 7:19 AM Nazir Bilal Yavuz  wrote:
> >
> > Currently, in the pg_stat_io view, IOs are counted as blocks. However, 
> > there are two issues with this approach:
> >
> > 1- The actual number of IO requests to the kernel is lower because IO 
> > requests can be merged before sending the final request. Additionally, it 
> > appears that all IOs are counted in block size.
> 
> I think this is a great idea. It will allow people to tune
> io_combine_limit as you mention below.
> 
> > 2- Some IOs may not align with block size. For example, WAL read IOs are 
> > done in variable bytes and it is not possible to correctly show these IOs 
> > in the pg_stat_io view [1].
> 
> Yep, this makes a lot of sense as a solution.

Thanks for the patch! I also think it makes sense.

A few random comments:

=== 1

+   /*
+* If IO done in bytes and byte is <= 0, this means there is an error
+* while doing an IO. Don't count these IOs.
+*/

s/byte/bytes/?

also:

The pgstat_io_count_checks() parameter is uint64. Does it mean it has to be
changed to int64?

Also from what I can see the calls are done with those values:

- 0
- io_buffers_len * BLCKSZ
- extend_by * BLCKSZ
- BLCKSZ

could io_buffers_len and extend_by be < 0? If not, is the comment correct?

=== 2

+   Assert((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == 
IOOP_EXTEND

and

+   if ((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == IOOP_EXTEND) 
&&

What about ordering the enum in IOOp (no bytes/bytes) so that we could check
that io_op >= "our firt bytes enum" instead?

Also we could create a macro on top of that to make it clear. And a comment
would be needed around the IOOp definition.

I think that would be simpler to maintain should we add no bytes or bytes op in
the future.

=== 3

+pgstat_io_count_checks(IOObject io_object, IOContext io_context, IOOp io_op, 
uint64 bytes)
+{
+   Assert((unsigned int) io_object < IOOBJECT_NUM_TYPES);
+   Assert((unsigned int) io_context < IOCONTEXT_NUM_TYPES);
+   Assert((unsigned int) io_op < IOOP_NUM_TYPES);
+   Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, 
io_op));

IOObject and IOContext are passed only for the assertions. What about removing
them from there and put the asserts in other places?

=== 4

+   /* Only IOOP_READ, IOOP_WRITE and IOOP_EXTEND can do IO in bytes. */

Not sure about "can do IO in bytes" (same wording is used in multiple places).

=== 5

/* Convert to numeric. */

"convert to numeric"? to be consistent with others single line comments around.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Added prosupport function for estimating numeric generate_series rows

2024-11-28 Thread David Rowley

On Fri, 29 Nov 2024 at 06:25, 孤傲小二~阿沐  wrote:
> Hello, thank you very much for your attention and guidance. I have modified 
> and improved the problem you mentioned. The patch of version v2 is attached 
> below.

I've only had a quick look at the patch.

* The following needs a DatumGetFloat8():

+ req->rows = numericvar_to_double_no_overflow(&q) + 1;

* It would also be good to see a comment explaining the following line:

+ if(nstep.sign != var_diff.sign)

Something like: /* When the sign of the step size and the series range
don't match, there are no rows in the series. */

* You should add one test for the generate_series(numeric, numeric).
You've only got tests for the 3 arg version.

Also a few minor things:

* Missing space after "if"

+ if(arg3)

* We always have an empty line after variable declarations, that's missing in:

+ NumericVar q;
+ init_var(&q);

* No need for the braces in:

+ if (NUMERIC_IS_SPECIAL(step))
+ {
+ goto cleanup;
+ }

David

Re: Enhancing Memory Context Statistics Reporting

2024-11-28 Thread Rahila Syed

Hi Tomas,

Thank you for the review.

>
>
> 1) I read through the thread, and in general I agree with the reasoning
> for removing the file part - it seems perfectly fine to just dump as
> much as we can fit into a buffer, and then summarize the rest. But do we
> need to invent a "new" limit here? The other places logging memory
> contexts do something like this:
>
>MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
>
> Which means we only print the 100 memory contexts at the top, and that's
> it. Wouldn't that give us a reasonable memory limit too?
>
> I think this prints more than 100 memory contexts, since 100 denotes the
max_level
and contexts at each level could have upto 100 children. This limit seems
much higher than
what I am currently storing in DSA which is approx. 7000 contexts.  I will
verify this again.


> 2) I see the function got renamed to pg_get_process_memory_contexts(),
> bu the comment still says pg_get_remote_backend_memory_contexts().
>
> Fixed

>
> 3) I don't see any SGML docs for this new function. I was a bit unsure
> what the "summary" argument is meant to do. The comment does not explain
> that either.
>
> Added docs.
Intention behind adding a summary argument is to report statistics of
contexts at level 0
and 1 i.e TopMemoryContext and its immediate children.

4) I wonder if the function needs to return PID. I mean, the caller
> knows which PID it is for, so it seems rather unnecessary.
>
> Perhaps it can be used to ascertain that the information indeed belongs to
the requested pid.

5) In the "summary" mode, it might be useful to include info about how
> many child contexts were aggregated. It's useful to know whether there
> was 1 child or 1 children. In the regular (non-summary) mode it'd
> always be "1", probably, but maybe it'd interact with the limit in (1).
> Not sure.
>
> Sure,  I will add this in the next iteration.

>
> 6) I feel a bit uneasy about the whole locking / communication scheme.
> In particular, I'm worried about lockups, deadlocks, etc. So I decided
> to do a trivial stress-test - just run the new function through pgbench
> with many clients.
>
> The memstats.sql script does just this:
>
>   SELECT * FROM pg_get_process_memory_contexts(
> (SELECT pid FROM pg_stat_activity
>   WHERE pid != pg_backend_pid()
>   ORDER BY random() LIMIT 1)
> , false);
>
> where the inner query just picks a PID for some other backend, and asks
> for memory context stats for that.
>
> And just run it like this on a scale 1 pgbench database:
>
>   pgbench -n -f memstats.sql -c 10 test
>
> And it gets stuck *immediately*. I've seen it to wait for other client
> backends and auxiliary processes like autovacuum launcher.
>
> This is absolutely idle system, there's no reason why a process would
> not respond almost immediately.


In my reproduction, this issue occurred because the process was terminated
while the requesting backend was waiting on the condition variable to be
signaled by it. I don’t see any solution other than having the waiting
client
backend timeout using ConditionVariableTimedSleep.

In the patch, since the timeout was set to a high value, pgbench ended up
stuck
waiting for the timeout to occur. The failure happens less frequently after
I added an
additional check for the process's existence, but it cannot be entirely
avoided. This is because a process can terminate after we check for its
existence but
before it signals the client. In such cases, the client will not receive
any signal.

I wonder if e.g. autovacuum launcher may
> not be handling these requests, or what if client backends can wait in a
> cycle.


Did not see a cyclic wait in client backends due to the pgbench stress test.


>
> 7) I've also seen this error:
>
>   pgbench: error: client 6 script 0 aborted in command 0 query 0: \
>   ERROR:  can't attach the same segment more than once
>
I haven't investigated it, but it seems like a problem handling errors,
> where we fail to detach from a segment after a timeout.


Thanks for the hint, fixed by adding a missing call to dsa_detach after
timeout.


>
>   > I opted for DSAs over DSMs to enable memory reuse by freeing
>   > segments for subsequent statistics copies of the same backend,
>   > without needing to recreate DSMs for each request.
>
> I feel like this might be a premature optimization - I don't have a
> clear idea how expensive it is to create DSM per request, but my
> intuition is that it's cheaper than processing the contexts and
> generating the info.
>
> I'd just remove that, unless someone demonstrates it really matters. I
> don't really worry about how expensive it is to process a request
> (within reason, of course) - it will happen only very rarely. It's more
> important to make sure there's no overhead when no one asks the backend
> for memory context info, and simplicity.
>
> Also, how expensive it is to just keep the DSA "just in case"? Imagine
> someone asks for the memory context info once - isn't

Re: Enhancing Memory Context Statistics Reporting

2024-11-28 Thread Tomas Vondra

On 11/29/24 00:23, Rahila Syed wrote:
> Hi Tomas,
> 
> Thank you for the review.
> 
> 
> 
> 1) I read through the thread, and in general I agree with the reasoning
> for removing the file part - it seems perfectly fine to just dump as
> much as we can fit into a buffer, and then summarize the rest. But do we
> need to invent a "new" limit here? The other places logging memory
> contexts do something like this:
> 
>    MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
> 
> Which means we only print the 100 memory contexts at the top, and that's
> it. Wouldn't that give us a reasonable memory limit too?
> 
> I think this prints more than 100 memory contexts, since 100 denotes the
> max_level
> and contexts at each level could have upto 100 children. This limit
> seems much higher than
> what I am currently storing in DSA which is approx. 7000 contexts.  I
> will verify this again.
>  

Yeah, you may be right. I don't remember what exactly that limit does.

> 
> 2) I see the function got renamed to pg_get_process_memory_contexts(),
> bu the comment still says pg_get_remote_backend_memory_contexts().
> 
> Fixed 
> 
> 
> 3) I don't see any SGML docs for this new function. I was a bit unsure
> what the "summary" argument is meant to do. The comment does not explain
> that either.
> 
> Added docs. 
> Intention behind adding a summary argument is to report statistics of
> contexts at level 0 
> and 1 i.e TopMemoryContext and its immediate children. 
> 

OK

> 4) I wonder if the function needs to return PID. I mean, the caller
> knows which PID it is for, so it seems rather unnecessary.
> 
> Perhaps it can be used to ascertain that the information indeed belongs to 
> the requested pid.
> 

I find that a bit ... suspicious. By this logic we'd include the input
parameters in every result, but we don't. So why is this case different?

> 5) In the "summary" mode, it might be useful to include info about how
> many child contexts were aggregated. It's useful to know whether there
> was 1 child or 1 children. In the regular (non-summary) mode it'd
> always be "1", probably, but maybe it'd interact with the limit in (1).
> Not sure.
> 
> Sure,  I will add this in the next iteration. 
> 

OK

> 
> 6) I feel a bit uneasy about the whole locking / communication scheme.
> In particular, I'm worried about lockups, deadlocks, etc. So I decided
> to do a trivial stress-test - just run the new function through pgbench
> with many clients.
> 
> The memstats.sql script does just this:
> 
>   SELECT * FROM pg_get_process_memory_contexts(
>     (SELECT pid FROM pg_stat_activity
>       WHERE pid != pg_backend_pid()
>       ORDER BY random() LIMIT 1)
>     , false);
> 
> where the inner query just picks a PID for some other backend, and asks
> for memory context stats for that.
> 
> And just run it like this on a scale 1 pgbench database:
> 
>   pgbench -n -f memstats.sql -c 10 test
> 
> And it gets stuck *immediately*. I've seen it to wait for other client
> backends and auxiliary processes like autovacuum launcher.
> 
> This is absolutely idle system, there's no reason why a process would
> not respond almost immediately.
> 
>  
> In my reproduction, this issue occurred because the process was terminated 
> while the requesting backend was waiting on the condition variable to be 
> signaled by it. I don’t see any solution other than having the waiting
> client 
> backend timeout using ConditionVariableTimedSleep.
> 
> In the patch, since the timeout was set to a high value, pgbench ended
> up stuck 
> waiting for the timeout to occur. The failure happens less frequently
> after I added an
> additional check for the process's existence, but it cannot be entirely 
> avoided. This is because a process can terminate after we check for its
> existence but 
> before it signals the client. In such cases, the client will not receive
> any signal.
> 

Hmmm, I see. I guess there's no way to know if a process responds to us,
but I guess it should be possible to wake up regularly and check if the
process still exists? Wouldn't that solve the case you mentioned?

> I wonder if e.g. autovacuum launcher may
> not be handling these requests, or what if client backends can wait in a
> cycle.
> 
>  
> Did not see a cyclic wait in client backends due to the pgbench stress test.
>  

Not sure, but if I modify the query to only request memory contexts from
non-client processes, i.e.

  SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
  WHERE pid != pg_backend_pid()
AND backend_type != 'client backend'
  ORDER BY random() LIMIT 1)
, false);

then it gets stuck and reports this:

  pgbench -n -f select.sql -c 4 -T 10 test
  pgbench (18devel)
  WARNING:  Wait for 105029 process to publish stats timed out, ...

But process 105

Re: More CppAsString2() in psql's describe.c

2024-11-28 Thread Tom Lane

Daniel Gustafsson  writes:
> On 28 Nov 2024, at 19:25, Corey Huinker  wrote:
>>> I sort of wish we had a shorter macro as CppAsString2() get's pretty verbose
>>> when used frequently like this.

>> I don't quite understand the etymology of the name (it's some variation on 
>> C++'s std::to_string, plus...something), but if I did, I'd probably find the 
>> name less icky.

> AFAIK, Cpp stands for "C Pre Processor" as it is a preprocessor expansion into
> a string, and the 2 is simply because CppAsString2() calls CppAsString().  The
> reason for the call is perform macro expansion before converting to string.

Yeah.  That name was fine when there were just a few occurrences,
but now it seems we've got hundreds, so +1 for something shorter.

I don't like suggestions as generic as STR() though; that's not very
meaningful and also seems like it might collide with some other usage.
How about something like SYM2LIT ("symbol to literal") or SYM2STR?

regards, tom lane

Re: Changing shared_buffers without restart

2024-11-28 Thread Matthias van de Meent

On Thu, 28 Nov 2024 at 19:57, Tom Lane  wrote:
>
> Matthias van de Meent  writes:
> > On Thu, 28 Nov 2024 at 18:19, Robert Haas  wrote:
> >> [...] It's unclear to me why
> >> operating systems don't offer better primitives for this sort of thing
> >> -- in theory there could be a system call that sets aside a pool of
> >> address space and then other system calls that let you allocate
> >> shared/unshared memory within that space or even at specific
> >> addresses, but actually such things don't exist.
>
> > Isn't that more a stdlib/malloc issue? AFAIK, Linux's mmap(2) syscall
> > allows you to request memory from the OS at arbitrary addresses - it's
> > just that stdlib's malloc doens't expose the 'alloc at this address'
> > part of that API.
>
> I think what Robert is concerned about is that there is exactly 0
> guarantee that that will succeed, because you have no control over
> system-driven allocations of address space (for example, loading
> of extensions or JIT code).  In fact, given things like ASLR, there
> is pressure on the kernel crew to make that *less* predictable not
> more so.

I see what you mean, but I think that shouldn't be much of an issue.
I'm not a kernel hacker, but I've never heard about anyone arguing to
remove mmap's mapping-overwriting behavior for user-controlled
mappings - it seems too useful as a way to guarantee relative memory
addresses (agreed, there is now mseal(2), but that is the user asking
for security on their own mapping, this isn't applied to arbitrary
mappings).

I mean, we can do the following to get a nice contiguous empty address
space no other mmap(NULL)s will get put into:

/* reserve size bytes of memory */
base = mmap(NULL, size, PROT_NONE, ...flags, ...);
/* use the first small_size bytes of that reservation */
allocated_in_reserved = mmap(base, small_size, PROT_READ |
PROT_WRITE, MAP_FIXED, ...);

With the PROT_NONE protection option the OS doesn't actually allocate
any backing memory, but guarantees no other mmap(NULL, ...) will get
placed in that area such that it overlaps with that allocation until
the area is munmap-ed, thus allowing us to reserve a chunk of address
space without actually using (much) memory. Deallocations have to go
through mmap(... PROT_NONE, ...) instead of munmap if we'd want to
keep the full area reserved, but I think that's not that much of an
issue.

I also highly doubt Linux will remove or otherwise limit the PROT_NONE
option to such a degree that we won't be able to "balloon" the memory
address space for (e.g.) dynamic shared buffer resizing.

See also: FreeBSD's MAP_GUARD mmap flag, Window's MEM_RESERVE and
MEM_RESERVE_PLACEHOLDER flags for VirtualAlloc[2][Ex].
See also [0] where PROT_NONE is explicitly called out as a tool for
reserving memory address space.

> So even if we devise a method that seems to work reliably
> today, we could have little faith that it would work with next year's
> kernels.

I really don't think that userspace memory address space reservations
through e.g. PROT_NONE or MEM_RESERVE[_PLACEHOLDER] will be retired
anytime soon, at least not without the relevant kernels also providing
effective alternatives.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://www.gnu.org/software/libc/manual/html_node/Memory-Protection.html

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-11-28 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 28 Nov 2024 19:02:57 +0800,
  Junwang Zhao  wrote:

>> > I tested 3 branches:
>> >
>> > 1. the master branch
>> > 2. all v26 patch sets applied
>> > 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some
>> > if branch in CopyOneRowTo, so I was expecting this slower than master
>> >
>> > You can see the detailed results here[2], I can not upload files so I
>> > just shared the google doc link, ping me if you can not open the link.
>> >
>> > [1]: 
>> > https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com
>> > [2]: 
>> > https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing
>>
>> Thanks for sharing your numbers.
>>
>> 1. and 2. shows that there is at least no significant
>> performance regression.
> 
> Agreed.

Can we focus on only 1. and 2. in this thread?

>> I see the patch set of 3. and I think that the result
>> (there is no performance difference between 1. and 3.) isn't
>> strange. The patch set adds some if branches but they aren't
>> used with "text" format at least in per row process.
> 
> It is not used in "text" format, but it adds some assembly code
> to the CopyOneRowTo function, so this will have some impact
> on the cpu i cache I guess.
> 
> There is difference between 1 and 3, 3 is always better than 1
> upto 4% improvement

Can we discuss 1. and 3. in the [1] thread?

(Anyway, we may want to confirm whether these numbers are
reproducible or not as the first step.)

>  I forgot to mention that the comparisons
> are in *sheet2*.

Thanks. I missed it.


Thanks,
-- 
kou

Re: Remove useless GROUP BY columns considering unique index

2024-11-28 Thread jian he

minor changes in get_unique_not_null_attnos:

* cosmetic changes
* if the returned list length is less than 2, no need sort.
From 58094504364d8a9eb51cbc77a75626aef39b17f0 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 29 Nov 2024 09:23:27 +0800
Subject: [PATCH v7 1/1] remove useless group by columns via unique-not-null

In the `remove_useless_groupby_columns` function,
use a unique index and a not-null constraint to eliminate unnecessary columns.
Here, the primary key is treated as a "unique index and not-null".
If there are multiple candidates,
use the unique index with the fewest columns to remove the other columns
in groupby expression.

discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0@Spar
---
 src/backend/catalog/index.c| 149 +
 src/backend/optimizer/plan/planner.c   |  53 ++--
 src/include/catalog/index.h|   1 +
 src/test/regress/expected/aggregates.out   |  59 +++-
 src/test/regress/expected/create_index.out |  11 +-
 src/test/regress/expected/indexing.out |  11 ++
 src/test/regress/sql/aggregates.sql|  26 +++-
 src/test/regress/sql/create_index.sql  |   4 +-
 src/test/regress/sql/indexing.sql  |   3 +
 src/tools/pgindent/typedefs.list   |   1 +
 10 files changed, 303 insertions(+), 15 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f9bb721c5f..2555997c3c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -56,6 +56,7 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/trigger.h"
+#include "common/int.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -97,6 +98,14 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+/* Struct for sorting list of Bitmapsets in get_unique_not_null_attnos */
+typedef struct
+{
+	Bitmapset	*unique_notnull_attnos;	/* Bitmapset contains attribute number
+			that are unique and not null*/
+	Oid			indexoid;/*the unique index oid */
+} unique_nn_info;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -131,7 +140,19 @@ static void SetReindexProcessing(Oid heapOid, Oid indexOid);
 static void ResetReindexProcessing(void);
 static void SetReindexPending(List *indexes);
 static void RemoveReindexPending(Oid indexOid);
+static int unique_notnull_sort_by_idxoid(const void *a, const void *b);
 
+/*
+ * Comparison function for sorting unique_nn_info in indexoid order.
+ */
+static int
+unique_notnull_sort_by_idxoid(const void *a, const void *b)
+{
+	const unique_nn_info *unique_a = (const unique_nn_info *) a;
+	const unique_nn_info *unique_b = (const unique_nn_info *) b;
+
+	return pg_cmp_u32(unique_a->indexoid, unique_b->indexoid);
+}
 
 /*
  * relationHasPrimaryKey
@@ -4257,3 +4278,131 @@ RestoreReindexState(const void *reindexstate)
 	/* Note the worker has its own transaction nesting level */
 	reindexingNestLevel = GetCurrentTransactionNestLevel();
 }
+
+/*
+ * get_unique_not_null_attnos
+ *		Returns a sorted List of Bitmapsets with the attribute numbers (offset
+ *		by FirstLowInvalidHeapAttributeNumber) of all valid non-partial unique
+ *		indexes which contain only non-nullable columns.
+ */
+List *
+get_unique_not_null_attnos(Oid relid)
+{
+	Bitmapset	*not_null_attnos = NULL;
+	Relation	pg_index;
+	int			j = 0;
+	HeapTuple	indexTuple;
+	SysScanDesc scan;
+	ScanKeyData skey;
+	List 		*not_null_cs;
+	List	   *result = NIL;
+	List	   *indlist = NIL;
+	List	   *indexOidlist = NIL;
+	unique_nn_info *info	= NULL;
+	ListCell   *lc,
+*lc2;
+
+	/* Use CookedConstraint to fetch not-null constraint. */
+	not_null_cs = RelationGetNotNullConstraints(relid, true, false);
+	if (not_null_cs == NIL)
+		return NULL;
+
+	foreach_ptr(CookedConstraint, cc, not_null_cs)
+		not_null_attnos = bms_add_member(not_null_attnos, cc->attnum - FirstLowInvalidHeapAttributeNumber);
+
+	/* Scan pg_index for unique index of the target rel */
+	pg_index = table_open(IndexRelationId, AccessShareLock);
+
+	ScanKeyInit(&skey,
+Anum_pg_index_indrelid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(relid));
+	scan = systable_beginscan(pg_index, IndexIndrelidIndexId, true,
+			  NULL, 1, &skey);
+
+	while (HeapTupleIsValid(indexTuple = systable_getnext(scan)))
+	{
+		Bitmapset  	*unique_notnull_attnos = NULL;
+		Form_pg_index index = (Form_pg_index) GETSTRUCT(indexTuple);
+		Datum		adatum;
+		bool		isNull;
+		bool		all_notnull = true;
+		int			numkeys;
+		Datum	   *keys;
+		int			nelems;
+
+		/* we're only interested in unique index */
+		if (!index->indisunique)
+			continue;
+
+		/* Skip invalid or deferred index */
+		if (!index->indisvalid || !index->indimmediate)
+			continue;
+
+		/* Skip expression index or predicate index */
+		if (!heap_attisnull(indexT

Re: Added prosupport function for estimating numeric generate_series rows

2024-11-28 Thread Tom Lane

David Rowley  writes:
> Also a few minor things:
> * Missing space after "if"
> + if(arg3)
> * We always have an empty line after variable declarations, that's missing in:
> + NumericVar q;
> + init_var(&q);

This sort of stuff is best addressed by running the code through
pgindent, rather than fixing it manually.  Usually we don't insist
on submitters getting it right; the committer should pgindent it.

regards, tom lane

Re: CI CompilerWarnings test fails on 15 in mingw_cross_warning

2024-11-28 Thread Thomas Munro

Pushed.  CI is finally returned to its full verdant splendour in the
back-branches.

Re: More CppAsString2() in psql's describe.c

2024-11-28 Thread Michael Paquier

On Thu, Nov 28, 2024 at 10:40:35AM +0100, Daniel Gustafsson wrote:
> LGTM, I didn't scan for omissions but the ones in the patch look right to me.
> I sort of wish we had a shorter macro as CppAsString2() get's pretty verbose
> when used frequently like this.

Thanks for the reviews, applied what I had.  I've managed to miss a
c.contype = 'n' in describeOneTableDetails(), actually, so I've fixed
it while on it.

Here are some notes about the state of things in describe.c:
- RELKIND_SPECIAL does not exist anymore, see listTables().
- pg_depend.deptype, that cannot be changed currently as
  DependencyType is in dependency.h.
- pol.polcmd wants some ACLs now in acl.h.
- tgenabled has some hardcoded values now in trigger.h.
- Event triggers with evtenabled has values in trigger.h
(TRIGGER_FIRES_ON_ORIGIN).
- attcompression has two values for lz4 and pglz.
- substream, where LOGICALREP_STREAM_ON & friends should be added to
pg_subscription_d.h to allow the move.  Perhaps we should just do that
for the subscription case.  That's worth a patch of its own as it
changes pg_subscription_d.h.  I'll spawn a new thread about that.
--
Michael


signature.asc
Description: PGP signature

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Alexander Korotkov

On Thu, Nov 28, 2024 at 9:33 PM Alena Rybakina
 wrote:
>
> On 28.11.2024 22:28, Ranier Vilela wrote:
>
> Em qui., 28 de nov. de 2024 às 16:03, Alena Rybakina 
>  escreveu:
>>
>> Hi! Thank you for the case.
>>
>> On 28.11.2024 21:00, Alexander Lakhin wrote:
>> > Hello Alexander,
>> >
>> > 21.11.2024 09:34, Alexander Korotkov wrote:
>> >> I'm going to push this if no objections.
>> >
>> > Please look at the following query, which triggers an error after
>> > ae4569161:
>> > SET random_page_cost = 1;
>> > CREATE TABLE tbl(u UUID);
>> > CREATE INDEX idx ON tbl USING HASH (u);
>> > SELECT COUNT(*) FROM tbl WHERE u = '' OR
>> >   u = '';
>> >
>> > ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
>> > LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625
>> >
>> >
>> I found out what the problem is index scan method was not generated. We
>> need to check this during OR clauses for SAOP transformation.
>>
>> There is a patch to fix this problem.
>
> Hi.
> Thanks for the quick fix.
>
> But I wonder if it is not possible to avoid all if the index is useless?
> Maybe moving your fix to the beginning of the function?
>
> diff --git a/src/backend/optimizer/path/indxpath.c 
> b/src/backend/optimizer/path/indxpath.c
> index d827fc9f4d..5ea0b27d01 100644
> --- a/src/backend/optimizer/path/indxpath.c
> +++ b/src/backend/optimizer/path/indxpath.c
> @@ -3248,6 +3248,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
>   Assert(IsA(orclause, BoolExpr));
>   Assert(orclause->boolop == OR_EXPR);
>
> + /* Ignore index if it doesn't support index scans */
> + if(!index->amsearcharray)
> + return NULL;
> +
>
> Agree. I have updated the patch
>
>   /*
>   * Try to convert a list of OR-clauses to a single SAOP expression. Each
>   * OR entry must be in the form: (indexkey operator constant) or (constant
>
> The test bug:
> EXPLAIN SELECT COUNT(*) FROM tbl WHERE u = '' 
> OR u = '';
> QUERY PLAN
> --
>  Aggregate  (cost=12.46..12.47 rows=1 width=8)
>->  Bitmap Heap Scan on tbl  (cost=2.14..12.41 rows=18 width=0)
>  Recheck Cond: ((u = '----'::uuid) OR 
> (u = '----'::uuid))
>  ->  BitmapOr  (cost=2.14..2.14 rows=18 width=0)
>->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 width=0)
>  Index Cond: (u = 
> '----'::uuid)
>->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 width=0)
>  Index Cond: (u = 
> '----'::uuid)
> (8 rows)

I slightly revised the fix and added similar check to
group_similar_or_args().  Could you, please, review that before
commit?

--
Regards,
Alexander Korotkov
Supabase


v3-0001-Skip-not-SOAP-supported-indexes-while-transformin.patch
Description: Binary data

Re: Truncate logs by max_log_size

2024-11-28 Thread Jim Jones

On 28.11.24 20:20, Kirill Gavrilov wrote:
>   Here is version 3 of this patch. I found another place where this
> setting can be applied. 
>   Also added some documentation and specified that this setting
> truncates queries by size in bytes.

Thanks. It is now possible to change the parameter using SET

postgres=# SET max_log_size TO 15;
SET
postgres=# SHOW max_log_size ;
 max_log_size
--
 15
(1 row)

In the postgresql.conf the default value is set to 0

#max_log_size = 0

But if we take a look at the parameter it shows something else

postgres=# SHOW max_log_size ;
 max_log_size
--
 5242880
(1 row)

Perhaps it would be best to keep the default value in the
postgresql.conf consistent with the documentation and the actual
max_log_size value. IMHO 0 (or -1) should disable it.

There are also several compilation issues:

postgres.c: In function ‘exec_simple_query’:
postgres.c:1040:9: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
 1040 | bool copied = false;
  | ^~~~
elog.c: In function ‘EmitErrorReport’:
elog.c:1697:9: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
 1697 | const char* old_query_string = debug_query_string;
  | ^
elog.c:1744:23: warning: passing argument 1 of ‘pfree’ discards ‘const’
qualifier from pointer target type [-Wdiscarded-qualifiers]
 1744 | pfree(debug_query_string);
  |   ^~
In file included from ../../../../src/include/postgres.h:47,
 from elog.c:55:
../../../../src/include/utils/palloc.h:86:25: note: expected ‘void *’
but argument is of type ‘const char *’
   86 | extern void pfree(void *pointer);
  |   ~~^~~
elog.c: In function ‘build_query_log’:
elog.c:3798:9: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
 3798 | size_t query_len = strlen(query);
  | ^~
elog.c:3801:24: warning: return discards ‘const’ qualifier from pointer
target type [-Wdiscarded-qualifiers]
 3801 | return query;
  |    ^
elog.c:3805:9: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
 3805 | size_t query_log_len = pg_mbcliplen(query, query_len,
max_log_size);
  | ^~

and there is a trailing whitespace at:

+ *    If query needs to be truncated, copied will be set to true

/home/jim/patches/max_log_query/V3-0001-parameter-max_log_size-to-truncate-logs.patch:141:
trailing whitespace.
 *  If query needs to be truncated, copied will be set to true
Checking patch doc/src/sgml/config.sgml...
Hunk #1 succeeded at 7865 (offset -48 lines).
Checking patch src/backend/tcop/postgres.c...
Hunk #1 succeeded at 71 (offset 1 line).
Hunk #2 succeeded at 1031 (offset 1 line).
Hunk #3 succeeded at 1083 (offset 1 line).
Hunk #4 succeeded at 1382 (offset 1 line).
Hunk #5 succeeded at 1393 (offset 1 line).
Checking patch src/backend/utils/error/elog.c...
Hunk #4 succeeded at 3781 (offset -3 lines).
Checking patch src/backend/utils/misc/guc_tables.c...
Hunk #1 succeeded at 3714 (offset -10 lines).
Checking patch src/backend/utils/misc/postgresql.conf.sample...
Hunk #1 succeeded at 615 (offset -2 lines).
Checking patch src/bin/pg_ctl/t/004_logrotate.pl...
Checking patch src/include/utils/elog.h...
Applied patch doc/src/sgml/config.sgml cleanly.
Applied patch src/backend/tcop/postgres.c cleanly.
Applied patch src/backend/utils/error/elog.c cleanly.
Applied patch src/backend/utils/misc/guc_tables.c cleanly.
Applied patch src/backend/utils/misc/postgresql.conf.sample cleanly.
Applied patch src/bin/pg_ctl/t/004_logrotate.pl cleanly.
Applied patch src/include/utils/elog.h cleanly.
warning: 1 line adds whitespace errors.

-- 
Jim

Re: Changing shared_buffers without restart

2024-11-28 Thread Tom Lane

Matthias van de Meent  writes:
> I mean, we can do the following to get a nice contiguous empty address
> space no other mmap(NULL)s will get put into:

> /* reserve size bytes of memory */
> base = mmap(NULL, size, PROT_NONE, ...flags, ...);
> /* use the first small_size bytes of that reservation */
> allocated_in_reserved = mmap(base, small_size, PROT_READ |
> PROT_WRITE, MAP_FIXED, ...);

> With the PROT_NONE protection option the OS doesn't actually allocate
> any backing memory, but guarantees no other mmap(NULL, ...) will get
> placed in that area such that it overlaps with that allocation until
> the area is munmap-ed, thus allowing us to reserve a chunk of address
> space without actually using (much) memory.

Well, that's all great if it works portably.  But I don't see one word
in either POSIX or the Linux mmap(2) man page that promises those
semantics for PROT_NONE.  I also wonder how well a giant chunk of
"unbacked" address space will interoperate with the OOM killer,
top(1)'s display of used memory, and other things that have caused us
headaches with large shared-memory arenas.

Maybe those issues are all in the past and this'll work great.
I'm not holding my breath though.

regards, tom lane

Re: Remove an unnecessary check on semijoin_target_ok() on postgres_fdw.c

2024-11-28 Thread Tender Wang

Alexander Pyhalov  于2024年11月29日周五 00:02写道：

> Tender Wang писал(а) 2024-10-09 10:26:
> > Hi,
> >When I debug FDW join pushdown codes, I found below codes in
> > semijoin_target_ok():
> > if (bms_is_member(var->varno, innerrel->relids) &&
> >
> > !bms_is_member(var->varno, outerrel->relids))
> >
> > As far as I know, if a var belongs to the innerrel of joinrel, it's
> > not possible that it
> > may belong to the outerrel. So if the bms_is_member(var->varno,
> > innerrel->relids)
> > returns TRUE, then !bms_is_member(var->varno, outerrel->relids) must
> > be TRUE.
> > If bms_is_member(var->varno, innerrel->relids) returns FALSE,
> > !bms_is_member(var->varno, outerrel->relids) will not execute due to
> > short circuit.
> >
> > So I think we can remove the "!bms_is_member(var->varno,
> > outerrel->relids)" from if.
> > Any thoughts?
>
> Hi.
> Seems good to me.
> --
> Best regards,
> Alexander Pyhalov,
> Postgres Professional
>

Thanks for looking at that.

-- 
Thanks,
Tender Wang

Re: Remove useless GROUP BY columns considering unique index

2024-11-28 Thread David Rowley

On Fri, 29 Nov 2024 at 01:33, jian he  wrote:
>
> On Thu, Nov 28, 2024 at 2:11 PM David Rowley  wrote:
> > I think it'll make more sense to adjust some of the
> > existing tests to use a unique constraint instead of a PK and then
> > adjust a column's NOT NULL property to check that part of the code is
> > working correctly.
> >
> looking around,  i inserted some tests to indexing.sql,
> create_index.sql, aggregates.sql
> also added tests for sort by index oid.

I meant the tests added by d4c3a156c. I don't think there's any need
to have tests in more than one file.

> > Another issue with this is that the list of indexes being used is not
> > sorted by Oid.

> When there are multiple matches, we need a determined way  to choose which 
> one.
> so please check attached.

Looking closer at get_relation_info(), it looks like the
RelOptInfo.indexlist will be in descending Oid order, per:

/*
* We've historically used lcons() here.  It'd make more sense to
* use lappend(), but that causes the planner to change behavior
* in cases where two indexes seem equally attractive.  For now,
* stick with lcons() --- few tables should have so many indexes
* that the O(N^2) behavior of lcons() is really a problem.
*/
indexinfos = lcons(info, indexinfos);

I'm starting to feel like we might be trying to include too much logic
to mimic this behaviour.  The main problem is that the RelOptInfos
have not yet been created for the given relation when
remove_useless_groupby_columns() is called. That's why we're having to
query the catalogue tables like this.  I do see a comment in
preprocess_groupclauses() that mentions:

 * Note: we return a fresh List, but its elements are the same
 * SortGroupClauses appearing in parse->groupClause.  This is important
 * because later processing may modify the processed_groupClause list.

So there's already a warning that there may still be future changes to
the processed_groupClause. It'd be nice if we could delay calling
remove_useless_groupby_columns() until the RelOptInfos are built as
we'd not have to query the catalogue tables to make this work. The
soonest point is just after the call to add_base_rels_to_query() in
query_planner().  That would mean doing more work on the
processed_groupClause in query_planner() rather than in
grouping_planner(), which is a little strange. It looks like the first
time we look at processed_groupClause and make a decision based on the
actual value of it is in standard_qp_callback(), i.e. after
add_base_rels_to_query().

I've attached an updated patch that gets rid of the
get_unique_not_null_attnos() function completely and uses the
RelOptInfo.indexlist and RelOptInfo.notnullattnums.  I also realised
that there's no need to check for NOT NULL constraints with unique
indexes defined with NULLS NOT DISTINCT as having NULLs unique means
we can still determine the functional dependency. One other possibly
good outcome of this is that it's much easier to do the
pg_statistic.avgwidth thing I talked about as RelOptInfo stores those
in attr_widths, however, looking at
table_block_relation_estimate_size(), I see we might not always
populate those.

I ended up re-homing remove_useless_groupby_columns() in initsplan.c.
I couldn't see anywhere else it fitted.

David

v8-0001-remove-useless-group-by-columns-via-unique-not-nu.patch
Description: Binary data

Memory leak in WAL sender with pgoutput (v10~)

2024-11-28 Thread Michael Paquier

Hi all,

This is a follow-up of ea792bfd93ab and this thread where I've noticed
that some memory was still leaking when running sysbench with a
logical replication setup:
https://www.postgresql.org/message-id/zz7srcbxxhoyn...@paquier.xyz

Reproducing the problem is quite simple, and can be done with the
following steps (please adapt, like previous thread):
1) Initialize a primary and set up some empty tables:
NUM_TABLES=500
PRIMARY_PORT=5432
STANDBY_PORT=5433
sysbench oltp_read_only --db-driver=pgsql \
 --pgsql-port=$PRIMARY_PORT \
 --pgsql-db=postgres \
 --pgsql-user=postgres \
 --pgsql-host=/tmp \
 --tables=$NUM_TABLES --table_size=0 \
 --report-interval=1 \
 --threads=1 prepare
2) Create a standby, promote it:
STANDBY_DATA=/define/your/standby/path/
pg_basebackup --checkpoint=fast -D $STANDBY_DATA -p $PRIMARY_PORT -h /tmp/ -R
echo "port = $STANDBY_PORT" >> $STANDBY_DATA/postgresql.conf
pg_ctl -D $STANDBY_DATA start
pg_ctl -D $STANDBY_DATA promote
3) Publication and subscription
psql -p $PRIMARY_PORT -c "CREATE PUBLICATION pub1 FOR ALL TABLES;"
psql -p $STANDBY_PORT -c "CREATE SUBSCRIPTION sub1 CONNECTION
'port=$PRIMARY_PORT user=postgres dbname=postgres PUBLICATION pub1 WITH
(enabled, create_slot, slot_name='pub1_to_sub1', copy_data=false);"
4) Run sysbench:
sysbench oltp_write_only --db-driver=pgsql \
 --pgsql-port=$PRIMARY_PORT \
 --pgsql-db=postgres \
 --pgsql-user=postgres \
 --pgsql-host=/tmp \
 --tables=$NUM_TABLES --table_size=100 \
 --report-interval=1 \
 --threads=$NUM_THREADS run \
 --time=5000

With that, I've mentioned internally that there was an extra leak and
left the problem lying aside for a few days before being able to come
back to it.  In-between, Jeff Davis has done a review of the code to
note that we leak memory in the CacheMemoryContext, due to the
list_free_deep() done in get_rel_sync_entry() that misses the fact
that the publication name is pstrdup()'d in GetPublication(), but
list_free_deep() does not know that so the memory of the strdupped
publication names stays around.  That's wrong.

Back to today, I've done more benchmarking using the above steps and
logged periodic memory samples of the WAL sender with
pg_log_backend_memory_contexts() (100s, 50 points), and analyzed the 
evolution of the whole thing. "Grand Total" used memory keeps
increasing due to one memory context: CacheMemoryContext.  So yes,
Jeff has spotted what looks like the only leak we have. 

Attached is a graph that shows the evolution of memory between HEAD
and a patched version for the used memory of CacheMemoryContext
(ylabel is in bytes, could not get that right as my gnuplot skills are
bad).

One thing to note in this case is that I've made the leak more
noticeable with this tweak to force the publication to be reset each
time with go through get_rel_sync_entry(), or memory takes much longer
to pile up.  That does not change the problem, speeds up the detection
by a large factor:
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2061,12 +2061,13 @@ get_rel_sync_entry(PGOutputData *data, Relation 
relation)
 List   *rel_publications = NIL;
 
 /* Reload publications if needed before use. */
-if (!publications_valid)
+elog(WARNING, "forcing reload publications");
 {
 oldctx = MemoryContextSwitchTo(CacheMemoryContext);
 if (data->publications)
 {
 list_free_deep(data->publications);

This problem exists since the introduction of logical replication,
down to v10.  It would be sad to have an extra loop on publications to
free explicitely the publication names, and the issue would still be
hidden to the existing callers of GetPublication(), so I'd suggest to
change the Publication structure to use a NAMEDATALEN string to force
the free to happen, as in the attached.

That means an ABI break.  I've looked around and did not notice any
out-of-core code relying on sizeof(Publication).  That does not mean
that nothing would break, of course, but the odds should be pretty
good as this is a rather internal structure.  One way would be to just
fix that on HEAD and call it a day, but I'm aware of heavy logirep
users whose workloads would be able to see memory bloating because
they maintain WAL senders around.

Thoughts or comments?
--
Michael
0,3968872,4014248
100,4011832,4014248
200,3997928,4001048
300,404,4001048
400,4040608,3984520
500,4042552,3999448
600,4053576,3999448
700,4067464,3938632
800,4068992,3998120
900,4080632,3998120
1000,4090056,3890312
1100,4083968,3998120
1200,4106728,3998120
1300,4119200,3949688
1400,4104680,3998120
1500,4134536,3998120
1600,4144408,3905128
1700,4137608,3998120
1800,4159744,3998120
1900,4171488

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-11-28 Thread vignesh C

On Wed, 27 Nov 2024 at 16:25, Nisha Moond  wrote:
>
> On Wed, Nov 27, 2024 at 8:39 AM Peter Smith  wrote:
> >
> > Hi Nisha,
> >
> > Here are some review comments for the patch v50-0002.
> >
> > ==
> > src/backend/replication/slot.c
> >
> > InvalidatePossiblyObsoleteSlot:
> >
> > 1.
> > + if (now &&
> > + TimestampDifferenceExceeds(s->inactive_since, now,
> > +replication_slot_inactive_timeout_sec * 1000))
> >
> > Previously this was using an additional call to 
> > SlotInactiveTimeoutCheckAllowed:
> >
> > + if (SlotInactiveTimeoutCheckAllowed(s) &&
> > + TimestampDifferenceExceeds(s->inactive_since, now,
> > +replication_slot_inactive_timeout * 1000))
> >
> > Is it OK to skip that call? e.g. can the slot fields possibly change
> > between assigning the 'now' and acquiring the mutex? If not, then the
> > current code is fine. The only reason for asking is because it is
> > slightly suspicious that it was not done this "easy" way in the first
> > place.
> >
> Good catch! While the mutex was being acquired right after the now
> assignment, there was a rare chance of another process modifying the
> slot in the meantime. So, I reverted the change in v51. To optimize
> the SlotInactiveTimeoutCheckAllowed() call, it's sufficient to check
> it here instead of during the 'now' assignment.
>
> Attached v51 patch-set addressing all comments in [1] and [2].

Few comments:
1) replication_slot_inactive_timeout can be mentioned in logical
replication config, we could mention something like:
Logical replication slot is also affected by replication_slot_inactive_timeout

2.a) Is this change applicable only for inactive timeout or it is
applicable to others like wal removed, wal level etc also? If it is
applicable to all of them we could move this to the first patch and
update the commit message:
+* If the slot can be acquired, do so and mark it as
invalidated. If
+* the slot is already ours, mark it as invalidated.
Otherwise, we'll
+* signal the owning process below and retry.
 */
-   if (active_pid == 0)
+   if (active_pid == 0 ||
+   (MyReplicationSlot == s &&
+active_pid == MyProcPid))

2.b) Also this MyReplicationSlot and active_pid check can be in same line:
+   (MyReplicationSlot == s &&
+active_pid == MyProcPid))


3) Error detail should start in upper case here similar to how others are done:
+   case RS_INVAL_INACTIVE_TIMEOUT:
+   appendStringInfo(&err_detail,
_("inactivity exceeded the time limit set by \"%s\"."),
+
"replication_slot_inactive_timeout");
+   break;

4) Since this change is not related to this patch, we can move this to
the first patch and update the commit message:
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -1508,7 +1508,7 @@ ReplSlotSyncWorkerMain(char *startup_data,
size_t startup_data_len)
 static void
 update_synced_slots_inactive_since(void)
 {
-   TimestampTz now = 0;
+   TimestampTz now;

/*
 * We need to update inactive_since only when we are promoting
standby to
@@ -1523,6 +1523,9 @@ update_synced_slots_inactive_since(void)
/* The slot sync worker or SQL function mustn't be running by now */
Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);

+   /* Use same inactive_since time for all slots */
+   now = GetCurrentTimestamp();

5) Since this change is not related to this patch, we can move this to
the first patch.
@@ -2250,6 +2350,7 @@ RestoreSlotFromDisk(const char *name)
boolrestored = false;
int readBytes;
pg_crc32c   checksum;
+   TimestampTz now;

/* no need to lock here, no concurrent access allowed yet */

@@ -2410,6 +2511,9 @@ RestoreSlotFromDisk(const char *name)
NameStr(cp.slotdata.name)),
 errhint("Change \"wal_level\" to be
\"replica\" or higher.")));

+   /* Use same inactive_since time for all slots */
+   now = GetCurrentTimestamp();
+
/* nothing can be active yet, don't lock anything */
for (i = 0; i < max_replication_slots; i++)
{
@@ -2440,9 +2544,11 @@ RestoreSlotFromDisk(const char *name)
/*
 * Set the time since the slot has become inactive
after loading the
 * slot from the disk into memory. Whoever acquires
the slot i.e.
-* makes the slot active will reset it.
+* makes the slot active will reset it. Avoid calling
+* ReplicationSlotSetInactiveSince() here, as it will
not set the time
+* for invalid slots.
 */
-   slot->inactive_since = GetCurrentTimestamp();
+

Re: [PATCH] SVE popcount support

2024-11-28 Thread Kirill Reshke

On Thu, 28 Nov 2024 at 20:22, Malladi, Rama  wrote:
>
> Attachments protected by Amazon: 0001-SVE-popcount-support.patch | 
> SVE-popcount-support-PostgreSQL.png |
> Amazon has replaced the attachments in this email with download links. 
> Downloads will be available until December 27, 2024, 15:43 (UTC+00:00). Tell 
> us what you think
> For more information click here
>
> Please find attached a patch to PostgreSQL implementing SVE popcount. I used 
> John Naylor's test_popcount module [0] to put together the attached graphs. 
> This test didn't show any regressions with a relatively small number of 
> bytes, and it showed the expected improvements with many bytes.
>
>
>
> [0] 
> https://postgr.es/m/CAFBsxsE7otwnfA36Ly44zZO+b7AEWHRFANxR1h1kxveEV=g...@mail.gmail.com

Hi! To register entry on commitfest you need to send patch in one of
this format:

https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F

This is useful for reviewers who use cfbot or cputube.

--
Best regards,
Kirill Reshke

Re: Fix comment in reorderbuffer.h

2024-11-28 Thread Daniel Gustafsson

> On 28 Nov 2024, at 15:14, vignesh C  wrote:
> 
> On Thu, 28 Nov 2024 at 06:40, Peter Smith  wrote:
>> 
>> Hi, while reviewing another patch I noticed a poorly worded code comment.
>> 
>> Please find the attached trivial fix.
> 
> I agree with this finding, the patch looks good to me.

Agreed, and pushed. Thanks!

--
Daniel Gustafsson

Re: Truncate logs by max_log_size

2024-11-28 Thread Jim Jones




On 25.11.24 17:52, Kirill Reshke wrote:
> Hello! Please deliver the most recent patch version and fix the issues
> Jim identified [0] as the current commitfest draws to a close.
> Do not forget to include both parts of this patch (This was actually
> developed off-list, and we are now using this on our cloud PostgreSQL
> distribution on versions 12–17).
>
> [0] 
> https://www.postgresql.org/message-id/35096a36-04d4-480b-a7cd-a2d8151fb737%40uni-muenster.de

In addition to these points, this feature seems to fail with queries
containing special characters (more than one byte):

With this "max_log_size"..

postgres=# SHOW max_log_size;
 max_log_size
--
 20
(1 row)

... and this query ..

postgres=# SELECT "ÄÜÖ" FROM t;
ERROR:  relation "t" does not exist
LINE 1: SELECT "ÄÜÖ" FROM t;

.. this is the [truncated] log entry we get ..

2024-11-28 14:58:57.912 CET [2258876] ERROR:  relation "t" does not
exist at character 19
2024-11-28 14:58:57.912 CET [2258876] STATEMENT:  SELECT "ÄÜÖ" FROM

... although the query originally had exactly 20 characters:

postgres=# SELECT length('SELECT "ÄÜÖ" FROM t;');
 length

 20
(1 row)


postgres=# SELECT length('ÄÜÖ'::bytea), length('AUO'::bytea);
 length | length
+
  6 |  3
(1 row)

If it is supposed to be like this, it should be clearly stated so in the
docs.

-- 
Jim

[PATCH] SVE popcount support

2024-11-28 Thread Malladi, Rama

Attachments protected by Amazon:

[0001-SVE-popcount-support.patch]
https://us-west-2.secure-attach.amazon.com/a29c9ff9-1f9b-430f-9b3c-07fde9a419aa/f9178627-0600-4527-bc5c-7e4cb9ef6e9a
[SVE-popcount-support-PostgreSQL.png]
https://us-west-2.secure-attach.amazon.com/a29c9ff9-1f9b-430f-9b3c-07fde9a419aa/13c252c4-c45e-447c-9e55-fe637f8d345c

Amazon has replaced the attachments in this email with download links. 
Downloads will be available until December 27, 2024, 15:43 (UTC+00:00).
[Tell us what you think] 
https://amazonexteu.qualtrics.com/jfe/form/SV_ehuz6zGo8YnsRKK
[For more information click here] https://docs.secure-attach.amazon.com/guide


Please find attached a patch to PostgreSQL implementing SVE popcount. I used 
John Naylor's test_popcount module [0] to put together the attached graphs. 
This test didn't show any regressions with a relatively small number of bytes, 
and it showed the expected improvements with many bytes.

[0] 
https://postgr.es/m/CAFBsxsE7otwnfA36Ly44zZO+b7AEWHRFANxR1h1kxveEV=g...@mail.gmail.com

Re: Useless field ispartitioned in CreateStmtContext

2024-11-28 Thread Kirill Reshke

On Tue, 5 Nov 2024 at 16:51, hugo <2689496...@qq.com> wrote:
>
> Hi, Kirill
>
> Sorry for the late reply, thanks for your suggestion.
> A simple fix has been added to the attached patch.
>
> --
> hugo
>

Hi! This field is actually used after 14e87ff. Just like Álvaro stated in [0]

Patch status is now Rejected.

[0] 
https://www.postgresql.org/message-id/202411051209.hzs5jktf6e3s@alvherre.pgsql

-- 
Best regards,
Kirill Reshke

Re: Changing shared_buffers without restart

2024-11-28 Thread Robert Haas

On Thu, Nov 28, 2024 at 11:30 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> on Linux, this fact is already used in the patch. The idea that we could put
> upper boundary on the size of other mappings based on total available memory
> comes from the fact that anonymous mappings, that are much larger than memory,
> will fail without overcommit. With overcommit it becomes different, but if
> allocations are hitting that limit I can imagine there are bigger problems 
> than
> shared buffer resize.
>
> This approach follows the same ideas already used in the patch, and have the
> same trade offs: no address changes, but questions about portability.

I definitely welcome the fact that you have some platform-specific
knowledge of the Linux behavior, because that's expertise that is
obviously quite useful here and which I lack. I'm personally not
overly concerned about whether it works on every other platform -- I
would prefer an implementation that works everywhere, but I'd rather
have one that works on Linux than have nothing. It's unclear to me why
operating systems don't offer better primitives for this sort of thing
-- in theory there could be a system call that sets aside a pool of
address space and then other system calls that let you allocate
shared/unshared memory within that space or even at specific
addresses, but actually such things don't exist.

All that having been said, what does concern me a bit is our ability
to predict what Linux will do well enough to keep what we're doing
safe; and also whether the Linux behavior might abruptly change in the
future. Users would be sad if we released this feature and then a
future kernel upgrade causes PostgreSQL to completely stop working. I
don't know how the Linux kernel developers actually feel about this
sort of thing, but if I imagine myself as a kernel developer, I can
totally see myself saying "well, we never promised that this would
work in any particular way, so we're free to change it whenever we
like." We've certainly used that argument here countless times.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Added prosupport function for estimating numeric generate_series rows

2024-11-28 Thread 孤傲小二~阿沐

Hello, thank you very much for your attention and guidance. I have modified and 
improved the problem you mentioned. The patch of version v2 is attached 
below. 


Regarding regression testing, I implemented it with the help of some facilities 
of generate_series_timestamp_support last time. Everything is normal in Cirrus 
CI test.


Looking forward to your reply, thank you very much.


 原始邮件
 
   
发件人：  "Dean Rasheed" 

v2_0001-Added-prosupport-function-for-estimating-numeric-gen.patch
Description: Binary data

Re: Truncate logs by max_log_size

2024-11-28 Thread Kirill Gavrilov

On Thu, Nov 28, 2024 at 5:23 PM Jim Jones  wrote:

>
>
> On 25.11.24 17:52, Kirill Reshke wrote:
> > Hello! Please deliver the most recent patch version and fix the issues
> > Jim identified [0] as the current commitfest draws to a close.
> > Do not forget to include both parts of this patch (This was actually
> > developed off-list, and we are now using this on our cloud PostgreSQL
> > distribution on versions 12–17).
> >
> > [0]
> https://www.postgresql.org/message-id/35096a36-04d4-480b-a7cd-a2d8151fb737%40uni-muenster.de
>
> In addition to these points, this feature seems to fail with queries
> containing special characters (more than one byte):
>
> With this "max_log_size"..
>
> postgres=# SHOW max_log_size;
>  max_log_size
> --
>  20
> (1 row)
>
> ... and this query ..
>
> postgres=# SELECT "ÄÜÖ" FROM t;
> ERROR:  relation "t" does not exist
> LINE 1: SELECT "ÄÜÖ" FROM t;
>
> .. this is the [truncated] log entry we get ..
>
> 2024-11-28 14:58:57.912 CET [2258876] ERROR:  relation "t" does not
> exist at character 19
> 2024-11-28 14:58:57.912 CET [2258876] STATEMENT:  SELECT "ÄÜÖ" FROM
>
> ... although the query originally had exactly 20 characters:
>
> postgres=# SELECT length('SELECT "ÄÜÖ" FROM t;');
>  length
> 
>  20
> (1 row)
>
>
> postgres=# SELECT length('ÄÜÖ'::bytea), length('AUO'::bytea);
>  length | length
> +
>   6 |  3
> (1 row)
>
> If it is supposed to be like this, it should be clearly stated so in the
> docs.
>
> --
> Jim
>
>   Here is version 3 of this patch. I found another place where this
setting can be applied.
  Also added some documentation and specified that this setting truncates
queries by size in bytes.


V3-0001-parameter-max_log_size-to-truncate-logs.patch
Description: Binary data

Re: More CppAsString2() in psql's describe.c

2024-11-28 Thread Daniel Gustafsson

> On 28 Nov 2024, at 19:25, Corey Huinker  wrote:

>> LGTM, I didn't scan for omissions but the ones in the patch look right to me.
>> I sort of wish we had a shorter macro as CppAsString2() get's pretty verbose
>> when used frequently like this.
> 
> I don't quite understand the etymology of the name (it's some variation on 
> C++'s std::to_string, plus...something), but if I did, I'd probably find the 
> name less icky.

AFAIK, Cpp stands for "C Pre Processor" as it is a preprocessor expansion into
a string, and the 2 is simply because CppAsString2() calls CppAsString().  The
reason for the call is perform macro expansion before converting to string.

I agree that it's not icky, it just get's very verbose as opposed how
translation of strings is done with _() as an example.

--
Daniel Gustafsson

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Ranier Vilela

Em qui., 28 de nov. de 2024 às 16:03, Alena Rybakina <
a.rybak...@postgrespro.ru> escreveu:

> Hi! Thank you for the case.
>
> On 28.11.2024 21:00, Alexander Lakhin wrote:
> > Hello Alexander,
> >
> > 21.11.2024 09:34, Alexander Korotkov wrote:
> >> I'm going to push this if no objections.
> >
> > Please look at the following query, which triggers an error after
> > ae4569161:
> > SET random_page_cost = 1;
> > CREATE TABLE tbl(u UUID);
> > CREATE INDEX idx ON tbl USING HASH (u);
> > SELECT COUNT(*) FROM tbl WHERE u = '' OR
> >   u = '';
> >
> > ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
> > LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625
> >
> >
> I found out what the problem is index scan method was not generated. We
> need to check this during OR clauses for SAOP transformation.
>
> There is a patch to fix this problem.
>
Hi.
Thanks for the quick fix.

But I wonder if it is not possible to avoid all if the index is useless?
Maybe moving your fix to the beginning of the function?

diff --git a/src/backend/optimizer/path/indxpath.c
b/src/backend/optimizer/path/indxpath.c
index d827fc9f4d..5ea0b27d01 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3248,6 +3248,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
  Assert(IsA(orclause, BoolExpr));
  Assert(orclause->boolop == OR_EXPR);

+ /* Ignore index if it doesn't support index scans */
+ if(!index->amsearcharray)
+ return NULL;
+
  /*
  * Try to convert a list of OR-clauses to a single SAOP expression. Each
  * OR entry must be in the form: (indexkey operator constant) or (constant

The test bug:
EXPLAIN SELECT COUNT(*) FROM tbl WHERE u =
'' OR u =
'';
QUERY PLAN
--
 Aggregate  (cost=12.46..12.47 rows=1 width=8)
   ->  Bitmap Heap Scan on tbl  (cost=2.14..12.41 rows=18 width=0)
 Recheck Cond: ((u = '----'::uuid)
OR (u = '----'::uuid))
 ->  BitmapOr  (cost=2.14..2.14 rows=18 width=0)
   ->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9
width=0)
 Index Cond: (u =
'----'::uuid)
   ->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9
width=0)
 Index Cond: (u =
'----'::uuid)
(8 rows)

best regards,
Ranier Vilela

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Alena Rybakina


On 28.11.2024 22:28, Ranier Vilela wrote:
Em qui., 28 de nov. de 2024 às 16:03, Alena Rybakina 
 escreveu:


Hi! Thank you for the case.

On 28.11.2024 21:00, Alexander Lakhin wrote:
> Hello Alexander,
>
> 21.11.2024 09:34, Alexander Korotkov wrote:
>> I'm going to push this if no objections.
>
> Please look at the following query, which triggers an error after
> ae4569161:
> SET random_page_cost = 1;
> CREATE TABLE tbl(u UUID);
> CREATE INDEX idx ON tbl USING HASH (u);
> SELECT COUNT(*) FROM tbl WHERE u =
'' OR
>   u = '';
>
> ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
> LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625
>
>
I found out what the problem is index scan method was not
generated. We
need to check this during OR clauses for SAOP transformation.

There is a patch to fix this problem.

Hi.
Thanks for the quick fix.

But I wonder if it is not possible to avoid all if the index is useless?
Maybe moving your fix to the beginning of the function?

diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c

index d827fc9f4d..5ea0b27d01 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3248,6 +3248,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
  Assert(IsA(orclause, BoolExpr));
  Assert(orclause->boolop == OR_EXPR);

+ /* Ignore index if it doesn't support index scans */
+ if(!index->amsearcharray)
+ return NULL;
+

Agree. I have updated the patch

  /*
  * Try to convert a list of OR-clauses to a single SAOP expression. Each
  * OR entry must be in the form: (indexkey operator constant) or 
(constant


The test bug:
EXPLAIN SELECT COUNT(*) FROM tbl WHERE u = 
'' OR u = 
'';

QUERY PLAN
--
 Aggregate  (cost=12.46..12.47 rows=1 width=8)
   ->  Bitmap Heap Scan on tbl  (cost=2.14..12.41 rows=18 width=0)
         Recheck Cond: ((u = 
'----'::uuid) OR (u = 
'----'::uuid))

         ->  BitmapOr  (cost=2.14..2.14 rows=18 width=0)
               ->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 
width=0)
                     Index Cond: (u = 
'----'::uuid)
               ->  Bitmap Index Scan on idx  (cost=0.00..1.07 rows=9 
width=0)
                     Index Cond: (u = 
'----'::uuid)

(8 rows)


Thank you

--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index d827fc9f4d9..5ea0b27d014 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3248,6 +3248,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
 	Assert(IsA(orclause, BoolExpr));
 	Assert(orclause->boolop == OR_EXPR);
 
+	/* Ignore index if it doesn't support index scans */
+	if(!index->amsearcharray)
+		return NULL;
+
 	/*
 	 * Try to convert a list of OR-clauses to a single SAOP expression. Each
 	 * OR entry must be in the form: (indexkey operator constant) or (constant
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 1b0a5f0e9e1..ef8bbf4748c 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1233,6 +1233,23 @@ SELECT count(*) FROM tenk1 WHERE stringu1 = 'TV';
 14
 (1 row)
 
+-- Transform OR-clauses to SAOP's shouldn't be chosen
+SET random_page_cost = 1;
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tenk1 WHERE stringu1 = 'TV' OR  stringu1 = 'TVAAAB';
+ QUERY PLAN 
+
+ Aggregate
+   ->  Bitmap Heap Scan on tenk1
+ Recheck Cond: ((stringu1 = 'TV'::name) OR (stringu1 = 'TVAAAB'::name))
+ ->  BitmapOr
+   ->  Bitmap Index Scan on hash_tuplesort_idx
+ Index Cond: (stringu1 = 'TV'::name)
+   ->  Bitmap Index Scan on hash_tuplesort_idx
+ Index Cond: (stringu1 = 'TVAAAB'::name)
+(8 rows)
+
+RESET random_page_cost;
 DROP INDEX hash_tuplesort_idx;
 RESET maintenance_work_mem;
 --
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index ddd0d9ad396..0e6529f3f3d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -372,6 +372,11 @@ CREATE INDEX hash_tuplesort_idx ON tenk1 USING hash (stringu1 name_ops) WITH (fi
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1 WHERE stringu1 = 'TV';

Re: Fix comment in reorderbuffer.h

2024-11-28 Thread vignesh C

On Thu, 28 Nov 2024 at 06:40, Peter Smith  wrote:
>
> Hi, while reviewing another patch I noticed a poorly worded code comment.
>
> Please find the attached trivial fix.

I agree with this finding, the patch looks good to me.

Regards,
Vignesh

Re: Changing shared_buffers without restart

2024-11-28 Thread Dmitry Dolgov

> On Wed, Nov 27, 2024 at 04:05:47PM GMT, Robert Haas wrote:
> On Wed, Nov 27, 2024 at 3:48 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > My understanding is that clashing of mappings (either at creation time
> > or when resizing) could happen only withing the process address space,
> > and the assumption is that by the time we prepare the mapping layout all
> > the rest of mappings for the process are already done.
>
> I don't think that's correct at all. First, the user could type LOAD
> 'whatever' at any time. But second, even if they don't or you prohibit
> them from doing so, the process could allocate memory for any of a
> million different things, and that could require mapping a new region
> of memory, and the OS could choose to place that just after an
> existing mapping, or at least close enough that we can't expand the
> object size as much as desired.
>
> If we had an upper bound on the size of shared_buffers and could
> reserve that amount of address space at startup time but only actually
> map a portion of it, then we could later remap and expand into the
> reserved space. Without that, I think there's absolutely no guarantee
> that the amount of address space that we need is available when we
> want to extend a mapping.

Just done a couple of experiments, and I think this could be addressed by
careful placing of mappings as well, based on two assumptions: for a new
mapping the kernel always picks up a lowest address that allows enough space,
and the maximum amount of allocable memory for other mappings could be derived
from total available memory. With that in mind the shared mapping layout will
have to have a large gap at the start, between the lowest address and the
shared mappings used for buffers and rest -- the gap where all the other
mapping (allocations, libraries, madvise, etc) will land. It's similar to
address space reserving you mentioned above, will reduce possibility of
clashing significantly, and looks something like this:

01339000-0139e000 [heap]
0139e000-014aa000 [heap]
7f2dd72f6000-7f2dfbc9c000 /memfd:strategy (deleted)
7f2e0209c000-7f2e269b /memfd:checkpoint (deleted)
7f2e2cdb-7f2e516b4000 /memfd:iocv (deleted)
7f2e57ab4000-7f2e7c478000 /memfd:descriptors (deleted)
7f2ebc478000-7f2ee8d3c000 /memfd:buffers (deleted)
^ note the distance between two mappings,
  which is intended for resize
7f3168d3c000-7f318d60 /memfd:main (deleted)
^ here is where the gap starts
7f4194c0-7f4194e7d000
^ this one is an anonymous maping created due to large
  memory allocation after shared mappings were created
7f419500-7f419527d000
7f41952dc000-7f4195416000
7f4195416000-7f419560 /dev/shm/PostgreSQL.2529797530
7f419560-7f41a311d000 /usr/lib/locale/locale-archive
7f41a317f000-7f41a320
7f41a320-7f41a3201000 /usr/lib64/libicudata.so.74.2

The assumption about picking up a lowest address is just how it works right now
on Linux, this fact is already used in the patch. The idea that we could put
upper boundary on the size of other mappings based on total available memory
comes from the fact that anonymous mappings, that are much larger than memory,
will fail without overcommit. With overcommit it becomes different, but if
allocations are hitting that limit I can imagine there are bigger problems than
shared buffer resize.

This approach follows the same ideas already used in the patch, and have the
same trade offs: no address changes, but questions about portability.

Re: Using read stream in autoprewarm

2024-11-28 Thread Matheus Alcantara

On Wednesday, November 27th, 2024 at 11:19 AM, Nazir Bilal Yavuz 
 wrote:

> > v2-0001-Use-read-stream-in-autoprewarm.patch
> > + bool *rs_have_free_buffer = per_buffer_data;
> > +
> > +
> > + *rs_have_free_buffer = true;
> > +
> >
> > Not sure if I understand why this variable is needed, it seems that 
it is only
> > written and never read? Just as comparison, the 
block_range_read_stream_cb
> > callback used on pg_prewarm seems to not use the per_buffer_data 
parameter.

>
>
> Actually, it is read in the main loop of the
> autoprewarm_database_main() function:
>
> /* There are no free buffers left in shared buffers, break the loop. */
> else if (!(*rs_have_free_buffer))
> break;
>
> apw_read_stream_next_block() callback function sets
> rs_have_free_buffer's value to false when there is no free buffer left
> in the shared buffers. And the code above terminates the main loop in
> the autoprewarm_database_main() function when it is set to false.
>
> block_range_read_stream_cb() callback is used when the callback only
> needs to loop over the block numbers. However, for the autoprewarm
> case; the callback function needs to do additional checks so another
> callback and the use of this variable are required.

Ohh, I see, thanks very much for the explanation.

> v3 is attached.

Thanks.

I don't know if there is another way that this patch could be tested? 
Looking

forward on other reviews on this.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Alena Rybakina


Hi! Thank you for the case.

On 28.11.2024 21:00, Alexander Lakhin wrote:

Hello Alexander,

21.11.2024 09:34, Alexander Korotkov wrote:

I'm going to push this if no objections.


Please look at the following query, which triggers an error after 
ae4569161:

SET random_page_cost = 1;
CREATE TABLE tbl(u UUID);
CREATE INDEX idx ON tbl USING HASH (u);
SELECT COUNT(*) FROM tbl WHERE u = '' OR
  u = '';

ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625


I found out what the problem is index scan method was not generated. We 
need to check this during OR clauses for SAOP transformation.


There is a patch to fix this problem.

--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index d827fc9f4d9..61110db65dd 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3303,6 +3303,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
 		if (get_op_rettype(opno) != BOOLOID)
 			break;
 
+		/* Ignore index if it doesn't support index scans */
+		if(!index->amsearcharray)
+			break;
+
 		/*
 		 * Check for clauses of the form: (indexkey operator constant) or
 		 * (constant operator indexkey).  Determine indexkey side first, check
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index ddd0d9ad396..0e6529f3f3d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -372,6 +372,11 @@ CREATE INDEX hash_tuplesort_idx ON tenk1 USING hash (stringu1 name_ops) WITH (fi
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1 WHERE stringu1 = 'TV';
 SELECT count(*) FROM tenk1 WHERE stringu1 = 'TV';
+-- Transform OR-clauses to SAOP's shouldn't be chosen
+SET random_page_cost = 1;
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tenk1 WHERE stringu1 = 'TV' OR  stringu1 = 'TVAAAB';
+RESET random_page_cost;
 DROP INDEX hash_tuplesort_idx;
 RESET maintenance_work_mem;

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Alena Rybakina

Sorry, I was in a hurry and forgot to add the test result. I updated the 
patch


On 28.11.2024 22:03, Alena Rybakina wrote:

Hi! Thank you for the case.

On 28.11.2024 21:00, Alexander Lakhin wrote:

Hello Alexander,

21.11.2024 09:34, Alexander Korotkov wrote:

I'm going to push this if no objections.


Please look at the following query, which triggers an error after 
ae4569161:

SET random_page_cost = 1;
CREATE TABLE tbl(u UUID);
CREATE INDEX idx ON tbl USING HASH (u);
SELECT COUNT(*) FROM tbl WHERE u = '' OR
  u = '';

ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625


I found out what the problem is index scan method was not generated. 
We need to check this during OR clauses for SAOP transformation.


There is a patch to fix this problem.


--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index d827fc9f4d9..61110db65dd 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3303,6 +3303,10 @@ match_orclause_to_indexcol(PlannerInfo *root,
 		if (get_op_rettype(opno) != BOOLOID)
 			break;
 
+		/* Ignore index if it doesn't support index scans */
+		if(!index->amsearcharray)
+			break;
+
 		/*
 		 * Check for clauses of the form: (indexkey operator constant) or
 		 * (constant operator indexkey).  Determine indexkey side first, check
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 1b0a5f0e9e1..ef8bbf4748c 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1233,6 +1233,23 @@ SELECT count(*) FROM tenk1 WHERE stringu1 = 'TV';
 14
 (1 row)
 
+-- Transform OR-clauses to SAOP's shouldn't be chosen
+SET random_page_cost = 1;
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tenk1 WHERE stringu1 = 'TV' OR  stringu1 = 'TVAAAB';
+ QUERY PLAN 
+
+ Aggregate
+   ->  Bitmap Heap Scan on tenk1
+ Recheck Cond: ((stringu1 = 'TV'::name) OR (stringu1 = 'TVAAAB'::name))
+ ->  BitmapOr
+   ->  Bitmap Index Scan on hash_tuplesort_idx
+ Index Cond: (stringu1 = 'TV'::name)
+   ->  Bitmap Index Scan on hash_tuplesort_idx
+ Index Cond: (stringu1 = 'TVAAAB'::name)
+(8 rows)
+
+RESET random_page_cost;
 DROP INDEX hash_tuplesort_idx;
 RESET maintenance_work_mem;
 --
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index ddd0d9ad396..0e6529f3f3d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -372,6 +372,11 @@ CREATE INDEX hash_tuplesort_idx ON tenk1 USING hash (stringu1 name_ops) WITH (fi
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1 WHERE stringu1 = 'TV';
 SELECT count(*) FROM tenk1 WHERE stringu1 = 'TV';
+-- Transform OR-clauses to SAOP's shouldn't be chosen
+SET random_page_cost = 1;
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tenk1 WHERE stringu1 = 'TV' OR  stringu1 = 'TVAAAB';
+RESET random_page_cost;
 DROP INDEX hash_tuplesort_idx;
 RESET maintenance_work_mem;

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-11-28 Thread Masahiro Ikeda


On 2024-11-26 07:32, Peter Geoghegan wrote:
On Mon, Nov 25, 2024 at 4:07 AM Masahiro Ikeda 
 wrote:

One thing I am concerned about is that it reduces the cases where the
optimization of _bt_checkkeys_look_ahead() works effectively, as the
approach
skips the skip immediately on the first occurrence per page.


I noticed that with the recent v17 revision of the patch, my original
MDAM paper "sales_mdam_paper" test case (the complicated query in the
introductory email of this thread) was about 2x slower. That's just
not okay, obviously. But the issue was relatively easy to fix: it was
fixed by making _bt_readpage not apply the "skipskip" optimization
when on the first page for the current primitive index scan -- we
already do this with the "precheck" optimization, so it's natural to
do it with the "skipskip" optimization as well.

The "sales_mdam_paper" test case involves thousands of primitive index
scans that each access only one leaf page. But each leaf page returns
2 non-adjoining tuples, with quite a few non-matching tuples "in
between" the matching tuples. There is one matching tuple for "store =
200", and another for "store = 250" -- and there's non-matching stores
201 - 249 between these two, which we want _bt_checkkeys_look_ahead to
skip over. This is exactly the kind of case where the
_bt_checkkeys_look_ahead() optimization is expected to help.


Great! Your new approach strikes a good balance between the trade-offs
of "skipskip" and "look ahead" optimization. Although the regression 
case

I provided seems to be a corner case, your regression case is realistic
and should be addressed.


Again, the above results are provided for reference, as I believe that
many users prioritize stability and I'd like to take your new 
approach.


Adversarial cases specifically designed to "make the patch look bad"
are definitely useful review feedback. Ideally, the patch will be 100%
free of regressions -- no matter how unlikely (or even silly) they may
seem. I always prefer to not have to rely on anybody's opinion of what
is likely or unlikely.  :-)

A quick test seems to show that this particular regression is more or
less fixed by v18. As you said, the _bt_checkkeys_look_ahead stuff is
the issue here (same with the MDAM sales query). You should confirm
that the issue has actually been fixed, though.


Thanks to your new patch, I have confirmed that the issue is fixed.

I have no comments on the new patches. If I find any new regression
cases, I'll report them.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-11-28 Thread Junwang Zhao

On Fri, Nov 29, 2024 at 9:07 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 28 Nov 2024 19:02:57 +0800,
>   Junwang Zhao  wrote:
>
> >> > I tested 3 branches:
> >> >
> >> > 1. the master branch
> >> > 2. all v26 patch sets applied
> >> > 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some
> >> > if branch in CopyOneRowTo, so I was expecting this slower than master
> >> >
> >> > You can see the detailed results here[2], I can not upload files so I
> >> > just shared the google doc link, ping me if you can not open the link.
> >> >
> >> > [1]: 
> >> > https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com
> >> > [2]: 
> >> > https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing
> >>
> >> Thanks for sharing your numbers.
> >>
> >> 1. and 2. shows that there is at least no significant
> >> performance regression.
> >
> > Agreed.
>
> Can we focus on only 1. and 2. in this thread?
>
> >> I see the patch set of 3. and I think that the result
> >> (there is no performance difference between 1. and 3.) isn't
> >> strange. The patch set adds some if branches but they aren't
> >> used with "text" format at least in per row process.
> >
> > It is not used in "text" format, but it adds some assembly code
> > to the CopyOneRowTo function, so this will have some impact
> > on the cpu i cache I guess.
> >
> > There is difference between 1 and 3, 3 is always better than 1
> > upto 4% improvement
>
> Can we discuss 1. and 3. in the [1] thread?

This thread and [1] thread are kind of interleaved, I chose this thread
to share the numbers because I think this feature should be committed
first and then adapt the *copy to json* as a contrib module.

Committers on this thread seem worried about the performance
drawback, so what I tried to do is that *if 2 is slightly worse than 1,
but better than 3*, then we can commit 2 first, but I did not get
the expected number.

>
> (Anyway, we may want to confirm whether these numbers are
> reproducible or not as the first step.)
>
> >  I forgot to mention that the comparisons
> > are in *sheet2*.
>
> Thanks. I missed it.
>
>
> Thanks,
> --
> kou



-- 
Regards
Junwang Zhao

Re: Remove useless GROUP BY columns considering unique index

2024-11-28 Thread David Rowley

On Fri, 29 Nov 2024 at 15:02, David Rowley  wrote:
> I've attached an updated patch that gets rid of the
> get_unique_not_null_attnos() function completely and uses the
> RelOptInfo.indexlist and RelOptInfo.notnullattnums.

I forgot to do a local commit before sending v8. Fixed in the attached v9.

David


v9-0001-remove-useless-group-by-columns-via-unique-not-nu.patch
Description: Binary data

Re: Converting SetOp to read its two inputs separately

2024-11-28 Thread David Rowley

On Thu, 14 Nov 2024 at 22:33, Richard Guo  wrote:
> BTW, I noticed a typo in the comment for function

> - * Returns false when sorted paths are not any more useful then unsorted
> + * Returns false when sorted paths are not any more useful than unsorted

I pushed a fix for that.

Thanks.

David

Re: Using read stream in autoprewarm

2024-11-28 Thread Kirill Reshke

On Wed, 27 Nov 2024 at 19:20, Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> Thank you for looking into this!
>
> On Wed, 27 Nov 2024 at 16:50, Matheus Alcantara  wrote:
> > I've executed the same test of 5 databases with each of them having 1 table 
> > of
> > 3GB of size and I've got very similar results.
> >
> > I've also tested using a single database with 4 tables with ~60GB of size 
> > and
> > the results compared with master was more closer but still an improvement. 
> > Note
> > that I've also increased the default shared_buffers to 7GB to see how it 
> > works
> > with large buffer pools.
> >   - patched: 5.4259 s
> >   - master: 5.53186 s
>
> Thanks for the testing.
>
> > Not to much to say about the code, I'm currently learning more about the 
> > read
> > stream API and Postgresql hacking itself. Just some minor points and 
> > questions
> > about the patches.
> >
> >
> > v2-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch
> > --- a/src/backend/storage/buffer/freelist.c
> > +/*
> > + * get_number_of_free_buffers -- a lockless way to get the number of free
> > + *  buffers in 
> > buffer pool.
> > + *
> > + * Note that result continuosly changes as free buffers are moved out by 
> > other
> > + * operations.
> > + */
> > +int
> > +get_number_of_free_buffers(void)
> >
> > typo on continuosly -> continuously
>
> Done.
>
> > v2-0001-Use-read-stream-in-autoprewarm.patch
> > +   bool   *rs_have_free_buffer = per_buffer_data;
> > +
> > +
> > +   *rs_have_free_buffer = true;
> > +
> >
> > Not sure if I understand why this variable is needed, it seems that it is 
> > only
> > written and never read? Just as comparison, the block_range_read_stream_cb
> > callback used on pg_prewarm seems to not use the per_buffer_data parameter.
>
> Actually, it is read in the main loop of the
> autoprewarm_database_main() function:
>
> /* There are no free buffers left in shared buffers, break the loop. 
> */
> else if (!(*rs_have_free_buffer))
> break;
>
> apw_read_stream_next_block() callback function sets
> rs_have_free_buffer's value to false when there is no free buffer left
> in the shared buffers. And the code above terminates the main loop in
> the autoprewarm_database_main() function when it is set to false.
>
> block_range_read_stream_cb() callback is used when the callback only
> needs to loop over the block numbers. However, for the autoprewarm
> case; the callback function needs to do additional checks so another
> callback and the use of this variable are required.
>
> v3 is attached.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft

Hi!

> + old_blk = &(p->block_info[p->pos - 1]);
> + cur_blk = &(p->block_info[p->pos]);
Should we Assert(p->pos > 0 && p->pos < *something*)

Patch tested with no regression.


-- 
Best regards,
Kirill Reshke

RE: Introduce XID age and inactive timeout based replication slot invalidation

2024-11-28 Thread Hayato Kuroda (Fujitsu)

Dear Nisha,

> 
> Attached v51 patch-set addressing all comments in [1] and [2].
>

Thanks for working on the feature! I've stated to review the patch.
Here are my comments - sorry if there are something which have already been 
discussed.
The thread is too long to follow correctly.

Comments for 0001
=

01. binary_upgrade_logical_slot_has_caught_up

ISTM that error_if_invalid is set to true when the slot can be moved forward, 
otherwise
it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, 
however,
only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I 
feel
it is OK to set to true. Thought?

02. ReplicationSlotAcquire

According to other functions, we are adding to a note to the translator when
parameters represent some common nouns, GUC names. I feel we should add a 
comment
for RS_INVAL_WAL_LEVEL part based on it.


Comments for 0002
=

03. check_replication_slot_inactive_timeout

Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and 
also
pg_createsubscriber?) starts a server process? Several parameters have already 
been
specified via -c option at that time. This can avoid an error while the 
upgrading.
Note that this part is still needed even if you accept the comment. Users can
manually boot with upgrade mode.

04. ReplicationSlotAcquire

Same comment as 02.

05. ReportSlotInvalidation

Same comment as 02.

06. found bug

While testing the patch, I found that slots can be invalidated too early when 
when
the GUC is quite large. I think because an overflow is caused in 
InvalidatePossiblyObsoleteSlot().

- Reproducer

I set the replication_slot_inactive_timeout to INT_MAX and executed below 
commands,
and found that the slot is invalidated.

```
postgres=# SHOW replication_slot_inactive_timeout;
 replication_slot_inactive_timeout 
---
 2147483647s
(1 row)
postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 
'test_decoding');
 slot_name |lsn
---+---
 test  | 0/18B7F38
(1 row)
postgres=# CHECKPOINT ;
CHECKPOINT
postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM 
pg_replication_slots ;
 slot_name |inactive_since | invalidation_reason 
---+---+-
 test  | 2024-11-28 07:50:25.927594+00 | inactive_timeout
(1 row)
```

- analysis

In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 
1000
is passed to the third argument of TimestampDifferenceExceeds(), which is also 
the
integer datatype. This causes an overflow and parameter is handled as the small
value.

- solution

I think there are two possible solutions. You can choose one of them:

a. Make the maximum INT_MAX/1000, or
b. Change the unit to millisecond.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: UUID v7

2024-11-28 Thread Andrey M. Borodin




> On 28 Nov 2024, at 04:07, Sergey Prokhorenko  
> wrote:
> 
> It would be useful to add a standard comparative benchmark with several 
> parameters and use cases to the patch, so that IT departments can compare 
> UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and 
> conditions.
> 
> I know for a fact that IT departments make such benchmarks of low quality. 
> They usually measure the generation rate, which is meaningless because it is 
> usually excessive. It makes sense to measure the rate of single-threaded and 
> multi-threaded insertion of a large number of records (with and without 
> partitioning), as well as the rate of execution of queries to join big 
> tables, to update or delete a large number of records. It is important to 
> measure memory usage, processor load, etc.

Publishing benchmarks seems to be far beyond what our documentation go for. 
Mostly, because benchmarks are tricky. You can prove anything with benchmarks.

Everyone is welcome to publish benchmark results in their blogs, but IMO docs 
have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39 applied on 
my MB Air M2 I get:

postgres=# create table table_for_uuidv4(id uuid primary key);
CREATE TABLE
Time: 9.479 ms
postgres=# insert into table_for_uuidv4 select uuidv4() from 
generate_series(1,3e7);
INSERT 0 3000
Time: 2003918.770 ms (33:23.919)
postgres=# create table table_for_uuidv7(id uuid primary key);
CREATE TABLE
Time: 3.930 ms
postgres=# insert into table_for_uuidv7 select uuidv7() from 
generate_series(1,3e7);
INSERT 0 3000
Time: 337001.315 ms (05:37.001)

Almost an order of magnitude better :)


Best regards, Andrey Borodin.

Re: Remove an unnecessary check on semijoin_target_ok() on postgres_fdw.c

2024-11-28 Thread Alexander Pyhalov


Tender Wang писал(а) 2024-10-09 10:26:

Hi,
   When I debug FDW join pushdown codes, I found below codes in
semijoin_target_ok():
if (bms_is_member(var->varno, innerrel->relids) &&

!bms_is_member(var->varno, outerrel->relids))

As far as I know, if a var belongs to the innerrel of joinrel, it's
not possible that it
may belong to the outerrel. So if the bms_is_member(var->varno,
innerrel->relids)
returns TRUE, then !bms_is_member(var->varno, outerrel->relids) must
be TRUE.
If bms_is_member(var->varno, innerrel->relids) returns FALSE,
!bms_is_member(var->varno, outerrel->relids) will not execute due to
short circuit.

So I think we can remove the "!bms_is_member(var->varno,
outerrel->relids)" from if.
Any thoughts?


Hi.
Seems good to me.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Added prosupport function for estimating numeric generate_series rows

2024-11-28 Thread Dean Rasheed

On Thu, 28 Nov 2024 at 07:47, 孤傲小二~阿沐  wrote:
>
> Hello hackers, I saw a recent submission: Teach planner how to estimate rows 
> for timestamp generate_series. I provide a patch for the numeric type here, 
> and a simple test is as follows:
>
> I really want to know your thoughts, please give me feedback. Thank you.
>

Good idea.

Some random review comments:

This should test for special inputs, NaN and infinity (it doesn't make
sense to convert those to NumericVars). generate_series() produces an
error for all such inputs, so the support function can just not
produce an estimate for these cases (the same as when the step size is
zero).

NumericVars initialised using init_var() should be freed using
free_var(). That can be avoided for the 3 inputs, by using
init_var_from_num(), rather than set_var_from_num(), which saves
copying digit arrays. It should then be possible to write this using a
single additional allocated NumericVar and one init_var()/free_var()
pair.

There's no need to use floor(), since the div_var() call already
produces a floored integer result.

It could use some regression test cases.

Regards,
Dean

postgres_fdw could deparse ArrayCoerceExpr

2024-11-28 Thread Alexander Pyhalov


Hi.

Recently, we were surprised by the following behavior - prepared 
statement, selecting data from foreign table with varchar(N) field 
couldn't push down "field = ANY($1)" expression, when switched to 
generic plan. This looked like shown in the attached patch. Reproducer 
is simple:


create extension postgres_fdw;
create server local foreign data wrapper postgres_fdw;
create user MAPPING FOR CURRENT_USER SERVER local;
create table test (c varchar(255));
create foreign table ftest (c varchar(255)) server local options 
(table_name 'test');


set plan_cache_mode to force_generic_plan ; -- just for demonstration, 
can happen with defautl plan_cache_mode, if planner decides that generic 
plan is preferable


prepare s(varchar[]) as select * from ftest where c = any ($1);

explain verbose execute s('{test}');
  QUERY PLAN
--
 Foreign Scan on public.ftest  (cost=100.00..143.43 rows=7 width=516)
   Output: c
   Filter: ((ftest.c)::text = ANY (($1)::text[]))
   Remote SQL: SELECT c FROM public.test

The issue is that we need to translate input array type from varchar[] 
to text[].


Attaching patch to allow postgres_fdw to deparse such conversion.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom efc5eb338db6621243e25e0b0281ae69c465974d Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Wed, 27 Nov 2024 14:07:39 +0300
Subject: [PATCH] postgres_fdw could deparse ArrayCoerceExpr

---
 contrib/postgres_fdw/deparse.c| 50 +++
 .../postgres_fdw/expected/postgres_fdw.out| 21 
 contrib/postgres_fdw/sql/postgres_fdw.sql |  9 
 3 files changed, 80 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 4680d517331..008237cb8f8 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -160,6 +160,7 @@ static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context);
 static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node,
 	 deparse_expr_cxt *context);
 static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context);
+static void deparseArrayCoerceExpr(ArrayCoerceExpr *node, deparse_expr_cxt *context);
 static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context);
 static void deparseNullTest(NullTest *node, deparse_expr_cxt *context);
 static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
@@ -696,6 +697,34 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_ArrayCoerceExpr:
+			{
+ArrayCoerceExpr *e = (ArrayCoerceExpr *) node;
+
+/*
+ * Recurse to input subexpression.
+ */
+if (!foreign_expr_walker((Node *) e->arg,
+		 glob_cxt, &inner_cxt, case_arg_cxt))
+	return false;
+
+/*
+ * T_ArrayCoerceExpr must not introduce a collation not
+ * derived from an input foreign Var (same logic as for a
+ * function).
+ */
+collation = e->resultcollid;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
 		case T_BoolExpr:
 			{
 BoolExpr   *b = (BoolExpr *) node;
@@ -2913,6 +2942,9 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
 		case T_RelabelType:
 			deparseRelabelType((RelabelType *) node, context);
 			break;
+		case T_ArrayCoerceExpr:
+			deparseArrayCoerceExpr((ArrayCoerceExpr *) node, context);
+			break;
 		case T_BoolExpr:
 			deparseBoolExpr((BoolExpr *) node, context);
 			break;
@@ -3501,6 +3533,24 @@ deparseRelabelType(RelabelType *node, deparse_expr_cxt *context)
 		   node->resulttypmod));
 }
 
+/*
+ * Deparse a ArrayCoerceExpr (array-type conversion) node.
+ */
+static void
+deparseArrayCoerceExpr(ArrayCoerceExpr *node, deparse_expr_cxt *context)
+{
+	deparseExpr(node->arg, context);
+
+	/*
+	 * No difference how to deparse explicit cast, but if we omit implicit
+	 * cast in the query, it'll be more user-friendly
+	 */
+	if (node->coerceformat != COERCE_IMPLICIT_CAST)
+		appendStringInfo(context->buf, "::%s",
+		 deparse_type_name(node->resulttype,
+		   node->resulttypmod));
+}
+
 /*
  * Deparse a BoolExpr node.
  */
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f2bcd6aa98c..55a8ac9020e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1169,6 +1169,27 @@ SELECT * FROM ft1 WHERE CASE c3 COLLATE "C" WHEN c6 THEN true ELSE c3 < 'bar' EN
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
 (4 rows)
 
+-- Test array type conversion pushdown
+SET plan_cach

Re: pg_stat_statements and "IN" conditions

2024-11-28 Thread Kirill Reshke

On Wed, 14 Aug 2024 at 01:06, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Sun, Aug 11, 2024 at 09:34:55PM GMT, Dmitry Dolgov wrote:
> > > On Sun, Aug 11, 2024 at 07:54:05PM +0300, Sergei Kornilov wrote:
> > >
> > > This feature will improve my monitoring. Even in patch 0001. I think 
> > > there are many other people in the thread who think this is useful. So 
> > > maybe we should move it forward? Any complaints about the overall design? 
> > > I see in the discussion it was mentioned that it would be good to measure 
> > > performance difference.
> > >
> > > PS: patch cannot be applied at this time, it needs another rebase.
> >
> > Yeah, it seems like most people are fine with the first patch and the
> > simplest approach. I'm going to post a rebased version and a short
> > thread summary soon.
>
> Ok, here is the rebased version. If anyone would like to review them, below is
> the short summary of the thread. Currently the patch series contains 4 
> changes:
>
> * 0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patch
>
>   Implements the simplest way to handle constant arrays, if the array contains
>   only constants it will be reduced. This is the basis, if I read it correctly
>   Nathan and Michael expressed that they're mostly fine with this one.
>
>   Michael seems to be skeptical about the "merged" flag in the LocationLen
>   struct, but from what I see the proposed alternative has problems as well.
>   There was also a note that the loop over constants has to be benchmarked, 
> but
>   it's not entirely clear for me in which dimentions to benchmark (i.e. what
>   are the expectations). For both I'm waiting on a reply to my questions.
>
> * 0002-Reusable-decimalLength-functions.patch
>
>   A small refactoring to make already existing "powers" functonality reusable
>   for following patches.
>
> * 0003-Merge-constants-in-ArrayExpr-into-groups.patch
>
>   Makes handling of constant arrays smarter by taking into account number of
>   elements in the array. This way records are merged into groups power of 10,
>   i.e. arrays with length 55 will land in a group 10-99, with lenght 555 in a
>   group 100-999 etc. This was proposed by Alvaro, and personally I like this
>   approach, because it remediates the issue of one-size-fits-all for the 
> static
>   threshold. But there are opinions that this introduces too much complexity.
>
> * 0004-Introduce-query_id_const_merge_threshold.patch
>
>   Fine tuning for the previous patch, makes only arrays with the length over a
>   certain threshold to be reduced.
>
> On top of that Yasuo Honda and Jakub Wartak have provided a couple of 
> practical
> examples, where handling of constant arrays has to be improved. David Geier
> pointed out some examples that might be confusing as well. All those are
> definitely worth addressing, but out of scope of this patch for now.

Hi! Can you please send a rebased version of this?

-- 
Best regards,
Kirill Reshke

Re: Using read_stream in index vacuum

2024-11-28 Thread Kirill Reshke

On Mon, 18 Nov 2024 at 16:34, Andrey M. Borodin  wrote:
>
>
>
> > On 2 Nov 2024, at 02:36, Kirill Reshke  wrote:
> >
> > I noticed CI failure for this patch. This does not look like a flap.
>
> Seems like vacuum did not start index cleanup. I’ve added "index_cleanup on".
> Thanks!
>
>
> Best regards, Andrey Borodin.

Hi!

0001 Looks mature. Some comments:
1)
>+# This ensures autovacuum do not run
>+$node->append_conf('postgresql.conf', 'autovacuum = off');
The other option is to set `autovacuum = off `in relation DDL. I'm not
sure which option is better.

2) Are these used?
my $psql_err = '';
my $psql_out = '';


Should we add tap testing to 0002 & 0003 like 0001 already has?


-- 
Best regards,
Kirill Reshke

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-11-28 Thread Kirill Reshke

On Tue, 19 Nov 2024 at 13:52, jian he  wrote:
>
> On Sat, Nov 16, 2024 at 5:55 PM Kirill Reshke  wrote:
> >
> > I am attaching my v8 for reference.
> >
>
> in your v8.
>
>
> REJECT_LIMIT
> 
>  
>   Specifies the maximum number of errors tolerated while converting a
>   column's input value to its data type, when ON_ERROR 
> is
>   set to ignore.
>   If the input contains more erroneous rows than the specified
> value, the COPY
>   command fails, even with ON_ERROR set to
> ignore.
>  
> 
>
>
> then above description not meet with following example, (i think)
>
> create table t(a int not null);
> COPY t FROM STDIN WITH (on_error set_to_null, reject_limit 2);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> a
> >> \.
> ERROR:  null value in column "a" of relation "t" violates not-null constraint
> DETAIL:  Failing row contains (null).
> CONTEXT:  COPY t, line 1, column a: "a"
>
> Overall, I think
> making the domain not-null align with column level not-null would be a
> good thing.
>
>
>  
>   Specifies how to behave when encountering an error converting a column's
>   input value into its data type.
>   An error_action value of
>   stop means fail the command,
>   ignore means discard the input row and
> continue with the next one, and
>   set_to_null means replace columns containing
> erroneous input values with null and move to the
> next row.
>
> "and move to the next row" is wrong?
> I think it should be " and move to the next field".

Hi! There is not too much time left in this CF, so I moved to the next one.
If you are going to work on this patch, I'm waiting on your feedback
or a v9 patch that answers the issues brought up in this discussion.

-- 
Best regards,
Kirill Reshke

Re: Changing shared_buffers without restart

2024-11-28 Thread Tom Lane

Matthias van de Meent  writes:
> On Thu, 28 Nov 2024 at 18:19, Robert Haas  wrote:
>> [...] It's unclear to me why
>> operating systems don't offer better primitives for this sort of thing
>> -- in theory there could be a system call that sets aside a pool of
>> address space and then other system calls that let you allocate
>> shared/unshared memory within that space or even at specific
>> addresses, but actually such things don't exist.

> Isn't that more a stdlib/malloc issue? AFAIK, Linux's mmap(2) syscall
> allows you to request memory from the OS at arbitrary addresses - it's
> just that stdlib's malloc doens't expose the 'alloc at this address'
> part of that API.

I think what Robert is concerned about is that there is exactly 0
guarantee that that will succeed, because you have no control over
system-driven allocations of address space (for example, loading
of extensions or JIT code).  In fact, given things like ASLR, there
is pressure on the kernel crew to make that *less* predictable not
more so.  So even if we devise a method that seems to work reliably
today, we could have little faith that it would work with next year's
kernels.

regards, tom lane

Re: [PATCH] Add sortsupport for range types and btree_gist

2024-11-28 Thread Andrey M. Borodin

> On 25 Nov 2024, at 20:40, Bernd Helmle  wrote:
> 
> 

Hi Bernd!

Thanks for the new patch version.

There are still some problems with tests.
0. You rely on order of test execution. "init.sql" test must prepend any other 
test. I doubt it is guaranteed.
1. There's a typo float8__buffering in Makefile
2. cidr type seems to be left behind
3. Tests do not seem to work when your configuration lacks injection points.

You can see how tests with injection points are excluded in other modules...

Perhaps, let's ask Michael.
Michael, we have 30 tests with checks that need injection points. But these 30 
tests also test functionality that needs to be tested even in build without 
injection points.
Do we have to extract checks with injection point into separate regression 
test? So that we can exclude this test in builds without injection points.

Thanks!

Best regards, Andrey Borodin.

Re: POC, WIP: OR-clause support for indexes

2024-11-28 Thread Alexander Lakhin


Hello Alexander,

21.11.2024 09:34, Alexander Korotkov wrote:

I'm going to push this if no objections.


Please look at the following query, which triggers an error after ae4569161:
SET random_page_cost = 1;
CREATE TABLE tbl(u UUID);
CREATE INDEX idx ON tbl USING HASH (u);
SELECT COUNT(*) FROM tbl WHERE u = '' OR
  u = '';

ERROR:  XX000: ScalarArrayOpExpr index qual found where not allowed
LOCATION:  ExecIndexBuildScanKeys, nodeIndexscan.c:1625

Best regards,
Alexander

Re: Changing shared_buffers without restart

2024-11-28 Thread Matthias van de Meent

On Thu, 28 Nov 2024 at 18:19, Robert Haas  wrote:
>
> [...] It's unclear to me why
> operating systems don't offer better primitives for this sort of thing
> -- in theory there could be a system call that sets aside a pool of
> address space and then other system calls that let you allocate
> shared/unshared memory within that space or even at specific
> addresses, but actually such things don't exist.

Isn't that more a stdlib/malloc issue? AFAIK, Linux's mmap(2) syscall
allows you to request memory from the OS at arbitrary addresses - it's
just that stdlib's malloc doens't expose the 'alloc at this address'
part of that API.

Windows seems to have an equivalent API in VirtualAlloc*. Both the
Windows API and Linux's mmap have an optional address argument, which
(when not NULL) is where the allocation will be placed (some
conditions apply, based on flags and specific API used), so, assuming
we have some control on where to allocate memory, we should be able to
reserve enough memory by using these APIs.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: More CppAsString2() in psql's describe.c

2024-11-28 Thread Corey Huinker

>
>
> LGTM, I didn't scan for omissions but the ones in the patch look right to
> me.
> I sort of wish we had a shorter macro as CppAsString2() get's pretty
> verbose
> when used frequently like this.
>

I don't quite understand the etymology of the name (it's some variation on
C++'s std::to_string, plus...something), but if I did, I'd probably find
the name less icky.

STR(), C_STR(), STRING(), and CSTRING() all seem to be available...

Re: Changing shared_buffers without restart

2024-11-28 Thread Dmitry Dolgov

> On Thu, Nov 28, 2024 at 12:18:54PM GMT, Robert Haas wrote:
>
> All that having been said, what does concern me a bit is our ability
> to predict what Linux will do well enough to keep what we're doing
> safe; and also whether the Linux behavior might abruptly change in the
> future. Users would be sad if we released this feature and then a
> future kernel upgrade causes PostgreSQL to completely stop working. I
> don't know how the Linux kernel developers actually feel about this
> sort of thing, but if I imagine myself as a kernel developer, I can
> totally see myself saying "well, we never promised that this would
> work in any particular way, so we're free to change it whenever we
> like." We've certainly used that argument here countless times.

Agree, at the moment I can't say for sure how reliable this behavior is
in long term. I'll try to see if there are ways to get more confidence
about that.

Re: UUID v7

2024-11-28 Thread Peter Eisentraut


On 27.11.24 00:11, Masahiko Sawada wrote:

On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio  wrote:


On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
 wrote:

gen_uuidv7() is OK


I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()


uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the 
renaissance of UUIDs. So we should not depend on legacy technology names


agreed



It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
is discussion whether we should add 'gen_' or 'get_' but let's go back
to the previously-agreed function name 'uuidv7()' for now. We can
rename it later if we find a better name.

I've attached the new version patch that incorporated all comments and
renamed the functions. Also I avoided using 'if defined(__darwin__) ||
defined(_MSC_VER)' twice.


* doc/src/sgml/func.sgml

The function variant uuidv7(interval) is not documented.


* src/backend/utils/adt/uuid.c

+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{

This should be a block comment, like

/*
 * Set the ...
 */

+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
...
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)

Is "ns" the timestamp argument?  What format is it?  Explain.

+   /*
+* Shift the current timestamp by the given interval. To make correct
+* calculating the time shift, we convert the UNIX epoch to TimestampTz
+* and use timestamptz_pl_interval(). Since this calculation is done 
with

+* microsecond precision, we carry back the nanoseconds.
+*/

This needs a bit of grammar tweaking, I think: "To make correct calculating"

I don't know what the meaning of "carry back" is.

+   Interval   *span = PG_GETARG_INTERVAL_P(0);

Not sure why this is named "span"?  Maybe "shift" would be better?


* src/include/catalog/pg_proc.dat

+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp 
shifted on specific interval',


better "shifted by"?


* src/test/regress/expected/opr_sanity.out

+uuidv4()
+uuidv7()
+uuidv7(interval)

Functions without arguments don't need to be marked leakproof.

uuidv7(interval) internally calls timestamptz_pl_interval(), which is
not leakproof, so I don't think that classification is sound.


* src/test/regress/sql/uuid.sql

+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7

Make the whitespace of the comment consistent with the rest of the file.

-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 
'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis 
test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 
'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test 
vector for v1


Here as well.

Re: Strange assertion in procarray.c

2024-11-28 Thread Michail Nikolaev

Hello, Heikki, Nathan and Michael!

Oh, please excuse my impudence in bringing you all here, but I finally
found what almost the same issue was fixed by Heikki already [0].

I discovered that a similar issue was previously addressed by Heikki in
commit [0], where installcheck was disabled for injection point tests.
However, in the meson build configuration, this was only applied to
regression tests - the isolation and TAP tests are still running during
installcheck.

As demonstrated in the previously shared reproducer [1], even *local*
injection points can cause backend crashes through unexpected side effects.
Therefore, I propose extending the installcheck disable to cover both TAP
and isolation tests as well.

I've attached a patch implementing these changes.

A patch with such change is attached.

Best regards,
Mikhail.

[0]:
https://github.com/postgres/postgres/commit/e2e3b8ae9ed73fcd3096c5ca93971891a7767388
[1]:
https://www.postgresql.org/message-id/flat/CANtu0ojbx6%3DesP8euQgzD1CN6tigTQvDmupwEmLTHZT%3D6_yx_A%40mail.gmail.com#18544d553544da67b4fc1ef764df3c3d

>
From e56b6de29aea01916d35d22e9a59241a04202b37 Mon Sep 17 00:00:00 2001
From: nkey 
Date: Wed, 27 Nov 2024 02:10:17 +0100
Subject: [PATCH v2] Test to reproduce issue with crash caused by passing throw
 injection points during transaction aborting (caused by call to
 injection_init_shmem).

---
 src/backend/utils/time/snapmgr.c  |  2 +
 src/test/modules/injection_points/Makefile|  2 +-
 .../injection_points/expected/crash.out   | 26 ++
 src/test/modules/injection_points/meson.build |  1 +
 .../modules/injection_points/specs/crash.spec | 47 +++
 5 files changed, 77 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/injection_points/expected/crash.out
 create mode 100644 src/test/modules/injection_points/specs/crash.spec

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 7d2b34d4f20..3a7357a050d 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -64,6 +64,7 @@
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/injection_point.h"
 
 
 /*
@@ -426,6 +427,7 @@ InvalidateCatalogSnapshot(void)
pairingheap_remove(&RegisteredSnapshots, 
&CatalogSnapshot->ph_node);
CatalogSnapshot = NULL;
SnapshotResetXmin();
+   INJECTION_POINT("invalidate_catalog_snapshot_end");
}
 }
 
diff --git a/src/test/modules/injection_points/Makefile 
b/src/test/modules/injection_points/Makefile
index 0753a9df58c..da8930ea49f 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -13,7 +13,7 @@ PGFILEDESC = "injection_points - facility for injection 
points"
 REGRESS = injection_points reindex_conc
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace
+ISOLATION = basic crash inplace
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/crash.out 
b/src/test/modules/injection_points/expected/crash.out
new file mode 100644
index 000..7d7f298c786
--- /dev/null
+++ b/src/test/modules/injection_points/expected/crash.out
@@ -0,0 +1,26 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s4_attach_locally wx1 rxwy2 c1 ry3 c2 c3
+injection_points_set_local
+--
+  
+(1 row)
+
+step s4_attach_locally: SELECT 
injection_points_attach('invalidate_catalog_snapshot_end', 'wait');
+injection_points_attach
+---
+   
+(1 row)
+
+step wx1: update D1 set id = id + 1;
+step rxwy2: update D2 set id = (select id+1 from D1);
+step c1: COMMIT;
+step ry3: select id from D2;
+id
+--
+ 1
+(1 row)
+
+step c2: COMMIT;
+ERROR:  could not serialize access due to read/write dependencies among 
transactions
+step c3: COMMIT;
diff --git a/src/test/modules/injection_points/meson.build 
b/src/test/modules/injection_points/meson.build
index 58f19001157..6079187dd57 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -43,6 +43,7 @@ tests += {
   'isolation': {
 'specs': [
   'basic',
+  'crash',
   'inplace',
 ],
   },
diff --git a/src/test/modules/injection_points/specs/crash.spec 
b/src/test/modules/injection_points/specs/crash.spec
new file mode 100644
index 000..55e6a5434ab
--- /dev/null
+++ b/src/test/modules/injection_points/specs/crash.spec
@@ -0,0 +1,47 @@
+setup
+{
+ create table D1 (id int primary key not null);
+ create table D2 (id int primary key not null);
+ insert into D1 values (1);
+ insert into D2 values (1);
+
+ CREATE EXTENSION injection_points;
+}
+
+teardown
+{
+ DROP TABLE D1, D2;
+ DROP EXTENSION injection_points;
+}
+
+session s1
+setup  { BEGIN ISOLATION LEVEL SERIALIZABLE;}
+step wx1   { update D1 set id = id + 1; }
+step c1

Re: [PATCH] Add support for displaying database service in psql prompt

2024-11-28 Thread Michael Banck

Hi,

On Tue, Nov 19, 2024 at 07:47:58PM -0500, Greg Sabino Mullane wrote:
> Compiled and tested: works fine, so +1 from me. Honestly, I was surprised
> %s was still available. :)

Thanks. Was that full review? You kept the commitfest item in "Needs
Review" state.


Michael

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-11-28 Thread Alena Rybakina

Hi!

On 27.11.2024 16:36, Matthias van de Meent wrote:

On Wed, 27 Nov 2024 at 14:22, Alena Rybakina wrote:

Sorry it took me so long to answer, I had some minor health complications

On 12.11.2024 23:00, Peter Geoghegan wrote:

On Sun, Nov 10, 2024 at 2:00 PM Alena Rybakina
wrote:

Or maybe I was affected by fatigue, but I don’t understand this point, to be
honest. I see from the documentation and your first letter that it specifies
how many times in total the tuple search would be performed during the index
execution. Is that not quite right?

Well, nodes that appear on the inner side of a nested loop join (and
in a few other contexts) generally have their row counts (and a few
other things) divided by the total number of executions. The idea is
that we're showing the average across all executions of the node -- if
the user wants the true absolute number, they're expected to multiply
nrows by nloops themselves. This is slightly controversial behavior,
but it is long established (weirdly, we never divide by nloops for
"Buffers").

I understood what you mean and I faced this situation before when I saw
extremely more number of actual rows that could be and it was caused by the
number of scanned tuples per cycles. [0]

[0]
https://www.postgresql.org/message-id/flat/9f4a159b-f527-465f-b82e-38b4b7df8...@postgrespro.ru

Initial versions of my patch didn't do this. The latest version does
divide like this, though. In general it isn't all that likely that an
inner index scan would have more than a single primitive index scan,
in any case, so which particular behavior I use here (divide vs don't
divide) is not something that I feel strongly about.

I think we should divide them because by dividing the total buffer usage by the
number of loops, user finds the average buffer consumption per loop. This gives
them a clearer picture of the resource intensity per basic unit of work.

I disagree; I think the whole "dividing by number of loops and
rounding up to integer" was the wrong choice for tuple count, as that
makes it difficult if not impossible to determine the actual produced
count when it's less than the number of loops. Data is lost in the
rounding/processing, and I don't want to have lost that data.

Same applies for ~scans~ searches: If we do an index search, we should
show it in the count as total sum, not partial processed value. If a
user is interested in per-loopcount values, then they can derive that
value from the data they're presented with; but that isn't true when
we present only the divided-and-rounded value.

To be honest, I didn't understand how it will be helpful because there
is an uneven distribution of buffer usage from cycle to cycle, isn't it?

I thought that the dividing memory on number of cycles helps us to
normalize the metric to account for the repeated iterations. This gives
us a clearer picture of the resource intensity per basic unit of work,
rather than just the overall total. Each loop may consume a different
amount of buffer space, but by averaging it out, we're smoothing those
fluctuations into a more representative measure.

Moreover, this does not correspond to another metric that is nearby -
the number of lines processed by the algorithm for the inner node. Will
not the user who evaluates the query plan be confused by such a discrepancy?

--
Regards,
Alena Rybakina
Postgres Professional

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

2024-11-28 Thread Michail Nikolaev

Hello, everyone!

I've improved the test stability. The revised version should provide
consistent results in all test runs.

Best regards,
Mikhail.

>
From e9a562587a509ace581d7ccf40eb41eb73b93b6f Mon Sep 17 00:00:00 2001
From: nkey 
Date: Sun, 24 Nov 2024 14:55:13 +0100
Subject: [PATCH v5 4/4] Modify the ExecInitPartitionInfo function to consider 
 partitioned indexes that are potentially processed by REINDEX CONCURRENTLY as
   arbiters as well.

This is necessary to ensure that all concurrent transactions use the same set 
of arbiter indexes.

The patch resolves the issues in the following specs:
* reindex_concurrently_upsert_partitioned
---
 src/backend/executor/execPartition.c | 119 ---
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 76518862291..a41d5f1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -483,6 +483,48 @@ ExecFindPartition(ModifyTableState *mtstate,
return rri;
 }
 
+/*
+ * IsIndexCompatibleAsArbiter
+ * Checks if the indexes are identical in terms of being used
+ * as arbiters for the INSERT ON CONFLICT operation by comparing
+ * them to the provided arbiter index.
+ *
+ * Returns the true if indexes are compatible.
+ */
+static bool
+IsIndexCompatibleAsArbiter(RelationarbiterIndexRelation,
+  IndexInfo  *arbiterIndexInfo,
+  Relation indexRelation,
+  IndexInfo  *indexInfo)
+{
+   int i;
+
+   if (arbiterIndexInfo->ii_Unique != indexInfo->ii_Unique)
+   return false;
+   /* it is not supported for cases of exclusion constraints. */
+   if (arbiterIndexInfo->ii_ExclusionOps != NULL || 
indexInfo->ii_ExclusionOps != NULL)
+   return false;
+   if (arbiterIndexRelation->rd_index->indnkeyatts != 
indexRelation->rd_index->indnkeyatts)
+   return false;
+
+   for (i = 0; i < indexRelation->rd_index->indnkeyatts; i++)
+   {
+   int arbiterAttoNo = 
arbiterIndexRelation->rd_index->indkey.values[i];
+   int attoNo = 
indexRelation->rd_index->indkey.values[i];
+   if (arbiterAttoNo != attoNo)
+   return false;
+   }
+
+   if (list_difference(RelationGetIndexExpressions(arbiterIndexRelation),
+   
RelationGetIndexExpressions(indexRelation)) != NIL)
+   return false;
+
+   if (list_difference(RelationGetIndexPredicate(arbiterIndexRelation),
+   
RelationGetIndexPredicate(indexRelation)) != NIL)
+   return false;
+   return true;
+}
+
 /*
  * ExecInitPartitionInfo
  * Lock the partition and initialize ResultRelInfo.  Also setup 
other
@@ -693,6 +735,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState 
*estate,
if (rootResultRelInfo->ri_onConflictArbiterIndexes != NIL)
{
List   *childIdxs;
+   List   *nonAncestorIdxs = NIL;
+   inti, j, additional_arbiters = 0;
 
childIdxs = 
RelationGetIndexList(leaf_part_rri->ri_RelationDesc);
 
@@ -703,23 +747,74 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState 
*estate,
ListCell   *lc2;
 
ancestors = get_partition_ancestors(childIdx);
-   foreach(lc2, 
rootResultRelInfo->ri_onConflictArbiterIndexes)
+   if (ancestors)
{
-   if (list_member_oid(ancestors, 
lfirst_oid(lc2)))
-   arbiterIndexes = 
lappend_oid(arbiterIndexes, childIdx);
+   foreach(lc2, 
rootResultRelInfo->ri_onConflictArbiterIndexes)
+   {
+   if (list_member_oid(ancestors, 
lfirst_oid(lc2)))
+   arbiterIndexes = 
lappend_oid(arbiterIndexes, childIdx);
+   }
}
+   else /* No ancestor was found for that index. 
Save it for rechecking later. */
+   nonAncestorIdxs = 
lappend_oid(nonAncestorIdxs, childIdx);
list_free(ancestors);
}
+
+   /*
+* If any non-ancestor indexes are found, we need to 
compare them with other
+* indexes of the relat

95 matches

Mail list logo