date:20231017

odd behaviour with serial, non null and partitioned table

2023-10-17 Thread Ashutosh Bapat

Hi Alvaro,

Problem 1

#create table tpart (a serial primary key, src varchar) partition by range(a);
CREATE TABLE
#create table t_p4 (a int primary key, src varchar);
CREATE TABLE
#\d tpart
   Partitioned table "public.tpart"
 Column |   Type| Collation | Nullable |
Default
+---+---+--+--
 a  | integer   |   | not null |
nextval('tpart_a_seq'::regclass)
 src| character varying |   |  |
Partition key: RANGE (a)
Indexes:
"tpart_pkey" PRIMARY KEY, btree (a)
Number of partitions: 0

#\d t_p4;
 Table "public.t_p4"
 Column |   Type| Collation | Nullable | Default
+---+---+--+-
 a  | integer   |   | not null |
 src| character varying |   |  |
Indexes:
"t_p4_pkey" PRIMARY KEY, btree (a)

Notice that both tpart and t_p4 have their column 'a' marked NOT NULL resp.

#select conname, contype, conrelid::regclass from pg_constraint where
conrelid in ('tpart'::regclass, 't_p4'::regclass);
 conname  | contype | conrelid
--+-+--
 tpart_a_not_null | n   | tpart
 tpart_pkey   | p   | tpart
 t_p4_pkey| p   | t_p4
(3 rows)

But tparts NOT NULL constraint is recorded in pg_constraint but not
t_p4's. Is this expected?

Both of them have there column a marked not null in pg_attribute
#select attrelid::regclass, attname, attnotnull from pg_attribute
where attrelid in ('tpart'::regclass, 't_p4'::regclass) and attname =
'a';
 attrelid | attname | attnotnull
--+-+
 tpart   | a   | t
 t_p4 | a   | t
(2 rows)

>From the next set of commands it can be inferred that the NOT NULL
constraint of tpart came because of serial column whereas t_p4's
column a was marked NOT NULL because of primary key. I didn't
investigate the source code.
#create table t_serial(a serial, src varchar);
CREATE TABLE
#select conname, contype, conrelid::regclass from pg_constraint where
conrelid in ('t_serial'::regclass);
   conname   | contype | conrelid
-+-+--
 t_serial_a_not_null | n   | t_serial
(1 row)

#select attrelid::regclass, attname, attnotnull from pg_attribute
where attrelid in ('t_serial'::regclass) and attname = 'a';
 attrelid | attname | attnotnull
--+-+
 t_serial | a   | t
(1 row)

Here's what I was trying to do actually.
#alter table tpart attach partition t_p4 for values from (7) to (9);
ERROR:  column "a" in child table must be marked NOT NULL
This is a surprise since t_p4.a is marked as NOT NULL. That happens
because MergeConstraintsIntoExisting() only looks at pg_constraint and
not pg_attribute. Should this function look at pg_attribute as well?

This behaviour is different from PG 14. I chanced to have a PG 14
build and hence tried that. I haven't tried PG 15 though.
#select version();
version
---
 PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

#create table tpart (a serial primary key, src varchar) partition by range(a);
CREATE TABLE
#create table t_p4 (a int primary key, src varchar);
CREATE TABLE
#\d tpart
   Partitioned table "public.tpart"
 Column |   Type| Collation | Nullable |
Default
+---+---+--+--
 a  | integer   |   | not null |
nextval('tpart_a_seq'::regclass)
 src| character varying |   |  |
Partition key: RANGE (a)
Indexes:
"tpart_pkey" PRIMARY KEY, btree (a)
Number of partitions: 0

#\d t_p4
 Table "public.t_p4"
 Column |   Type| Collation | Nullable | Default
+---+---+--+-
 a  | integer   |   | not null |
 src| character varying |   |  |
Indexes:
"t_p4_pkey" PRIMARY KEY, btree (a)

#select conname, contype, conrelid::regclass from pg_constraint where
conrelid in ('tpart'::regclass, 't_p4'::regclass);
  conname   | contype | conrelid
+-+--
 tpart_pkey | p   | tpart
 t_p4_pkey  | p   | t_p4
(2 rows)
 ^
#select attrelid::regclass, attname, attnotnull from pg_attribute
where attrelid in ('tpart'::regclass, 't_p4'::regclass) and attname =
'a';
 attrelid | attname | attnotnull
--+-+
 tpart| a   | t
 t_p4 | a   | t
(2 rows)

#alter table tpart attach partition t_p4 for values from (7) to (9);
ALTER TABLE
postgres@1073836=#\d tpart

Re: remaining sql/json patches

2023-10-17 Thread Anton A. Melnikov




On 17.10.2023 07:02, Amit Langote wrote:


One thing jian he missed during the debugging is that
ExecEvalJsonExprCoersion() receives the EMPTY ARRAY value via
*op->resvalue/resnull, set by ExecEvalJsonExprBehavior(), because
that's the ON EMPTY behavior specified in the constraint.  The bug was
that the code in ExecEvalJsonExprCoercion() failed to set val_string
to that value ("[]") before passing to InputFunctionCallSafe(), so the
latter would assume the input is NULL.


Thank a lot for this remark!

I tried to dig to the transformJsonOutput() to fix it earlier at the analyze 
stage,
but it looks like a rather hard way.

Maybe simple in accordance with you note remove the second condition from this 
line:
if (jb && JB_ROOT_IS_SCALAR(jb)) ?

There is a simplified reproduction before such a fix:
postgres=# select JSON_QUERY(jsonb '[]', '$' RETURNING char(5) OMIT QUOTES 
EMPTY ON EMPTY);
server closed the connection unexpectedly
This probably means the server terminated abnormally

after:
postgres=# select JSON_QUERY(jsonb '[]', '$' RETURNING char(5) OMIT QUOTES 
EMPTY ON EMPTY);
 json_query

 []
(1 row)

And at the moment i havn't found any side effects of that fix.
Please point me if i'm missing something.

With the best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Synchronizing slots from primary to standby

2023-10-17 Thread Peter Smith

FYI - the latest patch failed to apply.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
error: patch failed: src/include/utils/guc_hooks.h:160
error: src/include/utils/guc_hooks.h: patch does not apply

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: remaining sql/json patches

2023-10-17 Thread Amit Langote

Hi Anton,

On Tue, Oct 17, 2023 at 4:11 PM Anton A. Melnikov
 wrote:
> On 17.10.2023 07:02, Amit Langote wrote:
>
> > One thing jian he missed during the debugging is that
> > ExecEvalJsonExprCoersion() receives the EMPTY ARRAY value via
> > *op->resvalue/resnull, set by ExecEvalJsonExprBehavior(), because
> > that's the ON EMPTY behavior specified in the constraint.  The bug was
> > that the code in ExecEvalJsonExprCoercion() failed to set val_string
> > to that value ("[]") before passing to InputFunctionCallSafe(), so the
> > latter would assume the input is NULL.
> >
> Thank a lot for this remark!
>
> I tried to dig to the transformJsonOutput() to fix it earlier at the analyze 
> stage,
> but it looks like a rather hard way.

Indeed.  As I said, the problem was a bug in ExecEvalJsonExprCoercion().

>
> Maybe simple in accordance with you note remove the second condition from 
> this line:
> if (jb && JB_ROOT_IS_SCALAR(jb)) ?

Yeah, that's how I would fix it.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

serial and partitioned table

2023-10-17 Thread Ashutosh Bapat

Hi All,
#create table tpart (a serial primary key, src varchar) partition by range(a);
CREATE TABLE
#create table t_p4 (a int primary key, src varchar);
CREATE TABLE
To appease the gods of surprises I need to add a NOT NULL constraint. See [1].
#alter table t_p4 alter column a set not null;
ALTER TABLE
#alter table tpart attach partition t_p4 for values from (7) to (9);
ALTER TABLE
#\d t_p4
 Table "public.t_p4"
 Column |   Type| Collation | Nullable | Default
+---+---+--+-
 a  | integer   |   | not null |
 src| character varying |   |  |
Partition of: tpart FOR VALUES FROM (7) TO (9)
Indexes:
"t_p4_pkey" PRIMARY KEY, btree (a)

The partition was attached but the gods of surprises forgot to set the
default value for a, which gets set when we create a partition
directly.
#create table t_p3 partition of tpart for values from (5) to (7);
CREATE TABLE
#\d t_p3
 Table "public.t_p3"
 Column |   Type| Collation | Nullable |
Default
+---+---+--+--
 a  | integer   |   | not null |
nextval('tpart_a_seq'::regclass)
 src| character varying |   |  |
Partition of: tpart FOR VALUES FROM (5) TO (7)
Indexes:
"t_p3_pkey" PRIMARY KEY, btree (a)

Gods of surprises have another similar gift.
#create table t_p2(a serial primary key, src varchar);
CREATE TABLE
#alter table tpart attach partition t_p2 for values from (3) to (5);
ALTER TABLE
#\d t_p2
 Table "public.t_p2"
 Column |   Type| Collation | Nullable |
Default
+---+---+--+-
 a  | integer   |   | not null |
nextval('t_p2_a_seq'::regclass)
 src| character varying |   |  |
Partition of: tpart FOR VALUES FROM (3) TO (5)
Indexes:
"t_p2_pkey" PRIMARY KEY, btree (a)
Observe that t_p2 uses a different sequence, not the sequence used by
the parttiioned table tpart.

I think this behaviour is an unexpected result of using inheritance
for partitioning. Also partitions not getting default values from the
partitioned table may be fine except in the case of serial columns.
Unlike inheritance hierarchy, a partitioned table is expected to be a
single table. Thus a serial column is expected to have monotonically
increasing values across the partitions. So partitions should use the
same sequence as the parent table. If the new partition being attached
uses a different sequence than the partitioned table, we should
prohibit it from being attached.

This raises the question of what should be the behaviour on detach
partitions. I haven't studied the behaviour of inherited properties.
But it looks like the partition being detached should let go of the
inherited properties and keep the non-inherited one (even those which
were retained after merging).

I found this behaviour when experimenting with serial columns when
reading [2]. The result of this discussion will have some impact on
how we deal with IDENTITY columns in partitioned tables.

[1] 
https://www.postgresql.org/message-id/CAExHW5uRUtDfU0R8zXofQxCV3S1B%2BPa%2BX%2BNrpMwzKraLc25%3DEg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/70be435b-05db-06f2-7c01-9bb8ee2fccce%40enterprisedb.com

--
Best Wishes,
Ashutosh Bapat

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-17 Thread David Rowley

On Mon, 16 Oct 2023 at 05:56, Tom Lane  wrote:
> * in initStringInfoFromString, str->maxlen must be set to len+1 not len
>
> * comment in exec_bind_message doesn't look like pgindent will like it
>
> * same in record_recv, plus it has a misspelling "Initalize"
>
> * in stringinfo.c, inclusion of pg_bitutils.h seems no longer needed

Thank you for looking again. I've addressed all of these in the attached.

> I guess the next question is whether we want to stop here or
> try to relax the requirement about NUL-termination.  I'd be inclined
> to call that a separate issue deserving a separate commit, so maybe
> we should go ahead and commit this much anyway.

I am keen to see this relaxed. I agree that a separate effort is best.

David
diff --git a/src/backend/replication/logical/proto.c 
b/src/backend/replication/logical/proto.c
index d52c8963eb..ce9d5b4059 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -879,6 +879,7 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
/* Read the data */
for (i = 0; i < natts; i++)
{
+   char   *buff;
charkind;
int len;
StringInfo  value = &tuple->colvalues[i];
@@ -899,19 +900,16 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
len = pq_getmsgint(in, 4);  /* read length 
*/
 
/* and data */
-   value->data = palloc(len + 1);
-   pq_copymsgbytes(in, value->data, len);
+   buff = palloc(len + 1);
+   pq_copymsgbytes(in, buff, len);
 
/*
 * Not strictly necessary for 
LOGICALREP_COLUMN_BINARY, but
 * per StringInfo practice.
 */
-   value->data[len] = '\0';
+   buff[len] = '\0';
 
-   /* make StringInfo fully valid */
-   value->len = len;
-   value->cursor = 0;
-   value->maxlen = len;
+   initStringInfoFromString(value, buff, len);
break;
default:
elog(ERROR, "unrecognized data representation 
type '%c'", kind);
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 597947410f..b574188d70 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3582,10 +3582,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
/* Ensure we are reading the data into 
our memory context. */

MemoryContextSwitchTo(ApplyMessageContext);
 
-   s.data = buf;
-   s.len = len;
-   s.cursor = 0;
-   s.maxlen = -1;
+   initReadOnlyStringInfo(&s, buf, len);
 
c = pq_getmsgbyte(&s);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c900427ecf..7c0355cb2d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1817,23 +1817,20 @@ exec_bind_message(StringInfo input_message)
 
if (!isNull)
{
-   const char *pvalue = 
pq_getmsgbytes(input_message, plength);
+   char   *pvalue;
 
/*
-* Rather than copying data around, we just set 
up a phony
+* Rather than copying data around, we just 
initialize a
 * StringInfo pointing to the correct portion 
of the message
 * buffer.  We assume we can scribble on the 
message buffer so
 * as to maintain the convention that 
StringInfos have a
 * trailing null.  This is grotty but is a big 
win when
 * dealing with very large parameter strings.
 */
-   pbuf.data = unconstify(char *, pvalue);
-   pbuf.maxlen = plength + 1;
-   pbuf.len = plength;
-   pbuf.cursor = 0;
-
-   csave = pbuf.data[plength];
-   pbuf.data[plength] = '\0';
+   pvalue = unconstify(char *,

Re: pg_stat_statements and "IN" conditions

2023-10-17 Thread Dmitry Dolgov

> On Fri, Oct 13, 2023 at 03:35:19PM +0200, Dmitry Dolgov wrote:
> > On Fri, Oct 13, 2023 at 05:07:00PM +0900, Michael Paquier wrote:
> > Now, it doesn't mean that this approach with the "powers" will never
> > happen, but based on the set of opinions I am gathering on this thread
> > I would suggest to rework the patch as follows:
> > - First implement an on/off switch that reduces the lists in IN and/or
> > ANY to one parameter.  Simply.
> > - Second, refactor the powers routine.
> > - Third, extend the on/off switch, or just implement a threshold with
> > a second switch.
>
> Well, if it will help move this patch forward, why not. To clarify, I'm
> going to split the current implementation into three patches, one for
> each point you've mentioned.

Here is what I had mind. The first patch implements the basic notion of
merging, and I guess everyone agrees on its usefulness. The second and
third implement merging into groups power of 10, which I still find
useful as well. The last one adds a lower threshold for merging on top
of that. My intentions are to get the first one in, ideally I would love
to see the second and third applied as well.

> > When it comes to my opinion, I am not seeing any objections to the
> > feature as a whole, and I'm OK with the first step.  I'm also OK to
> > keep the door open for more improvements in controlling how these
> > IN/ANY lists show up, but there could be more than just the number of
> > items as parameter (say the query size, different behaviors depending
> > on the number of clauses in queries, subquery context or CTEs/WITH,
> > etc. just to name a few things coming in mind).
>
> Interesting point, but now it's my turn to have troubles imagining the
> case, where list representation could be controlled depending on
> something else than the number of elements in it. Do you have any
> specific example in mind?

In the current patch version I didn't add anything yet to address the
question of having more parameters to tune constants merging. The main
obstacle as I see it is that the information for that has to be
collected when jumbling various query nodes. Anything except information
about the ArrayExpr itself would have to be acquired when jumbling some
other parts of the query, not directly related to the ArrayExpr. It
seems to me this interdependency between otherwise unrelated nodes
outweigh the value it brings, and would require some more elaborate (and
more invasive for the purpose of this patch) mechanism to implement.
>From 4367571d3d886a14690ba31ad0163d0d309a52a3 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Sat, 14 Oct 2023 15:00:48 +0200
Subject: [PATCH v15 1/4] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is jumbled. In certain situations it's undesirable, especially
if the list becomes too large.

Make an array of Const expressions contribute only the first/last
elements to the jumble hash. Allow to enable this behavior via the new
GUC query_id_const_merge with the default value off.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane,
Michael Paquier, Sergei Kornilov, Alvaro Herrera, David Geier
Tested-by: Chengxi Sun
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/merging.out   | 167 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  34 +++-
 contrib/pg_stat_statements/sql/merging.sql|  58 ++
 doc/src/sgml/config.sgml  |  28 +++
 doc/src/sgml/pgstatstatements.sgml|  25 ++-
 src/backend/nodes/gen_node_support.pl |   2 +-
 src/backend/nodes/queryjumblefuncs.c  |  86 -
 src/backend/utils/misc/guc_tables.c   |  10 ++
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/nodes/primnodes.h |   2 +
 src/include/nodes/queryjumble.h   |   9 +-
 13 files changed, 406 insertions(+), 20 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/merging.out
 create mode 100644 contrib/pg_stat_statements/sql/merging.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index eba4a95d91a..af731fc9a58 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -19,7 +19,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
-	user_activity wal cleanup oldextversions
+	user_activity wal cleanup oldextversions merging
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clien

Re: Asymmetric partition-wise JOIN

2023-10-17 Thread Andrei Lepikhov


On 16/10/2023 23:21, Ashutosh Bapat wrote:

On Mon, Oct 16, 2023 at 10:24 AM Andrei Lepikhov
Whenever I visited this idea, I hit one issue prominently - how would
we differentiate different scans of the non-partitioned relation.
Normally we do that using different Relids but in this case we
wouldn't be able to know the number of such relations involved in the
query unless we start planning such a join. It's late to add new base
relations and assign them new Relids. Of course I haven't thought hard
about it. I haven't looked at the patch to see whether this problem is
solved and how.

I'm curious, which type of problems do you afraid here? Why we need a 
range table entry for each scan of non-partitioned relation?


--
regards,
Andrey Lepikhov
Postgres Professional

Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-10-17 Thread Michael Paquier

On Mon, Oct 16, 2023 at 01:07:07PM +0300, Nazir Bilal Yavuz wrote:
> Yes, that could be a better solution. Also, having more detailed stats
> for shared and local buffers is helpful. I updated patches in line
> with that:
> 
> 0001: Counts extends same way as a write.

It can change existing query results on an already-released branch,
but we already count the number of blocks when doing a relation
extension, so counting the write time is something I'd rather fix in
v16.  If you have any objections, let me know.

> 0002: Rename blk_{read|write}_time as shared_blk_{read|write}_time.

Note that `git diff --check` complains here.

--- a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
@@ -30,8 +30,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
 OUT local_blks_written int8,
 OUT temp_blks_read int8,
 OUT temp_blks_written int8,
-OUT blk_read_time float8,
-OUT blk_write_time float8
+OUT shared_blk_read_time float8,
+OUT shared_blk_write_time float8

Doing that in an extension upgrade script is incorrect.  These should
not be touched. 

- Total time the statement spent reading data file blocks, in milliseconds
+ Total time the statement spent reading shared data file blocks, in 
milliseconds

Or just shared blocks?  That's what we use elsewhere for
pg_stat_statements.  "shared data file blocks" sounds a bit confusing
for relation file blocks read/written from/to shared buffers.

> 0003: Add new local_blk_{read|write}_time variables.

 DATA = pg_stat_statements--1.4.sql \
+   pg_stat_statements--1.11--1.12.sql \
pg_stat_statements--1.10--1.11.sql \

There is no need to bump again pg_stat_statements, as it has already
been bumped to 1.11 on HEAD per the recent commit 5a3423ad8ee1 from
Daniel.  So the new changes can just be added to 1.11.
--
Michael


signature.asc
Description: PGP signature

Re: Invalidate the subscription worker in cases where a user loses their superuser status

2023-10-17 Thread Amit Kapila

On Fri, Oct 13, 2023 at 11:08 AM Amit Kapila  wrote:
>
> On Fri, Oct 13, 2023 at 10:04 AM vignesh C  wrote:
> >
> > On Thu, 12 Oct 2023 at 11:10, Amit Kapila  wrote:
> > >
> > > On Sun, Oct 8, 2023 at 8:22 AM vignesh C  wrote:
> > > >
> > >
> > > --- a/src/include/catalog/pg_subscription.h
> > > +++ b/src/include/catalog/pg_subscription.h
> > > @@ -127,6 +127,7 @@ typedef struct Subscription
> > >   * skipped */
> > >   char*name; /* Name of the subscription */
> > >   Oid owner; /* Oid of the subscription owner */
> > > + bool ownersuperuser; /* Is the subscription owner a superuser? */
> > >   bool enabled; /* Indicates if the subscription is enabled */
> > >   bool binary; /* Indicates if the subscription wants data in
> > >   * binary format */
> > >
> > > We normally don't change the exposed structure in back branches as
> > > that poses a risk of breaking extensions. In this case, if we want, we
> > > can try to squeeze some padding space or we even can fix it without
> > > introducing a new member. OTOH, it is already debatable whether to fix
> > > it in back branches, so we can even commit this patch just in HEAD.
> >
> > I too feel we can commit this patch only in HEAD.
> >
>
> Fair enough. I'll wait till early next week (say till Monday EOD) to
> see if anyone thinks otherwise and push this patch to HEAD after some
> more testing and review.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-17 Thread Aleksander Alekseev

Hi,

> > LGTM, except for one small detail: I noticed that you misspelled
> > "translations" in the commit message.
>
> Oops. Fixed locally.

v11-0001 and v11-0002 LGTM too. IMO "to assign a XID" sounds better
than "to generate a XID".

-- 
Best regards,
Aleksander Alekseev

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

2023-10-17 Thread Quan Zongliang

On 2023/10/17 12:15, Pavel Stehule wrote:

út 17. 10. 2023 v 3:30 odesílatel Quan Zongliang > napsal:

On 2023/10/16 20:05, Pavel Stehule wrote:
 >
 >
 > po 16. 10. 2023 v 13:56 odesílatel Daniel Gustafsson
mailto:dan...@yesql.se>
 > >> napsal:
 >
 >      > On 16 Oct 2023, at 12:15, Quan Zongliang
mailto:quanzongli...@yeah.net>
 >     >> wrote:
 >
 >      > Implement TODO item:
 >      > PL/pgSQL
 >      > Incomplete item Allow handling of %TYPE arrays, e.g.
tab.col%TYPE[]
 >
 >     Cool!  While I haven't looked at the patch yet, I've wanted this
 >     myself many
 >     times in the past when working with large plpgsql codebases.
 >
 >      > As a first step, deal only with [], such as
 >      > xxx.yyy%TYPE[]
 >      > xxx%TYPE[]
 >      >
 >      > It can be extended to support multi-dimensional and complex
 >     syntax in the future.
 >
 >     Was this omitted due to complexity of implementation or for some
 >     other reason?
 >
Because of complexity.

 >
 > There is no reason for describing enhancement. The size and
dimensions
 > of postgresql arrays are dynamic, depends on the value, not on
 > declaration. Now, this information is ignored, and can be
compatibility
 > break to check and enforce this info.
 >
Yes. I don't think it's necessary.
If anyone needs it, we can continue to enhance it in the future.

I don't think it is possible to do it.  But there is another missing 
functionality, if I remember well. There is no possibility to declare 
variables for elements of array.

The way it's done now is more like laziness.

Is it possible to do that?
If the parser encounters %TYPE[][]. It can be parsed. Then let 
parse_datatype do the rest.

For example, partitioned_table.a%TYPE[][100][]. Parse the type 
name(int4) of partitioned_table.a%TYPE and add the following [][100][]. 
Passing "int4[][100][]" to parse_datatype will give us the array 
definition we want.

Isn't this code a little ugly?

I propose syntax xxx.yyy%ELEMENTTYPE and xxx%ELEMENTTYPE

What do you think about it?
No other relational database can be found with such an implementation. 
But it seems like a good idea. It can bring more convenience to write 
stored procedure.

Regards

Pavel

 >
 >     --
 >     Daniel Gustafsson
 >
 >
 >

Re: Asymmetric partition-wise JOIN

2023-10-17 Thread Ashutosh Bapat

On Tue, Oct 17, 2023 at 2:05 PM Andrei Lepikhov
 wrote:
>
> On 16/10/2023 23:21, Ashutosh Bapat wrote:
> > On Mon, Oct 16, 2023 at 10:24 AM Andrei Lepikhov
> > Whenever I visited this idea, I hit one issue prominently - how would
> > we differentiate different scans of the non-partitioned relation.
> > Normally we do that using different Relids but in this case we
> > wouldn't be able to know the number of such relations involved in the
> > query unless we start planning such a join. It's late to add new base
> > relations and assign them new Relids. Of course I haven't thought hard
> > about it. I haven't looked at the patch to see whether this problem is
> > solved and how.
> >
> I'm curious, which type of problems do you afraid here? Why we need a
> range table entry for each scan of non-partitioned relation?
>

Not RTE but RelOptInfo.

Using the same example as Alexander Korotkov, let's say A is the
nonpartitioned table and P is partitioned table with partitions P1,
P2, ... Pn. The partitionwise join would need to compute AP1, AP2, ...
APn. Each of these joins may have different properties and thus will
require creating paths. In order to save these paths, we need
RelOptInfos which are indentified by relids. Let's assume that the
relids of these join RelOptInfos are created by union of relid of A
and relid of Px (the partition being joined). This is notionally
misleading but doable.

But the clauses of A parameterized by P will produce different
translations for each of the partitions. I think we will need
different RelOptInfos (for A) to store these translations.

The relid is also used to track the scans at executor level. Since we
have so many scans on A, each may be using different plan, we will
need different ids for those.

But if you have developed a way to use a single RelOptInfo of A to do
all this, may be we don't need all this. Will take a look at your next
version of patch.

-- 
Best Wishes,
Ashutosh Bapat

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Jelte Fennema

On Tue, 17 Oct 2023 at 03:23, Peter Geoghegan  wrote:
> My main objection to the new policy is that it's not quite clear what
> process I should go through in order to be 100% confident that koel
> won't start whining (short of waiting around for koel to whine). I
> know how to run pgindent, of course, and haven't had any problems so
> far...but it still seems quite haphazard. If we're going to make this
> a hard rule, enforced on every commit, it should be dead easy to
> comply with the rule.

I think *it is* dead easy to comply. If you run the following commands
before committing/after rebasing, then koel should always be happy:

src/tools/pgindent/pgindent src # works always but a bit slow
src/tools/pgindent/pgindent $(git diff --name-only --diff-filter=ACMR)
# much faster, but only works if you DID NOT change typedefs.list

If you have specific cases where it does not work. Then I think we
should talk about those/fix them. But looking at the last few commits
in .git-blame-ignore-revs I only see examples of people simply not
running pgindent before they commit.

I guess it's easy to forget, but that's why the wiki contains a
pre-commit hook[1] that you can use to remind yourself/run pgindent
automatically. The only annoying thing is that it does not trigger
when rebasing, but you can work around that by using rebase its -x
flag[2].

[1]: https://wiki.postgresql.org/wiki/Working_with_Git#Using_git_hooks
[2]: https://adamj.eu/tech/2022/11/07/pre-commit-run-hooks-rebase/

Re: wal recycling problem

2023-10-17 Thread Fabrice Chapuis

Thanks for your feedback
> How would you know which part of WAL is needed for any specific
replication slot?
change are captured for each published table and written twice,  once in
the current wal and once in the slot-specific wal
> How would you handle multiple replications
for the same table
added information about from which publication a table belongs is entered
in the wal slot
> be it logical or physical replication, retains WAL up to
max_slot_wal_keep_size
ok but if max_slot_wal_keep_size is exceeded the changes are lost and all
of the replicated tables must be resynchronized

Regards

Fabrice

On Sun, Oct 8, 2023 at 3:57 PM Christoph Moench-Tegeder 
wrote:

> ## Fabrice Chapuis (fabrice636...@gmail.com):
>
> > From a conceptual point of view I think that specific wals per
> subscription
> > should be used and stored in the pg_replslot folder in order to avoid
> > working directly on the wals of the instance.
> > What do you think about this proposal?
>
> I think that would open a wholly new can of worms.
> The most obvious point here is: that WAL is primarily generated for
> the operation of the database itself - it's our kind of transaction
> log, or "Redo Log" in other systems' lingo. Replication (be it physical
> or logical) is a secondary purpose (an obvious and important one, but
> still secondary).
> How would you know which part of WAL is needed for any specific
> replication slot? You'd have to decode and filter it, and already
> you're back at square one. How would you handle multiple replications
> for the same table (in the same publication, or even over multiple
> (overlapping) publications) - do you multiply the WAL?
>
> For now, we have "any replication using replication slots, be it logical
> or physical replication, retains WAL up to max_slot_wal_keep_size
> (or "unlimited" if not set - and on PostgreSQL 12 and before); and you
> need to monitor the state of your replication slots", which is a
> totally usabe rule, I think.
>
> Regards,
> Christoph
>
> --
> Spare Space
>

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Jelte Fennema

On Tue, 17 Oct 2023 at 04:57, Michael Paquier  wrote:
> I see an extra reason with not doing that: this increases the
> difficulty when it comes to send and maintain patches to the lists and
> newcomers would need to learn more tooling.  I don't think that we
> should make that more complicated for code-formatting reasons.

Honestly, I don't think it's a huge hurdle for newcomers. Most open
source projects have a CI job that runs automatic code formatting, so
it's a pretty common thing for contributors to deal with. And as long
as we keep it a separate CI job from the normal tests, committers can
even choose to commit the patch if the formatting job fails, after
running pgindent themselves.

And personally as a contributor it's a much nicer experience to see
quickly in CI that I messed up the code style, then to hear it a
week/month later in an email when someone took the time to review and
mentions the styling is way off all over the place.

Re: remaining sql/json patches

2023-10-17 Thread Amit Langote

On Mon, Oct 16, 2023 at 5:21 PM Nikita Malakhov  wrote:
>
> Hi!
>
> With the latest set of patches we encountered failure with the following 
> query:
>
> postgres@postgres=# SELECT JSON_QUERY(jsonpath '"aaa"', '$' RETURNING text);
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
> Time: 11.165 ms
>
> A colleague of mine, Anton Melnikov, proposed the following changes which 
> slightly
> alter coercion functions to process this kind of error correctly.
>
> Please check attached patch set.

Thanks for the patches.

I think I understand patch 1. It makes each of JSON_{QUERY | VALUE |
EXISTS}() use FORMAT JSON for the context item by default, which I
think is the correct behavior.

As for patch 2, maybe the executor part is fine, but I'm not so sure
about the parser part.  Could you please explain why you think the
parser must check error-safety of the target type for allowing IO
coercion for non-ERROR behaviors?

Even if we consider that that's what should be done, it doesn't seem
like a good idea for the parser to implement its own logic for
determining error-safety.  IOW, the parser should really be using some
type cache API.  I thought there might have been a flag in pg_proc
(prosafe) or pg_type (typinsafe), but apparently there isn't.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thanks for reviewing! PSA new version.

> 
> Yeah, I think introducing additional complexity unless it is really
> required sounds a bit scary to me as well. BTW, please find attached
> some cosmetic changes.

Basically LGTM, but below part was conflicted with Bharath's comment [1].

```
@@ -1607,7 +1605,7 @@ check_old_cluster_for_valid_slots(bool live_check)
fclose(script);
 
pg_log(PG_REPORT, "fatal");
-   pg_fatal("Your installation contains logical replication slots 
that cannot be upgraded.\n"
+   pg_fatal("Your installation contains invalid logical 
replication slots.\n"
```

How about " Your installation contains logical replication slots that can't be 
upgraded."?

> One minor additional comment:
> +# Initialize subscriber cluster
> +my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
> +$subscriber->init(allows_streaming => 'logical');
> 
> Why do we need to set wal_level as logical for subscribers?

It is not mandatory. The line was copied from tests in src/test/subscription.
Removed the setting from my patch. I felt that it could be removed from other
patches. I will fork new thread and post the patch.


Also, I did some improvements based on the v50, basically for tests.

1. Test file was refactored. pg_uprade was executed many times in the test so 
the
   test time was increasing. Below refactorings were done.

===
a. Checks for both transactional and non-transactional changes were done at the
   same time.
b. Removed the dry-run test. It did not improve the coverage.
c. Removed the wal_level test. Other tests like subscriptions and test_decoding
   do not contain test for GUCs, so I thought it could be acceptable. Removing
   all the GUC test (for max_replication_slots) might be risky, so it was 
remained.
===

2. Supported the cross-version checks. If an environment variable "oldinstall"
   is set, use the binary as old cluster. If the specified one is PG16-, the
   test verifies that logical replication slots would not be migrated.
   002_pg_upgrade.pl requires that $ENV(olddump) must be also defined, but it
   is not needed for our test. I tried to support from PG9.2, which is the 
oldest
   version for Xupgrade test [2]. You can see 0002 patch for it.
   IIUC pg_create_logical_replication_slot() can be available since PG9.4, so 
tests
   will be skipped if older executables are specified, like:

```
$ oldinstall=/home/hayato/older/pg92/ make check 
PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/003_upgrade_logical_replication_slots.pl .. skipped: Logical replication 
slots can be available since PG9.4
Files=1, Tests=0,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.08 cusr  0.02 
csys =  0.13 CPU)
Result: NOTESTS
```

[1]: 
https://www.postgresql.org/message-id/CALj2ACXp%2BLXioY_%3D9mboEbLD--4c4nnpJCZ%2Bj4fckBdSOQhENA%40mail.gmail.com
[2]: 
https://github.com/PGBuildFarm/client-code/releases#:~:text=support%20for%20testing%20cross%20version%20upgrade%20extended%20back%20to%209.2

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v51-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v51-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


v51-0002-support-cross-version-upgrade.patch
Description: v51-0002-support-cross-version-upgrade.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Hayato Kuroda (Fujitsu)

Dear Vignesh,

Thank you for reviewing! New version can be available in [1].

> 1) Should this:
> +# Copyright (c) 2023, PostgreSQL Global Development Group
> +
> +# Tests for upgrading replication slots
> +
> be:
> "Tests for upgrading logical replication slots"

Fixed.

> 2)  This statement is not entirely true:
> + 
> +  
> +   The old cluster has replicated all the changes to subscribers.
> +  
> 
> If we have some changes like shutdown_checkpoint the upgrade passes,
> if we have some changes like create view whose changes will not be
> replicated the upgrade fails.

Hmm, I felt current description seems sufficient, but how about the below?
"The old cluster has replicated all the transactions and logical decoding
 messages to subscribers."

> 3) All these includes are not required except for "logical.h"
> --- a/src/backend/utils/adt/pg_upgrade_support.c
> +++ b/src/backend/utils/adt/pg_upgrade_support.c
> @@ -11,14 +11,20 @@
> 
>  #include "postgres.h"
> 
> +#include "access/xlogutils.h"
> +#include "access/xlog_internal.h"
>  #include "catalog/binary_upgrade.h"
>  #include "catalog/heap.h"
>  #include "catalog/namespace.h"
>  #include "catalog/pg_type.h"
>  #include "commands/extension.h"
> +#include "funcapi.h"
>  #include "miscadmin.h"
> +#include "replication/logical.h"
> +#include "replication/slot.h"
>  #include "utils/array.h"
>  #include "utils/builtins.h"
> +#include "utils/pg_lsn.h"

I preferred to include all the needed items in each C files, but removed.

> 4) We could print two_phase as true/false instead of 0/1:
> +static void
> +print_slot_infos(LogicalSlotInfoArr *slot_arr)
> +{
> +   /* Quick return if there are no logical slots. */
> +   if (slot_arr->nslots == 0)
> +   return;
> +
> +   pg_log(PG_VERBOSE, "Logical replication slots within the database:");
> +
> +   for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> +   {
> +   LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
> +
> +   pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
> two_phase: %d",
> +  slot_info->slotname,
> +  slot_info->plugin,
> +  slot_info->two_phase);
> +   }
> +}

Fixed.

> 5) test passes without the below, maybe this is not required:
> +# 2. Consume WAL records to avoid another type of upgrade failure. It will be
> +#   tested in subsequent cases.
> +$old_publisher->safe_psql('postgres',
> +   "SELECT count(*) FROM
> pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
> +);

This part is removed because of the refactoring.

> 6) This message "run of pg_upgrade of old cluster with idle
> replication slots" seems wrong:
> +# pg_upgrade will fail because the slot still has unconsumed WAL records
> +command_checks_all(
> +   [
> +   'pg_upgrade', '--no-sync',
> +   '-d', $old_publisher->data_dir,
> +   '-D', $new_publisher->data_dir,
> +   '-b', $bindir,
> +   '-B', $bindir,
> +   '-s', $new_publisher->host,
> +   '-p', $old_publisher->port,
> +   '-P', $new_publisher->port,
> +   $mode,
> +   ],
> +   1,
> +   [
> +   qr/Your installation contains invalid logical
> replication slots./
> +   ],
> +   [qr//],
> +   'run of pg_upgrade of old cluster with idle replication slots');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +   "pg_upgrade_output.d/ not removed after pg_upgrade failure");

Rephased.

> 7) You could run pgindent and pgperlytidy, it shows there are few
> issues present with the patch.

I ran both.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866AC8A7694113BCBE0A71EF5D6A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Is this a problem in GenericXLogFinish()?

2023-10-17 Thread Robert Haas

On Mon, Oct 16, 2023 at 7:31 PM Jeff Davis  wrote:
> Another option might be to just change the hash indexing code to follow
> the correct protocol, locking and calling MarkBufferDirty() in those 3
> call sites. Marking the buffer dirty is easy, but making sure that it's
> locked might require some refactoring. Robert, would following the
> right protocol here affect performance?

Sorry, I'm not sure I understand the question. Are you asking whether
dirtying buffers unnecessarily might be slower than not doing that?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Sun, Oct 15, 2023 at 8:52 PM Peter Geoghegan  wrote:
> On Sat, Aug 12, 2023 at 5:53 PM Peter Geoghegan  wrote:
> > Maybe I'm wrong -- maybe the new policy is practicable. It might even
> > turn out to be worth the bother. Time will tell.
>
> (Two months pass.)
>
> There were two independent fixup commits to address complaints from
> koel just today (from two different committers). Plus there was a
> third issue (involving a third committer) this past Wednesday.
>
> This policy isn't working.

+1. I think this is more annoying than the status quo ante.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fix output of zero privileges in psql

2023-10-17 Thread Laurenz Albe

On Mon, 2023-10-16 at 19:05 -0700, David G. Johnston wrote:
> Reading both threads I'm not seeing any specific rejection of the
> solution that we simply represent empty privileges as "(none)".
> 
> I see an apparent consensus that if we do continue to represent it
> as NULL that the printout should respect \pset null.
> 
> Tom commented in favor of (none) on the other thread with some
> commentary regarding how extremely rare it is; then turns around
> and is "fairly concerned" about changing the current blank presentation
> of its value which is going to happen for some people regardless
> of which approach is chosen.
> 
> Stuart, the original user complainant, seems to favor (none)
> 
> Erik seems to favors (none)
> 
> I favor (none)
> 
> It's unclear to me whether you Laurenz are against (none) or just
> trying to go with the group consensus that doesn't actually seem to
> be against (none).
> 
> I'm in favor of iterating on v1 on this thread (haven't read it yet)
> and presenting it as the proposed solution.  If that ends up getting
> shot down we can revive v5 (still need to review as well).

Thanks for that summary.  I prefer my version (simply display NULLs
as NULLs), but I am not wedded to it.  I had the impression that Tom
would prefer that too, but is woried if the incompatibility introduced
would outweigh the small benefit (of either patch).

So it is clear that we don't have a consensus.

I don't think I want to go ahead with my version of the patch unless
there is more support for it.  I can review Erik's original code, if
that design meets with more favor.

> We should probably post on that thread that this one exists and post a link 
> to it.

Perhaps a good idea, yes.

Yours,
Laurenz Albe

Re: odd behaviour with serial, non null and partitioned table

2023-10-17 Thread Alvaro Herrera

Hello,

On 2023-Oct-17, Ashutosh Bapat wrote:

> Problem 1
> 
> #create table tpart (a serial primary key, src varchar) partition by range(a);
> CREATE TABLE
> #create table t_p4 (a int primary key, src varchar);
> CREATE TABLE

> But tparts NOT NULL constraint is recorded in pg_constraint but not
> t_p4's. Is this expected?

Yes.  tpart gets it from SERIAL, which implicitly requires a NOT NULL
marker.  If you just say PRIMARY KEY as you did for t_p4, the column
gets marked attnotnull, but there's no explicit NOT NULL constraint.

> Here's what I was trying to do actually.
> #alter table tpart attach partition t_p4 for values from (7) to (9);
> ERROR:  column "a" in child table must be marked NOT NULL
> This is a surprise since t_p4.a is marked as NOT NULL. That happens
> because MergeConstraintsIntoExisting() only looks at pg_constraint and
> not pg_attribute. Should this function look at pg_attribute as well?

Hmm ... well, not that way.  Maybe attaching a partition should cause a
NOT NULL constraint to spawn automatically (we do this in other cases).
There's no need to verify the existing rows for it, since attnotnull is
already checked; but it would mean that if you DETACH the partition, the
constraint would remain, so the table would dump slightly differently
than if you hadn't ATTACHed and DETACHed it.  But that sounds OK to me.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)

Re: False "pg_serial": apparent wraparound” in logs

2023-10-17 Thread Imseih (AWS), Sami

> So, I've spent more time on that and applied the simplification today,
> doing as you have suggested to use the head page rather than the tail
> page when the tail XID is ahead of the head XID, but without disabling
> the whole. I've simplified a bit the code and the comments, though,
> while on it (some renames and a slight refactoring of tailPage, for
> example).
> --
> Michael

Thank you!

Regards,

Sami

Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-10-17 Thread Nazir Bilal Yavuz

Hi,

Thanks for the review!

On Tue, 17 Oct 2023 at 11:40, Michael Paquier  wrote:
>
> On Mon, Oct 16, 2023 at 01:07:07PM +0300, Nazir Bilal Yavuz wrote:
> > Yes, that could be a better solution. Also, having more detailed stats
> > for shared and local buffers is helpful. I updated patches in line
> > with that:
> >
> > 0001: Counts extends same way as a write.
>
> It can change existing query results on an already-released branch,
> but we already count the number of blocks when doing a relation
> extension, so counting the write time is something I'd rather fix in
> v16.  If you have any objections, let me know.

I agree.

>
> > 0002: Rename blk_{read|write}_time as shared_blk_{read|write}_time.
>
> Note that `git diff --check` complains here.
>
> --- a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
> +++ b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
> @@ -30,8 +30,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
>  OUT local_blks_written int8,
>  OUT temp_blks_read int8,
>  OUT temp_blks_written int8,
> -OUT blk_read_time float8,
> -OUT blk_write_time float8
> +OUT shared_blk_read_time float8,
> +OUT shared_blk_write_time float8
>
> Doing that in an extension upgrade script is incorrect.  These should
> not be touched.
>
> - Total time the statement spent reading data file blocks, in milliseconds
> + Total time the statement spent reading shared data file blocks, in 
> milliseconds
>
> Or just shared blocks?  That's what we use elsewhere for
> pg_stat_statements.  "shared data file blocks" sounds a bit confusing
> for relation file blocks read/written from/to shared buffers.
>
> > 0003: Add new local_blk_{read|write}_time variables.
>
>  DATA = pg_stat_statements--1.4.sql \
> +   pg_stat_statements--1.11--1.12.sql \
> pg_stat_statements--1.10--1.11.sql \
>
> There is no need to bump again pg_stat_statements, as it has already
> been bumped to 1.11 on HEAD per the recent commit 5a3423ad8ee1 from
> Daniel.  So the new changes can just be added to 1.11.

I updated patches based on your comments. v4 is attached.

Regards,
Nazir Bilal Yavuz
Microsoft
From 4b744c0b4e2cfdee4e95779ee19ac214b85aa150 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 13 Oct 2023 17:35:00 +0300
Subject: [PATCH v4 1/3] Include IOOp IOOP_EXTENDs while calculating block
 write times

Extends are counted as writes in block context, so include IOOP_EXTENDs
while calculating block write times.
---
 src/backend/utils/activity/pgstat_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index eb7d35d422..8ec8670199 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -119,7 +119,7 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 		INSTR_TIME_SET_CURRENT(io_time);
 		INSTR_TIME_SUBTRACT(io_time, start_time);
 
-		if (io_op == IOOP_WRITE)
+		if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND)
 		{
 			pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
 			if (io_object == IOOBJECT_RELATION)
-- 
2.42.0

From 5f299ee59d443a53bf6f3a8e18467270f5ee8318 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 13 Oct 2023 16:28:03 +0300
Subject: [PATCH v4 2/3] Rename I/O timing statistics columns to
 shared_blk_{read|write}_time

Since only shared blocks' timings are counted in blk_{read|write}_time,
rename them as shared_blk_{read|write}_time
---
 .../expected/oldextversions.out   |   4 +-
 .../pg_stat_statements--1.10--1.11.sql|   4 +-
 .../pg_stat_statements/pg_stat_statements.c   |  14 ++-
 doc/src/sgml/pgstatstatements.sgml|   8 +-
 src/backend/commands/explain.c|  29 ++---
 src/backend/executor/instrument.c |  12 +-
 src/backend/utils/activity/pgstat_io.c|   4 +-
 src/include/executor/instrument.h |   4 +-
 src/test/regress/expected/explain.out | 108 +-
 9 files changed, 95 insertions(+), 92 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out
index 64982aad60..c2e0140a47 100644
--- a/contrib/pg_stat_statements/expected/oldextversions.out
+++ b/contrib/pg_stat_statements/expected/oldextversions.out
@@ -284,8 +284,8 @@ AlTER EXTENSION pg_stat_statements UPDATE TO '1.11';
  local_blks_written | bigint   |   |  | 
  temp_blks_read | bigint   |   |  | 
  temp_blks_written  | bigint   |   |  | 
- blk_read_time  | double precision |   |  | 
- blk_write_time | double precision |   |  | 
+ shared_blk_read_time   | double precision |   |  | 
+ shared_blk_write_time  | double precision |   |  | 
  temp_blk_read

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 6:34 AM Jelte Fennema  wrote:
> I think *it is* dead easy to comply. If you run the following commands
> before committing/after rebasing, then koel should always be happy:
>
> src/tools/pgindent/pgindent src # works always but a bit slow
> src/tools/pgindent/pgindent $(git diff --name-only --diff-filter=ACMR)
> # much faster, but only works if you DID NOT change typedefs.list

In isolation, that's true, but the list of mistakes that you can make
while committing which will inconvenience everyone working on the
project is very long. Another one that comes up frequently is
forgetting to bump CATALOG_VERSION_NO, but you also need a good commit
message, and good comments, and a good Discussion link in the commit
message, and the right list of authors and reviewers, and to update
the docs (with spaces, not tabs) and the Makefiles (with tabs, not
spaces) and the meson stuff and, as if that weren't enough already,
you actually need the code to work! And that includes not only working
regularly but also with CLOBBER_CACHE_ALWAYS and debug_parallel_query
and so on. It's very easy to miss something somewhere. I put a LOT of
work into polishing my commits before I push them, and it's still not
that uncommon that I screw something up.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 8:45 AM Robert Haas  wrote:
> > This policy isn't working.
>
> +1. I think this is more annoying than the status quo ante.

Although ... I do think it's spared me some rebasing pain, and that
does have some real value. I wonder if we could think of other
alternatives. For example, maybe we could have a bot. If you push a
commit that's not indented properly, the bot reindents the tree,
updates git-blame-ignore-revs, and sends you an email admonishing you
for your error. Or we could have a server-side hook that will refuse
the misindented commit, with some kind of override for emergency
situations. What I really dislike about the current situation is that
it's doubling down on the idea that committers have to be perfect and
get everything right every time. Turns out, that's hard to do. If not,
why do people keep screwing things up? Somebody could theorize - and
this seems to be Tom and Jelte's theory, though perhaps I'm
misinterpreting their comments - that the people who have made
mistakes here are just lazy, and what they need to do is up their
game.

But I don't buy that. First, I think that most of our committers are
pretty intelligent and hard-working people who are trying to do the
right thing. We can't all be Tom Lane, no matter how hard we may try.
Second, even if it were true that the offending committers are "just
lazy," all of our contributors and many senior non-committer
contributors are people who have put thousands, if not tens of
thousands, of hours into the project. Making them feel bad serves us
poorly. At the end of the day, it doesn't matter whether it's too much
of a pain for the perfect committers we'd like to have. It matters
whether it's too much of a pain for the human committers that we do
have.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Tom Lane

Robert Haas  writes:
> On Tue, Oct 17, 2023 at 8:45 AM Robert Haas  wrote:
>> +1. I think this is more annoying than the status quo ante.

> Although ... I do think it's spared me some rebasing pain, and that
> does have some real value. I wonder if we could think of other
> alternatives.

An alternative I was thinking about after reading your earlier email
was going back to the status quo ante, but doing the manual tree-wide
reindents significantly more often than once a year.  Adding one at
the conclusion of each commitfest would be a natural thing to do,
for instance.  It's hard to say what frequency would lead to the
least rebasing pain, but we know once-a-year isn't ideal.

> For example, maybe we could have a bot. If you push a
> commit that's not indented properly, the bot reindents the tree,
> updates git-blame-ignore-revs, and sends you an email admonishing you
> for your error.

I'm absolutely not in favor of completely-automated reindents.
pgindent is a pretty stupid tool and it will sometimes do stupid
things, which you have to correct for by tweaking the input
formatting.  The combination of the tool and human supervision
generally produces pretty good results, but the tool alone
not so much.

> Or we could have a server-side hook that will refuse
> the misindented commit, with some kind of override for emergency
> situations.

Even though I'm in the camp that would like the tree correctly
indented at all times, I remain very much against a commit hook.
I think that'd be significantly more annoying than the current
situation, which you're already unhappy about the annoying-ness of.

The bottom line here, I think, is that there's a subset of committers
that would like perfectly mechanically-indented code at all times,
and there's another subset that just doesn't care that much.
We don't (and shouldn't IMO) have a mechanism to let one set force
their views on the other set.  The current approach is clearly
insufficient for that, and I don't think trying to institute stronger
enforcement is going to make anybody happy.

regards, tom lane

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 4:57 AM Aleksander Alekseev
 wrote:
> v11-0001 and v11-0002 LGTM too.

Cool. Seems we are all in agreement, so committed these. Thanks!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Invalidate the subscription worker in cases where a user loses their superuser status

2023-10-17 Thread vignesh C

On Tue, 17 Oct 2023 at 14:17, Amit Kapila  wrote:
>
> On Fri, Oct 13, 2023 at 11:08 AM Amit Kapila  wrote:
> >
> > On Fri, Oct 13, 2023 at 10:04 AM vignesh C  wrote:
> > >
> > > On Thu, 12 Oct 2023 at 11:10, Amit Kapila  wrote:
> > > >
> > > > On Sun, Oct 8, 2023 at 8:22 AM vignesh C  wrote:
> > > > >
> > > >
> > > > --- a/src/include/catalog/pg_subscription.h
> > > > +++ b/src/include/catalog/pg_subscription.h
> > > > @@ -127,6 +127,7 @@ typedef struct Subscription
> > > >   * skipped */
> > > >   char*name; /* Name of the subscription */
> > > >   Oid owner; /* Oid of the subscription owner */
> > > > + bool ownersuperuser; /* Is the subscription owner a superuser? */
> > > >   bool enabled; /* Indicates if the subscription is enabled */
> > > >   bool binary; /* Indicates if the subscription wants data in
> > > >   * binary format */
> > > >
> > > > We normally don't change the exposed structure in back branches as
> > > > that poses a risk of breaking extensions. In this case, if we want, we
> > > > can try to squeeze some padding space or we even can fix it without
> > > > introducing a new member. OTOH, it is already debatable whether to fix
> > > > it in back branches, so we can even commit this patch just in HEAD.
> > >
> > > I too feel we can commit this patch only in HEAD.
> > >
> >
> > Fair enough. I'll wait till early next week (say till Monday EOD) to
> > see if anyone thinks otherwise and push this patch to HEAD after some
> > more testing and review.
> >
>
> Pushed.

Thanks for committing this.

Regards,
Vignesh

Force the old transactions logs cleanup even if checkpoint is skipped

2023-10-17 Thread Zakhlystov, Daniil (Nebius)


Hi, hackers! 

I've stumbled into an interesting problem. Currently, if Postgres has nothing 
to write, it would skip the checkpoint creation defined by the checkpoint 
timeout setting. However, we might face a temporary archiving problem (for 
example, some network issues) that might lead to a pile of wal files stuck in 
pg_wal. After this temporary issue has gone, we would still be unable to 
archive them since we effectively skip the checkpoint because we have nothing 
to write.

That might lead to a problem - suppose you've run out of disk space because of 
the temporary failure of the archiver. After this temporary failure has gone, 
Postgres would be unable to recover from it automatically and will require 
human attention to initiate a CHECKPOINT call.

I suggest changing this behavior by trying to clean up the old WAL even if we 
skip the main checkpoint routine. I've attached the patch that does exactly 
that.

What do you think?

To reproduce the issue, you might repeat the following steps:

1. Init Postgres:
pg_ctl initdb -D /Users/usernamedt/test_archiver

2. Add the archiver script to simulate failure:
➜  ~ cat /Users/usernamedt/command.sh
#!/bin/bash

false

3. Then alter the PostgreSQL conf:

archive_mode = on
checkpoint_timeout = 30s
archive_command = /Users/usernamedt/command.sh
log_min_messages = debug1

4. Then start Postgres:
/usr/local/pgsql/bin/pg_ctl -D /Users/usernamedt/test_archiver -l logfile start

5. Insert some data:
pgbench -i -s 30 -d postgres

6. Trigger checkpoint to flush all data:
psql -c "checkpoint;"

7. Alter the archiver script to simulate the end of archiver issues:
➜  ~ cat /Users/usernamedt/command.sh
#!/bin/bash

true

8. Check that the WAL files are actually archived but not removed:
➜  ~ ls -lha /Users/usernamedt/test_archiver/pg_wal/archive_status | head
total 0
drwx--@ 48 usernamedt  LD\Domain Users   1.5K Oct 17 17:44 .
drwx--@ 50 usernamedt  LD\Domain Users   1.6K Oct 17 17:43 ..
-rw---@  1 usernamedt  LD\Domain Users     0B Oct 17 17:42 
00010040.done
...
-rw---@  1 usernamedt  LD\Domain Users     0B Oct 17 17:43 
0001006D.done

2023-10-17 18:03:44.621 +04 [71737] DEBUG:  checkpoint skipped because system 
is idle

Thanks,

Daniil Zakhlystov

0001-Cleanup-old-files-if-checkpoint-is-skipped.patch
Description: 0001-Cleanup-old-files-if-checkpoint-is-skipped.patch

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Peter Geoghegan

On Tue, Oct 17, 2023 at 7:23 AM Tom Lane  wrote:
> Robert Haas  writes:
> > On Tue, Oct 17, 2023 at 8:45 AM Robert Haas  wrote:
> >> +1. I think this is more annoying than the status quo ante.
>
> > Although ... I do think it's spared me some rebasing pain, and that
> > does have some real value. I wonder if we could think of other
> > alternatives.
>
> An alternative I was thinking about after reading your earlier email
> was going back to the status quo ante, but doing the manual tree-wide
> reindents significantly more often than once a year.  Adding one at
> the conclusion of each commitfest would be a natural thing to do,
> for instance.  It's hard to say what frequency would lead to the
> least rebasing pain, but we know once-a-year isn't ideal.

That seems like the best alternative we have. The old status quo did
occasionally allow code with indentation that *clearly* wasn't up to
project standards to slip in. It could stay that way for quite a few
months at a time. That wasn't great either.

-- 
Peter Geoghegan

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Jelte Fennema

On Tue, 17 Oct 2023 at 16:23, Tom Lane  wrote:
> > Or we could have a server-side hook that will refuse
> > the misindented commit, with some kind of override for emergency
> > situations.
>
> Even though I'm in the camp that would like the tree correctly
> indented at all times, I remain very much against a commit hook.
> I think that'd be significantly more annoying than the current
> situation, which you're already unhappy about the annoying-ness of.

Why do you think that would be significantly more annoying than the
current situation? Instead of getting delayed feedback you get instant
feedback when you push.

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 10:23 AM Tom Lane  wrote:
> An alternative I was thinking about after reading your earlier email
> was going back to the status quo ante, but doing the manual tree-wide
> reindents significantly more often than once a year.  Adding one at
> the conclusion of each commitfest would be a natural thing to do,
> for instance.  It's hard to say what frequency would lead to the
> least rebasing pain, but we know once-a-year isn't ideal.

Yes. I suspect once a commitfest still wouldn't be often enough. Maybe
once a month or something would be. But I'm not sure. You might rebase
once over the misindented commit and then have to rebase again over
the indent that fixed it. There's not really anything that quite
substitutes for doing it right on every commit.

> The bottom line here, I think, is that there's a subset of committers
> that would like perfectly mechanically-indented code at all times,
> and there's another subset that just doesn't care that much.
> We don't (and shouldn't IMO) have a mechanism to let one set force
> their views on the other set.  The current approach is clearly
> insufficient for that, and I don't think trying to institute stronger
> enforcement is going to make anybody happy.

I mean, I think we DO have such a mechanism. Everyone agrees that the
buildfarm has to stay green, and we have a buildfarm member that
checks pgindent, so that means everyone has to pgindent. We could
decide to kill that buildfarm member, in which case we go back to
people not having to pgindent, but right now they do.

And if it's going to remain the policy, it's better to enforce that
policy earlier rather than later. I mean, what is the point of having
a system where we let people do the wrong thing and then publicly
embarrass them afterwards? How is that better than preventing them
from doing the wrong thing in the first place? Even if they don't
subjectively feel embarrassed, nobody likes having to log back on in
the evening or the weekend and clean up after something they thought
they were done with.

In fact, that particular experience is one of the worst things about
being a committer. It actively discourages me, at least, from trying
to get other people's patches committed. This particular problem is
minor, but the overall experience of trying to get things committed is
that you have to check 300 things for every patch and if you get every
one of them right then nothing happens and if you get one of them
wrong then you get a bunch of irritated emails criticizing your
laziness, sloppiness, or whatever, and you have to drop everything to
go fix it immediately. What a deal! I'm sure this isn't the only
reason why we have such a huge backlog of patches needing committer
attention, but it sure doesn't help. And there is absolutely zero need
for this to be yet another thing that you can find out you did wrong
in the 1-24 hour period AFTER you type 'git push'.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pre-proposal: unicode normalized text

2023-10-17 Thread Daniel Verite

Jeff Davis wrote:

> I believe the patch has utility as-is, but I've been brainstorming a
> few more ideas that could build on it:
> 
> * Add a per-database option to enforce only storing assigned unicode
> code points.

There's a problem in the fact that the set of assigned code points is
expanding with every Unicode release, which happens about every year.

If we had this option in Postgres 11 released in 2018 it would use
Unicode 11, and in 2023 this feature would reject thousands of code
points that have been assigned since then.

Aside from that, aborting a transaction because there's an
unassigned code point in a string feels like doing too much,
too late.
The programs that want to filter out unwanted code points
do it before they hit the database, client-side.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Pre-proposal: unicode normalized text

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 11:07 AM Daniel Verite  wrote:
> There's a problem in the fact that the set of assigned code points is
> expanding with every Unicode release, which happens about every year.
>
> If we had this option in Postgres 11 released in 2018 it would use
> Unicode 11, and in 2023 this feature would reject thousands of code
> points that have been assigned since then.

Are code points assigned from a gapless sequence? That is, is the
implementation of codepoint_is_assigned(char) just 'codepoint <
SOME_VALUE' and SOME_VALUE increases over time?

If so, we could consider having a function that lets you specify the
bound as an input parameter. But whether anyone would use it, or know
how to set that input parameter, is questionable. The real issue here
is whether you can figure out which of the code points that you could
put into the database already have collation definitions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Peter Geoghegan

On Tue, Oct 17, 2023 at 8:01 AM Robert Haas  wrote:
> In fact, that particular experience is one of the worst things about
> being a committer. It actively discourages me, at least, from trying
> to get other people's patches committed. This particular problem is
> minor, but the overall experience of trying to get things committed is
> that you have to check 300 things for every patch and if you get every
> one of them right then nothing happens and if you get one of them
> wrong then you get a bunch of irritated emails criticizing your
> laziness, sloppiness, or whatever, and you have to drop everything to
> go fix it immediately. What a deal!

Yep. Enforcing perfect indentation on koel necessitates rechecking
indentation after each and every last-minute fixup affecting C code --
the interactions makes it quite a bit harder to get everything right
on the first push. For example, if I spot an issue with a comment
during final pre-commit review, and fixup that commit, I have to run
pgindent again. On a long enough timeline, I'm going to forget to do
that.

-- 
Peter Geoghegan

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Jelte Fennema

On Tue, 17 Oct 2023 at 16:04, Robert Haas  wrote:
> What I really dislike about the current situation is that
> it's doubling down on the idea that committers have to be perfect and
> get everything right every time. Turns out, that's hard to do. If not,
> why do people keep screwing things up? Somebody could theorize - and
> this seems to be Tom and Jelte's theory, though perhaps I'm
> misinterpreting their comments - that the people who have made
> mistakes here are just lazy, and what they need to do is up their
> game.

To clarify, I did not intend to imply people that commit unindented
code are lazy. It's expected that humans forget to run pgindent before
committing from time to time (I do too). That's why I proposed a
server side git hook to reject badly indented commits very early in
this thread. But some others said that buildfarm animals were the way
to go for Postgres development flow. And since I'm not a committer I
left it at that. I was already happy enough that there was consensus
on indenting continuously, so that the semi-regular rebases for the
few open CF entries that I have are a lot less annoying.

But based on the current feedback I think we should seriously consider
a server-side "update" git hook again. People are obviously not
perfect machines. And for whatever reason not everyone installs the
pre-commit hook from the wiki. So the koel keeps complaining. A
server-side hook would solve all of this IMHO.

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 11:16 AM Peter Geoghegan  wrote:
> Yep. Enforcing perfect indentation on koel necessitates rechecking
> indentation after each and every last-minute fixup affecting C code --
> the interactions makes it quite a bit harder to get everything right
> on the first push. For example, if I spot an issue with a comment
> during final pre-commit review, and fixup that commit, I have to run
> pgindent again. On a long enough timeline, I'm going to forget to do
> that.

I also just discovered that my pre-commit hook doesn't work if I pull
commits into master by cherry-picking. I had thought that I could have
my hook just check my commits to master and not all of my local dev
branches where I really don't want to mess with this when I'm just
banging out a rough draft of something. But now I see that I'm going
to need to work harder on this if I actually want it to catch all the
ways I might screw this up.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add support for AT LOCAL

2023-10-17 Thread Tom Lane

Michael Paquier  writes:
> On Tue, Oct 17, 2023 at 01:40:18AM -0400, Tom Lane wrote:
>> makes the failure go away.  Unfortunately, I've not yet found another
>> way to make it go away :-(.  My upthread idea of using a local variable
>> instead of result->time is no help, and some other random code
>> alterations didn't change the results either.

> That may be a long shot, but even a modulo?

Yeah, the same thing occurred to me in the shower this morning, and it
does seem to work!  We can replace both loops with a %= operator, at
least if we're willing to assume C99 division semantics, which seems
pretty safe in 2023.  Your idea of doing a range check to skip the
division in typical cases is a refinement I'd not thought of, but
it seems like a good idea for performance.

(I see that the negative-starting-point case isn't covered in the
current regression tests, so maybe we better add a test for that.)

regards, tom lane

Re: Synchronizing slots from primary to standby

2023-10-17 Thread Drouvot, Bertrand


Hi,

On 10/13/23 10:35 AM, shveta malik wrote:

On Thu, Oct 12, 2023 at 9:18 AM shveta malik  wrote:




PFA v24 patch set which has below changes:

1) 'enable_failover' displayed in pg_replication_slots.
2) Support for 'enable_failover' in
pg_create_logical_replication_slot(). It is an optional argument with
default value false.
3) Addressed pending comments (1-30) from Peter in [1].
4) Fixed an issue in patch002 due to which even slots with
enable_failover=false were getting synced.

The changes for 1 and 2 are in patch001 while 3 and 4 are in patch0002

Thanks Ajin, for working on 1 and 3.


Thanks for the hard work!

+   if (RecoveryInProgress())
+   wrconn = slotsync_remote_connect(NULL);

does produce at compilation time:

launcher.c:1916:40: warning: too many arguments in call to 
'slotsync_remote_connect'
wrconn = slotsync_remote_connect(NULL);

Looking at 0001:

commit message:

"is added at the slot level which
will be persistent information"

what about "which is persistent information" ?

Code:

+   True if this logical slot is enabled to be synced to the physical 
standbys
+   so that logical replication is not blocked after failover. Always false
+   for physical slots.

Not sure "not blocked" is the right wording. "can be resumed from the new 
primary" maybe?

+static void
+ProcessRepliesAndTimeOut(void)
+{
+   CHECK_FOR_INTERRUPTS();
+
+   /* Process any requests or signals received recently */
+   if (ConfigReloadPending)
+   {
+   ConfigReloadPending = false;
+   ProcessConfigFile(PGC_SIGHUP);
+   SyncRepInitConfig();
+   SlotSyncInitConfig();
+   }

Do we want to do this at each place ProcessRepliesAndTimeOut() is being
called? I mean before this change it was not done in ProcessPendingWrites().

+ * Wait for physical standby to confirm receiving give lsn.

typo? s/give/given/


diff --git a/src/test/recovery/t/050_verify_slot_order.pl 
b/src/test/recovery/t/050_verify_slot_order.pl
new file mode 100644
index 00..25b3d5aac2
--- /dev/null
+++ b/src/test/recovery/t/050_verify_slot_order.pl
@@ -0,0 +1,145 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+

Regarding the TAP tests, should we also add some testing related to 
enable_failover being set
in pg_create_logical_replication_slot() and pg_logical_slot_get_changes() 
behavior too?

Please note that current comments are coming while
"quickly" going through 0001.

I'm planning to have a closer look at 0001 and 0002 too.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Pre-proposal: unicode normalized text

2023-10-17 Thread Isaac Morland

On Tue, 17 Oct 2023 at 11:15, Robert Haas  wrote:

> Are code points assigned from a gapless sequence? That is, is the
> implementation of codepoint_is_assigned(char) just 'codepoint <
> SOME_VALUE' and SOME_VALUE increases over time?
>

Not even close. Code points are organized in blocks, e.g. for mathematical
symbols or Ethiopic script. Sometimes new blocks are added, sometimes new
characters are added to existing blocks. Where they go is a combination of
convenience, history, and planning.

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 11:23 AM Jelte Fennema  wrote:
> To clarify, I did not intend to imply people that commit unindented
> code are lazy. It's expected that humans forget to run pgindent before
> committing from time to time (I do too). That's why I proposed a
> server side git hook to reject badly indented commits very early in
> this thread. But some others said that buildfarm animals were the way
> to go for Postgres development flow. And since I'm not a committer I
> left it at that. I was already happy enough that there was consensus
> on indenting continuously, so that the semi-regular rebases for the
> few open CF entries that I have are a lot less annoying.

Thanks for clarifying. I didn't really think you were trying to be
accusatory, but I didn't really understand what else to think either,
so this is helpful context.

> But based on the current feedback I think we should seriously consider
> a server-side "update" git hook again. People are obviously not
> perfect machines. And for whatever reason not everyone installs the
> pre-commit hook from the wiki. So the koel keeps complaining. A
> server-side hook would solve all of this IMHO.

One potential problem with a server-side hook is that if you back-port
a commit to older branches and then push the commits all together
(which is my workflow) then you might get failure to push on some
branches but not others. I don't know if there's any way to avoid
that, but it seems not great. You could think of enforcing the policy
only on master to try to avoid this, but that still leaves a risk that
you manage to push to all the back-branches and not to master.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pre-proposal: unicode normalized text

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 11:38 AM Isaac Morland  wrote:
> On Tue, 17 Oct 2023 at 11:15, Robert Haas  wrote:
>> Are code points assigned from a gapless sequence? That is, is the
>> implementation of codepoint_is_assigned(char) just 'codepoint <
>> SOME_VALUE' and SOME_VALUE increases over time?
>
> Not even close. Code points are organized in blocks, e.g. for mathematical 
> symbols or Ethiopic script. Sometimes new blocks are added, sometimes new 
> characters are added to existing blocks. Where they go is a combination of 
> convenience, history, and planning.

Ah. Good to know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: stopgap fix for signal handling during restore_command

2023-10-17 Thread Nathan Bossart

Committed and back-patched.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Peter Geoghegan

On Tue, Oct 17, 2023 at 8:24 AM Robert Haas  wrote:
> I also just discovered that my pre-commit hook doesn't work if I pull
> commits into master by cherry-picking. I had thought that I could have
> my hook just check my commits to master and not all of my local dev
> branches where I really don't want to mess with this when I'm just
> banging out a rough draft of something. But now I see that I'm going
> to need to work harder on this if I actually want it to catch all the
> ways I might screw this up.

Once you figure all that out, you're still obligated to hand-polish
typedefs.list to be consistent with whatever Bruce's machine's copy of
objdump does (or is it Tom's?). You need to sort the entries so they
kinda look like they originated from the same source as existing
entries, since my Debian machine seems to produce somewhat different
results to RHEL, for whatever reason. It's hard to imagine a worse use
of committer time.

I think that something like this new policy could work if the
underlying tooling was very easy to use and gave perfectly consistent
results on everybody's development machine. Obviously, that just isn't
the case.

-- 
Peter Geoghegan

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Jelte Fennema

On Tue, 17 Oct 2023 at 17:47, Peter Geoghegan  wrote:
> Once you figure all that out, you're still obligated to hand-polish
> typedefs.list to be consistent with whatever Bruce's machine's copy of
> objdump does (or is it Tom's?). You need to sort the entries so they
> kinda look like they originated from the same source as existing
> entries, since my Debian machine seems to produce somewhat different
> results to RHEL, for whatever reason. It's hard to imagine a worse use
> of committer time.
>
> I think that something like this new policy could work if the
> underlying tooling was very easy to use and gave perfectly consistent
> results on everybody's development machine. Obviously, that just isn't
> the case.

To make koel pass you don't need to worry about hand-polishing
typedefs.list. koel uses the typedefs.list that's committed into the
repo, just like when you run pgindent yourself. If you forget to
update the typedefs.list with new types, then worst case the pgindent
output will look weird. But it will look weird both on your own
machine and on koel. So afaik the current tooling should give
perfectly consistent results on everybody's development machine. If
you have an example of where it doesn't then we should fix that
problem.

On Tue, 17 Oct 2023 at 17:47, Peter Geoghegan  wrote:
>
> On Tue, Oct 17, 2023 at 8:24 AM Robert Haas  wrote:
> > I also just discovered that my pre-commit hook doesn't work if I pull
> > commits into master by cherry-picking. I had thought that I could have
> > my hook just check my commits to master and not all of my local dev
> > branches where I really don't want to mess with this when I'm just
> > banging out a rough draft of something. But now I see that I'm going
> > to need to work harder on this if I actually want it to catch all the
> > ways I might screw this up.
>
> Once you figure all that out, you're still obligated to hand-polish
> typedefs.list to be consistent with whatever Bruce's machine's copy of
> objdump does (or is it Tom's?). You need to sort the entries so they
> kinda look like they originated from the same source as existing
> entries, since my Debian machine seems to produce somewhat different
> results to RHEL, for whatever reason. It's hard to imagine a worse use
> of committer time.
>
> I think that something like this new policy could work if the
> underlying tooling was very easy to use and gave perfectly consistent
> results on everybody's development machine. Obviously, that just isn't
> the case.
>
> --
> Peter Geoghegan

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Peter Geoghegan

On Tue, Oct 17, 2023 at 9:03 AM Jelte Fennema  wrote:
> To make koel pass you don't need to worry about hand-polishing
> typedefs.list. koel uses the typedefs.list that's committed into the
> repo, just like when you run pgindent yourself. If you forget to
> update the typedefs.list with new types, then worst case the pgindent
> output will look weird. But it will look weird both on your own
> machine and on koel.

That's beside the point. The point is that I'm obligated to keep
typedef.list up to date in general, a task that is made significantly
harder by random objdump implementation details. And I probably need
to do this not just once per commit, but several times, since in
practice I need to defensively run and rerun pgindent as the patch is
tweaked.

-- 
Peter Geoghegan

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Tom Lane

Peter Geoghegan  writes:
> That's beside the point. The point is that I'm obligated to keep
> typedef.list up to date in general, a task that is made significantly
> harder by random objdump implementation details. And I probably need
> to do this not just once per commit, but several times, since in
> practice I need to defensively run and rerun pgindent as the patch is
> tweaked.

Hmm, I've not found it that hard to manage the typedefs list.
If I run pgindent and it adds weird spacing around uses of a new
typedef name, I go "oh, I better add that to the list" and do so.
End of problem.  There's not a requirement that you remove disused
typedef names, nor that you alphabetize perfectly.  I'm content
to update those sorts of details from the buildfarm's list once a
year or so.

This does assume that you inspect pgindent's changes rather than
just accepting them blindly --- but as I commented upthread, the
tool really requires that anyway.

regards, tom lane

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Tom Lane

Robert Haas  writes:
> One potential problem with a server-side hook is that if you back-port
> a commit to older branches and then push the commits all together
> (which is my workflow) then you might get failure to push on some
> branches but not others. I don't know if there's any way to avoid
> that, but it seems not great. You could think of enforcing the policy
> only on master to try to avoid this, but that still leaves a risk that
> you manage to push to all the back-branches and not to master.

Is that actually possible?  I had the idea that "git push" is an
atomic operation, ie 100% or nothing.  Is it only atomic per-branch?

regards, tom lane

Re: Restoring default privileges on objects

2023-10-17 Thread David G. Johnston

On Fri, Oct 6, 2023 at 1:29 PM Laurenz Albe 
wrote:

> On Fri, 2023-10-06 at 22:18 +0200, Laurenz Albe wrote:
> > On Fri, 2023-10-06 at 22:16 +0200, Laurenz Albe wrote:
> > > Here is a patch that does away with the special handling of NULL values
> > > in psql backslash commands.
> >
> > Erm, I forgot to attach the patch.
>
> I just realize that there is a conflicting proposal.  I'll reply to that
> thread.
> Sorry for the noise.
>
>
This thread seems officially closed and the discussion moved to [1].

Over there, after reading both threads, I am seeing enough agreement that
changing these queries to always print "(none)" (translating the word none)
where today they output null, and thus plan to move forward with the v1
patch on that thread proposing to do just that.  Please chime in over there
on this specific option - whether you wish to support or reject it.  Should
it be rejected the plan is to have these queries respect the user's
preference in \pset null.  Please make it clear if you would rather
maintain the status quo against either of those two options.

Thanks!

David J.

[1]
https://www.postgresql.org/message-id/ab67c99bfb5dea7bae18c77f96442820d19b5448.camel%40cybertec.at

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 12:17 PM Tom Lane  wrote:
> Hmm, I've not found it that hard to manage the typedefs list.
> If I run pgindent and it adds weird spacing around uses of a new
> typedef name, I go "oh, I better add that to the list" and do so.
> End of problem.  There's not a requirement that you remove disused
> typedef names, nor that you alphabetize perfectly.  I'm content
> to update those sorts of details from the buildfarm's list once a
> year or so.

+1 to all of that. At least for me, managing typedefs.list isn't the
problem. The problem is remembering to actually do it, and keep it
updated as the patch set is adjusted and rebased.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 12:18 PM Tom Lane  wrote:
> Robert Haas  writes:
> > One potential problem with a server-side hook is that if you back-port
> > a commit to older branches and then push the commits all together
> > (which is my workflow) then you might get failure to push on some
> > branches but not others. I don't know if there's any way to avoid
> > that, but it seems not great. You could think of enforcing the policy
> > only on master to try to avoid this, but that still leaves a risk that
> > you manage to push to all the back-branches and not to master.
>
> Is that actually possible?  I had the idea that "git push" is an
> atomic operation, ie 100% or nothing.  Is it only atomic per-branch?

I believe so. For instance:

[rhaas pgsql]$ git push rhaas
Enumerating objects: 2980, done.
Counting objects: 100% (2980/2980), done.
Delta compression using up to 16 threads
Compressing objects: 100% (940/940), done.
Writing objects: 100% (2382/2382), 454.52 KiB | 7.70 MiB/s, done.
Total 2382 (delta 2024), reused 1652 (delta 1429), pack-reused 0
remote: Resolving deltas: 100% (2024/2024), completed with 579 local objects.
To ssh://git.postgresql.org/users/rhaas/postgres.git
   e434e21e11..2406c4e34c  master -> master
 ! [rejected]  walsummarizer2 -> walsummarizer2 (non-fast-forward)
error: failed to push some refs to
'ssh://git.postgresql.org/users/rhaas/postgres.git'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pre-proposal: unicode normalized text

2023-10-17 Thread Jeff Davis

On Tue, 2023-10-17 at 17:07 +0200, Daniel Verite wrote:
> There's a problem in the fact that the set of assigned code points is
> expanding with every Unicode release, which happens about every year.
> 
> If we had this option in Postgres 11 released in 2018 it would use
> Unicode 11, and in 2023 this feature would reject thousands of code
> points that have been assigned since then.

That wouldn't be good for everyone, but might it be good for some
users?

We already expose normalization functions. If users are depending on
normalization, and they have unassigned code points in their system,
that will break when we update Unicode. By restricting themselves to
assigned code points, normalization is guaranteed to be forward-
compatible.

Regards,
Jeff Davis

Re: Fix output of zero privileges in psql

2023-10-17 Thread Tom Lane

Laurenz Albe  writes:
> On Mon, 2023-10-16 at 19:05 -0700, David G. Johnston wrote:
>> Reading both threads I'm not seeing any specific rejection of the
>> solution that we simply represent empty privileges as "(none)".

> Thanks for that summary.  I prefer my version (simply display NULLs
> as NULLs), but I am not wedded to it.  I had the impression that Tom
> would prefer that too, but is woried if the incompatibility introduced
> would outweigh the small benefit (of either patch).

> So it is clear that we don't have a consensus.

FWIW, my druthers are to make the describe.c queries honor \pset null
(not only for privileges, but anywhere else they fail to) and do
nothing beyond that.  I think that'll generally reduce the surprise
factor, while anything else we might opt to do will increase it.

If that fails to garner a consensus, my second choice would be to
do that plus translate empty-but-not-null ACLs to "(none)".

regards, tom lane

Re: Is this a problem in GenericXLogFinish()?

2023-10-17 Thread Jeff Davis

On Tue, 2023-10-17 at 08:41 -0400, Robert Haas wrote:
> Sorry, I'm not sure I understand the question. Are you asking whether
> dirtying buffers unnecessarily might be slower than not doing that?

I meant: are those cleanup operations frequent enough that dirtying
those buffers in that case would matter?

Regards,
Jeff Davis

Re: Fix a wrong comment in setrefs.c

2023-10-17 Thread shihao zhong

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

That looks correct for me

The new status of this patch is: Ready for Committer

Re: stopgap fix for signal handling during restore_command

2023-10-17 Thread Nathan Bossart

On Tue, Oct 17, 2023 at 10:46:47AM -0500, Nathan Bossart wrote:
> Committed and back-patched.

... and it looks like some of the back-branches are failing for Windows.
I'm assuming this is because c290e79 was only back-patched to v15.  My
first instinct is just to back-patch that one all the way to v11, but maybe
there's an alternative involving #ifdef WIN32.  Are there any concerns with
back-patching c290e79?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add support for AT LOCAL

2023-10-17 Thread Tom Lane

I wrote:
> Yeah, the same thing occurred to me in the shower this morning, and it
> does seem to work!  We can replace both loops with a %= operator, at
> least if we're willing to assume C99 division semantics, which seems
> pretty safe in 2023.

Whoops, no: for negative starting values we'd need truncate-towards-
minus-infinity division whereas C99 specifies truncate-towards-zero.
However, the attached does pass for me on cfarm111 as well as my
usual dev machine.

Presumably this is a pre-existing bug that also appears in back
branches.  But in the interests of science I propose that we
back-patch only the test case and see which machine(s) fail it
before back-patching the code change.

regards, tom lane

diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index c4da10d47a..56c7746c11 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -3083,10 +3083,11 @@ timetz_zone(PG_FUNCTION_ARGS)
 	result = (TimeTzADT *) palloc(sizeof(TimeTzADT));
 
 	result->time = t->time + (t->zone - tz) * USECS_PER_SEC;
+	/* C99 modulo has the wrong sign convention for negative input */
 	while (result->time < INT64CONST(0))
 		result->time += USECS_PER_DAY;
-	while (result->time >= USECS_PER_DAY)
-		result->time -= USECS_PER_DAY;
+	if (result->time >= USECS_PER_DAY)
+		result->time %= USECS_PER_DAY;
 
 	result->zone = tz;
 
@@ -3116,10 +3117,11 @@ timetz_izone(PG_FUNCTION_ARGS)
 	result = (TimeTzADT *) palloc(sizeof(TimeTzADT));
 
 	result->time = time->time + (time->zone - tz) * USECS_PER_SEC;
+	/* C99 modulo has the wrong sign convention for negative input */
 	while (result->time < INT64CONST(0))
 		result->time += USECS_PER_DAY;
-	while (result->time >= USECS_PER_DAY)
-		result->time -= USECS_PER_DAY;
+	if (result->time >= USECS_PER_DAY)
+		result->time %= USECS_PER_DAY;
 
 	result->zone = tz;
 
diff --git a/src/test/regress/expected/timetz.out b/src/test/regress/expected/timetz.out
index 3f8e005ce7..cbab6cfe5d 100644
--- a/src/test/regress/expected/timetz.out
+++ b/src/test/regress/expected/timetz.out
@@ -304,4 +304,25 @@ TABLE timetz_local_view;
  23:59:59.99-07 | 06:59:59.99+00 | 06:59:59.99+00 | 06:59:59.99+00 | 06:59:59.99+00
 (12 rows)
 
+SELECT f1 AS dat,
+   f1 AT TIME ZONE 'UTC+10' AS dat_at_tz,
+   f1 AT TIME ZONE INTERVAL '-10:00' AS dat_at_int
+  FROM TIMETZ_TBL
+  ORDER BY f1;
+  dat   |   dat_at_tz|   dat_at_int   
+++
+ 00:01:00-07| 21:01:00-10| 21:01:00-10
+ 01:00:00-07| 22:00:00-10| 22:00:00-10
+ 02:03:00-07| 23:03:00-10| 23:03:00-10
+ 08:08:00-04| 02:08:00-10| 02:08:00-10
+ 07:07:00-08| 05:07:00-10| 05:07:00-10
+ 11:59:00-07| 08:59:00-10| 08:59:00-10
+ 12:00:00-07| 09:00:00-10| 09:00:00-10
+ 12:01:00-07| 09:01:00-10| 09:01:00-10
+ 15:36:39-04| 09:36:39-10| 09:36:39-10
+ 15:36:39-05| 10:36:39-10| 10:36:39-10
+ 23:59:00-07| 20:59:00-10| 20:59:00-10
+ 23:59:59.99-07 | 20:59:59.99-10 | 20:59:59.99-10
+(12 rows)
+
 ROLLBACK;
diff --git a/src/test/regress/sql/timetz.sql b/src/test/regress/sql/timetz.sql
index 33f7f8aafb..d797f478f0 100644
--- a/src/test/regress/sql/timetz.sql
+++ b/src/test/regress/sql/timetz.sql
@@ -100,4 +100,9 @@ CREATE VIEW timetz_local_view AS
   ORDER BY f1;
 SELECT pg_get_viewdef('timetz_local_view', true);
 TABLE timetz_local_view;
+SELECT f1 AS dat,
+   f1 AT TIME ZONE 'UTC+10' AS dat_at_tz,
+   f1 AT TIME ZONE INTERVAL '-10:00' AS dat_at_int
+  FROM TIMETZ_TBL
+  ORDER BY f1;
 ROLLBACK;

Re: New WAL record to detect the checkpoint redo location

2023-10-17 Thread Robert Haas

On Fri, Oct 13, 2023 at 3:29 AM Michael Paquier  wrote:
> Now looking at 0002, where you should be careful about the code
> indentation or koel will complain.

Fixed in the attached version.

> This makes the new code call LocalSetXLogInsertAllowed() and what we
> set for checkPoint.PrevTimeLineID after taking the insertion locks,
> which should be OK.

Cool.

> I've mentioned as well a test in pg_walinspect after one of the
> checkpoints generated there, but what you do here is enough for the
> online case.

I don't quite understand what you're saying here. If you're suggesting
a potential improvement, can you be a bit more clear and explicit
about what the suggestion is?

> +   /*
> +* XLogInsertRecord will have updated RedoRecPtr, but we need to copy
> +* that into the record that will be inserted when the checkpoint is
> +* complete.
> +*/
> +   checkPoint.redo = RedoRecPtr;
>
> For online checkpoints, a very important point is that
> XLogCtl->Insert.RedoRecPtr is also updated in XLogInsertRecord().
> Perhaps that's worth an addition?  I was a bit confused first that we
> do the following for shutdown checkpoints:
> RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
>
> Then repeat this pattern for non-shutdown checkpoints a few lines down
> without touching the copy of the redo LSN in XLogCtl->Insert, because
> of course we don't hold the WAL insert locks in an exclusive fashion
> here:
> checkPoint.redo = RedoRecPtr;
>
> My point is that this is not only about RedoRecPtr, but also about
> XLogCtl->Insert.RedoRecPtr here.  The comment in ReserveXLogSwitch()
> says that.

I have adjusted the comment in CreateCheckPoint to hopefully address
this concern.  I don't understand what you mean about
ReserveXLogSwitch(), though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v9-0001-During-online-checkpoints-insert-XLOG_CHECKPOINT_.patch
Description: Binary data

Re: stopgap fix for signal handling during restore_command

2023-10-17 Thread Tom Lane

Nathan Bossart  writes:
> ... and it looks like some of the back-branches are failing for Windows.
> I'm assuming this is because c290e79 was only back-patched to v15.  My
> first instinct is just to back-patch that one all the way to v11, but maybe
> there's an alternative involving #ifdef WIN32.  Are there any concerns with
> back-patching c290e79?

Sounds fine to me.

regards, tom lane

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Maciek Sakrejda

On Tue, Oct 17, 2023, 09:22 Robert Haas  wrote:

> On Tue, Oct 17, 2023 at 12:18 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > Is that actually possible?  I had the idea that "git push" is an
> > atomic operation, ie 100% or nothing.  Is it only atomic per-branch?
>
> I believe so.


Git push does have an --atomic flag to treat the entire push as a single
operation.

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-10-17 Thread Robert Haas

On Mon, Oct 16, 2023 at 6:48 PM Thomas Munro  wrote:
> I pushed the retry-loop-in-frontend-executables patch and the
> missing-locking-in-SQL-functions patch yesterday.  That leaves the
> backup ones, which I've rebased and attached, no change.  It sounds
> like we need some more healthy debate about that backup label idea
> that would mean we don't need these (two birds with one stone), so
> I'll just leave these here to keep the cfbot happy in the meantime.

0002 has no comments at all, and the commit message is not specific
enough for me to understand what problem it fixes. I suggest adding
some comments and fixing the commit message. I'm also not very sure
whether the change to the signature of sendFileWithContent is really
the best way to deal with the control file maybe containing a zero
byte ... but I'm quite sure that if we're going to do it that way, it
needs a comment. But maybe we should do something else that would
require less explanation, like having the caller always pass the
length.

Regarding 0001, the way you've hacked up update_controlfile() doesn't
fill me with joy. It's nice if code that is common to the frontend and
the backend does the same thing in both cases rather than, say, having
an extra argument that only works in one case but not the other. I bet
this could be refactored to make it nicer, e.g. have one function that
takes an exact pathname at which the control file is to be written and
then other functions that use it as a subroutine.

Personally, I think the general idea of 0001 is better than any
competing proposal on the table. In the case of pg_basebackup, we
could fix the server to perform appropriate locking around reading the
control file, so that the version sent to the client doesn't get torn.
But if a backup is made by calling pg_backup_start() and copying
$PGDATA, that isn't going to work. To fix that, we need to either make
the backup procedure more complicated and essentially treat the
control file as a special case, or we need to do something like this.
I think this is better; but as you mention, opinions vary on that.

Life would be a lot easier here if we could get rid of the low-level
backup API and just have pg_basebackup DTWT, but that seems like a
completely non-viable proposal.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

2023-10-17 Thread Robert Haas

On Mon, Oct 16, 2023 at 12:31 PM Michael Christofides
 wrote:
> According to the docs[1]: "In a parallel bitmap heap scan, one process is 
> chosen as the leader. That process performs a scan of one or more indexes and 
> builds a bitmap indicating which table blocks need to be visited. These 
> blocks are then divided among the cooperating processes as in a parallel 
> sequential scan."
>
> My understanding is that the "Heap Blocks" statistic is only reporting blocks 
> for the bitmap (i.e. not the subsequent scan). As such, I think it is correct 
> that the workers do not report additional exact heap blocks.

I think you're wrong about that. The bitmap index scans are what scan
the indexes and build the bitmap. The bitmap heap scan node is what
scans the heap i.e. the table, and that is what is divided across the
workers.

On the patch itself, snapshot_and_stats doesn't strike me as a great
name. If we added three more variable-length things would we call the
member snapshot_and_stats_and_pink_and_orange_and_blue? Probably
better to pick a name that is somehow more generic.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

2023-10-17 Thread Pavel Stehule

Hi


> Isn't this code a little ugly?
>
> >
> > I propose syntax xxx.yyy%ELEMENTTYPE and xxx%ELEMENTTYPE
> >
> > What do you think about it?
> No other relational database can be found with such an implementation.
> But it seems like a good idea. It can bring more convenience to write
> stored procedure.
>

No other databases support arrays :-)

Regards

Pavel


> >
> > Regards
> >
> > Pavel
> >
> >
> >  >
> >  > --
> >  > Daniel Gustafsson
> >  >
> >  >
> >  >
> >
>
>

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-10-17 Thread David G. Johnston

On Tue, Oct 17, 2023 at 10:50 AM Robert Haas  wrote:

> Life would be a lot easier here if we could get rid of the low-level
> backup API and just have pg_basebackup DTWT, but that seems like a
> completely non-viable proposal.
>

Yeah, my contribution to this area [1] is focusing on the API because I
figured we've provided it and should do our best to have it do as much as
possible for the dba or third-parties that build tooling on top of it.

I kinda think that adding a pg_backup_metadata directory that
pg_backup_start|stop can use may help here.  I'm wondering whether those
functions provide enough control guarantees that pg_control's
"in_backup=true|false" flag proposed in that thread is reliable enough when
copied to the root directory in the backup.  I kinda feel that so long as
the flag is reliable it should be possible for the signal file processing
code to implement whatever protocol we need.

David J.

[1]
https://www.postgresql.org/message-id/CAKFQuwbpz4s8XP_%2BKhsif2eFaC78wpTbNbevUYBmjq-UCeNL7Q%40mail.gmail.com

Re: Improving Physical Backup/Restore within the Low Level API

2023-10-17 Thread Robert Haas

On Mon, Oct 16, 2023 at 5:21 PM David G. Johnston
 wrote:
> But no, by default, and probably so far as pg_basebackup is concerned, a 
> server crash during backup results in requiring outside intervention in order 
> to get the server to restart.

Others may differ, but I think such a proposal is dead on arrival. As
Laurenz says, that's just reinventing one of the main problems with
exclusive backup mode.

The underlying issue here is that, fundamentally, there's no way for
postgres itself to tell the difference between the backup directory on
the primary and an exact copy of it on a standby. There has to be some
mechanism by which the user tells us whether this is the original
directory or a clone of it -- and that's what backup_label,
recovery.signal, and standby.signal are for. Your proposal rejiggers
the details of how we distinguish primary from standby, but it
doesn't, and can't, avoid the need for users to actually follow the
directions, and I don't see why they'd be any more likely to follow
the directions that this proposal would require than the directions
we're giving them now.

I wish I had a better idea here, because the status quo is definitely
not great. The only thought that really occurs to me is that we might
do better if PostgreSQL did more of the work itself and left fewer
steps to the user to perform. If you could click the "take a backup
here" button and the "restore a backup there" button and not think
about what was actually happening, you'd not have the opportunity to
mess up. But, as I understand it, the main motivation for the
continued existence of the low-level API is that the data directory
might be really big, and you might need to clone it using some kind of
special magic that your system has available instead of copying all
the bytes. And that makes it hard to move more of the responsibility
into PostgreSQL itself, because we don't know how that special magic
works.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Track Oldest Initialized WAL Buffer Page

2023-10-17 Thread Bharath Rupireddy

On Mon, Jul 3, 2023 at 6:57 PM Heikki Linnakangas  wrote:
>

Thanks a lot for responding. Sorry for being late.

> On 07/02/2023 16:00, Bharath Rupireddy wrote:
> > Hi,
> >
> > While working on [1], I was looking for a quick way to tell if a WAL
> > record is present in the WAL buffers array without scanning but I
> > couldn't find one.
>
> /* The end-ptr of the page that contains the record */
> expectedEndPtr += XLOG_BLCKSZ - recptr % XLOG_BLCKSZ;
>
> /* get the buffer where the record is, if it's in WAL buffers at all */
> idx = XLogRecPtrToBufIdx(recptr);
>
> /* prevent the WAL buffer from being evicted while we look at it */
> LWLockAcquire(WALBufMappingLock, LW_SHARED);
>
> /* Check if the page we're interested in is in the buffer */
> found = XLogCtl->xlblocks[idx] == expectedEndPtr;
>
> LWLockRelease(WALBufMappingLock, LW_SHARED);

This is exactly what I'm doing in the 0001 patch here
https://www.postgresql.org/message-id/calj2acu3zyzjov4vztr+lfk5pl4ndunbls6e1vg2dhdbjqg...@mail.gmail.com.

My bad! I should have mentioned the requirement properly - I want to
avoid taking WALBufMappingLock to peek into wal_buffers to determine
if the WAL buffer page containing the required WAL record exists.

> You actually hint at the above solution here, so I'm confused. If you're
> OK with slightly stale results, you can skip the WALBufferMappingLock
> above too, and perform an atomic read of xlblocks[idx] instead.

I get that and I see GetXLogBuffer first reading xlblocks without lock
and then to confirm it anyways takes the lock again in
AdvanceXLInsertBuffer.

 * However, we don't hold a lock while we read the value. If someone has
 * just initialized the page, it's possible that we get a "torn read" of
 * the XLogRecPtr if 64-bit fetches are not atomic on this platform. In
 * that case we will see a bogus value. That's ok, we'll grab the mapping
 * lock (in AdvanceXLInsertBuffer) and retry if we see anything else than
 * the page we're looking for. But it means that when we do this unlocked
 * read, we might see a value that appears to be ahead of the page we're
 * looking for. Don't PANIC on that, until we've verified the value while
 * holding the lock.
 */

The the 0001 patch at
https://www.postgresql.org/message-id/calj2acu3zyzjov4vztr+lfk5pl4ndunbls6e1vg2dhdbjqg...@mail.gmail.com
reads the WAL buffer page with WALBufMappingLock. So, the patch can
avoid WALBufMappingLock and do something like [1]:

[1]
{
idx = XLogRecPtrToBufIdx(ptr);
expectedEndPtr = ptr;
expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;

/*
 * Do a stale read of xlblocks without WALBufMappingLock. All the callers
 * of this function are expected to read WAL that's already flushed to disk
 * from WAL buffers. If this stale read says the requested WAL buffer page
 * doesn't exist, it means that the WAL buffer page either is being or has
 * already been replaced for reuse. If this stale read says the requested
 * WAL buffer page exists, we then take WALBufMappingLock and re-read the
 * xlblocks to ensure the WAL buffer page really exists and nobody is
 * replacing it meanwhile.
 */
endptr = XLogCtl->xlblocks[idx];

/* Requested WAL isn't available in WAL buffers. */
if (expectedEndPtr != endptr)
break;

/*
 * Requested WAL is available in WAL buffers, so recheck the existence
 * under the WALBufMappingLock and read if the page still exists, otherwise
 * return.
 */
LWLockAcquire(WALBufMappingLock, LW_SHARED);

endptr = XLogCtl->xlblocks[idx];

/* Requested WAL isn't available in WAL buffers. */
if (expectedEndPtr != endptr)
break;

/*
 * We found the WAL buffer page containing the given XLogRecPtr.
Get starting
 * address of the page and a pointer to the right location given
 * XLogRecPtr in that page.
 */
page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
data = page + ptr % XLOG_BLCKSZ;

return data;
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: The danger of deleting backup_label

2023-10-17 Thread Stephen Frost

Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 10/16/23 15:06, Robert Haas wrote:
> > On Mon, Oct 16, 2023 at 1:00 PM David Steele  wrote:
> > > After some agonizing (we hope) they decide to delete backup_label and,
> > > wow, it just works! So now they merrily go on their way with a corrupted
> > > cluster. They also remember for the next time that deleting backup_label
> > > is definitely a good procedure.
> > > 
> > > The idea behind this patch is that deleting backup_label would produce a
> > > hard error because pg_control would be missing as well (if the backup
> > > software did its job). If both pg_control and backup_label are present
> > > (but pg_control has not been loaded with the contents of backup_label,
> > > i.e. it is the running copy from the backup cluster) we can also error.
> > 
> > I mean, I think we're just going in circles, here. I did and do
> > understand, but I didn't and don't agree. You're hypothesizing a user
> > who is willing to do ONE thing that they shouldn't do during backup
> > restoration (namely, remove backup_label) but who won't be willing to
> > do a SECOND thing that they shouldn't do during backup restoration
> > (namely, run pg_resetwal).
> 
> In my experience the first case is much more likely than the second. Your
> experience may vary.

My experience (though perhaps not a surprise) mirrors David's.

> Anyway, I think they are pretty different. Deleting backup label appears to
> give a perfectly valid restore. Running pg_resetwal is more clearly (I
> think) the nuclear solution.

Right, and a delete of backup_label is just an 'rm' that folks may think
"oh, this is just some leftover thing that isn't actually needed".

OTOH, pg_resetwal has an online documentation page and a man page that's
very clear that it's only to be used as a last resort (perhaps we should
pull that into the --help output too..?).  It's also pretty clear that
pg_resetwal is actually changing things about the cluster while nuking
backup_label doesn't *seem* to be in that same category, even though we
all know it is because it's needed once recovery begins.

I'd also put out there that while people don't do restore testing
nearly as much as they should, they tend to at _least_ try to do it once
after taking their first backup and if that fails then they try to figure
out why and what they're not doing right.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Improving Physical Backup/Restore within the Low Level API

2023-10-17 Thread David Steele


On 10/17/23 14:28, Robert Haas wrote:

On Mon, Oct 16, 2023 at 5:21 PM David G. Johnston
 wrote:

But no, by default, and probably so far as pg_basebackup is concerned, a server 
crash during backup results in requiring outside intervention in order to get 
the server to restart.


Others may differ, but I think such a proposal is dead on arrival. As
Laurenz says, that's just reinventing one of the main problems with
exclusive backup mode.


I concur -- this proposal resurrects the issues we had with exclusive 
backups without solving the issues being debated elsewhere, e.g. torn 
reads of pg_control or users removing backup_label when they should not.


Regards,
-David

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-10-17 Thread Bharath Rupireddy

On Thu, Oct 12, 2023 at 4:13 AM Andres Freund  wrote:
>
> On 2023-10-03 16:05:32 -0700, Jeff Davis wrote:
> >
> > Does this patch still look like a good fit for your (or someone else's)
> > plans for direct IO here? If so, would committing this soon make it
> > easier to make progress on that, or should we wait until it's actually
> > needed?
>
> I think it'd be quite useful to have. Even with the code as of 16, I see
> better performance in some workloads with debug_io_direct=wal,
> wal_sync_method=open_datasync compared to any other configuration. Except of
> course that it makes walsenders more problematic, as they suddenly require
> read IO. Thus having support for walsenders to send directly from wal buffers
> would be beneficial, even without further AIO infrastructure.

Right. Tests show the benefit with WAL DIO + this patch -
https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40mail.gmail.com.

Also, irrespective of WAL DIO, the WAL buffers hit ratio with the
patch stood at 95% for 1 primary, 1 sync standby, 1 async standby,
pgbench --scale=300 --client=32 --time=900. In other words, the
walsenders avoided 95% of the time reading from the file/avoided pread
system calls - 
https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com.

> I also think there are other quite desirable features that are made easier by
> this patch. One of the primary problems with using synchronous replication is
> the latency increase, obviously. We can't send out WAL before it has locally
> been wirten out and flushed to disk. For some workloads, we could
> substantially lower synchronous commit latency if we were able to send WAL to
> remote nodes *before* WAL has been made durable locally, even if the receiving
> systems wouldn't be allowed to write that data to disk yet: It takes less time
> to send just "write LSN: %X/%X, flush LSNL: %X/%X" than also having to send
> all the not-yet-durable WAL.
>
> In many OLTP workloads there won't be WAL flushes between generating WAL for
> DML and commit, which means that the amount of WAL that needs to be sent out
> at commit can be of nontrivial size.
>
> E.g. for pgbench, normally a transaction is about ~550 bytes (fitting in a
> single tcp/ip packet), but a pgbench transaction that needs to emit FPIs for
> everything is a lot larger: ~45kB (not fitting in a single packet). Obviously
> many real world workloads OLTP workloads actually do more writes than
> pgbench. Making the commit latency of the latter be closer to the commit
> latency of the former when using syncrep would obviously be great.
>
> Of course this patch is just a relatively small step towards that: We'd also
> need in-memory buffering on the receiving side, the replication protocol would
> need to be improved, we'd likely need an option to explicitly opt into
> receiving unflushed data. But it's still a pretty much required step.

Yes, this patch can pave the way for all of the above features in
future. However, I'm looking forward to getting this in for now.
Later, I'll come up with more concrete thoughts on the above.

Having said above, the latest v10 patch after addressing some of the
review comments is at
https://www.postgresql.org/message-id/CALj2ACU3ZYzjOv4vZTR%2BLFk5PL4ndUnbLS6E1vG2dhDBjQGy2A%40mail.gmail.com.
Any further thoughts on the patch is welcome.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Improving Physical Backup/Restore within the Low Level API

2023-10-17 Thread David G. Johnston

On Tue, Oct 17, 2023 at 12:30 PM David Steele  wrote:

> On 10/17/23 14:28, Robert Haas wrote:
> > On Mon, Oct 16, 2023 at 5:21 PM David G. Johnston
> >  wrote:
> >> But no, by default, and probably so far as pg_basebackup is concerned,
> a server crash during backup results in requiring outside intervention in
> order to get the server to restart.
> >
> > Others may differ, but I think such a proposal is dead on arrival. As
> > Laurenz says, that's just reinventing one of the main problems with
> > exclusive backup mode.
>
> I concur -- this proposal resurrects the issues we had with exclusive
> backups without solving the issues being debated elsewhere, e.g. torn
> reads of pg_control or users removing backup_label when they should not.
>
>
Thank you all for the feedback.

Admittedly I don't understand the problem of torn reads well enough to
solve it here but I figured by moving the "must not remove" stuff out of
backup_label and into pg_control the odds of it being removed from the
backup and the backup still booting basically go to zero.  I do agree that
renaming backup_label to something like "recovery_stuff_do_not_delete.conf"
probably does that just as well without the downside.

Placing a copy of all relevant files into pg_backup_metadata seems like a
decent shield against accidents and a way to reliably self-document the
backup even if the behavioral changes are not desired.  Though doing that
and handling multiple concurrent backups probably makes the cost too high
to move away from relying just on documentation.

I suppose I'd consider having to add one file to the data directory to be
an improvement over having to remove two of them - in terms of what it
takes to recover from system failure during a backup.

David J

Re: The danger of deleting backup_label

2023-10-17 Thread Robert Haas

On Tue, Oct 17, 2023 at 3:17 PM Stephen Frost  wrote:
> I'd also put out there that while people don't do restore testing
> nearly as much as they should, they tend to at _least_ try to do it once
> after taking their first backup and if that fails then they try to figure
> out why and what they're not doing right.

Well, I agree with you on that point, but a lot of people only seem to
realize that after it's too late.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-10-17 Thread Tom Lane

Jacob Champion  writes:
> v3 fixes a doc comment I forgot to fill in; there are no other code
> changes. To try to further reduce the activation energy, I've also
> attached an attempt at a backport to 11. The main difference is the
> absence of catalogIdHash, which showed up in 15, so we don't get the
> benefit of that deduplication.

So ... I still do not like anything about this patch.  Putting
has_policies into CatalogIdMapEntry isn't a wart, it's more
nearly a tumor.  Running getTablesWithPolicies before we can
acquire locks is horrid from the standpoint of minimizing the
window between our transaction snapshot and successful acquisition
of all needed locks.  (It might be all right in databases with
few pg_policy entries, but I don't think we can assume that that
holds everywhere.)  And the whole thing is just ugly and solves
the problem only partially.

What I am wondering about is whether we shouldn't just undo what
checkExtensionMembership does, specifically:

/*
 * In 9.6 and above, mark the member object to have any non-initial ACL,
 * policies, and security labels dumped.
 *
 * Note that any initial ACLs (see pg_init_privs) will be removed when we
 * extract the information about the object.  We don't provide support for
 * initial policies and security labels and it seems unlikely for those to
 * ever exist, but we may have to revisit this later.
 *
 * ...
 */

dobj->dump = ext->dobj.dump_contains & (DUMP_COMPONENT_ACL |
DUMP_COMPONENT_SECLABEL |
DUMP_COMPONENT_POLICY);

Why are we marking extension member objects as being subject to SECLABEL
or POLICY dumping?  As the comment notes, that isn't really sensible
unless what we are dumping is a delta from the extension's initial
assignments.  But we have no infrastructure for that, and none seems
likely to appear in the near future.

Could we not fix this by just reducing the above to

dobj->dump = ext->dobj.dump_contains & (DUMP_COMPONENT_ACL);

When and if someone comes along and implements storage of extensions'
initial policies, they can figure out how to avoid fetching policies for
not-to-be-dumped tables.  (My thoughts would run towards adding a column
to pg_class to help detect whether such work is needed without doing
expensive additional queries.)  But I don't see why we require a solution
to that problem as things stand.

regards, tom lane

Re: The danger of deleting backup_label

2023-10-17 Thread David Steele


On 10/14/23 11:30, David Steele wrote:

On 10/12/23 10:19, David Steele wrote:

On 10/11/23 18:10, Thomas Munro wrote:


As Stephen mentioned[1], we could perhaps also complain if both backup
label and control file exist, and then hint that the user should
remove the *control file* (not the backup label!).  I had originally
suggested we would just overwrite the control file, but by explicitly
complaining about it we would also bring the matter to tool/script
authors' attention, ie that they shouldn't be backing that file up, or
should be removing it in a later step if they copy everything.  He
also mentions that there doesn't seem to be anything stopping us from
back-patching changes to the backup label contents if we go this way.
I don't have a strong opinion on that and we could leave the question
for later.


I'm worried about the possibility of back patching this unless the 
solution comes out to be simpler than I think and that rarely comes to 
pass. Surely throwing errors on something that is currently valid 
(i.e. backup_label and pg_control both present).


But perhaps there is a simpler, acceptable solution we could back 
patch (transparent to all parties except Postgres) and then a more 
advanced solution we could go forward with.


I guess I had better get busy on this.


Attached is a very POC attempt at something along these lines that could 
be back patched. I stopped when I realized excluding pg_control from the 
backup is a necessity to make this work (else we still could end up with 
a torn copy of pg_control even if there is a good copy elsewhere). To 
enumerate the back patch issues as I see them:


Given that the above can't be back patched, I'm thinking we don't need 
backup_label at all going forward. We just write the values we need for 
recovery into pg_control and return *that* from pg_backup_stop() and 
tell the user to store it with their backup. We already have "These 
files are vital to the backup working and must be written byte for byte 
without modification, which may require opening the file in binary 
mode." in the documentation so dealing with pg_control should not be a 
problem. pg_control also has a CRC so we will know if it gets munged.


It doesn't really matter where/how they store pg_control as long as it 
is written back into PGDATA before the cluster starts. If 
backupEndRequired, etc., are set appropriately then recovery will do the 
right thing when it starts, just as now if PG crashes after it has 
renamed backup_label but before recovery to consistency has completed.


We can still enforce the presence of recovery.signal by checking 
backupEndRequired if that's something we want but it seems like 
backupEndRequired would be enough. I'm fine either way.


Another thing we can do here is make backup from standby easier. The 
user won't need to worry about *when* pg_control is copied. We can just 
write the ideal min recovery point into pg_control.


Any informational data currently in backup_label can be returned as 
columns (like the start/end lsn is now).


This makes the patch much less invasive and while it definitely, 
absolutely cannot be back patched, it seems like a good way forward.


This is the direction I'm planning to work on patch-wise but I'd like to 
hear people's thoughts.


Regards,
-David

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-17 Thread Stephen Frost

Greetings,

* Nathan Bossart (nathandboss...@gmail.com) wrote:
> On Wed, Oct 04, 2023 at 10:41:15PM -0700, Gurjeet Singh wrote:
> > The patches posted in this thread so far attempt to add the ability to
> > allow the user to have an arbitrary number of passwords. I believe
> > that allowing arbitrary number of passwords is not only unnecessary,
> > but the need to name passwords, the need to store them in a shared
> > catalog, etc. may actually create problems in the field. The
> > users/admins will have to choose names for passwords, which they
> > didn't have to previously. The need to name them may also lead to
> > users storing password-hints in the password names (e.g. 'mom''s
> > birthday', 'ex''s phone number', 'third password'), rendering the
> > passwords weak.
> > 
> > Moreover, allowing an arbitrarily many number of passwords will
> > require us to provide additional infrastructure to solve problems like
> > observability (which passwords are currently in use, and which ones
> > have been effectively forgotten by applications), or create a nuisance
> > for admins that can create more problems than it solves.
> 
> IMHO neither of these problems seems insurmountable.  Besides advising
> against using hints as names, we could also automatically generate safe
> names, or even disallow user-provided names entirely.  And adding
> observability for passwords seems worthwhile anyway.

Agreed, particularly on adding observability for password use.
Regardless of what we do, I feel pretty strongly that we need that.
That said, having this handled in a separate catalog feels like a just
generally better idea than shoving it all into pg_authid as we extend
things to include information like "last used date", "last used source
IP", etc.

Providing this observability purely through logging strikes me as a
terrible idea.

I don't find the concern about names as 'hints' to be at all compelling.
Having a way to avoid having names may have some value, but only if we
can come up with something reasonable.

> > So I propose that the feature should allow no more than 2 passwords
> > for a role, each with their own validity periods. This eliminates the
> > need to name passwords, because at any given time there are no more
> > than 2 passwords; current one, and old one. This also eliminates the
> > need for a shared catalog to hold passwords, because with the limit of
> > 2 imposed, we can store the old password and its validity period in
> > additional columns in the pg_authid table.
> 
> Another approach could be to allow an abritrary number of passwords but to
> also allow administrators to limit how many passwords can be associated to
> each role.  That way, we needn't restrict this feature to 2 passwords for
> everyone.  Perhaps 2 should be the default, but in any case, IMO we
> shouldn't design to only support 2.

Agreed that it's a bad idea to design to support 2 and only 2.  If
nothing else, there's the very simple case that the user needs to do
another password rotation ... and they look and discover that the old
password is still being used and that if they took it away, things would
break, but they need time to run down which system is still using it
while still needing to perform the migration for the other systems that
are correctly being updated- boom, need 3 for that case.  There's other
use-cases that could be interesting though- presumably we'd log which
password is used to authenticate and then users could have a fleet of
web servers which each have their own password but log into the same PG
user and they could happily rotate the passwords independently for all
of those systems.

I don't propose this as some use-case just for the purpose of argument-
not sharing passwords across a bunch of systems is absolutely a good
stance when it comes to security, and due to the way permissions and
roles work in PG, being able to have both distinct passwords with
explicitly logged indication of which system used what password to log
in while not having to deal with possibly permission differences due to
using actually independent roles is valuable.  That is, each server
using a separate role to log in could lead to some servers having access
to something or other while others don't pretty easily- if they're all
logging in with the same role and just a different password, that's not
going to happen.

* Jeff Davis (pg...@j-davis.com) wrote:
> Using a number seems weird to me because either:
> 
>  (a) if the number is always increasing you'd have to look to find the
> number of the new slot to add and the old slot to remove; or
>  (b) if switched between two numbers (say 0 and 1), it would be error
> prone because you'd have to remember which is the old one that can be
> safely replaced

Yeah, a number doesn't strike me as very good either.

> Maybe a password is best described by its validity period rather than a
> name? But what about passwords that don't expire?

The validity idea is interesting but falls down w

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-17 Thread Tomas Vondra

Hi,

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

In particular, this deals with these issues:

1) overflows in distance calculation for large timestamp values (0002)

2) incorrect subtraction in distance for date values (0003)

3) incorrect distance for infinite date/timestamp values (0005)

4) failing distance for extreme interval values (0007)

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary. People with multi-minmax indexes on "date" columns
probably will need to reindex.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom e23ca4e53d352e05951fb314dea347682794c25b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 17 Oct 2023 18:18:53 +0200
Subject: [PATCH 1/8] Tests for overflows with dates and timestamps in BRIN

When calculating distances for date and timestamp values for BRIN
minmax-multi indexes, we need to be careful about overflows for extreme
values. In that case the distance is negative, resulting in building of
inefficient summaries.

The new regression tests check this for date and timestamp data types.
It adds tables with data close to the allowed min/max values, and builds
a minmax-multi index on it.
---
 src/test/regress/expected/brin_multi.out | 61 ++
 src/test/regress/sql/brin_multi.sql  | 65 
 2 files changed, 126 insertions(+)

diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index 9f46934c9be..d5bd600f8fd 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -823,3 +823,64 @@ SELECT COUNT(*) FROM brin_test_multi_2 WHERE a = 'aab32389-22bc-c25a-6f60-6eb525
 
 DROP TABLE brin_test_multi_2;
 RESET enable_seqscan;
+-- test overflows during CREATE INDEX with extreme timestamp values
+CREATE TABLE brin_timestamp_test(a TIMESTAMPTZ);
+SET datestyle TO iso;
+INSERT INTO brin_timestamp_test VALUES
+('4713-01-01 00:00:01 BC'), ('4713-01-01 00:00:02 BC'), ('4713-01-01 00:00:03 BC'),
+('4713-01-01 00:00:04 BC'), ('4713-01-01 00:00:05 BC'), ('4713-01-01 00:00:06 BC'),
+('4713-01-01 00:00:07 BC'), ('4713-01-01 00:00:08 BC'), ('4713-01-01 00:00:09 BC'),
+('4713-01-01 00:00:10 BC'), ('4713-01-01 00:00:11 BC'), ('4713-01-01 00:00:12 BC'),
+('4713-01-01 00:00:13 BC'), ('4713-01-01 00:00:14 BC'), ('4713-01-01 00:00:15 BC'),
+('4713-01-01 00:00:16 BC'), ('4713-01-01 00:00:17 BC'), ('4713-01-01 00:00:18 BC'),
+('4713-01-01 00:00:19 BC'), ('4713-01-01 00:00:20 BC'), ('4713-01-01 00:00:21 BC'),
+('4713-01-01 00:00:22 BC'), ('4713-01-01 00:00:23 BC'), ('4713-01-01 00:00:24 BC'),
+('4713-01-01 00:00:25 BC'), ('4713-01-01 00:00:26 BC'), ('4713-01-01 00:00:27 BC'),
+('4713-01-01 00:00:28 BC'), ('4713-01-01 00:00:29 BC'), ('4713-01-01 00:00:30 BC'),
+('294276-12-01 00:00:01'), ('294276-12-01 00:00:02'), ('294276-12-01 00:00:03'),
+('294276-12-01 00:00:04'), ('294276-12-01 00:00:05'), ('294276-12-01 00:00:06'),
+('294276-12-01 00:00:07'), ('294276-12-01 00:00:08'), ('294276-12-01 00:00:09'),
+('294276-12-01 00:00:10'), ('294276-12-01 00:00:11'), ('294276-12-01 00:00:12'),
+('294276-12-01 00:00:13'), ('294276-12-01 00:00:14'), ('294276-12-01 00:00:15'),
+('294276-12-01 00:00:16'), ('294276-12-01 00:00:17'), ('294276-12-01 00:00:18'),
+('294276-12-01 00:00:19'), ('294276-12-01 00:00:20'), ('294276-12-01 00:00:21'),
+('294276-12-01 00:00:22'), ('294276-12-01 00:00:23'), ('294276-12-01 00:00:24'),
+('294276-12-01 00:00:25'), ('294276-12-01 00:00:26'), ('294276-12-01 00:00:27'),
+('294276-12-01 00:00:28'), ('294276-12-01 00:00:29'), ('294276-12-01 00:00:30');
+CREATE INDEX ON brin_timestamp_test USING brin (a timestamptz_minmax_multi_ops) WITH (pages_per_range=1);
+DROP TABLE brin_timestamp_test;
+-- test overflows during CREATE INDEX with extreme date values
+CREATE TABLE brin_date_test(a DATE);
+INSERT INTO brin_date_test VALUES
+('4713-01-01 BC'), ('4713-01-02 BC'), ('4713-01-03 BC'), ('4713-01-04 BC'),
+('4713-01-05 BC'), ('4713-01-06 BC'), ('4713-01-07 BC'), ('4713-01-08 BC'),
+('4713-01-09 BC'), ('4713-01-10 BC'), ('4713-01-11 BC'), ('4713-01-12 BC'),
+('4713-01-13 BC'), ('4713-01-14 BC'), ('4713-01-15 BC'), ('4713-01-16 BC'),
+('4713-01-17 BC'), ('4713-01-18 BC'), ('4713-01-19 BC'), ('4713-01-20 BC'),
+('4713-01-21 BC'), ('4713-01-22 BC'), ('4713-01-23 BC'), ('4713-01-24 BC'),
+('4713-01-25 BC'), ('4713-01-26 BC'), ('4713-01-27 BC'), ('4713-01-28 BC'),
+('4713-01-29 BC'), ('4713-01-30 BC'), ('4713-01-31 BC'),
+('5874897-12-01'), ('5874897-12-02'), ('5874897-12-

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Jelte Fennema

On Tue, 17 Oct 2023 at 18:53, Maciek Sakrejda  wrote:
> Git push does have an --atomic flag to treat the entire push as a single 
> operation.

I decided to play around a bit with server hooks. Attached is a git
"update" hook that rejects pushes to the master branch when the new
HEAD of master does not pass pgindent. It tries to do the minimal
amount of work necessary. Together with the --atomic flag of git push
I think this would work quite well.

Note: It does require that pg_bsd_indent is in PATH. While not perfect
seems like it would be acceptable in practice to me. Its version is
not updated very frequently. So manually updating it on the git server
when we do does not seem like a huge issue to me.

The easiest way to try it out is by cloning the postgres repo in two
different local directories, let's call them A and B. And then
configure directory B to be the origin remote of A. By placing the
update script in B/.git/hooks/ it will execute whenever you push
master from A to B.

update
Description: Binary data

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Magnus Hagander

On Tue, Oct 17, 2023 at 10:43 PM Jelte Fennema  wrote:
>
> On Tue, 17 Oct 2023 at 18:53, Maciek Sakrejda  wrote:
> > Git push does have an --atomic flag to treat the entire push as a single 
> > operation.
>
> I decided to play around a bit with server hooks. Attached is a git
> "update" hook that rejects pushes to the master branch when the new
> HEAD of master does not pass pgindent. It tries to do the minimal
> amount of work necessary. Together with the --atomic flag of git push
> I think this would work quite well.
>
> Note: It does require that pg_bsd_indent is in PATH. While not perfect
> seems like it would be acceptable in practice to me. Its version is
> not updated very frequently. So manually updating it on the git server
> when we do does not seem like a huge issue to me.

If it doesn't know how to rebuild it, aren't we going to be stuck in a
catch-22 if we need to change it in certain ways? Since an old version
of pg_bsd_indent would reject the patch that might include updating
it. (And when it does, one should expect the push to take quite a long
time, but given the infrequency I agree that part is probably not an
issue)

And unless we're only enforcing it on master, we'd also need to make
provisions for different versions of it on different branches, I
think?

Other than that, I agree it's fairly simple. It does nede a lot more
sandboxing than what's in there now, but that's not too hard of a
problem to solve, *if* this is what we want.

(And of course needs to be integrated with the existing script since
AFAIK you can't chain git hooks unless you do it manually - but that's
mostliy mechanical)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: run pgindent on a regular basis / scripted manner

2023-10-17 Thread Tom Lane

Magnus Hagander  writes:
> If it doesn't know how to rebuild it, aren't we going to be stuck in a
> catch-22 if we need to change it in certain ways? Since an old version
> of pg_bsd_indent would reject the patch that might include updating
> it. (And when it does, one should expect the push to take quite a long
> time, but given the infrequency I agree that part is probably not an
> issue)

Everyone who has proposed this has included a caveat that there must
be a way to override the check.  Given that minimal expectation, it
shouldn't be too hard to deal with pg_bsd_indent-updating commits.

regards, tom lane

Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-10-17 Thread Tom Lane

I wrote:
> Why are we marking extension member objects as being subject to SECLABEL
> or POLICY dumping?  As the comment notes, that isn't really sensible
> unless what we are dumping is a delta from the extension's initial
> assignments.  But we have no infrastructure for that, and none seems
> likely to appear in the near future.

Here's a quick patch that does it that way.  The test changes
are identical to Jacob's v3-0001.

regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 83aeef2751..fd6d8bb5dd 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1680,13 +1680,10 @@ checkExtensionMembership(DumpableObject *dobj, Archive *fout)
 	addObjectDependency(dobj, ext->dobj.dumpId);
 
 	/*
-	 * In 9.6 and above, mark the member object to have any non-initial ACL,
-	 * policies, and security labels dumped.
-	 *
-	 * Note that any initial ACLs (see pg_init_privs) will be removed when we
-	 * extract the information about the object.  We don't provide support for
-	 * initial policies and security labels and it seems unlikely for those to
-	 * ever exist, but we may have to revisit this later.
+	 * In 9.6 and above, mark the member object to have any non-initial ACLs
+	 * dumped.  (Any initial ACLs will be removed later, using data from
+	 * pg_init_privs, so that we'll dump only the delta from the extension's
+	 * initial setup.)
 	 *
 	 * Prior to 9.6, we do not include any extension member components.
 	 *
@@ -1694,6 +1691,13 @@ checkExtensionMembership(DumpableObject *dobj, Archive *fout)
 	 * individually, since the idea is to exactly reproduce the database
 	 * contents rather than replace the extension contents with something
 	 * different.
+	 *
+	 * Note: it might be interesting someday to implement storage and delta
+	 * dumping of extension members' RLS policies and/or security labels.
+	 * However there is a pitfall for RLS policies: trying to dump them
+	 * requires getting a lock on their tables, and the calling user might not
+	 * have privileges for that.  We need no lock to examine a table's ACLs,
+	 * so the current feature doesn't have a problem of that sort.
 	 */
 	if (fout->dopt->binary_upgrade)
 		dobj->dump = ext->dobj.dump;
@@ -1702,9 +1706,7 @@ checkExtensionMembership(DumpableObject *dobj, Archive *fout)
 		if (fout->remoteVersion < 90600)
 			dobj->dump = DUMP_COMPONENT_NONE;
 		else
-			dobj->dump = ext->dobj.dump_contains & (DUMP_COMPONENT_ACL |
-	DUMP_COMPONENT_SECLABEL |
-	DUMP_COMPONENT_POLICY);
+			dobj->dump = ext->dobj.dump_contains & (DUMP_COMPONENT_ACL);
 	}
 
 	return true;
diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
index d00c3544e9..68a767d2f5 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -175,6 +175,19 @@ my %pgdump_runs = (
 			'postgres',
 		],
 	},
+
+	# regress_dump_login_role shouldn't need SELECT rights on internal
+	# (undumped) extension tables
+	privileged_internals => {
+		dump_cmd => [
+			'pg_dump', '--no-sync', "--file=$tempdir/privileged_internals.sql",
+			# these two tables are irrelevant to the test case
+			'--exclude-table=regress_pg_dump_schema.external_tab',
+			'--exclude-table=regress_pg_dump_schema.extdependtab',
+			'--username=regress_dump_login_role', 'postgres',
+		],
+	},
+
 	schema_only => {
 		dump_cmd => [
 			'pg_dump', '--no-sync', "--file=$tempdir/schema_only.sql",
@@ -284,6 +297,7 @@ my %full_runs = (
 	exclude_table => 1,
 	no_privs => 1,
 	no_owner => 1,
+	privileged_internals => 1,
 	with_extension => 1,
 	without_extension => 1);
 
@@ -321,6 +335,16 @@ my %tests = (
 		like => { pg_dumpall_globals => 1, },
 	},
 
+	'CREATE ROLE regress_dump_login_role' => {
+		create_order => 1,
+		create_sql => 'CREATE ROLE regress_dump_login_role LOGIN;',
+		regexp => qr/^
+			\QCREATE ROLE regress_dump_login_role;\E
+			\n\QALTER ROLE regress_dump_login_role WITH \E.*\Q LOGIN \E.*;
+			\n/xm,
+		like => { pg_dumpall_globals => 1, },
+	},
+
 	'GRANT ALTER SYSTEM ON PARAMETER full_page_writes TO regress_dump_test_role'
 	  => {
 		create_order => 2,
@@ -704,6 +728,7 @@ my %tests = (
 			data_only => 1,
 			extension_schema => 1,
 			pg_dumpall_globals => 1,
+			privileged_internals => 1,
 			section_data => 1,
 			section_pre_data => 1,
 			# Excludes this schema as extension is not listed.
@@ -720,6 +745,7 @@ my %tests = (
 			data_only => 1,
 			extension_schema => 1,
 			pg_dumpall_globals => 1,
+			privileged_internals => 1,
 			section_data => 1,
 			section_pre_data => 1,
 			# Excludes this schema as extension is not listed.
@@ -743,6 +769,7 @@ my %tests = (
 			# Excludes the extension and keeps the schema's data.
 			without_extension_internal_schema => 1,
 		},
+		unlike => { privileged_internals => 1 },
 	},);
 
 #
diff --git a/src/test/modules/test_pg_dump/test_pg_dum

Re: stopgap fix for signal handling during restore_command

2023-10-17 Thread Nathan Bossart

On Tue, Oct 17, 2023 at 12:47:29PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> ... and it looks like some of the back-branches are failing for Windows.
>> I'm assuming this is because c290e79 was only back-patched to v15.  My
>> first instinct is just to back-patch that one all the way to v11, but maybe
>> there's an alternative involving #ifdef WIN32.  Are there any concerns with
>> back-patching c290e79?
> 
> Sounds fine to me.

Thanks, done.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add support for AT LOCAL

2023-10-17 Thread Thomas Munro

Hmm, I guess I must have missed some important flag or environment
variable when trying to reproduce it, sorry.

Given that IBM describes xlc as "legacy" (replaced by xlclang, but
still supported for some unspecified period of time for the benefit of
people who need C++ ABI compatibility with old code), I wonder how
long we plan to support it...  Anecdotally, from a time 1-2 decades
ago when I used AIX daily, I can report that vast amounts of open
source stuff couldn't build with xlc, so gcc was used for pretty much
anything that didn't have a C++ ABI requirement.  I kinda wonder if a
single person in the entire world appreciates that we support this.

Re: Add support for AT LOCAL

2023-10-17 Thread Tom Lane

Thomas Munro  writes:

> Given that IBM describes xlc as "legacy" (replaced by xlclang, but
> still supported for some unspecified period of time for the benefit of
> people who need C++ ABI compatibility with old code), I wonder how
> long we plan to support it...

Should we be testing against xlclang instead?

regards, tom lane

Re: Add support for AT LOCAL

2023-10-17 Thread Thomas Munro

On Wed, Oct 18, 2023 at 11:54 AM Tom Lane  wrote:
> Thomas Munro  writes:
>
> > Given that IBM describes xlc as "legacy" (replaced by xlclang, but
> > still supported for some unspecified period of time for the benefit of
> > people who need C++ ABI compatibility with old code), I wonder how
> > long we plan to support it...
>
> Should we be testing against xlclang instead?

I hesitated to suggest it because it's not my animal/time we're
talking about but it seems to make more sense.  It appears to be IBM's
answer to the nothing-builds-with-this-thing phenomenon, since it
accepts a lot of GCCisms via Clang's adoption of them.  From a quick
glance at [1], it lacks the atomics builtins but we have our own
assembler magic for POWER.  So maybe it'd all just work™.

[1] 
https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=migration-checklist-when-moving-from-xl-based-front-end-clang-based-front-end

Re: Add support for AT LOCAL

2023-10-17 Thread Tom Lane

Thomas Munro  writes:
> On Wed, Oct 18, 2023 at 11:54 AM Tom Lane  wrote:
>> Should we be testing against xlclang instead?

> I hesitated to suggest it because it's not my animal/time we're
> talking about but it seems to make more sense.  It appears to be IBM's
> answer to the nothing-builds-with-this-thing phenomenon, since it
> accepts a lot of GCCisms via Clang's adoption of them.  From a quick
> glance at [1], it lacks the atomics builtins but we have our own
> assembler magic for POWER.  So maybe it'd all just work™.

Discounting the Windows animals, it looks like the xlc animals are
our only remaining ones that use anything except gcc or clang.
That feels uncomfortably like a compiler monoculture to me, so
I can understand the reasoning for keeping hornet/mandrill going.
Still, maybe we should just accept the fact that gcc/clang have
outcompeted everything else in the C compiler universe.  It's
getting hard to imagine that anyone would bring out some new product
that didn't try to be bug-compatible with gcc, for precisely the
reason you mention.

regards, tom lane

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-17 Thread Jeff Davis

On Tue, 2023-10-17 at 16:20 -0400, Stephen Frost wrote:
> Agreed that it's a bad idea to design to support 2 and only 2.

I don't disagree, but it's difficult to come up with syntax that:

 (a) supports N passwords
 (b) makes the ordinary cases simple and documentable
 (c) helps users avoid mistakes (at least in the simple cases)
 (d) makes sense passwords with and without validity period
 (e) handles the challenging cases

One challenging case is that we cannot allow the mixing of password
protocols (e.g. MD5 & SCRAM), because the authentication exchange only
gets one chance at success. If someone ends up with 7 MD5 passwords,
and they'd like to migrate to SCRAM, then we can't allow them to
migrate one password at a time (because then the other passwords would
break). I'd like to see what the SQL for doing this should look like.

>   If
> nothing else, there's the very simple case that the user needs to do
> another password rotation ... and they look and discover that the old
> password is still being used and that if they took it away, things
> would
> break, but they need time to run down which system is still using it
> while still needing to perform the migration for the other systems
> that
> are correctly being updated- boom, need 3 for that case.

That sounds like a reasonable use case. I don't know if we should make
it a requirement, but if we come up with something reasonable that
supports this case I'm fine with it. Ideally, it would still be easy to
see when you are making a mistake (e.g. forgetting to ever remove
passwords).

>   There's other
> use-cases that could be interesting though- presumably we'd log which
> password is used to authenticate and then users could have a fleet of
> web servers which each have their own password but log into the same
> PG
> user and they could happily rotate the passwords independently for
> all
> of those systems.
> 
> if they're all
> logging in with the same role and just a different password, that's
> not
> going to happen.

I'm not sure this is a great idea. Can you point to some precedent
here?

> Giving users the option of not having to specify a name and letting
> the
> system come up with one (similar to what we do for indexes and such)
> could work out pretty decently, imv.  I'd have that be optional
> though-
> if the user wants to specify a name, then they should be allowed to
> do
> so.

Can you describe how some basic use cases should work with example SQL?

> 
> > * the identifier of the current password always changing (perhaps
> > fine
> > if it'a a "created at" ID?)
> 
> I'm not following what the issue is you're getting at here.

I just meant that when rotating passwords, you have to keep coming up
with new names, so the "current" or "primary" one would not have a
consistent name.

Regards,
Jeff Davis

Re: Add support for AT LOCAL

2023-10-17 Thread Michael Paquier

On Tue, Oct 17, 2023 at 12:45:28PM -0400, Tom Lane wrote:
> Whoops, no: for negative starting values we'd need truncate-towards-
> minus-infinity division whereas C99 specifies truncate-towards-zero.
> However, the attached does pass for me on cfarm111 as well as my
> usual dev machine.

I guess that the following trick could be used for the negative case,
with one modulo followed by one extra addition: 
if (result->time < INT64CONST(0))
{
result->time %= USECS_PER_DAY;
result->time += USECS_PER_DAY;
}

> Presumably this is a pre-existing bug that also appears in back
> branches.  But in the interests of science I propose that we
> back-patch only the test case and see which machine(s) fail it
> before back-patching the code change.

Sure, as you see fit.
--
Michael


signature.asc
Description: PGP signature

Re: New WAL record to detect the checkpoint redo location

2023-10-17 Thread Michael Paquier

On Tue, Oct 17, 2023 at 12:45:52PM -0400, Robert Haas wrote:
> On Fri, Oct 13, 2023 at 3:29 AM Michael Paquier  wrote:
>> I've mentioned as well a test in pg_walinspect after one of the
>> checkpoints generated there, but what you do here is enough for the
>> online case.
> 
> I don't quite understand what you're saying here. If you're suggesting
> a potential improvement, can you be a bit more clear and explicit
> about what the suggestion is?

Suggestion is from here, with a test for pg_walinspect after it runs
its online checkpoint (see the full-page case):
https://www.postgresql.org/message-id/ZOvf1tu6rfL/b...@paquier.xyz

+-- Check presence of REDO record.
+SELECT redo_lsn FROM pg_control_checkpoint() \gset
+SELECT start_lsn = :'redo_lsn'::pg_lsn AS same_lsn, record_type
+  FROM pg_get_wal_record_info(:'redo_lsn');

>> Then repeat this pattern for non-shutdown checkpoints a few lines down
>> without touching the copy of the redo LSN in XLogCtl->Insert, because
>> of course we don't hold the WAL insert locks in an exclusive fashion
>> here:
>> checkPoint.redo = RedoRecPtr;
>>
>> My point is that this is not only about RedoRecPtr, but also about
>> XLogCtl->Insert.RedoRecPtr here.  The comment in ReserveXLogSwitch()
>> says that.
> 
> I have adjusted the comment in CreateCheckPoint to hopefully address
> this concern.

-* XLogInsertRecord will have updated RedoRecPtr, but we need to copy
-* that into the record that will be inserted when the checkpoint is
-* complete.
+* XLogInsertRecord will have updated XLogCtl->Insert.RedoRecPtr in
+* shared memory and RedoRecPtr in backend-local memory, but we need
+* to copy that into the record that will be inserted when the
+* checkpoint is complete. 

This comment diff between v8 and v9 looks OK to me.  Thanks.

> I don't understand what you mean about
> ReserveXLogSwitch(), though.

I am not sure either, looking back at that :p
--
Michael


signature.asc
Description: PGP signature

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Peter Smith

Here are some review comments for v51-0001

==
src/bin/pg_upgrade/check.c

0.
+check_old_cluster_for_valid_slots(bool live_check)
+{
+ char output_path[MAXPGPATH];
+ FILE*script = NULL;
+
+ prep_status("Checking for valid logical replication slots");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "invalid_logical_relication_slots.txt");

0a
typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

~

0b.
Since the non-upgradable slots are not strictly "invalid", is this an
appropriate filename for the bad ones?

But I don't have very good alternatives. Maybe:
- non_upgradable_logical_replication_slots.txt
- problem_logical_replication_slots.txt

==
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+# --
+# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
+#
+# There are two requirements for GUCs - wal_level and max_replication_slots,
+# but only max_replication_slots will be tested here. This is because to
+# reduce the execution time of the test.


SUGGESTION
# TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
#
# Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
# reduce the test execution time, only 'max_replication_slots' is tested here.

~~~

2.
+# Preparations for the subsequent test:
+# 1. Create two slots on the old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1',
'test_decoding', false, true);"
+);
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);"
+);


Can't you combine those SQL in the same $old_publisher->safe_psql.

~~~

3.
+# Clean up
+rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
+# Set max_replication_slots to the same value as the number of slots. Both of
+# slots will be used for subsequent tests.
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");

The code doesn't seem to match the comment - is this correct? The
old_publisher created 2 slots, so why are you setting new_publisher
"max_replication_slots = 1" again?

~~~

4.
+# Preparations for the subsequent test:
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+# 2. Advance the slot test_slot2 up to the current WAL location
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+# 3. Emit a non-transactional message. test_slot2 detects the message so that
+# this slot will be also reported by upcoming pg_upgrade.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+);


I felt this test would be clearer if you emphasised the state of the
test_slot1 also. e.g.

4a.
BEFORE
+# 1. Generate extra WAL records. Because these WAL records do not get consumed
+# it will cause the upcoming pg_upgrade test to fail.

SUGGESTION
# 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
#has consumed them.

~

4b.
BEFORE
+# 2. Advance the slot test_slot2 up to the current WAL location

SUGGESTION
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
#still has unconsumed WAL records.

~~~

5.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(

/because the slot still has/because there are slots still having/

~~~

6.
+ [qr//],
+ 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
+);

/slot/slots/

~~~

7.
+# And check the content. Both of slots must be reported that they have
+# unconsumed WALs after confirmed_flush_lsn.

SUGGESTION
# Check the file content. Both slots should be reporting that they have
# unconsumed WAL records.


~~~

8.
+# Preparations for the subsequent test:
+# 1. Setup logical replication
+my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
+
+$old_publisher->start;
+
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');");
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot2');");
+
+$old_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub FOR ALL TABLES;");


8a.
/Setup logical replication/Setup logical replication (first, cleanup
slots from the previous tests)/

~

8b.
Can't you combine all those SQL in the same $old_publisher->safe_psql.

~~~

9.
+
+# Actual run, successful upgrade is expected
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_pub

Re: The danger of deleting backup_label

2023-10-17 Thread Kyotaro Horiguchi

At Tue, 17 Oct 2023 16:16:42 -0400, David Steele  wrote in 
> Given that the above can't be back patched, I'm thinking we don't need
> backup_label at all going forward. We just write the values we need
> for recovery into pg_control and return *that* from pg_backup_stop()
> and tell the user to store it with their backup. We already have
> "These files are vital to the backup working and must be written byte
> for byte without modification, which may require opening the file in
> binary mode." in the documentation so dealing with pg_control should
> not be a problem. pg_control also has a CRC so we will know if it gets
> munged.

I'm somewhat perplexed regarding the objective of this thread.

This thread began with the intent of preventing users from removing
the backup_label from a backup. At the beginning, the proposal aimed
to achieve this by injecting an invalid value to pg_control file
located in the generated backup. However, this (and previous) proposal
seems to deviate from that initial objective. It now eliminates the
need to be concerned about the pg_control version that is coped into
the generated backup. However, if someone removes the backup_label
from a backup, the initial concerns could still surface.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: remaining sql/json patches

2023-10-17 Thread jian he

Hi.
based on v22.

I added some tests again json_value for the sake of coverager test.

A previous email thread mentioned needing to check *empty in ExecEvalJsonExpr.
since JSON_VALUE_OP, JSON_QUERY_OP, JSON_EXISTS_OP all need to have
*empty cases, So I refactored a little bit.
might be helpful. Maybe we can also refactor *error cases.

The following part is not easy to understand.
res = ExecPrepareJsonItemCoercion(jbv,
+  jsestate->item_jcstates,
+  &post_eval->jcstate);
+ if (post_eval->jcstate &&
+ post_eval->jcstate->coercion &&
+ (post_eval->jcstate->coercion->via_io ||
+ post_eval->jcstate->coercion->via_populate))
From 439f98ae55e6f67ca6c5bf023c6e9ab2796f4c49 Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Wed, 18 Oct 2023 10:12:33 +0800
Subject: [PATCH v22 1/1] add some test, refactor ExecEvalJsonExpr

add some tests again json_value for code coverage.
refactor ExecEvalJsonExpr, let JSON_VALUE_OP, JSON_QUERY_OP,
JSON_EXISTS_OP handle *empty in a common way.
---
 src/backend/executor/execExprInterp.c   | 41 +
 src/test/regress/expected/jsonb_sqljson.out | 36 ++
 src/test/regress/sql/jsonb_sqljson.sql  |  6 +++
 3 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 123121a9..1a1f2089 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -4218,7 +4218,8 @@ ExecEvalJsonExpr(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
 	*op->resvalue = (Datum) 0;
 	return;
 }
-
+if(*empty)
+	goto return_empty;
 resnull = false;
 res = BoolGetDatum(exists);
 break;
@@ -4236,7 +4237,10 @@ ExecEvalJsonExpr(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
 *op->resvalue = (Datum) 0;
 return;
 			}
+			if (*empty)
+goto return_empty;
 			resnull = !DatumGetPointer(res);
+
 			break;
 
 		case JSON_VALUE_OP:
@@ -4253,15 +4257,14 @@ ExecEvalJsonExpr(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
 	*op->resvalue = (Datum) 0;
 	return;
 }
-
-if (!jbv)		/* NULL or empty */
+if (*empty)
+	goto return_empty;
+if (!jbv)		/* NULL */
 {
 	resnull = true;
+	res		= (Datum) 0;
 	break;
 }
-
-Assert(!*empty);
-
 resnull = false;
 
 /* Coerce scalar item to the output type */
@@ -4322,31 +4325,23 @@ ExecEvalJsonExpr(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
 			return;
 	}
 
+	*op->resvalue = res;
+	*op->resnull = resnull;
+	return;
 	/*
 	 * If the ON EMPTY behavior is to cause an error, do so here.  Other
 	 * behaviors will be handled in ExecEvalJsonExprBehavior().
 	 */
-	if (*empty)
-	{
-		Assert(jexpr->on_empty);	/* it is not JSON_EXISTS */
-
-		if (jexpr->on_empty->btype == JSON_BEHAVIOR_ERROR)
+return_empty:
+		Assert(jexpr->on_empty);
+		*op->resnull = true;
+		*op->resvalue = (Datum) 0;
+		if (jexpr->on_empty->btype == JSON_BEHAVIOR_ERROR && throw_error)
 		{
-			if (!throw_error)
-			{
-*op->resnull = true;
-*op->resvalue = (Datum) 0;
-return;
-			}
-
 			ereport(ERROR,
 	(errcode(ERRCODE_NO_SQL_JSON_ITEM),
 	 errmsg("no SQL/JSON item")));
 		}
-	}
-
-	*op->resvalue = res;
-	*op->resnull = resnull;
 }
 
 /*
@@ -4467,7 +4462,7 @@ ExecEvalJsonExprCoercion(ExprState *state, ExprEvalStep *op,
 			{
 Jsonb	   *jb = resnull ? NULL : DatumGetJsonbP(res);
 
-if (jb && JB_ROOT_IS_SCALAR(jb))
+if (jb)
 {
 	omit_quotes = true;
 	val_string = JsonbUnquote(jb);
diff --git a/src/test/regress/expected/jsonb_sqljson.out b/src/test/regress/expected/jsonb_sqljson.out
index a3ba44cf..0526ff00 100644
--- a/src/test/regress/expected/jsonb_sqljson.out
+++ b/src/test/regress/expected/jsonb_sqljson.out
@@ -512,6 +512,42 @@ SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timestamptz '2018-02-21 12:34:56 +
  Tue Feb 20 18:34:56 2018 PST
 (1 row)
 
+SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timestamp '2018-02-21 12:34:56 +10' AS ts);
+json_value
+--
+ Wed Feb 21 12:34:56 2018
+(1 row)
+
+SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timestamp(3) '2018-02-21 12:34:56.123456 +10' AS ts);
+  json_value  
+--
+ Wed Feb 21 12:34:56.123 2018
+(1 row)
+
+SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timestamptz(3) '2018-02-21 12:34:56.123456 +10' AS ts);
+json_value
+--
+ Tue Feb 20 18:34:56.123 2018 PST
+(1 row)
+
+SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING date '2018-02-21 12:34:56 +10' AS ts);
+ json_value 
+
+ 02-21-2018
+(1 row)
+
+SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING time '2018-02-21 12:34:56 +10' AS ts);
+ json_value 
+
+ 12:34:56
+(1 row)
+
+SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timetz '2018-02-21 12:34:56 +10' AS ts);
+ json_value  
+-
+ 12:34:56+10
+(1 row)
+
 SELECT JSON_VALUE(jso

A trouble about meson on Windows

2023-10-17 Thread Kyotaro Horiguchi

Hello.

I've been unable to build PostgreSQL using Meson on Windows. As I'm
unsure of the cause, I'm providing this as a report.

In brief, the ninja command fails with the following error message on
my Windows environment.

>ninja -v
ninja: error: 'src/backend/postgres_lib.a.p/meson_pch-c.obj', needed by 
'src/backend/postgres.exe', missing and no known rule to make it

I'd like to note that I haven't made any modification to meson.build,
therefore, b_pch should be false. (I didn't confirm that because the
python scripts are executables on my environment..) However,
build.ninja actually contains several entries referencing
meson_pch-c.obj, despite lacking corresponding entries to build them:

> build src/backend/postgres.exe | src/backend/postgres.pdb: c_LINKER_RSP 
> src/backend/postgres.exe.p/win32ver.res 
> src/backend/postgres_lib.a.p/meson_pch-c.obj ..

An excerpted output from the command "ninja".

>ninja
Version: 1.2.1
...
Build type: native build
Project name: postgresql
Project version: 17devel
C compiler for the host machine: cl (msvc 19.34.31935 "Microsoft(R) C/C++ 
Optimizing Compiler Version 19.34.31935 for x64")
C linker for the host machine: link link 14.34.31935.0
Host machine cpu family: x86_64
Host machine cpu: x86_64
...
Program python found: YES 
(C:\Users\horiguti\AppData\Local\Programs\Python\Python310\python.EXE)
...
Program C:\Users\horiguti\AppData\Local\Programs\Python\Python310\Scripts\meson 
found: YES 
(C:\Users\horiguti\AppData\Local\Programs\Python\Python310\Scripts\meson.exe)
...
Found ninja-1.11.1.git.kitware.jobserver-1 at 
C:\Users\horiguti\AppData\Local\Programs\Python\Python310\Scripts\ninja.EXE
Cleaning... 0 files.
ninja: error: 'src/backend/postgres_lib.a.p/meson_pch-c.obj', needed by 
'src/backend/postgres.exe', missing and no known rule to make it

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-17 Thread Stephen Frost

Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Tue, 2023-10-17 at 16:20 -0400, Stephen Frost wrote:
> > Agreed that it's a bad idea to design to support 2 and only 2.
> 
> I don't disagree, but it's difficult to come up with syntax that:
> 
>  (a) supports N passwords
>  (b) makes the ordinary cases simple and documentable
>  (c) helps users avoid mistakes (at least in the simple cases)
>  (d) makes sense passwords with and without validity period
>  (e) handles the challenging cases

Undoubtably biased ... but I don't agree that this is so difficult.
What points have been raised as a downside of the originally proposed
approach, specifically?

Reading back through the thread, from a user perspective, the primary
one seems to be that passwords are expected to be named.  I'm surprised
this is being brought up as such a serious concern.  Lots and lots and
lots of things in the system require naming, after all, and the idea
that this is somehow harder or more of an issue is quite odd to me.

> One challenging case is that we cannot allow the mixing of password
> protocols (e.g. MD5 & SCRAM), because the authentication exchange only
> gets one chance at success. If someone ends up with 7 MD5 passwords,
> and they'd like to migrate to SCRAM, then we can't allow them to
> migrate one password at a time (because then the other passwords would
> break). I'd like to see what the SQL for doing this should look like.

I've got absolutely no interest in supporting md5- it's high time to rip
that out and be happy that it's gone.  We nearly did it last year and
I'm really hoping we accomplish that this year.

I'm open to the idea that we may need to support some new SCRAM version
or an alternative mechanism in the future, but it's long past time to
spend any time worrying about md5.  As for how difficult it is to deal
with supporting an alternative in the future- that's going to depend a
great deal on what that alternative is and I don't know that we can
really code to handle that as part of this effort in a sensible way, and
trying to code for "anything" is going to likely make this much more
complicated, and probably rife with security issues too.

> >   If
> > nothing else, there's the very simple case that the user needs to do
> > another password rotation ... and they look and discover that the old
> > password is still being used and that if they took it away, things
> > would
> > break, but they need time to run down which system is still using it
> > while still needing to perform the migration for the other systems
> > that
> > are correctly being updated- boom, need 3 for that case.
> 
> That sounds like a reasonable use case. I don't know if we should make
> it a requirement, but if we come up with something reasonable that
> supports this case I'm fine with it. Ideally, it would still be easy to
> see when you are making a mistake (e.g. forgetting to ever remove
> passwords).

We have monitoring for many, many parts of the system and this would be
a good thing to monitor also- not just at a per-password level but also
at an overall role/user level, as you have a similar issue there and we
don't properly provide users with any way to check reasonably "hey, when
was the last time this user logged in?".  No, trawling through ancient
logs, if you even have them, isn't a proper solution to this problem.

> >   There's other
> > use-cases that could be interesting though- presumably we'd log which
> > password is used to authenticate and then users could have a fleet of
> > web servers which each have their own password but log into the same
> > PG
> > user and they could happily rotate the passwords independently for
> > all
> > of those systems.
> > 
> > if they're all
> > logging in with the same role and just a different password, that's
> > not
> > going to happen.
> 
> I'm not sure this is a great idea. Can you point to some precedent
> here?

It's already the case that lots and lots and lots of systems out there
log into PG using the same username/password.  With this, we're at least
offering them the ability to vary the password and keep the user the
same.  We've even seen this be asked for in other ways- the ability to
use distinct Kerberos or LDAP identifies to log into the same user in
the database, see pg_ident.conf and various suggestions for how to bring
that to LDAP too.  Other systems also support the ability to have a
group of users in LDAP, or such, be allowed to log into a specific
database user.  One big reason for this is that then you know everyong
logging into that account has exactly the same access to things- because
that's the lowest level at which access can be granted.  Having a way to
support this similar capability but for passwords is certainly useful.

> > Giving users the option of not having to specify a name and letting
> > the
> > system come up with one (similar to what we do for indexes and such)
> > could work out pretty decently, imv.  I'd have that be optional
> >

Remove wal_level settings for subscribers in tap tests

2023-10-17 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

While discussing [1], I found that in tap tests, wal_level was set to logical 
for
subscribers too. The setting is not needed for subscriber side, and it may cause
misunderstanding for newcomers. Therefore, I wanted to propose the patch which
removes unnecessary "allows_streaming => 'logical'".
I grepped with the string and checked the necessity of them one by one.

How do you think?

[1]: https://commitfest.postgresql.org/45/4273/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



0001-remove-unnecessary-wal_level-settings.patch
Description: 0001-remove-unnecessary-wal_level-settings.patch

Re: Add support for AT LOCAL

2023-10-17 Thread Tom Lane

Thomas Munro  writes:
> On Wed, Oct 18, 2023 at 11:54 AM Tom Lane  wrote:
>> Should we be testing against xlclang instead?

> I hesitated to suggest it because it's not my animal/time we're
> talking about but it seems to make more sense.  It appears to be IBM's
> answer to the nothing-builds-with-this-thing phenomenon, since it
> accepts a lot of GCCisms via Clang's adoption of them.  From a quick
> glance at [1], it lacks the atomics builtins but we have our own
> assembler magic for POWER.  So maybe it'd all just work™.

FWIW, I tried a test build with xlclang 16.1 on cfarm111, and
it does seem like it Just Works, modulo a couple of oddities:

*  fails to compile, due to references to struct
in6_addr, unless  is included first.  Most of our
references to tcp.h already do that, but not libpq-be.h and
fe-protocol3.c.  I'm a bit at a loss why we've not seen this
with the existing BF animals on this machine, because AFAICS
they're all using the same /usr/include tree.

* configure recognizes this as gcc but not Clang, which may or may
not be fine:
...
checking whether we are using the GNU C compiler... yes
...
checking whether xlclang is Clang... no
...
This doesn't seem to break anything, but it struck me as odd.
configure seems to pick a sane set of compiler options anyway.

Interestingly, xlclang shows the same failure with the pre-19fa97731
versions of timetz_zone/timetz_izone as plain xlc does.  I guess
this is not so astonishing since they presumably share the same
codegen backend.  But maybe somebody ought to file a bug with IBM?

regards, tom lane

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-17 Thread Amit Kapila

On Wed, Oct 18, 2023 at 7:31 AM Peter Smith  wrote:
>
> ==
> src/bin/pg_upgrade/check.c
>
> 0.
> +check_old_cluster_for_valid_slots(bool live_check)
> +{
> + char output_path[MAXPGPATH];
> + FILE*script = NULL;
> +
> + prep_status("Checking for valid logical replication slots");
> +
> + snprintf(output_path, sizeof(output_path), "%s/%s",
> + log_opts.basedir,
> + "invalid_logical_relication_slots.txt");
>
> 0a
> typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> ~
>
> 0b.
> Since the non-upgradable slots are not strictly "invalid", is this an
> appropriate filename for the bad ones?
>
> But I don't have very good alternatives. Maybe:
> - non_upgradable_logical_replication_slots.txt
> - problem_logical_replication_slots.txt
>

I prefer the current naming. I think 'invalid' here indicates both
types of slots that are invalidated by the checkpointer and those that
have pending WAL to be consumed.

> ==
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
>
> 1.
> +# --
> +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> +#
> +# There are two requirements for GUCs - wal_level and max_replication_slots,
> +# but only max_replication_slots will be tested here. This is because to
> +# reduce the execution time of the test.
>
>
> SUGGESTION
> # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> #
> # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> # reduce the test execution time, only 'max_replication_slots' is tested here.
>

I think we don't need the second part of the comment: "Two GUCs ...".
Ideally, we should test each parameter's invalid value but that could
be costly, so I think it is okay to test a few of them.

-- 
With Regards,
Amit Kapila.

Re: Some performance degradation in REL_16 vs REL_15

2023-10-17 Thread 邱宇航

I wrote a script and test on branch REL_[10-16]_STABLE, and do see performance 
drop in REL_13_STABLE, which is about 1~2%.

scale   round   10  11  12  13  14  15  16
1   1   7922.2  8018.3  8102.8  7838.3  7829.2  7870.0  7846.1
2   7922.4  7923.5  8090.3  7887.7  7912.4  7815.2  7865.6
3   7937.6  7964.9  8012.8  7918.5  7879.4  7786.4  7981.1
4   8000.4  7959.5  8141.1  7886.3  7840.9  7863.5  8022.4
5   7921.8  7945.5  8005.2  7993.7  7957.0  7803.8  7899.8
6   7893.8  7895.1  8017.2  7879.8  7880.9  7911.4  7909.2
7   7879.3  7853.5  8071.7  7956.2  7876.7  7863.3  7986.3
8   7980.5  7964.1  8119.2  8015.2  7877.6  7784.9  7923.6
9   8083.9  7946.4  7960.3  7913.9  7924.6  7867.7  7928.6
10  7971.2  7991.8  7999.5  7812.4  7824.3  7831.0  7953.4
AVG 7951.3  7946.3  8052.0  7910.2  7880.3  7839.7  7931.6
MED 7930.0  7952.9  8044.5  7900.8  7878.5  7847.1  7926.1
10  1   41221.5 41394.8 40926.8 40566.6 41661.3 40511.9 40961.8
2   40974.0 40697.9 40842.4 40269.2 41127.7 40795.5 40814.9
3   41453.5 41426.4 41066.2 40890.9 41018.6 40897.3 40891.7
4   41691.9 40294.9 41189.8 40873.8 41539.7 40943.2 40643.8
5   40843.4 40855.5 41243.8 40351.3 40863.2 40839.6 40795.5
6   40969.3 40897.9 41380.8 40734.7 41269.3 41301.0 41061.0
7   40981.1 41119.5 41158.0 40834.6 40967.1 40790.6 41061.6
8   41006.4 41205.9 40740.3 40978.7 40742.4 40951.6 41242.1
9   41089.9 41129.7 40648.3 40622.1 40782.0 40460.5 40877.9
10  41280.3 41462.7 41316.4 40728.0 40983.9 40747.0 40964.6
AVG 41151.1 41048.5 41051.3 40685.0 41095.5 40823.8 40931.5
MED 41048.2 41124.6 41112.1 40731.3 41001.3 40817.6 40926.7
100 1   43429.0 43190.2 44099.3 43941.5 43883.3 44215.0 44604.9
2   43281.7 43795.2 44963.6 44331.5 43559.7 43571.5 43403.9
3   43749.0 43614.1 44616.7 43759.5 43617.8 43530.3 43362.4
4   43362.0 43197.3 44296.7 43692.4 42020.5 43607.3 43081.8
5   43373.4 43288.0 44240.9 43795.0 43630.6 43576.7 43512.0
6   43637.0 43385.2 45130.1 43792.5 43635.4 43905.2 43371.2
7   43621.2 43474.2 43735.0 43592.2 43889.7 43947.7 43369.8
8   43351.0 43937.5 44285.6 43877.2 43771.1 43879.1 43680.4
9   43481.3 43700.5 44119.9 43786.9 43440.8 44083.1 43563.2
10  43238.7 43559.5 44310.8 43406.0 44306.6 43376.3 43242.7
AVG 43452.4 43514.2 44379.9 43797.5 43575.6 43769.2 43519.2
MED 43401.2 43516.8 44291.2 43789.7 43633.0 43743.2 43387.5

The script looks like:
initdb data >/dev/null 2>&1 #initdb on every round
pg_ctl -D data -l logfile start >/dev/null 2>&1 #start without changing any 
setting
pgbench -i postgres $scale >/dev/null 2>&1
sleep 1 >/dev/null 2>&1
pgbench -c20 -T10 -j8

And here is the pg_config output:
...
CONFIGURE =  '--enable-debug' '--prefix=/home/postgres/base' '--enable-depend' 
'PKG_CONFIG_PATH=/usr/local/lib64/pkgconfig::/usr/lib/pkgconfig'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type 
-Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g 
-O2
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed 
-Wl,-rpath,'/home/postgres/base/lib',--enable-new-dtags
LDFLAGS_EX = 
LDFLAGS_SL = 
LIBS = -lpgcommon -lpgport -lz -lreadline -lpthread -lrt -ldl -lm 
VERSION = PostgreSQL 16.0

—-
Yuhang Qiu

Re: pg_logical_emit_message() misses a XLogFlush()

2023-10-17 Thread Michael Paquier

On Tue, Oct 17, 2023 at 11:57:33AM +0530, Amit Kapila wrote:
> LGTM.

Thanks, I've applied that, then.
--
Michael


signature.asc
Description: PGP signature

Re: Some performance degradation in REL_16 vs REL_15

2023-10-17 Thread Andres Freund

Hi,

On 2023-10-16 11:04:25 +0300, Anton A. Melnikov wrote:
> On 13.10.2023 05:05, Andres Freund wrote:
> > Could you provide a bit more details about how you ran the benchmark?  The
> > reason I am asking is that ~330 TPS is pretty slow for -c20.  Even on 
> > spinning
> > rust and using the default settings, I get considerably higher results.
> > 
> > Oh - I do get results closer to yours if I use pgbench scale 1, causing a 
> > lot
> > of row level contention. What scale did you use?
> 
> 
> I use default scale of 1.

That means you're largely going to be bottlenecked due to row level
contention. For read/write pgbench you normally want to use a scale that's
bigger than the client count, best by at least 2x.

Have you built postgres with assertions enabled or such?

What is the server configuration for both versions?

> And run the command sequence:
> $pgbench -i bench
> $sleep 1
> $pgbench -c20 -T10 -j8

I assume you also specify the database name here, given you specified it for
pgbench -i?

As you're not doing a new initdb here, the state of the cluster will
substantially depend on what has run before. This can matter substantially
because a cluster with prior substantial write activity will already have
initialized WAL files and can reuse them cheaply, whereas one without that
activity needs to initialize new files.  Although that matters a bit less with
scale 1, because there's just not a whole lot of writes.

At the very least you should trigger a checkpoint before or after pgbench
-i. The performance between having a checkpoint during the pgbench run or not
is substantially different, and if you're not triggering one explicitly, it'll
be up to random chance whether it happens during the run or not. It's less
important if you run pgbench for an extended time, but if you do it just for
10s...

E.g. on my workstation, if there's no checkpoint, I get around 633 TPS across
repeated runs, but if there's a checkpoint between pgbench -i and the pgbench
run, it's around 615 TPS.

Greetings,

Andres Freund

1 2 >

1 - 100 of 116 matches

Mail list logo