date:20240312

Re: Add new error_action COPY ON_ERROR "log"

2024-03-12 Thread Bharath Rupireddy

On Tue, Mar 12, 2024 at 12:22 PM Michael Paquier  wrote:
>
> On Mon, Mar 11, 2024 at 06:00:00PM +0530, Bharath Rupireddy wrote:
> > Please see the attached v6 patch set.
>
> I am tempted to tweak a few things in the docs, the comments and the
> tests (particularly adding more patterns for tuples that fail on
> conversion while it's clear that there are not enough attributes after
> the incorrect values), but that looks roughly OK.

+1. But, do you want to add them now or later as a separate
patch/discussion altogether?

> Wouldn't it be better to squash the patches together, by the way?

I guess not. They are two different things IMV.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Bertrand Drouvot

Hi,

On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila  wrote:
> >
> > You might want to consider its interaction with sync slots on standby.
> > Say, there is no activity on slots in terms of processing the changes
> > for slots. Now, we won't perform sync of such slots on standby showing
> > them inactive as per your new criteria where as same slots could still
> > be valid on primary as the walsender is still active. This may be more
> > of a theoretical point as in running system there will probably be
> > some activity but I think this needs some thougths.
> 
> I believe the xmin and catalog_xmin of the sync slots on the standby
> keep advancing depending on the slots on the primary, no? If yes, the
> XID age based invalidation shouldn't be a problem.
> 
> I believe there are no walsenders started for the sync slots on the
> standbys, right? If yes, the inactive timeout based invalidation also
> shouldn't be a problem. Because, the inactive timeouts for a slot are
> tracked only for walsenders because they are the ones that typically
> hold replication slots for longer durations and for real replication
> use. We did a similar thing in a recent commit [1].
> 
> Is my understanding right? Do you still see any problems with it?

Would that make sense to "simply" discard/prevent those kind of invalidations
for "synced" slot on standby? I mean, do they make sense given the fact that
those slots are not usable until the standby is promoted?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: DROP DATABASE is interruptible

2024-03-12 Thread Alexander Lakhin


Hi,

13.07.2023 23:52, Andres Freund wrote:


Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
part. But also a lot of other conflicts in tests... Took me 5-6 hours or
so.
But I now finally pushed the fixes. Hope the buildfarm agrees with it...

Thanks for the review!


I've discovered that the test 037_invalid_database, introduced with
c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with
debug_discard_caches = 1 set via TEMP_CONFIG:
echo "debug_discard_caches = 1" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/ 
PROVE_TESTS="t/037*"
# +++ tap check in src/test/recovery +++
[09:05:48] t/037_invalid_database.pl .. 6/?

regress_log_037_invalid_database ends with:
[09:05:51.622](0.021s) # issuing query via background psql:
#   CREATE DATABASE regression_invalid_interrupt;
#   BEGIN;
#   LOCK pg_tablespace;
#   PREPARE TRANSACTION 'lock_tblspc';
[09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion

I see two backends waiting:
law  2420132 2420108  0 09:05 ?    00:00:00 postgres: node: law 
postgres [local] DROP DATABASE waiting
law  2420135 2420108  0 09:05 ?    00:00:00 postgres: node: law 
postgres [local] startup waiting

and the latter's stack trace:
#0  0x7f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80, 
nevents=1) at latch.c:1570
#2  0x563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1, 
wait_event_info=50331648) at latch.c:1516
#3  0x563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at 
latch.c:538
#4  0x563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 ) at 
proc.c:1339

#5  0x563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80, 
owner=0x563c40ea5af8) at lock.c:1816
#6  0x563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false, 
reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080

#7  0x563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116
#8  0x563c3f537aff in relation_open (relationId=1213, lockmode=1) at 
relation.c:55
#9  0x563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44
#10 0x563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at 
catcache.c:980
#11 0x563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80, 
touch_index=true) at catcache.c:1083
#12 0x563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184
#13 0x563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317
#14 0x563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law", 
useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177

#15 0x563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres", 
username=0x563c40e53fe8 "law") at postgres.c:4229
#16 0x563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475

It looks like no new backend can be started due to the pg_tablespace lock,
when a new relcache file is needed during the backend initialization.

Best regards,
Alexander

Re: Built-in CTYPE provider

2024-03-12 Thread Peter Eisentraut


On 08.03.24 02:00, Jeff Davis wrote:

And here's v22 (I didn't post v21).

I committed Unicode property tables and functions, and the simple case
mapping. I separated out the full case mapping changes (based on
SpecialCasing.txt) into patch 0006.



0002: Basic builtin collation provider that only supports "C".


Overall, this patch looks sound.

In the documentation, let's make the list of locale providers an actual 
list instead of a sequence of s.


We had some discussion on initdb option --builtin-locale and whether it 
should be something more general.  I'm ok with leaving it like this for 
now and maybe consider as an "open item" for PG17.


In

errmsg("parameter \"locale\" must be specified")

make "locale" a placeholder.  (See commit 36a14afc076).

It seems the builtin provider accepts both "C" and "POSIX" as locale
names, but the documentation says it must be "C".  Maybe we don't need
to accept "POSIX"?  (Seeing that there are no plans for "POSIX.UTF-8",
maybe we just ignore the "POSIX" spelling altogether?)

Speaking of which, the code in postinit.c is inconsistent in that 
respect with builtin_validate_locale().  Shouldn't postinit.c use

builtin_validate_locale(), to keep it consistent?

Or, there could be a general function that accepts a locale provider and 
a locale string and validates everything together?


In initdb.c, this message

printf(_("The database cluster will be initialized with no locale.\n"));

sounds a bit confusing.  I think it's ok to show "C" as a locale.  I'm
not sure we need to change the logic here.

Also in initdb.c, this message

pg_fatal("locale must be specified unless provider is libc");

should be flipped around, like

locale must be specified if provider is %s

In pg_dump.c, dumpDatabase(), there are some new warning messages that
are not specifically about the builtin provider.  Are those existing
deficiencies?  It's not clear to me.

What are the changes in the pg_upgrade test about?  Maybe explain the
scenario it is trying to test briefly?



0004: Inline some UTF-8 functions to improve performance


Makes sense that inlining can be effective here.  But why aren't you 
just inlining the existing function pg_utf_mblen()?  Now we have two 
functions that do the same thing.  And the comment at pg_utf_mblen() is 
removed completely, so it's not clear anymore why it exists.




0005: Add a unicode_strtitle() function and move the implementation for
the builtin provider out of formatting.c.


In the recent discussion you had expression some uncertainty about the 
detailed semantics of this.  INITCAP() was copied from Oracle, so we 
could check there for reference, too.  Or we go with full Unicode 
semantics.  I'm not clear on all the differences and tradeoffs, if there 
are any.  In any case, it would be good if there were documentation or a 
comment that somehow wrote down the resolution of this.

Re: Transaction timeout

2024-03-12 Thread Andrey M. Borodin



> On 11 Mar 2024, at 16:18, Alexander Korotkov  wrote:
> 
> I think if checking psql stderr is problematic, checking just logs is
> fine.  Could we wait for the relevant log messages one by one with
> $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do?

PFA version with $node->wait_for_log()

Best regards, Andrey Borodin.


v4-0001-Add-timeouts-TAP-tests.patch
Description: Binary data

Re: Disconnect if socket cannot be put into non-blocking mode

2024-03-12 Thread Heikki Linnakangas


On 11/03/2024 16:44, Heikki Linnakangas wrote:

While self-reviewing my "Refactoring backend fork+exec code" patches, I
noticed this in pq_init():


/*
 * In backends (as soon as forked) we operate the underlying socket in
 * nonblocking mode and use latches to implement blocking semantics if
 * needed. That allows us to provide safely interruptible reads and
 * writes.
 *
 * Use COMMERROR on failure, because ERROR would try to send the error 
to
 * the client, which might require changing the mode again, leading to
 * infinite recursion.
 */
#ifndef WIN32
if (!pg_set_noblock(MyProcPort->sock))
ereport(COMMERROR,
(errmsg("could not set socket to nonblocking mode: 
%m")));
#endif

#ifndef WIN32

/* Don't give the socket to any subprograms we execute. */
if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0)
elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
#endif


Using COMMERROR here seems bogus. Firstly, if there was a problem with
recursion, surely the elog(FATAL) that follows would also be wrong. But
more seriously, it's not cool to continue using the connection as if
everything is OK, if we fail to put it into non-blocking mode. We should
disconnect. (COMMERROR merely logs the error, it does not bail out like
ERROR does)

Barring objections, I'll commit and backpatch the attached to fix that.


Committed.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

2024-03-12 Thread Yugo NAGATA

On Sat, 9 Mar 2024 08:50:28 -0600
Nathan Bossart  wrote:

> On Sat, Mar 09, 2024 at 11:57:18AM +0900, Yugo NAGATA wrote:
> > On Fri, 8 Mar 2024 16:17:58 -0600
> > Nathan Bossart  wrote:
> >> Is this guaranteed to be TOASTed for all possible page sizes?
> > 
> > Should we use block_size?
> > 
> >  SHOW block_size \gset
> >  INSERT INTO test_chunk_id(v1,v2)
> >   VALUES (repeat('x', 1), repeat('x', (:block_size / 4)));
> > 
> > I think this will work in various page sizes. 
> 
> WFM
> 
> > +SHOW block_size; \gset
> > + block_size 
> > +
> > + 8192
> > +(1 row)
> 
> I think we need to remove the ';' so that the output of the query is not
> saved in the ".out" file.  With that change, this test passes when Postgres
> is built with --with-blocksize=32.  However, many other unrelated tests
> begin failing, so I guess this fix isn't tremendously important.

I rewrote the patch to use current_setting('block_size') instead of SHOW
and \gset as other tests do. Although some tests are failing with block_size=32,
I wonder it is a bit better to use "block_size" instead of the constant
to make the test more general to some extent.

Regards,
Yugo Nagata

> 
> -- 
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com


-- 
Yugo NAGATA 
>From 5b3be1ca6f8d8bafc1d9ce7bea252f364c9c09a9 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Wed, 29 Mar 2023 09:59:25 +0900
Subject: [PATCH v9] Add pg_column_toast_chunk_id function

This function returns the chunk_id of an on-disk TOASTed value, or
NULL if the value is un-TOASTed or not on disk. This enables users to
know which columns are actually TOASTed. This function is also useful
to identify a problematic row when an error like
  "ERROR:  unexpected chunk number ... (expected ...) for toast value"
occurs.
---
 doc/src/sgml/func.sgml   | 17 
 src/backend/utils/adt/varlena.c  | 41 
 src/include/catalog/pg_proc.dat  |  3 ++
 src/test/regress/expected/misc_functions.out | 21 ++
 src/test/regress/sql/misc_functions.sql  | 23 +++
 5 files changed, 105 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0bb7aeb40e..3169e2aeb7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28552,6 +28552,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

   
 
+  
+   
+
+ pg_column_toast_chunk_id
+
+pg_column_toast_chunk_id ( "any" )
+oid
+   
+   
+Shows the chunk_id of an on-disk
+TOASTed value. Returns NULL
+if the value is un-TOASTed or not on-disk.
+See  for details about
+TOAST.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 543afb66e5..84d36781a4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5105,6 +5105,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
 
+/*
+ * Return the chunk_id of the on-disk TOASTed value.
+ * Return NULL if the value is unTOASTed or not on disk.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+	int			typlen;
+	struct varlena *attr;
+	struct varatt_external toast_pointer;
+
+	/* On first call, get the input type's typlen, and save at *fn_extra */
+	if (fcinfo->flinfo->fn_extra == NULL)
+	{
+		/* Lookup the datatype of the supplied argument */
+		Oid			argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+		typlen = get_typlen(argtypeid);
+		if (typlen == 0)		/* should not happen */
+			elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+		fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+	  sizeof(int));
+		*((int *) fcinfo->flinfo->fn_extra) = typlen;
+	}
+	else
+		typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+	if (typlen != -1)
+		PG_RETURN_NULL();
+
+	attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+		PG_RETURN_NULL();
+
+	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+	PG_RETURN_OID(toast_pointer.va_valueid);
+}
+
 /*
  * string_agg - Concatenates values and returns string.
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 291ed876fc..443ca854a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7447,6 +7447,9 @@
 { oid => '2121', descr => 'compression method for the compressed datum',
   proname => 'pg_column_compression', provolatile => 's', prorettype => 'text',
   proargtypes => 'any', prosrc => 'pg_column_compression' },
+{ oid => '8393', descr => 'chunk ID of on-disk TOASTed value',
+  proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid',
+  proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' },
 { oid => '2322',
   descr => 'total disk space usage for the specified ta

Re: Statistics Import and Export

2024-03-12 Thread Stephen Frost

Greetings,

* Corey Huinker (corey.huin...@gmail.com) wrote:
> > > One persistent problem is that there is no _safe equivalent to ARRAY_IN,
> > so
> > > that can always fail on us, though it should only do so if the string
> > > passed in wasn't a valid array input format, or the values in the array
> > > can't coerce to the attribute's basetype.
> >
> > That would happen before we even get to being called and there's not
> > much to do about it anyway.
> 
> Not sure I follow you here. the ARRAY_IN function calls happen once for
> every non-null stavaluesN parameter, and it's done inside the function
> because the result type could be the base type for a domain/array type, or
> could be the type itself. I suppose we could move that determination to the
> caller, but then we'd need to call get_base_element_type() inside a client,
> and that seems wrong if it's even possible.

Ah, yeah, ok, I see what you're saying here and sure, there's a risk
those might ERROR too, but that's outright invalid data then as opposed
to a NULL getting passed in.

> > > Or one compound command
> > >
> > > SELECT pg_set_relation_stats(t.oid, ...)
> > >  pg_set_attribute_stats(t.oid, 'id'::name, ...),
> > >  pg_set_attribute_stats(t.oid, 'last_name'::name, ...),
> > >  ...
> > > FROM (VALUES('foo.bar'::regclass)) AS t(oid);
> > >
> > > The second one has the feature that if any one attribute fails, then the
> > > whole update fails, except, of course, for the in-place update of
> > pg_class.
> > > This avoids having an explicit transaction block, but we could get that
> > > back by having restore wrap the list of commands in a transaction block
> > > (and adding the explicit lock commands) when it is safe to do so.
> >
> > Hm, I like this approach as it should essentially give us the
> > transaction block we had been talking about wanting but without needing
> > to explicitly do a begin/commit, which would add in some annoying
> > complications.  This would hopefully also reduce the locking concern
> > mentioned previously, since we'd get the lock needed in the first
> > function call and then the others would be able to just see that we've
> > already got the lock pretty quickly.
> 
> True, we'd get the lock needed in the first function call, but wouldn't we
> also release that very lock before the subsequent call? Obviously we'd be
> shrinking the window in which another process could get in line and take a
> superior lock, and the universe of other processes that would even want a
> lock that blocks us is nil in the case of an upgrade, identical to existing
> behavior in the case of an FDW ANALYZE, and perfectly fine in the case of
> someone tinkering with stats.

No, we should be keeping the lock until the end of the transaction
(which in this case would be just the one statement, but it would be the
whole statement and all of the calls in it).  See analyze.c:268 or
so, where we call relation_close(onerel, NoLock); meaning we're closing
the relation but we're *not* releasing the lock on it- it'll get
released at the end of the transaction.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Spurious pgstat_drop_replslot() call

2024-03-12 Thread Alexander Lakhin


Hello Bertrand and Michael,

12.03.2024 09:17, Bertrand Drouvot wrote:



May I
suggest the attached additions with LWLockHeldByMeInMode to make sure
that the stats are dropped and created while we hold
ReplicationSlotAllocationLock?

Yeah, makes fully sense and looks good to me.


Sorry for a bit off-topic, but I've remembered an anomaly with a similar
assert:
https://www.postgresql.org/message-id/17947-b9554521ad963c9c%40postgresql.org

Maybe you would find it worth considering while working in this area...
(I've just run that reproducer on b36fbd9f8 and confirmed that the
assertion failure is still here.)

Best regards,
Alexander

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Alvaro Herrera

Here's a last one for the cfbot.

I have a question about this one

int
PQcancelStart(PGcancelConn *cancelConn)
{
[...]
if (cancelConn->conn.status != CONNECTION_ALLOCATED)
{
libpq_append_conn_error(&cancelConn->conn,
"cancel request is already being sent on this 
connection");
cancelConn->conn.status = CONNECTION_BAD;
return 0;
}


If we do this and we see conn.status is not ALLOCATED, meaning a cancel
is already ongoing, shouldn't we leave conn.status alone instead of
changing to CONNECTION_BAD?  I mean, we shouldn't be juggling the elbow
of whoever's doing that, should we?  Maybe just add the error message
and return 0?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)
>From fc0cbf0a6184d374e12e051f88f9f8eef7cc30d9 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 12 Mar 2024 10:09:25 +0100
Subject: [PATCH v34 1/2] libpq: Add encrypted and non-blocking query
 cancellation routines

The existing PQcancel API uses blocking IO, which makes PQcancel
impossible to use in an event loop based codebase without blocking the
event loop until the call returns.  It also doesn't encrypt the
connection over which the cancel request is sent, even when the original
connection required encryption.

This commit adds a PQcancelConn struct and assorted functions, which
provide a better mechanism of sending cancel requests; in particular all
the encryption used in the original connection are also used in the
cancel connection.  The main entry points are:

- PQcancelCreate creates the PQcancelConn based on the original
  connection (but does not establish an actual connection).
- PQcancelStart can be used to initiate non-blocking cancel requests,
  using encryption if the original connection did so, which must be
  pumped using
- PQcancelPoll.
- PQcancelReset puts a PQcancelConn back in state so that it can be
  reused to send a new cancel request to the same connection.
- PQcancelBlocking is a simpler-to-use blocking API that still uses
  encryption.

Additional functions are
 - PQcancelStatus, mimicks PQstatus;
 - PQcancelSocket, mimicks PQcancelSocket;
 - PQcancelErrorMessage, mimicks PQerrorMessage;
 - PQcancelFinish, mimicks PQfinish.

Author: Jelte Fennema-Nio 
Reviewed-by: Denis Laxalde 
Discussion: https://postgr.es/m/am5pr83mb0178d3b31ca1b6ec4a8ecc42f7...@am5pr83mb0178.eurprd83.prod.outlook.com
---
 doc/src/sgml/libpq.sgml   | 465 --
 src/interfaces/libpq/exports.txt  |   9 +
 src/interfaces/libpq/fe-cancel.c  | 297 ++-
 src/interfaces/libpq/fe-connect.c | 129 -
 src/interfaces/libpq/libpq-fe.h   |  31 +-
 src/interfaces/libpq/libpq-int.h  |   5 +
 .../modules/libpq_pipeline/libpq_pipeline.c   | 125 +
 src/tools/pgindent/typedefs.list  |   1 +
 8 files changed, 1015 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index a2bbf33d02..373d0dc322 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -265,7 +265,7 @@ PGconn *PQsetdb(char *pghost,
 
  PQconnectStartParamsPQconnectStartParams
  PQconnectStartPQconnectStart
- PQconnectPollPQconnectPoll
+ PQconnectPollPQconnectPoll
  
   
nonblocking connection
@@ -5287,7 +5287,7 @@ int PQisBusy(PGconn *conn);
/
can also attempt to cancel a command that is still being processed
by the server; see .  But regardless of
-   the return value of , the application
+   the return value of , the application
must continue with the normal result-reading sequence using
.  A successful cancellation will
simply cause the command to terminate sooner than it would have
@@ -6034,10 +6034,402 @@ int PQsetSingleRowMode(PGconn *conn);
SQL command
   
 
-  
-   A client application can request cancellation of a command that is
-   still being processed by the server, using the functions described in
-   this section.
+  
+   Functions for Sending Cancel Requests
+   
+
+ PQcancelCreatePQcancelCreate
+
+ 
+  
+   Prepares a connection over which a cancel request can be sent.
+
+PGcancelConn *PQcancelCreate(PGconn *conn);
+
+  
+
+  
+creates a
+   PGcancelConnPGcancelConn
+   object, but it won't instantly start sending a cancel request over this
+   connection. A cancel request can be sent over this connection in a
+   blocking manner using  and in a
+   non-blocking manner using .
+   The return value can be passed to 
+   to check if the PGcancelConn object was
+   created successfully. The PGcancelConn object
+   is an opaque structure that is not meant to be accessed directly by the
+   application. This PGcancelConn object can be
+   used to cancel the que

Re: remaining sql/json patches

2024-03-12 Thread Himanshu Upadhyaya

Hi,

wanted to share the below case:

‘postgres[146443]=#’SELECT JSON_EXISTS(jsonb '{"customer_name": "test",
"salary":1000, "department_id":1}', '$.* ? (@== $dept_id && @ == $sal)'
PASSING 1000 AS sal, 1 as dept_id);
 json_exists
-
 f
(1 row)

isn't it supposed to return "true" as json in input is matching with both
the condition dept_id and salary?

-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Jelte Fennema-Nio

On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera  wrote:
> If we do this and we see conn.status is not ALLOCATED, meaning a cancel
> is already ongoing, shouldn't we leave conn.status alone instead of
> changing to CONNECTION_BAD?  I mean, we shouldn't be juggling the elbow
> of whoever's doing that, should we?  Maybe just add the error message
> and return 0?

I'd rather fail as hard as possible when someone is using the API
wrongly. Not doing so is bound to cause confusion imho. e.g. if the
state is still CONNECTION_OK because the user forgot to call
PQcancelReset then keeping the connection status "as is" might seem as
if the cancel request succeeded even though nothing happened. So if
the user uses the API incorrectly then I'd rather use all the avenues
possible to indicate that there was an error. Especially since in all
other cases if PQcancelStart returns false CONNECTION_BAD is the
status, and this in turn means that PQconnectPoll will return
PGRES_POLLING_FAILED. So I doubt people will always check the actual
return value of the function to check if an error happened. They might
check PQcancelStatus or PQconnectPoll instead, because that integrates
easier with the rest of their code.

Re: Add bump memory context type and use it for tuplesorts

2024-03-12 Thread John Naylor

On Tue, Mar 12, 2024 at 6:41 AM David Rowley  wrote:
> Thanks for trying this out.  I didn't check if the performance was
> susceptible to the memory size before the reset.  It certainly would
> be once the allocation crosses some critical threshold of CPU cache
> size, but probably it will also be to some extent regarding the number
> of actual mallocs that are required underneath.

I neglected to mention it, but the numbers I chose did have the L2/L3
cache in mind, but the reset frequency didn't seem to make much
difference.

> I see there's some discussion of bump in [1].  Do you still have a
> valid use case for bump for performance/memory usage reasons?

Yeah, that was part of my motivation for helping test, although my
interest is in saving memory in cases of lots of small allocations. It
might help if I make this a little more concrete, so I wrote a
quick-and-dirty function to measure the bytes used by the proposed TID
store and the vacuum's current array.

Using bitmaps really shines with a high number of offsets per block,
e.g. with about a million sequential blocks, and 49 offsets per block
(last parameter is a bound):

select * from tidstore_memory(0,1*1001*1000, 1,50);
 array_mem |  ts_mem
---+--
 294294000 | 42008576

The break-even point with this scenario is around 7 offsets per block:

select * from tidstore_memory(0,1*1001*1000, 1,8);
 array_mem |  ts_mem
---+--
  42042000 | 42008576

Below that, the array gets smaller, but the bitmap just has more empty
space. Here, 8 million bytes are used by the chunk header in bitmap
allocations, so the bump context would help there (I haven't actually
tried). Of course, the best allocation is no allocation at all, and I
have a draft patch to store up to 3 offsets in the last-level node's
pointer array, so for 2 or 3 offsets per block we're smaller than the
array again:

select * from tidstore_memory(0,1*1001*1000, 1,4);
 array_mem | ts_mem
---+-
  18018000 | 8462336

Sequential blocks are not the worst case scenario for memory use, but
this gives an idea of what's involved. So, with aset, on average I
still expect to use quite a bit less memory, with some corner cases
that use more. The bump context would be some extra insurance to
reduce those corner cases, where there are a large number of blocks in
play.

Re: Reports on obsolete Postgres versions

2024-03-12 Thread Daniel Gustafsson

> On 12 Mar 2024, at 02:37, Nathan Bossart  wrote:
> 
> On Mon, Mar 11, 2024 at 05:17:13PM -0400, Bruce Momjian wrote:
>> On Mon, Mar 11, 2024 at 04:12:04PM -0500, Nathan Bossart wrote:
>>> I've read that the use of the term "minor release" can be confusing.  While
>>> the versioning page clearly describes what is eligible for a minor release,
>>> not everyone reads it, so I suspect that many folks think there are new
>>> features, etc. in minor releases.  I think a "minor release" of Postgres is
>>> more similar to what other projects would call a "patch version."
>> 
>> Well, we do say:
>> 
>>  While upgrading will always contain some level of risk, PostgreSQL
>>  minor releases fix only frequently-encountered bugs, security issues,
>>  and data corruption problems to reduce the risk associated with
>>  upgrading. For minor releases, the community considers not upgrading to
>>  be riskier than upgrading. 
>> 
>> but that is far down the page.  Do we need to improve this?
> 
> I think making that note more visible would certainly be an improvement.

We have this almost at the top of the page, which IMHO isn't a very good
description about what a minor version is:

Each major version receives bug fixes and, if need be, security fixes
that are released at least once every three months in what we call a
"minor release."

Maybe we can rewrite that sentence to properly document what a minor is (bug
fixes *and* security fixes) with a small blurb about the upgrade risk?

(Adding Jonathan in CC: who is good at website copy).

--
Daniel Gustafsson

Re: Reports on obsolete Postgres versions

2024-03-12 Thread Michael Banck

Hi,

On Mon, Mar 11, 2024 at 05:17:13PM -0400, Bruce Momjian wrote:
> On Mon, Mar 11, 2024 at 04:12:04PM -0500, Nathan Bossart wrote:
> > On Mon, Mar 11, 2024 at 04:37:49PM -0400, Bruce Momjian wrote:
> > >   https://www.postgresql.org/support/versioning/
> > > 
> > > This web page should correct the idea that "upgrades are more risky than
> > > staying with existing versions".  Is there more we can do?  Should we
> > > have a more consistent response for such reporters?
> > 
> > I've read that the use of the term "minor release" can be confusing.  While
> > the versioning page clearly describes what is eligible for a minor release,
> > not everyone reads it, so I suspect that many folks think there are new
> > features, etc. in minor releases.  I think a "minor release" of Postgres is
> > more similar to what other projects would call a "patch version."
> 
> Well, we do say:
> 
>   While upgrading will always contain some level of risk, PostgreSQL
>   minor releases fix only frequently-encountered bugs, security issues,
>   and data corruption problems to reduce the risk associated with
>   upgrading. For minor releases, the community considers not upgrading to
>   be riskier than upgrading. 
> 
> but that is far down the page.  Do we need to improve this?

I liked the statement from Laurenz a while ago on his blog
(paraphrased): "Upgrading to the latest patch release does not require
application testing or recertification". I am not sure we want to put
that into the official page (or maybe tone down/qualify it a bit), but I
think a lot of users stay on older minor versions because they dread
their internal testing policies.

The other thing that could maybe be made a bit better is the fantastic
patch release schedule, which however is buried in the "developer
roadmap". I can see how this was useful years ago, but I think this page
should be moved to the end-user part of the website, and maybe (also)
integrated into the support/versioning page?

Michael

Re: speed up a logical replica setup

2024-03-12 Thread Amit Kapila

On Thu, Mar 7, 2024 at 10:44 PM Tomas Vondra
 wrote:
>
> 4) Is FOR ALL TABLES a good idea?
>
> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> sure it won't work for a number of use cases. I know large databases
> it's common to create "work tables" (not necessarily temporary) as part
> of a batch job, but there's no need to replicate those tables.
>
> AFAIK that'd break this FOR ALL TABLES publication, because the tables
> will qualify for replication, but won't be present on the subscriber. Or
> did I miss something?
>

As the subscriber is created from standby, all the tables should be
present at least initially during and after creating the subscriber.
Users are later free to modify the publications/subscriptions.

> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> it may be an acceptable limitation, but maybe it'd be good to also
> support restricting which tables should be replicated (e.g. blacklist or
> whitelist based on table/schema name?).
>

This would be useful, but OTOH could also be enhanced in a later
version unless we think it is a must for the first version.

-- 
With Regards,
Amit Kapila.

Re: Properly pathify the union planner

2024-03-12 Thread David Rowley

On Mon, 11 Mar 2024 at 19:56, Richard Guo  wrote:
> * There are cases where the setop_pathkeys of a subquery does not match
> the union_pathkeys generated in generate_union_paths() for sorting by
> the whole target list.  In such cases, even if we have explicitly sorted
> the paths of the subquery to meet the ordering required for its
> setop_pathkeys, we cannot find appropriate ordered paths for
> MergeAppend.  For instance,
>
> explain (costs off)
> (select a, b from t t1 where a = b) UNION (select a, b from t t2 where a = b);
> QUERY PLAN
> ---
>  Unique
>->  Sort
>  Sort Key: t1.a, t1.b
>  ->  Append
>->  Index Only Scan using t_a_b_idx on t t1
>  Filter: (a = b)
>->  Index Only Scan using t_a_b_idx on t t2
>  Filter: (a = b)
> (8 rows)
>
> (Assume t_a_b_idx is a btree index on t(a, b))
>
> In this query, the setop_pathkeys of the subqueries includes only one
> PathKey because 'a' and 'b' are in the same EC inside the subqueries,
> while the union_pathkeys of the whole query includes two PathKeys, one
> for each target entry.  After we convert the setop_pathkeys to outer
> representation, we'd notice that it does not match union_pathkeys.
> Consequently, we are unable to recognize that the index scan paths are
> already appropriately sorted, leading us to miss the opportunity to
> utilize MergeAppend.
>
> Not sure if this case is common enough to be worth paying attention to.

I've spent almost all day looking into this, which is just enough work
to satisfy myself this *is* future work rather than v17 work.

The reason I feel this is future work rather than work for this patch
is that this is already a limitation of subqueries in general and it's
not unique to union child queries.

For example:

create table ab(a int, b int, primary key(a,b));
insert into ab select x,y from generate_series(1,100)x,generate_series(1,100)y;
vacuum analyze ab;
explain select * from (select a,b from ab where a = 1 order by a,b
limit 10) order by a,b;

The current plan for this is:

   QUERY PLAN
-
 Sort
   Sort Key: ab.a, ab.b
   ->  Limit
 ->  Index Only Scan using ab_pkey on ab
   Index Cond: (a = 1)
(5 rows)

The additional sort isn't required but is added because the outer
query requires the pathkeys {a,b} and the inner query only has the
pathkey {b}. {a} is removed due to it being redundant because of the
const member.   The outer query does not know about the redundant
pathkeys so think the subquery is only sorted by {b} therefore adds
the sort on "a", "b".

The attached 0001 patch (renamed as .txt so it's ignored by the CFbot)
adjusts convert_subquery_pathkeys() to have it look a bit deeper and
try harder to match the path to the outer query's query_pathkeys.
After patching with that, the plan becomes:

QUERY PLAN
---
 Limit
   ->  Index Only Scan using ab_pkey on ab
 Index Cond: (a = 1)
(3 rows)

The patch is still incomplete as the matching is quite complex for the
case you mentioned with a=b.  It's a bit late here to start making
that work, but I think the key to make that work is to give
subquery_matches_pathkeys() an extra parameter or 2 for where to start
working on each list and recursively call itself where I've left the
TODO comment in the function and on the recursive call, try the next
query_pathkeys and the same sub_pathkey. If the recursive call
function returns false, continue on trying to match the normal way. If
it returns true, return true.

There'd be a bit more work elsewhere to do to make this work for the
general case.  For example:

explain (costs off) select * from (select a,b from ab where a = 1
offset 0) order by a,b;

still produces the following plan with the patched version:

QUERY PLAN
---
 Sort
   Sort Key: ab.a, ab.b
   ->  Index Only Scan using ab_pkey on ab
 Index Cond: (a = 1)
(4 rows)

the reason for this is that the subquery does not know the outer query
would like the index path to have the pathkeys  {a,b}.  I've not
looked at this issue in detail but I suspect we could do something
somewhere like set_subquery_pathlist() to check if the outer query's
query_pathkeys are all Vars from the subquery.  I think we'd need to
trawl through the eclasses to ensure that each query_pathkeys eclasses
contains a member mentioning only a Var from the subquery and if so,
get the subquery to set those pathkeys.

Anyway, this is all at least PG18 work so I'll raise a thread about it
around June.  The patch is included in case you want to mess around
with it. I'd be happy if you want to look into the subquery pathkey
issue portion of the work. I won't be giving this much more focus
until after the freeze.

> * In build_se

Re: a wrong index choose when statistics is out of date

2024-03-12 Thread Andrei Lepikhov


On 8/3/2024 18:53, Andy Fan wrote:

I just reviewed the bad queries plan for the past half years internally,
I found many queries used the Nested loop which is the direct cause. now
I think I find out a new reason for this, because the missed optimizer
statistics cause the rows in outer relation to be 1, which make the Nest
loop is choosed. I'm not sure your idea could help on this or can help
on this than mine at this aspect.


Having had the same problem for a long time, I've made an attempt and 
invented a patch that probes an index to determine whether the estimated 
constant is within statistics' scope.
I remember David's remark on the overhead problem, but I don't argue it 
here. This patch is on the table to have one more solution sketch for 
further discussion.
Also, Andy, if you have a specific problem with index choosing, you can 
try a tiny option that makes the index-picking technique less dependent 
on the ordering of index lists [1].


[1] 
https://www.postgresql.org/message-id/9b3dbf6d-776a-4953-a5a4-609929393...@postgrespro.ru


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/utils/adt/like_support.c 
b/src/backend/utils/adt/like_support.c
index 2635050861..bc8e59bbf3 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -643,7 +643,7 @@ patternsel_common(PlannerInfo *root,
/*
 * Pattern specifies an exact match, so estimate as for '='
 */
-   result = var_eq_const(&vardata, eqopr, collation, 
prefix->constvalue,
+   result = var_eq_const(root, &vardata, eqopr, collation, 
prefix->constvalue,
  false, true, false);
}
else
@@ -1294,7 +1294,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData 
*vardata,
 * probably off the end of the histogram, and thus we probably got a 
very
 * small estimate from the >= condition; so we still need to clamp.
 */
-   eq_sel = var_eq_const(vardata, eqopr, collation, prefixcon->constvalue,
+   eq_sel = var_eq_const(root, vardata, eqopr, collation, 
prefixcon->constvalue,
  false, true, false);
 
prefixsel = Max(prefixsel, eq_sel);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..be6213046e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -273,7 +273,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
 * in the query.)
 */
if (IsA(other, Const))
-   selec = var_eq_const(&vardata, operator, collation,
+   selec = var_eq_const(root, &vardata, operator, collation,
 ((Const *) 
other)->constvalue,
 ((Const *) 
other)->constisnull,
 varonleft, negate);
@@ -286,13 +286,133 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
return selec;
 }
 
+/*
+ * Lookup the value in the index and try to estimate number of the tuples with
+ * the same value.
+ */
+static Selectivity
+estimate_ntuples_by_index(PlannerInfo *root, VariableStatData *vardata,
+ Datum constval,
+ Oid collation, Oid 
regprocoid, int32 stawidth)
+{
+   RelOptInfo *rel = vardata->rel;
+   RangeTblEntry  *rte = root->simple_rte_array[rel->relid];
+   ListCell   *lc;
+   int ntuples = 0;
+   Selectivity selec = -1.;
+
+   foreach(lc, rel->indexlist)
+   {
+   IndexOptInfo   *index = (IndexOptInfo *) lfirst(lc);
+   ScanKeyData scankeys[1];
+   SnapshotDataSnapshotNonVacuumable;
+   IndexScanDesc   index_scan;
+   RelationheapRel;
+   RelationindexRel;
+
+   if (index->relam != BTREE_AM_OID)
+   continue;
+   if (index->indpred != NIL)
+   continue;
+   if (index->hypothetical)
+   continue;
+   if (!match_index_to_operand(vardata->var, 0, index))
+   continue;
+
+   heapRel = table_open(rte->relid, NoLock);
+   indexRel = index_open(index->indexoid, NoLock);
+
+   ScanKeyEntryInitialize(&scankeys[0],
+  0,
+  1,
+  
BTEqualStrategyNumber,
+  
vardata->atttype,
+

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-03-12 Thread John Naylor

On Mon, Mar 11, 2024 at 3:13 PM Masahiko Sawada  wrote:
>
> On Mon, Mar 11, 2024 at 12:20 PM John Naylor  wrote:
> >
> > On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada  
> > wrote:

> > + ts->context = CurrentMemoryContext;
> >
> > As far as I can tell, this member is never accessed again -- am I
> > missing something?
>
> You're right. It was used to re-create the tidstore in the same
> context again while resetting it, but we no longer support the reset
> API. Considering it again, would it be better to allocate the iterator
> struct in the same context as we store the tidstore struct?

That makes sense.

> > + /* DSA for tidstore will be detached at the end of session */
> >
> > No other test module pins the mapping, but that doesn't necessarily
> > mean it's wrong. Is there some advantage over explicitly detaching?
>
> One small benefit of not explicitly detaching dsa_area in
> tidstore_destroy() would be simplicity; IIUC if we want to do that, we
> need to remember the dsa_area using (for example) a static variable,
> and free it if it's non-NULL. I've implemented this idea in the
> attached patch.

Okay, I don't have a strong preference at this point.

> > +-- Add tids in random order.
> >
> > I don't see any randomization here. I do remember adding row_number to
> > remove whitespace in the output, but I don't remember a random order.
> > On that subject, the row_number was an easy trick to avoid extra
> > whitespace, but maybe we should just teach the setting function to
> > return blocknumber rather than null?
>
> Good idea, fixed.

+ test_set_block_offsets
+
+ 2147483647
+  0
+ 4294967294
+  1
+ 4294967295

Hmm, was the earlier comment about randomness referring to this? I'm
not sure what other regression tests do in these cases, or how
relibale this is. If this is a problem we could simply insert this
result into a temp table so it's not output.

> > +Datum
> > +tidstore_create(PG_FUNCTION_ARGS)
> > +{
> > ...
> > + tidstore = TidStoreCreate(max_bytes, dsa);
> >
> > +Datum
> > +tidstore_set_block_offsets(PG_FUNCTION_ARGS)
> > +{
> > 
> > + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs);
> >
> > These names are too similar. Maybe the test module should do
> > s/tidstore_/test_/ or similar.
>
> Agreed.

Mostly okay, although a couple look a bit generic now. I'll leave it
up to you if you want to tweak things.

> > In general, the .sql file is still very hard-coded. Functions are
> > created that contain a VALUES statement. Maybe it's okay for now, but
> > wanted to mention it. Ideally, we'd have some randomized tests,
> > without having to display it. That could be in addition to (not
> > replacing) the small tests we have that display input. (see below)
> >
>
> Agreed to add randomized tests in addition to the existing tests.

I'll try something tomorrow.

> Sounds a good idea. In fact, if there are some bugs in tidstore, it's
> likely that even initdb would fail in practice. However, it's a very
> good idea that we can test the tidstore anyway with such a check
> without a debug-build array.
>
> Or as another idea, I wonder if we could keep the debug-build array in
> some form. For example, we use the array with the particular build
> flag and set a BF animal for that. That way, we can test the tidstore
> in more real cases.

I think the purpose of a debug flag is to help developers catch
mistakes. I don't think it's quite useful enough for that. For one, it
has the same 1GB limitation as vacuum's current array. For another,
it'd be a terrible way to debug moving tidbitmap.c from its hash table
to use TID store -- AND/OR operations and lossy pages are pretty much
undoable with a copy of vacuum's array. Last year, when I insisted on
trying a long term realistic load that compares the result with the
array, the encoding scheme was much harder to understand in code. I
think it's now easier, and there are better tests.

> In the latest (v69) patch:
>
> - squashed v68-0005 and v68-0006 patches.
> - removed most of the changes in v68-0007 patch.
> - addressed above review comments in v69-0002 patch.
> - v69-0003, 0004, and 0005 are miscellaneous updates.
>
> As for renaming TidStore to TIDStore, I dropped the patch for now
> since it seems we're using "Tid" in some function names and variable
> names. If we want to update it, we can do that later.

I think we're not consistent across the codebase, and it's fine to
drop that patch.

v70-0008:

@@ -489,7 +489,7 @@ parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs)
  /*
  * Free the current tidstore and return allocated DSA segments to the
  * operating system. Then we recreate the tidstore with the same max_bytes
- * limitation.
+ * limitation we just used.

Nowadays, max_bytes is now more like a hint for tidstore, and not a
limitation, right? Vacuum has the limitation. Maybe instead of "with",
we should say "passing the same limitatio

Re: Streaming I/O, vectored I/O (WIP)

2024-03-12 Thread Dilip Kumar

On Tue, Mar 12, 2024 at 12:10 PM Thomas Munro  wrote:
>
> I think you'd be right if StartReadBuffers() were capable of
> processing a sequence consisting of a hit followed by misses, but
> currently it always gives up after the first hit.  That is, it always
> processes some number of misses (0-16) and then at most one hit.  So
> for now the variable would always turn out to be the same as blockNum.
>
Okay, then shouldn't this "if (found)" block immediately break the
loop so that when we hit the block we just return that block?  So it
makes sense what you explained but with the current code if there are
the first few hits followed by misses then we will issue the
smgrprefetch() for the initial hit blocks as well.

+ if (found)
+ {
+ /*
+ * Terminate the read as soon as we get a hit.  It could be a
+ * single buffer hit, or it could be a hit that follows a readable
+ * range.  We don't want to create more than one readable range,
+ * so we stop here.
+ */
+ actual_nblocks = operation->nblocks = *nblocks = i + 1;(Dilip: I
think we should break after this?)
+ }
+ else
+ {
+ /* Extend the readable range to cover this block. */
+ operation->io_buffers_len++;
+ }
+ }

> The reason is that I wanted to allows "full sized" read system calls
> to form.  If you said "hey please read these 16 blocks" (I'm calling
> that "full sized", AKA MAX_BUFFERS_PER_TRANSFER), and it found 2 hits,
> then it could only form a read of 14 blocks, but there might be more
> blocks that could be read after those.  We would have some arbitrary
> shorter read system calls, when we wanted to make them all as big as
> possible.  So in the current patch you say "hey please read these 16
> blocks" and it returns saying "only read 1", you call again with 15
> and it says "only read 1", and you call again and says "read 16!"
> (assuming 2 more were readable after the original range we started
> with).  Then physical reads are maximised.  Maybe there is some nice
> way to solve that, but I thought this way was the simplest (and if
> there is some instruction-cache-locality/tight-loop/perf reason why we
> should work harder to find ranges of hits, it could be for later).
> Does that make sense?

Understood, I think this makes sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: speed up a logical replica setup

2024-03-12 Thread Amit Kapila

On Mon, Mar 11, 2024 at 9:42 AM vignesh C  wrote:
>
> On Sat, 9 Mar 2024 at 00:56, Tomas Vondra  
> wrote:
> >
> > >> 5) slot / publication / subscription name
> > >>
> > >> I find it somewhat annoying it's not possible to specify names for
> > >> objects created by the tool - replication slots, publication and
> > >> subscriptions. If this is meant to be a replica running for a while,
> > >> after a while I'll have no idea what pg_createsubscriber_569853 or
> > >> pg_createsubscriber_459548_2348239 was meant for.
> > >>
> > >> This is particularly annoying because renaming these objects later is
> > >> either not supported at all (e.g. for replication slots), or may be
> > >> quite difficult (e.g. publications).
> > >>
> > >> I do realize there are challenges with custom names (say, if there are
> > >> multiple databases to replicate), but can't we support some simple
> > >> formatting with basic placeholders? So we could specify
> > >>
> > >> --slot-name "myslot_%d_%p"
> > >>
> > >> or something like that?
> > >
> > > Not sure we can do in the first version, but looks nice. One concern is 
> > > that I
> > > cannot find applications which accepts escape strings like 
> > > log_line_prefix.
> > > (It may just because we do not have use-case.) Do you know examples?
> > >
> >
> > I can't think of a tool already doing that, but I think that's simply
> > because it was not needed. Why should we be concerned about this?
> >
>
> +1 to handle this.
> Currently, a) Publication name = pg_createsubscriber_%u, where %u is
> database oid, b) Replication slot name =  pg_createsubscriber_%u_%d,
> Where %u is database oid and %d is the pid and c) Subscription name =
> pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid
> How about we have a non mandatory option like
> --prefix_object_name=mysetup1, which will create a) Publication name =
> mysetup1_pg_createsubscriber_%u, and b) Replication slot name =
> mysetup1_pg_createsubscriber_%u (here pid is also removed) c)
> Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also
> removed).
>
> In the default case where the user does not specify
> --prefix_object_name the object names will be created without any
> prefix names.
>

Tomas's idea is better in terms of useability. So, we should instead
have three switches --slot-name, --publication-name, and
--subscriber-name with some provision of appending dbid's and some
unique identifier for standby. The unique identifier can help in
creating multiple subscribers from different standbys.

-- 
With Regards,
Amit Kapila.

Re: POC: Extension for adding distributed tracing - pg_tracing

2024-03-12 Thread Anthonin Bonnefoy

Hi all!

Thanks for the reviews and comments.

> - pg_tracing.c should include postgres.h as the first thing
Will do.

> afaict none of the use of volatile is required, spinlocks have been barriers
> for a long time now
Got it, I will remove them. I've been mimicking what was done in
pg_stat_statements and all spinlocks are done on volatile variables
[1].

> - I don't think it's a good idea to do memory allocations in the middle of a
> PG_CATCH. If the error was due to out-of-memory, you'll throw another error.
Good point. I was wondering what were the risks of generating spans
for errors. I will try to find a better way to handle this.

To give a quick update: following Heikki's suggestion, I'm currently
working on getting pg_tracing as a separate extension on github. I
will send an update when it's ready.

[1]: 
https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/pg_stat_statements.c#L2000

On Fri, Feb 9, 2024 at 7:50 PM Andres Freund  wrote:
>
> Hi,
>
> On 2024-02-09 19:46:58 +0300, Nikita Malakhov wrote:
> > I agree with Heikki on most topics and especially the one he recommended
> > to publish your extension on GitHub, this tool is very promising for highly
> > loaded
> > systems, you will get a lot of feedback very soon.
> >
> > I'm curious about SpinLock - it is recommended for very short operations,
> > taking up to several instructions, and docs say that for longer ones it
> > will be
> > too expensive, and recommends using LWLock. Why have you chosen SpinLock?
> > Does it have some benefits here?
>
> Indeed - e.g. end_tracing() looks to hold the spinlock for far too long for
> spinlocks to be appropriate. Use an lwlock.
>
> Random stuff I noticed while skimming:
> - pg_tracing.c should include postgres.h as the first thing
>
> - afaict none of the use of volatile is required, spinlocks have been barriers
>   for a long time now
>
> - you acquire the spinlock for single increments of n_writers, perhaps that
>   could become an atomic, to reduce contention?
>
> - I don't think it's a good idea to do memory allocations in the middle of a
>   PG_CATCH. If the error was due to out-of-memory, you'll throw another error.
>
>
> Greetings,
>
> Andres Freund

Re: Reports on obsolete Postgres versions

2024-03-12 Thread Daniel Gustafsson

>> but that is far down the page.  Do we need to improve this?

> I liked the statement from Laurenz a while ago on his blog
> (paraphrased): "Upgrading to the latest patch release does not require
> application testing or recertification". I am not sure we want to put
> that into the official page (or maybe tone down/qualify it a bit), but I
> think a lot of users stay on older minor versions because they dread
> their internal testing policies.

I think we need a more conservative language since a minor release might fix a
planner bug that someone's app relied on and their plans will be worse after
upgrading.  While rare, it can for sure happen so the official wording should
probably avoid such bold claims.

> The other thing that could maybe be made a bit better is the fantastic
> patch release schedule, which however is buried in the "developer
> roadmap". I can see how this was useful years ago, but I think this page
> should be moved to the end-user part of the website, and maybe (also)
> integrated into the support/versioning page?

Fair point.

--
Daniel Gustafsson

Re: Disconnect if socket cannot be put into non-blocking mode

2024-03-12 Thread Daniel Gustafsson

> On 12 Mar 2024, at 09:49, Heikki Linnakangas  wrote:

>> Barring objections, I'll commit and backpatch the attached to fix that.
> 
> Committed.

Sorry for being slow to review, but +1 on this change.

--
Daniel Gustafsson

Re: Regarding the order of the header file includes

2024-03-12 Thread Richard Guo

On Fri, Mar 8, 2024 at 6:58 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Mar 7, 2024 at 12:39 PM Richard Guo 
> wrote:
> >
> > While rebasing one of my patches I noticed that the header file includes
> > in relnode.c are not sorted in order.  So I wrote a naive script to see
> > if any other C files have the same issue.  The script is:
> >
> > #!/bin/bash
> >
> > find . -name "*.c" | while read -r file; do
> >   headers=$(grep -o '#include "[^>]*"' "$file" |
> > grep -v "postgres.h" | grep -v "postgres_fe.h" |
> > sed 's/\.h"//g')
> >
> >   sorted_headers=$(echo "$headers" | sort)
> >
> >   results=$(diff <(echo "$headers") <(echo "$sorted_headers"))
> >
> >   if [[ $? != 0 ]]; then
> > echo "Headers in '$file' are out of order"
> > echo $results
> > echo
> >   fi
> > done
>
> Cool. Isn't it a better idea to improve this script to auto-order the
> header files and land it under src/tools/pginclude/headerssort? It can
> then be reusable and be another code beautification weapon one can use
> before the code release.


Yeah, perhaps.  However the current script is quite unrefined and would
require a lot of effort to make it a reusable tool.  I will add it to my
to-do list and hopefully one day I can get back to it.  Feel free to
mess around with it if someone is interested.


> FWIW, I'm getting the syntax error when ran the above shell script:
>
> headerssort.sh: 10: Syntax error: "(" unexpected


I think the error is due to line 10 containing bash-style syntax.  Hmm,
have you tried to use 'bash' instead of 'sh' to run this script?

Thanks
Richard

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Jelte Fennema-Nio

On Tue, 12 Mar 2024 at 10:53, Jelte Fennema-Nio  wrote:
> I'd rather fail as hard as possible when someone is using the API
> wrongly.

To be clear, this is my way of looking at it. If you feel strongly
about that we should not change conn.status, I'm fine with making that
change to the patchset.

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

2024-03-12 Thread vignesh C

On Mon, 11 Mar 2024 at 17:16, Amit Kapila  wrote:
>
> On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > > PSA the patch for implementing it. It is basically same as Ian's one.
> > > However, this patch still cannot satisfy the condition 3).
> > >
> > > `pg_basebackup -D data_N2 -d "user=postgres" -R`
> > > -> dbname would not be appeared in primary_conninfo.
> > >
> > > This is because `if (connection_string)` case in GetConnection() explicy 
> > > override
> > > a dbname to "replication". I've tried to add a dummy entry {"dbname", 
> > > NULL} pair
> > > before the overriding, but it is no-op. Because The replacement of the 
> > > dbname in
> > > pqConnectOptions2() would be done only for the valid (=lastly specified)
> > > connection options.
> >
> > Oh, this patch missed the straightforward case:
> >
> > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R
> > -> dbname would not be appeared in primary_conninfo.
> >
> > So I think it cannot be applied as-is. Sorry for sharing the bad item.
> >
>
> Can you please share the patch that can be considered for review?

Here is a patch with few changes: a) Removed the version check based
on discussion from [1] b) updated the documentation.
I have tested various scenarios with the attached patch, the dbname
that is written in postgresql.auto.conf for each of the cases with
pg_basebackup is given below:
1) ./pg_basebackup -U vignesh -R
-> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
will have dbname as username specified)
2) ./pg_basebackup -D data -R
-> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo
will have the dbname as username (which is the same as the OS user
while setting defaults))
3) ./pg_basebackup -d "user=vignesh"  -D data -R
-> primary_conninfo = "dbname=replication"  (In this case
primary_conninfo will have dbname as replication which is the default
value from GetConnection as connection string is specified)
4) ./pg_basebackup -d "user=vignesh dbname=postgres"  -D data -R
-> primary_conninfo = "dbname=postgres" (In this case primary_conninfo
will have the dbname as the dbname specified)
5) ./pg_basebackup -d ""  -D data -R
-> primary_conninfo = "dbname=replication" (In this case
primary_conninfo will have dbname as replication which is the default
value from GetConnection as connection string is specified)

I have mentioned the reasons as to why dbname is being set to
replication or username in each of the cases. How about replacing
these values in postgresql.auto.conf manually.

[1] - 
https://www.postgresql.org/message-id/CAGECzQTh9oB3nu98DsHMpRaVaqXPDRgTDEikY82OAKYF0%3DhVMA%40mail.gmail.com

Regards,
Vignesh
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 88c689e725..1a6980d1b0 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -243,7 +243,10 @@ PostgreSQL documentation
 The postgresql.auto.conf file will record the connection
 settings and, if specified, the replication slot
 that pg_basebackup is using, so that
-streaming replication will use the same settings later on.
+a) synchronization of logical replication slots on the primary to the
+hot_standby from 
+pg_sync_replication_slots or slot sync worker
+and b) streaming replication will use the same settings later on.

 
   
diff --git a/src/fe_utils/recovery_gen.c b/src/fe_utils/recovery_gen.c
index 2585f11939..cb37703ab9 100644
--- a/src/fe_utils/recovery_gen.c
+++ b/src/fe_utils/recovery_gen.c
@@ -49,7 +49,6 @@ GenerateRecoveryConfig(PGconn *pgconn, char *replication_slot)
 	{
 		/* Omit empty settings and those libpqwalreceiver overrides. */
 		if (strcmp(opt->keyword, "replication") == 0 ||
-			strcmp(opt->keyword, "dbname") == 0 ||
 			strcmp(opt->keyword, "fallback_application_name") == 0 ||
 			(opt->val == NULL) ||
 			(opt->val != NULL && opt->val[0] == '\0'))

Re: Regarding the order of the header file includes

2024-03-12 Thread Richard Guo

On Wed, Mar 6, 2024 at 5:32 PM Richard Guo  wrote:

> Please note that this patch only addresses the order of header file
> includes in backend modules (and might not be thorough).  It is possible
> that other modules may have a similar issue, but I have not evaluated
> them yet.
>

Attached is v2, which also includes the 0002 patch that addresses the
order of header file includes in non-backend modules.

Thanks
Richard


v2-0001-Make-the-order-of-the-header-file-includes-consistent-in-backend-modules.patch
Description: Binary data


v2-0002-Make-the-order-of-the-header-file-includes-consistent-in-non-backend-modules.patch
Description: Binary data

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-03-12 Thread Matthias van de Meent

On Thu, 7 Mar 2024 at 19:37, Michail Nikolaev
 wrote:
>
> Hello!
>
> > I'm not a fan of this approach. Changing visibility and cleanup
> > semantics to only benefit R/CIC sounds like a pain to work with in
> > essentially all visibility-related code. I'd much rather have to deal
> > with another index AM, even if it takes more time: the changes in
> > semantics will be limited to a new plug in the index AM system and a
> > behaviour change in R/CIC, rather than behaviour that changes in all
> > visibility-checking code.
>
> Technically, this does not affect the visibility logic, only the
> clearing semantics.
> All visibility related code remains untouched.

Yeah, correct. But it still needs to update the table relations'
information after finishing creating the indexes, which I'd rather not
have to do.

> But yes, still an inelegant and a little strange-looking option.
>
> At the same time, perhaps it can be dressed in luxury
> somehow - for example, add as a first class citizen in 
> ComputeXidHorizonsResult
> a list of blocks to clear some relations.

Not sure what you mean here, but I don't think
ComputeXidHorizonsResult should have anything to do with actual
relations.

> > But regardless of second scan snapshots, I think we can worry about
> > that part at a later moment: The first scan phase is usually the most
> > expensive and takes the most time of all phases that hold snapshots,
> > and in the above discussion we agreed that we can already reduce the
> > time that a snapshot is held during that phase significantly. Sure, it
> > isn't great that we have to scan the table again with only a single
> > snapshot, but generally phase 2 doesn't have that much to do (except
> > when BRIN indexes are involved) so this is likely less of an issue.
> > And even if it is, we would still have reduced the number of
> > long-lived snapshots by half.
>
> Hmm, but it looks like we don't have the infrastructure to "update" xmin
> propagating to the horizon after the first snapshot in a transaction is taken.

We can just release the current snapshot, and get a new one, right? I
mean, we don't actually use the transaction for much else than
visibility during the first scan, and I don't think there is a need
for an actual transaction ID until we're ready to mark the index entry
with indisready.

> One option I know of is to reuse the
> d9d076222f5b94a85e0e318339cfc44b8f26022d (1) approach.
> But if this is the case, then there is no point in re-taking the
> snapshot again during the first
> phase - just apply this "if" only for the first phase - and you're done.

Not a fan of that, as it is too sensitive to abuse. Note that
extensions will also have access to these tools, and I think we should
build a system here that's not easy to break, rather than one that is.

> Do you know any less-hacky way? Or is it a nice way to go?

I suppose we could be resetting the snapshot every so often? Or use
multiple successive TID range scans with a new snapshot each?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: remaining sql/json patches

2024-03-12 Thread Amit Langote

Hi Himanshu,

On Tue, Mar 12, 2024 at 6:42 PM Himanshu Upadhyaya
 wrote:
>
> Hi,
>
> wanted to share the below case:
>
> ‘postgres[146443]=#’SELECT JSON_EXISTS(jsonb '{"customer_name": "test", 
> "salary":1000, "department_id":1}', '$.* ? (@== $dept_id && @ == $sal)' 
> PASSING 1000 AS sal, 1 as dept_id);
>  json_exists
> -
>  f
> (1 row)
>
> isn't it supposed to return "true" as json in input is matching with both the 
> condition dept_id and salary?

I think you meant to use || in your condition, not &&, because 1000 != 1.

See:

SELECT JSON_EXISTS(jsonb '{"customer_name": "test", "salary":1000,
"department_id":1}', '$.* ? (@ == $dept_id || @ == $sal)' PASSING 1000
AS sal, 1 as dept_id);
 json_exists
-
 t
(1 row)

Or you could've written the query as:

SELECT JSON_EXISTS(jsonb '{"customer_name": "test", "salary":1000,
"department_id":1}', '$ ? (@.department_id == $dept_id && @.salary ==
$sal)' PASSING 1000 AS sal, 1 as dept_id);
 json_exists
-
 t
(1 row)

Does that make sense?

In any case, JSON_EXISTS() added by the patch here returns whatever
the jsonpath executor returns.  The latter is not touched by this
patch.  PASSING args, which this patch adds, seem to be working
correctly too.

-- 
Thanks, Amit Langote

Re: collect_corrupt_items_vacuum.patch

2024-03-12 Thread Alexander Korotkov

On Sun, Jan 14, 2024 at 4:35 AM Alexander Korotkov  wrote:
> On Sat, Jan 13, 2024 at 7:33 PM Dmitry Koval  wrote:
> > Thank you, there is one small point left (in the comment): can you
> > replace "guarantteed to be to be newer" to "guaranteed to be newer",
> > file src/backend/storage/ipc/procarray.c?
>
> Fixed.  Thank you for catching this.

I made the following improvements to the patch.
1. I find a way to implement the path with less changes to the core
code.  The GetRunningTransactionData() function allows to get the
least running xid, all I need is to add database-aware values.
2. I added the TAP test reproducing the original issue.

I'm going to push this if no objections.

--
Regards,
Alexander Korotkov

v6-0001-Fix-false-reports-in-pg_visibility.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Amit Kapila

On Tue, Mar 12, 2024 at 1:24 PM Bertrand Drouvot
 wrote:
>
> On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote:
> > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila  wrote:
> > >
> > > You might want to consider its interaction with sync slots on standby.
> > > Say, there is no activity on slots in terms of processing the changes
> > > for slots. Now, we won't perform sync of such slots on standby showing
> > > them inactive as per your new criteria where as same slots could still
> > > be valid on primary as the walsender is still active. This may be more
> > > of a theoretical point as in running system there will probably be
> > > some activity but I think this needs some thougths.
> >
> > I believe the xmin and catalog_xmin of the sync slots on the standby
> > keep advancing depending on the slots on the primary, no? If yes, the
> > XID age based invalidation shouldn't be a problem.
> >
> > I believe there are no walsenders started for the sync slots on the
> > standbys, right? If yes, the inactive timeout based invalidation also
> > shouldn't be a problem. Because, the inactive timeouts for a slot are
> > tracked only for walsenders because they are the ones that typically
> > hold replication slots for longer durations and for real replication
> > use. We did a similar thing in a recent commit [1].
> >
> > Is my understanding right? Do you still see any problems with it?
>
> Would that make sense to "simply" discard/prevent those kind of invalidations
> for "synced" slot on standby? I mean, do they make sense given the fact that
> those slots are not usable until the standby is promoted?
>

AFAIR, we don't prevent similar invalidations due to
'max_slot_wal_keep_size' for sync slots, so why to prevent it for
these new parameters? This will unnecessarily create inconsistency in
the invalidation behavior.

-- 
With Regards,
Amit Kapila.

Re: Detoasting optionally to make Explain-Analyze less misleading

2024-03-12 Thread Matthias van de Meent

On Mon, 26 Feb 2024 at 21:54, stepan rutz  wrote:
>
> Hi Matthias, thanks for picking it up. I still believe this is valuable
> to a lot of people out there. Thanks for dealing with my proposal.
> Matthias, Tom, Tomas everyone.
>
> Two (more or less) controversial remarks from side.
>
> 1. Actually serialization should be the default for "analyze" in
> explain, as current analyze doesn't detoast and thus distorts the result
> in extreme (but common) cases easily by many order of magnitude (see my
> original video on that one). So current "explain analyze" only works for
> some queries and since detoasting is really transparent, it is quite
> something to leave detoasting out of explain analyze. This surprises
> people all the time, since explain analyze suggests it "runs" the query
> more realistically.

I'm not sure about this, but it could easily be a mid-beta decision
(if this is introduced before the feature freeze of 17, whenever that
is).

> 2. The bandwidth I computed in one of the previous versions of the patch
> was certainly cluttering up the explain output and it is misleading yes,
> but then again it performs a calculation people will now do in their
> head. The "bandwidth" here is how much data your query gets out of
> backend by means of the query and the deserialization. So of course if
> you do id-lookups you get a single row and such querries do have a lower
> data-retrieval bandwidth compared to bulk querries.

I think that's a job for post-processing the EXPLAIN output by the
user. If we don't give users the raw data, they won't be able to do
their own cross-referenced processing: "5MB/sec" doesn't tell you how
much time or data was actually spent.

> However having some
> measure of how fast data is delivered from the backend especially on
> larger joins is still a good indicator of one aspect of a query.

I'm not sure about that. Network speed is a big limiting factor that
we can't measure here, and the size on disk is often going to be
smaller than the data size when transfered across the network.

> Sorry for the remarks. Both are not really important, just restating my
> points here. I understand the objections and reasons that speak against
> both points and believe the current scope is just right.

No problem. Remarks from users (when built on solid arguments) are
always welcome, even if we may not always agree on the specifics.

-->8--

Attached is v9, which is rebased on master to handle 24eebc65's
changed signature of pq_sendcountedtext.
It now also includes autocompletion, and a second patch which adds
documentation to give users insights into this new addition to
EXPLAIN.

Kind regards,

Matthias van de Meent

v9-0002-Add-EXPLAIN-SERIALIZE-docs.patch
Description: Binary data

v9-0001-Explain-Add-SERIALIZE-option.patch
Description: Binary data

Re: A failure in t/038_save_logical_slots_shutdown.pl

2024-03-12 Thread Amit Kapila

On Thu, Jan 25, 2024 at 5:07 PM Amit Kapila  wrote:
>
> On Thu, Jan 25, 2024 at 4:27 PM Bharath Rupireddy
>  wrote:
> >
> > Thanks. I'll wait a while and then add it to CF to not lose it in the wild.
> >
>
> Feel free to add it to CF. However, I do plan to look at it in the
> next few days.
>

The patch looks mostly good to me. I have changed the comments and
commit message in the attached. I am planning to push this tomorrow
unless you or others have any comments on it.

-- 
With Regards,
Amit Kapila.


v2-0001-Fix-a-random-failure-in-038_save_logical_slots_sh.patch
Description: Binary data

confusing `case when` column name

2024-03-12 Thread adjk...@126.com

Hi hackers,

Below is a `case when` demo,

```sql
create table foo(a int, b int);
insert into foo values (1, 2);
select case 1 when 1 then a else b end from foo;
```


Currently, psql output is,


```text
 b
---
 1
(1 row)
```


At the first glance at the output column title, I assume the result of the sql 
is wrong. It should be `a`.
After some investigation, I discovered that the result's value is accurate. 
However, PostgreSQL utilizes b as the title for the output column.
Nee we change the title of the case-when output column? If you hackers think 
it's worth the effort, I'm willing to invest time in working on it.
Best Regards,
Winter Loo

Re: Make attstattarget nullable

2024-03-12 Thread Peter Eisentraut


On 06.03.24 22:34, Tomas Vondra wrote:

0001


1) I think this bit in ALTER STATISTICS docs is wrong:

-  new_target
+  SET STATISTICS { integer | DEFAULT }

because it means we now have list entries for name, ..., new_name,
new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }".
That's a bit weird.


Ok, how would you change it?  List out the full clauses of the other 
variants under Parameters as well?


We have similar inconsistencies on other ALTER reference pages, so I'm 
not sure what the preferred approach is.



2) The newtarget handling in AlterStatistics seems rather confusing. Why
does it get set to -1 just to ignore the value later? For a while I was
99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field
to -1. Maybe ditching the first if block and directly checking
stmt->stxstattarget before setting repl_val/repl_null would be better?


But we also need to continue accepting -1 for default on input.  The 
current code achieves that, the proposed variant would not.


Note that this patch matches the equivalent patch for attstattarget 
(4f622503d6d), which uses the same logic.  We could change it if we have 
a better idea, but then we should change both.



0002


1) I think InsertPgAttributeTuples comment probably needs to document
what the new tupdesc_extra parameter does.


Yes, I'll update that comment.

Re: [PATCHES] Post-special page storage TDE support

2024-03-12 Thread Aleksander Alekseev

Hi David,

> I have finished the reworking of this particular patch series, and have tried 
> to
> organize this in such a way that it will be easily reviewable.  It is
> constructed progressively to be able to follow what is happening here.  As 
> such,
> each individual commit is not guaranteed to compile on its own, so the whole
> series would need to be applied before it works. (It does pass CI tests.)
>
> Here is a brief roadmap of the patches; some of them have additional details 
> in
> the commit message describing a little more about them.
>
> These two patches do some refactoring of existing code to make a common place 
> to
> modify the definitions:
>
> v3-0001-refactor-Create-PageUsableSpace-to-represent-spac.patch
> v3-0002-refactor-Make-PageGetUsablePageSize-routine.patch
>
> These two patches add the ReservedPageSize variable and teach PageInit to use 
> to
> adjust sizing accordingly:
>
> v3-0003-feature-Add-ReservedPageSize-variable.patch
> v3-0004-feature-Adjust-page-sizes-at-PageInit.patch
>
> This patch modifies the definitions of 4 symbols to be computed based on
> PageUsableSpace:
>
> v3-0005-feature-Create-Calc-Limit-and-Dynamic-forms-for-f.patch
>
> These following 4 patches are mechanical replacements of all existing uses of
> these symbols; this provides both visibility into where the existing symbol is
> used as well as distinguishing between parts that care about static allocation
> vs dynamic usage.  The only non-mechanical change is to remove the definition 
> of
> the old symbol so we can be guaranteed that all uses have been considered:
>
> v3-0006-chore-Split-MaxHeapTuplesPerPage-into-Limit-and-D.patch
> v3-0007-chore-Split-MaxIndexTuplesPerPage-into-Limit-and-.patch
> v3-0008-chore-Split-MaxHeapTupleSize-into-Limit-and-Dynam.patch
> v3-0009-chore-Split-MaxTIDsPerBTreePage-into-Limit-and-Dy.patch
>
> The following patches are related to required changes to support dynamic toast
> limits:
>
> v3-0010-feature-Add-hook-for-setting-reloptions-defaults-.patch
> v3-0011-feature-Dynamically-calculate-toast_tuple_target.patch
> v3-0012-feature-Add-Calc-options-for-toast-related-pieces.patch
> v3-0013-chore-Replace-TOAST_MAX_CHUNK_SIZE-with-ClusterTo.patch
> v3-0014-chore-Translation-updates-for-TOAST_MAX_CHUNK_SIZ.patch
>
> In order to calculate some of the sizes, we need to include nbtree.h 
> internals,
> but we can't use in front-end apps, so we separate out the pieces we care 
> about
> into a separate include and use that:
>
> v3-0015-chore-Split-nbtree.h-structure-defs-into-an-inter.patch
>
> This is the meat of the patch; provide a common location for these
> block-size-related constants to be computed using the infra that has been set 
> up
> so far.  Also ensure that we are properly initializing this in front end and
> back end code.  A tricky piece here is we have two separate include files for
> blocksize.h; one which exposes externs as consts for optimizations, and one 
> that
> blocksize.c itself uses without consts, which it uses to create/initialized 
> the
> vars:
>
> v3-0016-feature-Calculate-all-blocksize-constants-in-a-co.patch
>
> Add ControlFile and GUC support for reserved_page_size:
>
> v3-0017-feature-ControlFile-GUC-support-for-reserved_page.patch
>
> Add initdb support for reserving page space:
>
> v3-0018-feature-Add-reserved_page_size-to-initdb-bootstra.patch
>
> Fixes for pg_resetwal:
>
> v3-0019-feature-Updates-for-pg_resetwal.patch
>
> The following 4 patches mechanically replace the Dynamic form to use the new
> Cluster variables:
>
> v3-0020-chore-Rename-MaxHeapTupleSizeDynamic-to-ClusterMa.patch
> v3-0021-chore-Rename-MaxHeapTuplesPerPageDynamic-to-Clust.patch
> v3-0022-chore-Rename-MaxIndexTuplesPerPageDynamic-to-Clus.patch
> v3-0023-chore-Rename-MaxTIDsPerBTreePageDynamic-to-Cluste.patch
>
> Two pieces of optimization required for visibility map:
>
> v3-0024-optimization-Add-support-for-fast-non-division-ba.patch
> v3-0025-optimization-Use-fastdiv-code-in-visibility-map.patch
>
> Update bufpage.h comments:
>
> v3-0026-doc-update-bufpage-docs-w-reserved-space-data.patch
>
> Fixes for bloom to use runtime size:
>
> v3-0027-feature-Teach-bloom-about-PageUsableSpace.patch
>
> Fixes for FSM to use runtime size:
>
> v3-0028-feature-teach-FSM-about-reserved-page-space.patch
>
> I hope this makes sense for reviewing, I know it's a big job, so breaking 
> things up a little more and organizing will hopefully help.

Just wanted to let you know that the patchset seems to need a rebase,
according to cfbot.

Best regards,
Aleksander Alekseev (wearing a co-CFM hat)

Re: confusing `case when` column name

2024-03-12 Thread David G. Johnston

On Tuesday, March 12, 2024, adjk...@126.com  wrote:

>
> Nee we change the title of the case-when output column?
>
>
Choosing either a or b as the label seems wrong and probably worth changing
to something that has no meaning and encourages the application of a column
alias.

David J.

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2024-03-12 Thread Aleksander Alekseev

Hi,

> I took a closer look at this patch over the last couple days, and I did
> a bunch of benchmarks to see how expensive the accounting is. The good
> news is that I haven't observed any overhead - I did two simple tests,
> that I think would be particularly sensitive to this:
>
> [...]

Just wanted to let you know that v20231226 doesn't apply. The patch
needs love from somebody interested in it.

Best regards,
Aleksander Alekseev (wearing a co-CFM hat)

Re: Make attstattarget nullable

2024-03-12 Thread Tomas Vondra




On 3/12/24 13:47, Peter Eisentraut wrote:
> On 06.03.24 22:34, Tomas Vondra wrote:
>> 0001
>> 
>>
>> 1) I think this bit in ALTER STATISTICS docs is wrong:
>>
>> -  > class="parameter">new_target
>> +  SET STATISTICS { > class="parameter">integer | DEFAULT }
>>
>> because it means we now have list entries for name, ..., new_name,
>> new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }".
>> That's a bit weird.
> 
> Ok, how would you change it?  List out the full clauses of the other
> variants under Parameters as well?

I'd go with a parameter, essentially exactly as it used to be, except
for adding the DEFAULT option. So the list would define new_target, and
mention DEFAULT as a special value.

> We have similar inconsistencies on other ALTER reference pages, so I'm
> not sure what the preferred approach is.
> 

Yeah, the other reference pages may have some inconsistencies too, but
let's not add more.

>> 2) The newtarget handling in AlterStatistics seems rather confusing. Why
>> does it get set to -1 just to ignore the value later? For a while I was
>> 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field
>> to -1. Maybe ditching the first if block and directly checking
>> stmt->stxstattarget before setting repl_val/repl_null would be better?
> 
> But we also need to continue accepting -1 for default on input.  The
> current code achieves that, the proposed variant would not.
> 

OK, I did not realize that. But then maybe this should be explained in a
comment before the new "if" block, because people won't realize why it
needs to be this way.

> Note that this patch matches the equivalent patch for attstattarget
> (4f622503d6d), which uses the same logic.  We could change it if we have
> a better idea, but then we should change both.
> 
>> 0002
>> 
>>
>> 1) I think InsertPgAttributeTuples comment probably needs to document
>> what the new tupdesc_extra parameter does.
> 
> Yes, I'll update that comment.
> 

OK.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: CF entries for 17 to be reviewed

2024-03-12 Thread Aleksander Alekseev

Hi,

> > On 6 Mar 2024, at 18:49, Andrey M. Borodin  wrote:
> >
> > Here are statuses for "Refactoring" section:
>
> [...]

> Aleksander, I would greatly appreciate if you join me in managing CF. 
> Together we can move more stuff :)
> Currently, I'm going through "SQL Commands". And so far I had not come to 
> "Performance" and "Server Features" at all... So if you can handle updating 
> statuses of that sections - that would be great.

Server Features:

* Fix partitionwise join with partially-redundant join clauses
I see there is a good discussion in the progress here. Doesn't
seem to be needing more reviewers at the moment.
* ALTER TABLE SET ACCESS METHOD on partitioned tables
Ditto.
* Patch to implement missing join selectivity estimation for range types
The patch doesn't apply and has been "Waiting on Author" for a few
months now. Could be a candidate for closing with RwF.
* logical decoding and replication of sequences, take 2
   According to Tomas Vondra the patch is not ready for PG17. The
patch is marked as "Waiting on Author". Although it could be withrowed
for now, personally I see no problem with keeping it WoA until the
PG18 cycle begins.
* Add the ability to limit the amount of memory that can be allocated
to backends
   v20231226 doesn't apply. The patch needs love from somebody interested in it.
* Multi-version ICU
   By a quick look the patch doesn't apply and was moved between
several commitfests, last time in "Waiting on Author" state. Could be
a candidate for closing with RwF.
* Post-special Page Storage TDE support
A large patch divided into 28 (!) parts. Currently needs a rebase.
Which shouldn't necessarily stop a reviewer looking for a challenging
task.
* Built-in collation provider for "C" and "C.UTF-8"
Peter E left some feedback today, so I changed the status to
"Waiting on Author"
* ltree hash functions
Marked as RfC and cfbot seems to be happy with the patch. Could
use some attention from a committer?
* UUID v7
The patch is in good shape. Michael argued that the patch should
be merged when RFC is approved. No action seems to be needed until
then.
* (to be continued)

--
Best regards,
Aleksander Alekseev

Re: [PATCH] LockAcquireExtended improvement

2024-03-12 Thread Robert Haas

On Mon, Mar 11, 2024 at 11:11 PM Jingxian Li  wrote:
> Your version changes less code than mine by pushing the nowait flag down
> into ProcSleep(). This looks fine in general, except for a little advice,
> which I don't think there is necessary to add 'waiting' suffix to the
> process name in function WaitOnLock with dontwait being true, as follows:

That could be done, but in my opinion it's not necessary. The waiting
suffix will appear only very briefly, and probably only in relatively
rare cases. It doesn't seem worth adding code to avoid it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Disable LLVM bitcode generation with pgxs.mk framework.

2024-03-12 Thread Xing Guo

Hi hackers,

When the PostgreSQL server is configured with --with-llvm, the pgxs.mk
framework will generate LLVM bitcode for extensions automatically.
Sometimes, I don't want to generate bitcode for some extensions. I can
turn off this feature by specifying with_llvm=0 in the make command.

```
make with_llvm=0
```

Would it be possible to add a new switch in the pgxs.mk framework to
allow users to disable this feature? E.g., the Makefile looks like:

```
WITH_LLVM=no
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
```

Best Regards,
Xing

Re: Commitfest Manager for March

2024-03-12 Thread Aleksander Alekseev

Hi Andrey,

> > If you need any help please let me know.
>
> Aleksander, I would greatly appreciate if you join me in managing CF. 
> Together we can move more stuff :)
> Currently, I'm going through "SQL Commands". And so far I had not come to 
> "Performance" and "Server Features" at all... So if you can handle updating 
> statuses of that sections - that would be great.

OK, I'll take care of the "Performance" and "Server Features"
sections. I submitted my summaries of the entries triaged so far to
the corresponding thread [1].

[1]: 
https://www.postgresql.org/message-id/CAJ7c6TN9SnYdq%3DkfP-txgo5AaT%2Bt9YU%2BvQHfLBZqOBiHwoipAg%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: CF entries for 17 to be reviewed

2024-03-12 Thread Aleksander Alekseev

Hi,

> > Aleksander, I would greatly appreciate if you join me in managing CF. 
> > Together we can move more stuff :)
> > Currently, I'm going through "SQL Commands". And so far I had not come to 
> > "Performance" and "Server Features" at all... So if you can handle updating 
> > statuses of that sections - that would be great.
>
> Server Features:
>
> [...]

I noticed that "Avoid mixing custom and OpenSSL BIO functions" had two
entries [1][2]. I closed [1] and marked it as "Withdrawn" due to lack
of a better status. Maybe we need an additional "Duplicate" status.

[1]: https://commitfest.postgresql.org/47/4834/
[2]: https://commitfest.postgresql.org/47/4835/

-- 
Best regards,
Aleksander Alekseev

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Jelte Fennema-Nio

On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera  wrote:
> Here's a last one for the cfbot.

Thanks for committing the first 3 patches btw. Attached a tiny change
to 0001, which adds "(backing struct for PGcancelConn)" to the comment
on pg_cancel_conn.
From d340fde6883a249fd7c1a90033675a3b5edb603e Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Thu, 14 Dec 2023 13:39:09 +0100
Subject: [PATCH v35 2/2] Start using new libpq cancel APIs

A previous commit introduced new APIs to libpq for cancelling queries.
This replaces the usage of the old APIs in most of the codebase with
these newer ones. This specifically leaves out changes to psql and
pgbench as those would need a much larger refactor to be able to call
them, due to the new functions not being signal-safe.
---
 contrib/dblink/dblink.c   |  30 +++--
 contrib/postgres_fdw/connection.c | 105 +++---
 .../postgres_fdw/expected/postgres_fdw.out|  15 +++
 contrib/postgres_fdw/sql/postgres_fdw.sql |   7 ++
 src/fe_utils/connect_utils.c  |  11 +-
 src/test/isolation/isolationtester.c  |  29 ++---
 6 files changed, 145 insertions(+), 52 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 19a362526d2..98dcca3e6fd 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -1346,22 +1346,32 @@ PG_FUNCTION_INFO_V1(dblink_cancel_query);
 Datum
 dblink_cancel_query(PG_FUNCTION_ARGS)
 {
-	int			res;
 	PGconn	   *conn;
-	PGcancel   *cancel;
-	char		errbuf[256];
+	PGcancelConn *cancelConn;
+	char	   *msg;
 
 	dblink_init();
 	conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0)));
-	cancel = PQgetCancel(conn);
+	cancelConn = PQcancelCreate(conn);
 
-	res = PQcancel(cancel, errbuf, 256);
-	PQfreeCancel(cancel);
+	PG_TRY();
+	{
+		if (!PQcancelBlocking(cancelConn))
+		{
+			msg = pchomp(PQcancelErrorMessage(cancelConn));
+		}
+		else
+		{
+			msg = "OK";
+		}
+	}
+	PG_FINALLY();
+	{
+		PQcancelFinish(cancelConn);
+	}
+	PG_END_TRY();
 
-	if (res == 1)
-		PG_RETURN_TEXT_P(cstring_to_text("OK"));
-	else
-		PG_RETURN_TEXT_P(cstring_to_text(errbuf));
+	PG_RETURN_TEXT_P(cstring_to_text(msg));
 }
 
 
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 4931ebf5915..dcc13dc3b24 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -133,7 +133,7 @@ static void pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue);
 static void pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry);
 static void pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel);
 static bool pgfdw_cancel_query(PGconn *conn);
-static bool pgfdw_cancel_query_begin(PGconn *conn);
+static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime);
 static bool pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime,
    bool consume_input);
 static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
@@ -1315,36 +1315,104 @@ pgfdw_cancel_query(PGconn *conn)
 	endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
 		  CONNECTION_CLEANUP_TIMEOUT);
 
-	if (!pgfdw_cancel_query_begin(conn))
+	if (!pgfdw_cancel_query_begin(conn, endtime))
 		return false;
 	return pgfdw_cancel_query_end(conn, endtime, false);
 }
 
 static bool
-pgfdw_cancel_query_begin(PGconn *conn)
+pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime)
 {
-	PGcancel   *cancel;
-	char		errbuf[256];
+	bool		timed_out = false;
+	bool		failed = false;
+	PGcancelConn *cancel_conn = PQcancelCreate(conn);
 
-	/*
-	 * Issue cancel request.  Unfortunately, there's no good way to limit the
-	 * amount of time that we might block inside PQgetCancel().
-	 */
-	if ((cancel = PQgetCancel(conn)))
+
+	if (!PQcancelStart(cancel_conn))
 	{
-		if (!PQcancel(cancel, errbuf, sizeof(errbuf)))
+		PG_TRY();
 		{
 			ereport(WARNING,
 	(errcode(ERRCODE_CONNECTION_FAILURE),
 	 errmsg("could not send cancel request: %s",
-			errbuf)));
-			PQfreeCancel(cancel);
-			return false;
+			pchomp(PQcancelErrorMessage(cancel_conn);
 		}
-		PQfreeCancel(cancel);
+		PG_FINALLY();
+		{
+			PQcancelFinish(cancel_conn);
+		}
+		PG_END_TRY();
+		return false;
 	}
 
-	return true;
+	/* In what follows, do not leak any PGcancelConn on an error. */
+	PG_TRY();
+	{
+		while (true)
+		{
+			TimestampTz now = GetCurrentTimestamp();
+			long		cur_timeout;
+			PostgresPollingStatusType pollres = PQcancelPoll(cancel_conn);
+			int			waitEvents = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH;
+
+			if (pollres == PGRES_POLLING_OK)
+			{
+break;
+			}
+
+			/* If timeout has expired, give up, else get sleep time. */
+			cur_timeout = TimestampDifferenceMilliseconds(now, endtime);
+			if (cur_timeout <= 0)
+			{
+timed_out = true;
+failed = true;
+goto exit;
+			}
+
+			switch (pollres)
+			{
+case PGRES_POLLING_READING:
+	waitEvents |= WL_SOCKET_READABLE;
+	break;
+case PGRES_POL

Re: confusing `case when` column name

2024-03-12 Thread Tom Lane

"David G. Johnston"  writes:
> On Tuesday, March 12, 2024, adjk...@126.com  wrote:
>> Nee we change the title of the case-when output column?

> Choosing either a or b as the label seems wrong and probably worth changing
> to something that has no meaning and encourages the application of a column
> alias.

Yeah, we won't get any kudos for changing a rule that's stood for
25 years, even if it's not very good.  This is one of the places
where it's just hard to make a great choice automatically, and
users are probably going to end up applying an AS clause most of
the time if they care about the column name at all.

regards, tom lane

Re: On disable_cost

2024-03-12 Thread Robert Haas

On Thu, Aug 3, 2023 at 5:22 AM Jian Guo  wrote:
> I have write an initial patch to retire the disable_cost GUC, which labeled a 
> flag on the Path struct instead of adding up a big cost which is hard to 
> estimate. Though it involved in tons of plan changes in regression tests, I 
> have tested on some simple test cases such as eagerly generate a two-stage 
> agg plans and it worked. Could someone help to review?

I took a look at this patch today. I believe that overall this may
well be an approach worth pursuing. However, more work is going to be
needed. Here are some comments:

1. You stated that it changes lots of plans in the regression tests,
but you haven't provided any sort of analysis of why those plans
changed. I'm kind of surprised that there would be "tons" of plan
changes. You (or someone) should look into why that's happening.

2. The change to compare_path_costs_fuzzily() seems incorrect to me.
When path1->is_disabled && path2->is_disabled, costs should be
compared just as they are when neither path is disabled. Instead, the
patch treats any two such paths as having equal cost. That seems
catastrophically bad. Maybe it accounts for some of those plan
changes, although that would only be true if those plans were created
while using some disabling GUC.

3. Instead of adding is_disabled at the end of the Path structure, I
suggest adding it between param_info and parallel_aware. I think if
you do that, the space used by the new field will use up padding bytes
that are currently included in the struct, instead of making it
longer.

4. A critical issue for any patch of this type is performance. This
concern was raised earlier on this thread, but your path doesn't
address it. There's no performance analysis or benchmarking included
in your email. One idea that I have is to write the cost-comparison
test like this:

if (unlikely(path1->is_disabled || path2->is_disabled))
{
if (!path1->is_disabled)
return COSTS_BETTER1;
if (!path2->is_disabled)
return COSTS_BETTER2;
/* if both disabled, fall through */
}

I'm not sure that would be enough to prevent the patch from adding
noticeably to the cost of path comparison, but maybe it would help.

5. The patch changes only compare_path_costs_fuzzily() but I wonder
whether compare_path_costs() and compare_fractional_path_costs() need
similar surgery. Whether they do or don't, there should likely be some
comments explaining the situation.

6. In fact, the patch changes no comments at all, anywhere. I'm not
sure how many comment changes a patch like this needs to make, but the
answer definitely isn't "none".

7. The patch doesn't actually remove disable_cost. I guess it should.

8. When you submit a patch, it's a good idea to also add it on
commitfest.postgresql.org. It doesn't look like that was done in this
case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: confusing `case when` column name

2024-03-12 Thread Daniel Gustafsson

> On 12 Mar 2024, at 15:19, Tom Lane  wrote:

> users are probably going to end up applying an AS clause most of
> the time if they care about the column name at all.

If anything, we could perhaps add a short note in the CASE documentation about
column naming, the way it's phrased now a new user could very easily assume it
would be "case".

--
Daniel Gustafsson

Re: Disable LLVM bitcode generation with pgxs.mk framework.

2024-03-12 Thread Daniel Gustafsson

> On 12 Mar 2024, at 14:38, Xing Guo  wrote:

> Would it be possible to add a new switch in the pgxs.mk framework to
> allow users to disable this feature?

Something like that doesn't seem unreasonable I think.

--
Daniel Gustafsson

Re: A failure in t/038_save_logical_slots_shutdown.pl

2024-03-12 Thread Bharath Rupireddy

On Tue, Mar 12, 2024 at 6:05 PM Amit Kapila  wrote:
>
> The patch looks mostly good to me. I have changed the comments and
> commit message in the attached. I am planning to push this tomorrow
> unless you or others have any comments on it.

LGTM.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: POC, WIP: OR-clause support for indexes

2024-03-12 Thread Alexander Korotkov

On Mon, Mar 11, 2024 at 2:43 PM Andrei Lepikhov
 wrote:
> On 11/3/2024 18:31, Alexander Korotkov wrote:
> >> I'm not convinced about this limit. The initial reason was to combine
> >> long lists of ORs into the array because such a transformation made at
> >> an early stage increases efficiency.
> >> I understand the necessity of this limit in the array decomposition
> >> routine but not in the creation one.
> >
> > The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because
> > N^2 algorithms could be applied to arrays.  Are you sure that's not
> > true for our case?
> When you operate an array, indeed. But when we transform ORs to an
> array, not. Just check all the places in the optimizer and even the
> executor where we would pass along the list of ORs. This is why I think
> we should use this optimization even more intensively for huge numbers
> of ORs in an attempt to speed up the overall query.

Ok.

> >>> 3) Better save the original order of clauses by putting hash entries and
> >>> untransformable clauses to the same list.  A lot of differences in
> >>> regression tests output have gone.
> >> I agree that reducing the number of changes in regression tests looks
> >> better. But to achieve this, you introduced a hack that increases the
> >> complexity of the code. Is it worth it? Maybe it would be better to make
> >> one-time changes in tests instead of getting this burden on board. Or
> >> have you meant something more introducing the node type?
> >
> > For me the reason is not just a regression test.  The current code
> > keeps the original order of quals as much as possible.  The OR
> > transformation code reorders quals even in cases when it doesn't
> > eventually apply any optimization.  I don't think that's acceptable.
> > However, less hackery ways for this is welcome for sure.
> Why is it unacceptable? Can the user implement some order-dependent
> logic with clauses, and will it be correct?
> Otherwise, it is a matter of taste, and generally, this decision is up
> to you.

I think this is an important property that the user sees the quals in
the plan in the same order as they were in the query.  And if some
transformations are applied, then the order is saved as much as
possible.  I don't think we should sacrifice this property without
strong reasons.  A bit of code complexity is definitely not that
reason for me.

> >>> We don't make array values unique.  That might make query execution
> >>> performance somewhat worse, and also makes selectivity estimation
> >>> worse.  I suggest Andrei and/or Alena should implement making array
> >>> values unique.
> >> The fix Alena has made looks correct. But I urge you to think twice:
> >> The optimizer doesn't care about duplicates, so why do we do it?
> >> What's more, this optimization is intended to speed up queries with long
> >> OR lists. Using the list_append_unique() comparator on such lists could
> >> impact performance. I suggest sticking to the common rule and leaving
> >> the responsibility on the user's shoulders.
> >
> > I don't see why the optimizer doesn't care about duplicates for OR
> > lists.  As I showed before in an example, it successfully removes the
> > duplicate.  So, currently OR transformation clearly introduces a
> > regression in terms of selectivity estimation.  I think we should
> > evade that.
> I think you are right. It is probably a better place than any other to
> remove duplicates in an array. I just think we should sort and remove
> duplicates from entry->consts in one pass. Thus, this optimisation
> should be applied to sortable constants.

Ok.

> >> At least, we should do this optimization later, in one pass, with
> >> sorting elements before building the array. But what if we don't have a
> >> sort operator for the type?
> >
> > It was probably discussed before, but can we do our work later?  There
> > is a canonicalize_qual() which calls find_duplicate_ors().  This is
> > the place where currently duplicate OR clauses are removed.  Could our
> > OR-to-ANY transformation be just another call from
> > canonicalize_qual()?
> Hmm, we already tried to do it at that point. I vaguely recall some
> issues caused by this approach. Anyway, it should be done as quickly as
> possible to increase the effect of the optimization.

I think there were provided quite strong reasons why this shouldn't be
implemented at the parse analysis stage [1], [2], [3].  The
canonicalize_qual() looks quite appropriate place for that since it
does similar transformations.

Links.
1. 
https://www.postgresql.org/message-id/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAH2-Wzm2%3Dnf_JhiM3A2yetxRs8Nd2NuN3JqH%3Dfm_YWYd1oYoPg%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CA%2BTgmoaOiwMXBBTYknczepoZzKTp-Zgk5ss1%2BCuVQE-eFTqBmA%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Bharath Rupireddy

On Mon, Mar 11, 2024 at 11:26 AM Amit Kapila  wrote:
>
> > Hm. I get the concern. Are you okay with having inavlidation_reason
> > separately for both logical and physical slots? In such a case,
> > logical slots that got invalidated on the standby will have duplicate
> > info in conflict_reason and invalidation_reason, is this fine?
> >
>
> If we have duplicate information in two columns that could be
> confusing for users. BTW, isn't the recovery conflict occur only
> because of rows_removed and wal_level_insufficient reasons? The
> wal_removed or the new reasons you are proposing can't happen because
> of recovery conflict. Am, I missing something here?

My understanding aligns with yours that the rows_removed and
wal_level_insufficient invalidations can occur only upon recovery
conflict.

FWIW, a test named 'synchronized slot has been invalidated' in
040_standby_failover_slots_sync.pl inappropriately uses
conflict_reason = 'wal_removed' logical slot on standby. As per the
above understanding, it's inappropriate to use conflict_reason here
because wal_removed invalidation doesn't conflict with recovery.

> > Another idea is to make 'conflict_reason text' as a 'conflicting
> > boolean' again (revert 007693f2a3), and have 'invalidation_reason
> > text' for both logical and physical slots. So, whenever 'conflicting'
> > is true, one can look at invalidation_reason for the reason for
> > conflict. How does this sound?
> >
>
> So, does this mean that conflicting will only be true for some of the
> reasons (say wal_level_insufficient, rows_removed, wal_removed) and
> logical slots but not for others? I think that will also not eliminate
> the duplicate information as user could have deduced that from single
> column.

So, how about we turn conflict_reason to only report the reasons that
actually cause conflict with recovery for logical slots, something
like below, and then have invalidation_cause as a generic column for
all sorts of invalidation reasons for both logical and physical slots?

ReplicationSlotInvalidationCause cause = slot_contents.data.invalidated;

if (slot_contents.data.database == InvalidOid ||
cause == RS_INVAL_NONE ||
cause != RS_INVAL_HORIZON ||
cause != RS_INVAL_WAL_LEVEL)
{
nulls[i++] = true;
}
else
{
Assert(cause == RS_INVAL_HORIZON || cause == RS_INVAL_WAL_LEVEL);

values[i++] = CStringGetTextDatum(SlotInvalidationCauses[cause]);
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: UUID v7

2024-03-12 Thread Jelte Fennema-Nio

On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin  wrote:
> Sorry for this long and vague explanation, if it still seems too uncertain we 
> can have a chat or something like that. I don't think this number picking 
> stuff deserve to be commented, because it still is quite close to random. RFC 
> gives us too much freedom of choice.

I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Bertrand Drouvot

Hi,

On Tue, Mar 12, 2024 at 05:51:43PM +0530, Amit Kapila wrote:
> On Tue, Mar 12, 2024 at 1:24 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote:
> > > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila  
> > > wrote:
> > > >
> > > > You might want to consider its interaction with sync slots on standby.
> > > > Say, there is no activity on slots in terms of processing the changes
> > > > for slots. Now, we won't perform sync of such slots on standby showing
> > > > them inactive as per your new criteria where as same slots could still
> > > > be valid on primary as the walsender is still active. This may be more
> > > > of a theoretical point as in running system there will probably be
> > > > some activity but I think this needs some thougths.
> > >
> > > I believe the xmin and catalog_xmin of the sync slots on the standby
> > > keep advancing depending on the slots on the primary, no? If yes, the
> > > XID age based invalidation shouldn't be a problem.
> > >
> > > I believe there are no walsenders started for the sync slots on the
> > > standbys, right? If yes, the inactive timeout based invalidation also
> > > shouldn't be a problem. Because, the inactive timeouts for a slot are
> > > tracked only for walsenders because they are the ones that typically
> > > hold replication slots for longer durations and for real replication
> > > use. We did a similar thing in a recent commit [1].
> > >
> > > Is my understanding right? Do you still see any problems with it?
> >
> > Would that make sense to "simply" discard/prevent those kind of 
> > invalidations
> > for "synced" slot on standby? I mean, do they make sense given the fact that
> > those slots are not usable until the standby is promoted?
> >
> 
> AFAIR, we don't prevent similar invalidations due to
> 'max_slot_wal_keep_size' for sync slots,

Right, we'd invalidate them on the standby should the standby sync slot 
restart_lsn
exceeds the limit.

> so why to prevent it for
> these new parameters? This will unnecessarily create inconsistency in
> the invalidation behavior.

Yeah, but I think wal removal has a direct impact on the slot usuability which
is probably not the case with the new XID and Timeout ones. That's why I thought
about handling them differently (but I'm also fine if that's not the case).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: UUID v7

2024-03-12 Thread Jelte Fennema-Nio

On Tue, 12 Mar 2024 at 07:32, Michael Paquier  wrote:
> Sure, there is no problem in discussing a patch to implement a
> behavior.  But I disagree about taking a risk in merging something
> that could become non-compliant with the approved RFC, if the draft is
> approved at the end, of course.  This just strikes me as a bad idea.

I agree that we shouldn't release UUIDv7 support if the RFC describing
that is not yet approved. But I do think it would be a shame if e.g.
the RFC got approved 2 weeks after Postgres its feature freeze. Which
would then mean we'd have to wait another 1.5 years before actually
using uuidv7. Would it be a reasonable compromise to still merge the
patch for PG17 (assuming the code is good to merge with regards to the
current draft RFC), but revert the commit if the RFC is not approved
before some deadline before the release date (e.g. before the first
release candidate)?

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Bharath Rupireddy

On Tue, Mar 12, 2024 at 5:51 PM Amit Kapila  wrote:
>
> > Would that make sense to "simply" discard/prevent those kind of 
> > invalidations
> > for "synced" slot on standby? I mean, do they make sense given the fact that
> > those slots are not usable until the standby is promoted?
>
> AFAIR, we don't prevent similar invalidations due to
> 'max_slot_wal_keep_size' for sync slots, so why to prevent it for
> these new parameters? This will unnecessarily create inconsistency in
> the invalidation behavior.

Right. +1 to keep the behaviour consistent for all invalidations.
However, an assertion that inactive_timeout isn't set for synced slots
on the standby isn't a bad idea because we rely on the fact that
walsenders aren't started for synced slots. Again, I think it misses
the consistency in the invalidation behaviour.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Jelte Fennema-Nio

On Tue, 12 Mar 2024 at 15:04, Jelte Fennema-Nio  wrote:
> Attached a tiny change to 0001

One more tiny comment change, stating that pg_cancel is used by the
deprecated PQcancel function.


v36-0001-libpq-Add-encrypted-and-non-blocking-query-cance.patch
Description: Binary data


v36-0002-Start-using-new-libpq-cancel-APIs.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Bharath Rupireddy

On Tue, Mar 12, 2024 at 9:11 PM Bertrand Drouvot
 wrote:
>
> > AFAIR, we don't prevent similar invalidations due to
> > 'max_slot_wal_keep_size' for sync slots,
>
> Right, we'd invalidate them on the standby should the standby sync slot 
> restart_lsn
> exceeds the limit.

Right. Help me understand this a bit - is the wal_removed invalidation
going to conflict with recovery on the standby?

Per the discussion upthread, I'm trying to understand what
invalidation reasons will exactly cause conflict with recovery? Is it
just rows_removed and wal_level_insufficient invalidations? My
understanding on the conflict with recovery and invalidation reason
has been a bit off track. Perhaps, we need to clarify these two things
in the docs for the end users as well?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: CI speed improvements for FreeBSD

2024-03-12 Thread Maxim Orlov

Hi!

I looked at the changes and I liked them.  Here are my thoughts:

0001:
1. I think, this is a good idea to use RAM.  Since, it's still a UFS, and
we lose nothing in terms of testing, but win in speed significantly.
2. Change from "swapoff -a || true" to "swapoff -a" is legit in my view,
since it's better to explicitly fail than silent any possible problem.
3. Man says that lowercase suffixes should be used for the mdconfig. But in
fact, you can use either lowercase or an appercase. Yep, it's in
   the mdconfig.c: "else if (*p == 'g' || *p == 'G')".

0002:
1. The resource usage should be a bit higher, this is for sure.  But, if
I'm not missing something, not drastically. Anyway, I do not know
   how to measure this increase to get concrete values.
2. And think of a potential benefits of increasing the number of test jobs:
more concurrent processes, more interactions, better test coverage.

Here are my runs:
FreeBSD @master
https://cirrus-ci.com/task/4934701194936320
Run test_world 05:56

FreeBSD @master + 0001
https://cirrus-ci.com/task/5921385306914816
Run test_world 05:06

FreeBSD @master + 0001, + 0002
https://cirrus-ci.com/task/5635288945393664
Run test_world 02:20

For comparison
Debian @master
https://cirrus-ci.com/task/5143705577848832
Run test_world 02:38

In the overall, I consider this changes useful.  CI run faster, with better
test coverage in exchange for presumably slight increase
in resource usage, but I don't think this increase should be significant.

-- 
Best regards,
Maxim Orlov.

Re: type cache cleanup improvements

2024-03-12 Thread Teodor Sigaev


Got it. Here is an updated patch where I added a corresponding comment.

Thank you!

Playing around I found one more place which could easily modified with 
hash_seq_init_with_hash_value() call.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/diff --git a/src/backend/utils/cache/attoptcache.c b/src/backend/utils/cache/attoptcache.c
index af978ccd4b1..28980620662 100644
--- a/src/backend/utils/cache/attoptcache.c
+++ b/src/backend/utils/cache/attoptcache.c
@@ -44,12 +44,10 @@ typedef struct
 
 /*
  * InvalidateAttoptCacheCallback
- *		Flush all cache entries when pg_attribute is updated.
+ *		Flush cache entry (or entries) when pg_attribute is updated.
  *
  * When pg_attribute is updated, we must flush the cache entry at least
- * for that attribute.  Currently, we just flush them all.  Since attribute
- * options are not currently used in performance-critical paths (such as
- * query execution), this seems OK.
+ * for that attribute.
  */
 static void
 InvalidateAttoptCacheCallback(Datum arg, int cacheid, uint32 hashvalue)
@@ -57,7 +55,16 @@ InvalidateAttoptCacheCallback(Datum arg, int cacheid, uint32 hashvalue)
 	HASH_SEQ_STATUS status;
 	AttoptCacheEntry *attopt;
 
-	hash_seq_init(&status, AttoptCacheHash);
+	/*
+	 * By convection, zero hash value is passed to the callback as a sign
+	 * that it's time to invalidate the cache. See sinval.c, inval.c and
+	 * InvalidateSystemCachesExtended().
+	 */
+	if (hashvalue == 0)
+		hash_seq_init(&status, AttoptCacheHash);
+	else
+		hash_seq_init_with_hash_value(&status, AttoptCacheHash, hashvalue);
+
 	while ((attopt = (AttoptCacheEntry *) hash_seq_search(&status)) != NULL)
 	{
 		if (attopt->opts)
@@ -70,6 +77,17 @@ InvalidateAttoptCacheCallback(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/*
+ * Hash function compatible with two-arg system cache hash function.
+ */
+static uint32
+relatt_cache_syshash(const void *key, Size keysize)
+{
+	const AttoptCacheKey* ckey = key;
+
+	return GetSysCacheHashValue2(ATTNUM, ckey->attrelid, ckey->attnum);
+}
+
 /*
  * InitializeAttoptCache
  *		Initialize the attribute options cache.
@@ -82,9 +100,17 @@ InitializeAttoptCache(void)
 	/* Initialize the hash table. */
 	ctl.keysize = sizeof(AttoptCacheKey);
 	ctl.entrysize = sizeof(AttoptCacheEntry);
+
+	/*
+	 * AttoptCacheEntry takes hash value from the system cache. For
+	 * AttoptCacheHash we use the same hash in order to speedup search by hash
+	 * value. This is used by hash_seq_init_with_hash_value().
+	 */
+	ctl.hash = relatt_cache_syshash;
+
 	AttoptCacheHash =
 		hash_create("Attopt cache", 256, &ctl,
-	HASH_ELEM | HASH_BLOBS);
+	HASH_ELEM | HASH_FUNCTION);
 
 	/* Make sure we've initialized CacheMemoryContext. */
 	if (!CacheMemoryContext)

Re: Statistics Import and Export

2024-03-12 Thread Corey Huinker

>
> No, we should be keeping the lock until the end of the transaction
> (which in this case would be just the one statement, but it would be the
> whole statement and all of the calls in it).  See analyze.c:268 or
> so, where we call relation_close(onerel, NoLock); meaning we're closing
> the relation but we're *not* releasing the lock on it- it'll get
> released at the end of the transaction.
>
>
If that's the case, then changing the two table_close() statements to
NoLock should resolve any remaining concern.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Bharath Rupireddy

On Mon, Mar 11, 2024 at 3:44 PM Amit Kapila  wrote:
>
> Yes, your understanding is correct. I wanted us to consider having new
> parameters like 'inactive_replication_slot_timeout' to be at
> slot-level instead of GUC. I think this new parameter doesn't seem to
> be the similar as 'max_slot_wal_keep_size' which leads to truncation
> of WAL at global and then invalidates the appropriate slots. OTOH, the
> 'inactive_replication_slot_timeout' doesn't appear to have a similar
> global effect.

last_inactive_at is tracked for each slot using which slots get
invalidated based on inactive_replication_slot_timeout. It's like
max_slot_wal_keep_size invalidating slots based on restart_lsn. In a
way, both are similar, right?

> The other thing we should consider is what if the
> checkpoint happens at a timeout greater than
> 'inactive_replication_slot_timeout'?

In such a case, the slots get invalidated upon the next checkpoint as
the (current_checkpointer_timeout - last_inactive_at) will then be
greater than inactive_replication_slot_timeout.

> Shall, we consider doing it via
> some other background process or do we think checkpointer is the best
> we can have?

The same problem exists if we do it with some other background
process. I think the checkpointer is best because it already
invalidates slots for wal_removed cause, and flushes all replication
slots to disk. Moving this new invalidation functionality into some
other background process such as autovacuum will not only burden that
process' work but also mix up the unique functionality of that
background process.

Having said above, I'm open to ideas from others as I'm not so sure if
there's any issue with checkpointer invalidating the slots for new
reasons.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Alvaro Herrera

On 2024-Mar-12, Jelte Fennema-Nio wrote:

> On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera  wrote:
> > Here's a last one for the cfbot.
> 
> Thanks for committing the first 3 patches btw. Attached a tiny change
> to 0001, which adds "(backing struct for PGcancelConn)" to the comment
> on pg_cancel_conn.

Thanks, I included it.  I hope there were no other changes, because I
didn't verify :-) but if there were, please let me know to incorporate
them.

I made a number of other small changes, mostly to the documentation,
nothing fundamental.  (Someday we should stop using  to
document the libpq functions and use refentry's instead ... it'd be
useful to have manpages for these functions.)

One thing I don't like very much is release_conn_addrinfo(), which is
called conditionally in two places but unconditionally in other places.
Maybe it'd make more sense to put this conditionality inside the
function itself, possibly with a "bool force" flag to suppress that in
the cases where it is not desired.

In pqConnectDBComplete, we cast the PGconn * to PGcancelConn * in order
to call PQcancelPoll, which is a bit abusive, but I don't know how to do
better.  Maybe we just accept this ... but if PQcancelStart is the only
way to have pqConnectDBStart called from a cancel connection, maybe it'd
be saner to duplicate pqConnectDBStart for cancel conns.

Thanks!

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: UUID v7

2024-03-12 Thread Sergey Prokhorenko

Hi Jelte,
I am one of the contributors to this RFC.

Andrey's patch corresponds exactly to Fixed-Length Dedicated Counter Bits 
(Method 1).

Andrey and you simply did not read the RFC a little further down in the text:
__

The following sub-topics cover topics related solely with creating reliable 
fixed-length dedicated counters:
   
   - Fixed-Length Dedicated Counter Seeding:
  -
Implementations utilizing the fixed-length counter method randomly initialize 
the counter with each new timestamp tick. However, when the timestamp has not 
increased, the counter is instead incremented by the desired increment logic. 
When utilizing a randomly seeded counter alongside Method 1, the random value 
MAY be regenerated with each counter increment without impacting sortability. 
The downside is that Method 1 is prone to overflows if a counter of adequate 
length is not selected or the random data generated leaves little room for the 
required number of increments. Implementations utilizing fixed-length counter 
method MAY also choose to randomly initialize a portion of the counter rather 
than the entire counter. For example, a 24 bit counter could have the 23 bits 
in least-significant, right-most, position randomly initialized. The remaining 
most significant, left-most counter bit is initialized as zero for the sole 
purpose of guarding against counter rollovers.

  - 
   - Fixed-Length Dedicated Counter Length:
  - Select a counter bit-length that can properly handle the level of 
timestamp precision in use. For example, millisecond precision generally 
requires a larger counter than a timestamp with nanosecond precision. General 
guidance is that the counter SHOULD be at least 12 bits but no longer than 42 
bits. Care must be taken to ensure that the counter length selected leaves room 
for sufficient entropy in the random portion of the UUID after the counter. 
This entropy helps improve the unguessability characteristics of UUIDs created 
within the batch.
  - The following sub-topics cover rollover handling with either type of 
counter method:   

  - ...
  - 
   - Counter Rollover Handling:
  -
Counter rollovers MUST be handled by the application to avoid sorting issues. 
The general guidance is that applications that care about absolute monotonicity 
and sortability should freeze the counter and wait for the timestamp to advance 
which ensures monotonicity is not broken. Alternatively, implementations MAY 
increment the timestamp ahead of the actual time and reinitialize the counter.


Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Tuesday, 12 March 2024 at 06:36:13 pm GMT+3, Jelte Fennema-Nio 
 wrote:  
 
 On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin  wrote:
> Sorry for this long and vague explanation, if it still seems too uncertain we 
> can have a chat or something like that. I don't think this number picking 
> stuff deserve to be commented, because it still is quite close to random. RFC 
> gives us too much freedom of choice.

I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread Bharath Rupireddy

On Mon, Mar 11, 2024 at 4:09 PM Amit Kapila  wrote:
>
> I don't see how it will be easier for the user to choose the default
> value of 'max_slot_xid_age' compared to 'max_slot_wal_keep_size'. But,
> I agree similar to 'max_slot_wal_keep_size', 'max_slot_xid_age' can be
> another parameter to allow vacuum to proceed removing the rows which
> otherwise it wouldn't have been as those would be required by some
> slot. Now, if this understanding is correct, we should probably make
> this invalidation happen by (auto)vacuum after computing the age based
> on this new parameter.

Currently, the patch computes the XID age in the checkpointer using
the next XID (gets from ReadNextFullTransactionId()) and slot's xmin
and catalog_xmin. I think the checkpointer is best because it already
invalidates slots for wal_removed cause, and flushes all replication
slots to disk. Moving this new invalidation functionality into some
other background process such as autovacuum will not only burden that
process' work but also mix up the unique functionality of that
background process.

Having said above, I'm open to ideas from others as I'm not so sure if
there's any issue with checkpointer invalidating the slots for new
reasons.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

perl: unsafe empty pattern behavior

2024-03-12 Thread Jeff Davis

Moved from discussion on -committers:

https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.ca...@j-davis.com

Summary:

Do not use perl empty patterns like // or qr// or s//.../, the behavior
is too surprising for perl non-experts. There are a few such uses in
our tests; patch attached. Unfortunately, there is no obvious way to
automatically detect them so I am just relying on grep. I'm sure there
are others here who know more about perl than I do, so
suggestions/corrections are welcome.

Long version:

Some may know this already, but we just discovered the dangers of using
empty patterns in perl:

"If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead... If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match)."

https://perldoc.perl.org/perlop#The-empty-pattern-//

In other words, if you have code like:

   if ('xyz' =~ //)
   {
   print "'xyz' matches //\n";
   }

The match will succeed and print, because there's no previous pattern,
so // is a "genuine" empty pattern, which is treated like /.*/ (I
think?). Then, if you add some other code before it:

   if ('abc' =~ /abc/)
   {
   print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ //)
   {
   print "'xyz' matches //\n";
   }

The first match will succeed, but the second match will fail, because
// is treated like /abc/.

On reflection, that does seem very perl-like. But it can cause
surprising action-at-a-distance if not used carefully, especially for
those who aren't experts in perl. It's much safer to just not use the
empty pattern.

If you use qr// instead:

https://perldoc.perl.org/perlop#qr/STRING/msixpodualn

like:

   if ('abc' =~ qr/abc/)
   {
   print "'abc' matches /abc/\n";
   }

   if ('xyz' =~ qr//)
   {
   print "'xyz' matches //\n";
   }

Then the second match may succeed or may fail, and it's not clear from
the documentation what precise circumstances matter. It seems to fail
on older versions of perl (like 5.16.3) and succeed on newer versions
(5.38.2). However, it may also depend on when the qr// is [re]compiled,
or regex flags, or locale, or may just be undefined.

Regards,
Jeff Davis


From be5aa677e37180a8c1b0faebcceab5506b1c8130 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 11 Mar 2024 16:44:56 -0700
Subject: [PATCH v1] perl: avoid empty regex patterns

Empty patterns have special behavior that uses the last successful
pattern match. This behavior can be surprising, so remove empty
patterns and instead match against exactly what is intended
(e.g. /^$/ or /.*/).

Unfortunately there's not an easy way to check for this in an
automated way, so it's likely that some cases have been missed and
will be missed in the future. This commit just cleans up known
instances.

Discussion: https://postgr.es/m/1548559.1710188...@sss.pgh.pa.us
---
 src/bin/pg_upgrade/t/003_logical_slots.pl   | 4 ++--
 src/bin/pg_upgrade/t/004_subscription.pl| 2 +-
 src/bin/psql/t/001_basic.pl | 8 
 src/bin/psql/t/010_tab_completion.pl| 2 +-
 src/test/recovery/t/037_invalid_database.pl | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 83d71c3084..256dfd53b1 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -78,7 +78,7 @@ command_checks_all(
 	[
 		qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/
 	],
-	[qr//],
+	[qr/^$/],
 	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
 );
 ok( -d $newpub->data_dir . "/pg_upgrade_output.d",
@@ -118,7 +118,7 @@ command_checks_all(
 	[
 		qr/Your installation contains logical replication slots that can't be upgraded./
 	],
-	[qr//],
+	[qr/^$/],
 	'run of pg_upgrade of old cluster with slots having unconsumed WAL records'
 );
 
diff --git a/src/bin/pg_upgrade/t/004_subscription.pl b/src/bin/pg_upgrade/t/004_subscription.pl
index df5d6dffbc..c8ee2390d1 100644
--- a/src/bin/pg_upgrade/t/004_subscription.pl
+++ b/src/bin/pg_upgrade/t/004_subscription.pl
@@ -68,7 +68,7 @@ command_checks_all(
 	[
 		qr/max_replication_slots \(0\) must be greater than or equal to the number of subscriptions \(1\) on the old cluster/
 	],
-	[qr//],
+	[qr/^$/],
 	'run of pg_upgrade where the new cluster has insufficient max_replication_slots'
 );
 
diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl
index 9f0b6cf8ca..ce875ce316 100644
--- a/src/bin/psql/t/001_basic.pl
+++ b/src/bin/psql/t/001_basic.pl
@@ -412,23 +412,23 @@ my $perlbin = $^X;
 $perlbin =~ s!\\!/!g if $PostgreSQL::Test::Utils::windows_os;
 my $pipe_cmd = "$perlbin -pe '' >$g_file";
 
-psql_like($node, "SELECT 'one' \\g | $pipe_cmd", qr//, "one command \\g");
+psql_like($node, "SELECT 'one' \\g

Re: UUID v7

2024-03-12 Thread Jelte Fennema-Nio

On Tue, 12 Mar 2024 at 18:18, Sergey Prokhorenko
 wrote:
> Andrey and you simply did not read the RFC a little further down in the text:

You're totally right, sorry about that. Maybe it would be good to move
those subsections around a bit in the RFC though, so that anything
related to only one method is included in the section for that method.

Re: On disable_cost

2024-03-12 Thread Tom Lane

Robert Haas  writes:
> On Thu, Aug 3, 2023 at 5:22 AM Jian Guo  wrote:
>> I have write an initial patch to retire the disable_cost GUC, which labeled 
>> a flag on the Path struct instead of adding up a big cost which is hard to 
>> estimate. Though it involved in tons of plan changes in regression tests, I 
>> have tested on some simple test cases such as eagerly generate a two-stage 
>> agg plans and it worked. Could someone help to review?

> I took a look at this patch today. I believe that overall this may
> well be an approach worth pursuing. However, more work is going to be
> needed. Here are some comments:

> 1. You stated that it changes lots of plans in the regression tests,
> but you haven't provided any sort of analysis of why those plans
> changed. I'm kind of surprised that there would be "tons" of plan
> changes. You (or someone) should look into why that's happening.

I've not read the patch, but given this description I would expect
there to be *zero* regression changes --- I don't think we have any
test cases that depend on disable_cost being finite.  If there's more
than zero changes, that very likely indicates a bug in the patch.
Even if there are places where the output legitimately changes, you
need to justify each one and make sure that you didn't invalidate the
intent of that test case.

BTW, having written that paragraph, I wonder if we couldn't get
the same end result with a nearly one-line change that consists of
making disable_cost be IEEE infinity.  Years ago we didn't want
to rely on IEEE float semantics in this area, but nowadays I don't
see why we shouldn't.

regards, tom lane

Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Alvaro Herrera

On 2024-Mar-12, Jeff Davis wrote:

> Do not use perl empty patterns like // or qr// or s//.../, the behavior
> is too surprising for perl non-experts.

Yeah, nasty.

> There are a few such uses in
> our tests; patch attached. Unfortunately, there is no obvious way to
> automatically detect them so I am just relying on grep. I'm sure there
> are others here who know more about perl than I do, so
> suggestions/corrections are welcome.

I suggest that pg_dump/t/002_pg_dump.pl could use a verification that
the ->{regexp} thing is not empty.  I also tried grepping (for things
like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
have ... but I only looked for the "qr" literal, not other ways to get
regexes.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Alvaro Herrera

Hmm, buildfarm member kestrel (which uses
-fsanitize=undefined,alignment) failed:

# Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc
dbname='postgres'
test cancellations... 
libpq_pipeline:260: query did not fail when it was expected

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-03-12%2016%3A41%3A27

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The saddest aspect of life right now is that science gathers knowledge faster
 than society gathers wisdom."  (Isaac Asimov)

Re: On disable_cost

2024-03-12 Thread Robert Haas

On Tue, Mar 12, 2024 at 1:32 PM Tom Lane  wrote:
> BTW, having written that paragraph, I wonder if we couldn't get
> the same end result with a nearly one-line change that consists of
> making disable_cost be IEEE infinity.  Years ago we didn't want
> to rely on IEEE float semantics in this area, but nowadays I don't
> see why we shouldn't.

I don't think so, because I think that what will happen in that case
is that we'll pick a completely random plan if we can't pick a plan
that avoids incurring disable_cost. Every plan that contains one
disabled node anywhere in the plan tree will look like it has exactly
the same cost as any other such plan.

IMHO, this is actually one of the problems with disable_cost as it
works today. I think the semantics that we want are: if it's possible
to pick a plan where nothing is disabled, then pick the cheapest such
plan; if not, pick the cheapest plan overall. But treating
disable_cost doesn't really do that. It does the first part -- picking
the cheapest plan where nothing is disabled -- but it doesn't do the
second part, because once you add disable_cost into the cost of some
particular plan node, it screws up the rest of the planning, because
the cost estimates for the disabled nodes have no bearing in reality.
Fast-start plans, for example, will look insanely good compared to
what would be the case in normal planning (and we lean too much toward
fast-start plans even normally).

(I don't think we should care how MANY disabled nodes appear in a
plan, particularly. This is a more arguable point. Is a plan with 1
disabled node and 10% more cost better or worse than a plan with 2
disabled nodes and 10% less cost? I'd argue that counting the number
of disabled nodes isn't particularly meaningful.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Alvaro Herrera

On 2024-Mar-12, Alvaro Herrera wrote:

> Hmm, buildfarm member kestrel (which uses
> -fsanitize=undefined,alignment) failed:
> 
> # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc
> dbname='postgres'
> test cancellations... 
> libpq_pipeline:260: query did not fail when it was expected

Hm, I tried using the same compile flags, couldn't reproduce.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)

Support json_errdetail in FRONTEND builds

2024-03-12 Thread Jacob Champion

Hello,

Both the incremental JSON [1] and OAuth [2] patchsets would be
improved by json_errdetail(), which was removed from FRONTEND builds
in b44669b2ca:

>The routine providing a detailed error message based on the error code
>is made backend-only, the existing code being unsafe to use in the
>frontend as the error message may finish by being palloc'd or point to a
>static string, so there is no way to know if the memory of the message
>should be pfree'd or not.

Attached is a patch to undo this, by attaching any necessary
allocations to the JsonLexContext so the caller doesn't have to keep
track of them.

This is based on the first patch of the OAuth patchset, which
additionally needs json_errdetail() to be safe to use from libpq
itself. Alvaro pointed out offlist that we don't have to go that far
to re-enable this function for the utilities, so this patch is a sort
of middle ground between what we have now and what OAuth implements.
(There is some additional minimization that could be done to this
patch, but I'm hoping to keep the code structure consistent between
the two, if the result is acceptable.)

Two notes that I wanted to point out explicitly:
- On the other thread, Daniel contributed a destroyStringInfo()
counterpart for makeStringInfo(), which is optional but seemed useful
to include here.
- After this patch, if a frontend client calls json_errdetail()
without calling freeJsonLexContext(), it will leak memory.

Thanks,
--Jacob

[1] 
https://www.postgresql.org/message-id/682c8fff-355c-a04f-57ac-81055c4ccda8%40dunslane.net
[2] 
https://www.postgresql.org/message-id/CAOYmi%2BkKNZCL7uz-LHyBggM%2BfEcf4285pFWwm7spkUb8irY7mQ%40mail.gmail.com


0001-common-jsonapi-support-json_errdetail-in-FRONTEND.patch
Description: Binary data

Broken EXPLAIN output for SubPlan in MERGE

2024-03-12 Thread Dean Rasheed

While playing around with EXPLAIN and SubPlans, I noticed that there's
a bug in how this is handled for MERGE. For example:

drop table if exists src, tgt, ref;
create table src (a int, b text);
create table tgt (a int, b text);
create table ref (a int);

explain (verbose, costs off)
merge into tgt t
  using (select (select r.a from ref r where r.a = s.a) a, b from src s) s
  on t.a = s.a
  when not matched then insert values (s.a, s.b);

QUERY PLAN
---
 Merge on public.tgt t
   ->  Merge Left Join
 Output: t.ctid, s.a, s.b, s.ctid
 Merge Cond: (((SubPlan 1)) = t.a)
 ->  Sort
   Output: s.a, s.b, s.ctid, ((SubPlan 1))
   Sort Key: ((SubPlan 1))
   ->  Seq Scan on public.src s
 Output: s.a, s.b, s.ctid, (SubPlan 1)
 SubPlan 1
   ->  Seq Scan on public.ref r
 Output: r.a
 Filter: (r.a = s.a)
 ->  Sort
   Output: t.ctid, t.a
   Sort Key: t.a
   ->  Seq Scan on public.tgt t
 Output: t.ctid, t.a
   SubPlan 2
 ->  Seq Scan on public.ref r_1
   Output: r_1.a
   Filter: (r_1.a = t.ctid)

The final filter condition "(r_1.a = t.ctid)" is incorrect, and should
be "(r_1.a = s.a)".

What's happening is that the right hand side of that filter expression
is an input Param node which get_parameter() tries to display by
calling find_param_referent() and then drilling down through the
ancestor node (the ModifyTable node) to try to find the real name of
the variable (s.a).

However, that isn't working properly for MERGE because the inner_plan
and inner_tlist of the corresponding deparse_namespace aren't set
correctly. Actually the inner_tlist is correct, but the inner_plan is
set to the ModifyTable node, whereas it needs to be the outer child
node -- in a MERGE, any references to the source relation will be
INNER_VAR references to the targetlist of the join node immediately
under the ModifyTable node.

So I think we want to do something like the attached.

Regards,
Dean
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
new file mode 100644
index 2a1ee69..2231752
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4988,8 +4988,11 @@ set_deparse_plan(deparse_namespace *dpns
 	 * For a WorkTableScan, locate the parent RecursiveUnion plan node and use
 	 * that as INNER referent.
 	 *
-	 * For MERGE, make the inner tlist point to the merge source tlist, which
-	 * is same as the targetlist that the ModifyTable's source plan provides.
+	 * For MERGE, pretend the ModifyTable's source plan (its outer plan) is
+	 * INNER referent.  This is the join from the target relation to the data
+	 * source, and all INNER_VAR Vars in other parts of the query refer to its
+	 * targetlist.
+	 *
 	 * For ON CONFLICT .. UPDATE we just need the inner tlist to point to the
 	 * excluded expression's tlist. (Similar to the SubqueryScan we don't want
 	 * to reuse OUTER, it's used for RETURNING in some modify table cases,
@@ -5004,17 +5007,17 @@ set_deparse_plan(deparse_namespace *dpns
 		dpns->inner_plan = find_recursive_union(dpns,
 (WorkTableScan *) plan);
 	else if (IsA(plan, ModifyTable))
-		dpns->inner_plan = plan;
-	else
-		dpns->inner_plan = innerPlan(plan);
-
-	if (IsA(plan, ModifyTable))
 	{
 		if (((ModifyTable *) plan)->operation == CMD_MERGE)
-			dpns->inner_tlist = dpns->outer_tlist;
+			dpns->inner_plan = outerPlan(plan);
 		else
-			dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist;
+			dpns->inner_plan = plan;
 	}
+	else
+		dpns->inner_plan = innerPlan(plan);
+
+	if (IsA(plan, ModifyTable) && ((ModifyTable *) plan)->operation == CMD_INSERT)
+		dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist;
 	else if (dpns->inner_plan)
 		dpns->inner_tlist = dpns->inner_plan->targetlist;
 	else
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
new file mode 100644
index e3ebf46..1a6f6ad
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -1473,6 +1473,56 @@ WHEN MATCHED AND t.a < 10 THEN
 
 DROP TABLE ex_msource, ex_mtarget;
 DROP FUNCTION explain_merge(text);
+-- EXPLAIN SubPlans and InitPlans
+CREATE TABLE src (a int, b int, c int, d int);
+CREATE TABLE tgt (a int, b int, c int, d int);
+CREATE TABLE ref (ab int, cd int);
+EXPLAIN (verbose, costs off)
+MERGE INTO tgt t
+USING (SELECT *, (SELECT count(*) FROM ref r
+   WHERE r.ab = s.a + s.b
+ AND r.cd = s.c - s.d) cnt
+ FROM src s) s
+ON t.a = s.a AND t.b < s.cnt
+WHEN MATCHED AND t.c > s.cnt THEN
+  UPDATE SET (b, c) = (SELECT s.b, s.cnt);
+ QUERY PLAN  
+-

typo in paths.h

2024-03-12 Thread Cary Huang

Hello



I noticed that the comment for declaring create_tidscan_paths() in 
src/include/optimizer/paths.h has a typo. The function is implemented in 
tidpath.c, not tidpath.h as stated, which does not exist.



Made a small patch to correct it.



Thank you





Cary Huang

-

HighGo Software Inc. (Canada)

mailto:cary.hu...@highgo.ca

http://www.highgo.ca

correct_source_filename.patch
Description: Binary data

Re: On disable_cost

2024-03-12 Thread Tom Lane

Robert Haas  writes:
> On Tue, Mar 12, 2024 at 1:32 PM Tom Lane  wrote:
>> BTW, having written that paragraph, I wonder if we couldn't get
>> the same end result with a nearly one-line change that consists of
>> making disable_cost be IEEE infinity.

> I don't think so, because I think that what will happen in that case
> is that we'll pick a completely random plan if we can't pick a plan
> that avoids incurring disable_cost. Every plan that contains one
> disabled node anywhere in the plan tree will look like it has exactly
> the same cost as any other such plan.

Good point.

> IMHO, this is actually one of the problems with disable_cost as it
> works today.

Yeah.  I keep thinking that the right solution is to not generate
disabled paths in the first place if there are any other ways to
produce the same relation.  That has obvious order-of-operations
problems though, and I've not been able to make it work.

regards, tom lane

Re: On disable_cost

2024-03-12 Thread Robert Haas

On Tue, Mar 12, 2024 at 3:36 PM Tom Lane  wrote:
> Yeah.  I keep thinking that the right solution is to not generate
> disabled paths in the first place if there are any other ways to
> produce the same relation.  That has obvious order-of-operations
> problems though, and I've not been able to make it work.

I've expressed the same view in the past. It would be nice not to
waste planner effort on paths that we're just going to throw away, but
I'm not entirely sure what you mean by "obvious order-of-operations
problems."

To me, it seems like what we'd need is to be able to restart the whole
planner process if we run out of steam before we get done. For
example, suppose we're planning a 2-way join where index and
index-only scans are disabled, sorts are disabled, and nested loops
and hash joins are disabled. There's no problem generating just the
non-disabled scan types at the baserel level, but when we reach the
join, we're going to find that the only non-disabled join type is a
merge join, and we're also going to find that we have no paths that
provide pre-sorted input, so we need to sort, which we're also not
allowed to do. If we could give up at that point and restart planning,
disabling all of the plan-choice constraints and now creating all
paths for each RelOptInfo, then everything would, I believe, be just
fine. We'd end up needing neither disable_cost nor the mechanism
proposed by this patch.

But in the absence of that, we need some way to privilege the
non-disabled paths over the disabled ones -- and I'd prefer to have
something more principled than disable_cost, if we can work out the
details.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: typo in paths.h

2024-03-12 Thread David Rowley

On Wed, 13 Mar 2024 at 07:58, Cary Huang  wrote:
> I noticed that the comment for declaring create_tidscan_paths() in 
> src/include/optimizer/paths.h has a typo. The function is implemented in 
> tidpath.c, not tidpath.h as stated, which does not exist.

Thank you.  Pushed.

David

Re: remaining sql/json patches

2024-03-12 Thread Alvaro Herrera

About 0002:

I think we should just drop it.  Look at the changes it produces in the
plans for aliases XMLTABLE:

> @@ -1556,7 +1556,7 @@ SELECT f.* FROM xmldata, LATERAL 
> xmltable('/ROWS/ROW[COUNTRY_NAME="Japan" or COU
> Output: f."COUNTRY_NAME", f."REGION_ID"
> ->  Seq Scan on public.xmldata
>   Output: xmldata.data
> -   ->  Table Function Scan on "xmltable" f
> +   ->  Table Function Scan on "XMLTABLE" f
>   Output: f."COUNTRY_NAME", f."REGION_ID"
>   Table Function Call: XMLTABLE(('/ROWS/ROW[COUNTRY_NAME="Japan" or 
> COUNTRY_NAME="India"]'::text) PASSING (xmldata.data) COLUMNS "COUNTRY_NAME" 
> text, "REGION_ID" integer)
>   Filter: (f."COUNTRY_NAME" = 'Japan'::text)

Here in text-format EXPLAIN, we already have the alias next to the
"xmltable" moniker, when an alias is present.  This matches the
query itself as well as the labels used in the "Output:" display.
If an alias is not present, then this says just 'Table Function Scan on 
"xmltable"'
and the rest of the plans refers to this as "xmltable", so it's also
fine.

> @@ -1591,7 +1591,7 @@ SELECT f.* FROM xmldata, LATERAL 
> xmltable('/ROWS/ROW[COUNTRY_NAME="Japan" or COU
> "Parent Relationship": "Inner",   
>   
>  +
> "Parallel Aware": false,  
>   
>  +
> "Async Capable": false,   
>   
>  +
> -   "Table Function Name": "xmltable",
>   
>  +
> +   "Table Function Name": "XMLTABLE",
>   
>  +
> "Alias": "f", 
>   
>  +
> "Output": ["f.\"COUNTRY_NAME\"", "f.\"REGION_ID\""],  
>   
>  +
> "Table Function Call": 
> "XMLTABLE(('/ROWS/ROW[COUNTRY_NAME=\"Japan\" or 
> COUNTRY_NAME=\"India\"]'::text) PASSING (xmldata.data) COLUMNS 
> \"COUNTRY_NAME\" text, \"REGION_ID\" integer)",+

This is the JSON-format explain.  Notice that the "Alias" member already
shows the alias "f", so the only thing this change is doing is
uppercasing the "xmltable" to "XMLTABLE".  We're not really achieving
anything here.

I think the only salvageable piece from this, **if anything**, is making
the "xmltable" literal string into uppercase.  That might bring a little
clarity to the fact that this is a keyword and not a user-introduced
name.


In your 0003 I think this would only have relevance in this query,

+-- JSON_TABLE() with alias
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT * FROM
+   JSON_TABLE(
+   jsonb 'null', 'lax $[*]' PASSING 1 + 2 AS a, json '"foo"' AS "b c"
+   COLUMNS (
+   id FOR ORDINALITY,
+   "int" int PATH '$',
+   "text" text PATH '$'
+   )) json_table_func;
+   
QUERY PLAN 
  
+--
--
+ Table Function Scan on "JSON_TABLE" json_table_func
+   Output: id, "int", text
+   Table Function Call: JSON_TABLE('null'::jsonb, '$[*]' AS json_table_path_0 
PASSING 3 AS a, '"foo"'::jsonb AS "b c" COLUMNS (id FOR ORDINALITY, "int" 
integer PATH '$', text text PATH '$') PLAN (json_table_path_0))
+(3 rows)

and I'm curious to see what this would output if this was to be run
without the 0002 patch.  If I understand things correctly, the alias
would be displayed anyway, meaning 0002 doesn't get us anything.

Please do add a test with EXPLAIN (FORMAT JSON) in 0003.

Thanks

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La vida es para el que se aventura"

Re: On disable_cost

2024-03-12 Thread David Rowley

On Wed, 13 Mar 2024 at 08:55, Robert Haas  wrote:
> But in the absence of that, we need some way to privilege the
> non-disabled paths over the disabled ones -- and I'd prefer to have
> something more principled than disable_cost, if we can work out the
> details.

The primary place I see issues with disabled_cost is caused by
STD_FUZZ_FACTOR.  When you add 1.0e10 to a couple of modestly costly
paths, it makes them appear fuzzily the same in cases where one could
be significantly cheaper than the other. If we were to bump up the
disable_cost it would make this problem worse.

I think we do still need some way to pick the cheapest disabled path
when there are no other options.

One way would be to set fuzz_factor to 1.0 when either of the paths
costs in compare_path_costs_fuzzily() is >= disable_cost.
clamp_row_est() does cap row estimates at MAXIMUM_ROWCOUNT (1e100), so
I think there is some value of disable_cost that could almost
certainly ensure we don't reach it because the path is truly expensive
rather than disabled.

So maybe the fix could be to set disable_cost to something like
1.0e110 and adjust compare_path_costs_fuzzily to not apply the
fuzz_factor for paths >= disable_cost.   However, I wonder if that
risks the costs going infinite after a couple of cartesian joins.

David

Re: un-revert the MAINTAIN privilege and the pg_maintain predefined role

2024-03-12 Thread Nathan Bossart

On Thu, Mar 07, 2024 at 10:50:00AM -0600, Nathan Bossart wrote:
> Given all of this code was previously reviewed and committed, I am planning
> to forge ahead and commit this early next week, provided no objections or
> additional feedback materialize.

Jeff Davis and I spent some additional time looking at this patch.  There
are existing inconsistencies among the privilege checks for the various
maintenance commands, and the MAINTAIN privilege just builds on the status
quo, with one exception.  In the v1 patch, I proposed skipping privilege
checks when VACUUM recurses to TOAST tables, which means that a user may be
able to process a TOAST table for which they've concurrent lost privileges
on the main relation (since each table is vacuumed in a separate
transaction).  It's easy enough to resolve this inconsistency by sending
down the parent OID when recursing to a TOAST table and using that for the
privilege checks.  AFAICT this avoids any kind of cache lookup hazards
because we hold a session lock on the main relation in this case.  I've
done this in the attached v2.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 1d58f64b1dcded53bd95c926c5476e129352c753 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 12 Mar 2024 10:55:33 -0500
Subject: [PATCH v2 1/1] Reintroduce MAINTAIN privilege and pg_maintain
 predefined role.

Roles with MAINTAIN on a relation may run VACUUM, ANALYZE, REINDEX,
REFRESH MATERIALIZE VIEW, CLUSTER, and LOCK TABLE on the relation.
Roles with privileges of pg_maintain may run those same commands on
all relations.

This was previously committed for v16, but it was reverted in
commit 151c22deee due to concerns about search_path tricks that
could be used to escalate privileges to the table owner.  Commits
2af07e2f74, 59825d1639, and c7ea3f4229 resolved these concerns by
restricting search_path when running maintenance commands.

XXX: NEEDS CATVERSION BUMP

Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20240305161235.GA3478007%40nathanxps13
---
 doc/src/sgml/ddl.sgml |  35 --
 doc/src/sgml/func.sgml|   2 +-
 .../sgml/ref/alter_default_privileges.sgml|   4 +-
 doc/src/sgml/ref/analyze.sgml |   6 +-
 doc/src/sgml/ref/cluster.sgml |  10 +-
 doc/src/sgml/ref/grant.sgml   |   3 +-
 doc/src/sgml/ref/lock.sgml|   4 +-
 .../sgml/ref/refresh_materialized_view.sgml   |   5 +-
 doc/src/sgml/ref/reindex.sgml |  23 ++--
 doc/src/sgml/ref/revoke.sgml  |   2 +-
 doc/src/sgml/ref/vacuum.sgml  |   6 +-
 doc/src/sgml/user-manag.sgml  |  12 ++
 src/backend/catalog/aclchk.c  |  15 +++
 src/backend/commands/analyze.c|  13 +-
 src/backend/commands/cluster.c|  43 +--
 src/backend/commands/indexcmds.c  |  34 ++---
 src/backend/commands/lockcmds.c   |   2 +-
 src/backend/commands/matview.c|   3 +-
 src/backend/commands/tablecmds.c  |  18 +--
 src/backend/commands/vacuum.c |  76 +++-
 src/backend/postmaster/autovacuum.c   |   1 +
 src/backend/utils/adt/acl.c   |   8 ++
 src/bin/pg_dump/dumputils.c   |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   2 +-
 src/bin/psql/tab-complete.c   |   6 +-
 src/include/catalog/pg_authid.dat |   5 +
 src/include/commands/tablecmds.h  |   5 +-
 src/include/commands/vacuum.h |   5 +-
 src/include/nodes/parsenodes.h|   3 +-
 src/include/utils/acl.h   |   5 +-
 .../expected/cluster-conflict-partition.out   |   8 +-
 .../specs/cluster-conflict-partition.spec |   2 +-
 .../perl/PostgreSQL/Test/AdjustUpgrade.pm |  11 ++
 src/test/regress/expected/cluster.out |   7 ++
 src/test/regress/expected/create_index.out|   4 +-
 src/test/regress/expected/dependency.out  |  22 ++--
 src/test/regress/expected/privileges.out  | 116 ++
 src/test/regress/expected/rowsecurity.out |  34 ++---
 src/test/regress/sql/cluster.sql  |   5 +
 src/test/regress/sql/dependency.sql   |   2 +-
 src/test/regress/sql/privileges.sql   |  67 ++
 41 files changed, 456 insertions(+), 179 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 9d7e2c756b..8616a8e9cc 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1868,8 +1868,8 @@ ALTER TABLE products RENAME TO items;
INSERT, UPDATE, DELETE,
TRUNCATE, REFERENCES, TRIGGER,
CREATE, CONNECT, TEMPORARY,
-   EXECUTE, USAGE, SET
-   and ALTER SYSTEM.
+   EXECUTE, USAGE, SET,
+   ALTER SYSTEM, and MAINTAIN.
The privileges applicable to a particular
object vary depending on the object's type (table, function, etc.).
More detail about the meanings

Re: On disable_cost

2024-03-12 Thread Tom Lane

David Rowley  writes:
> So maybe the fix could be to set disable_cost to something like
> 1.0e110 and adjust compare_path_costs_fuzzily to not apply the
> fuzz_factor for paths >= disable_cost.   However, I wonder if that
> risks the costs going infinite after a couple of cartesian joins.

Perhaps.  It still does nothing for Robert's point that once we're
forced into using a "disabled" plan type, it'd be better if the
disabled-ness didn't skew subsequent planning choices.

On the whole I agree that getting rid of disable_cost entirely
would be the way to go, if we can replace that with a separate
boolean without driving up the cost of add_path too much.

regards, tom lane

Re: Vectored I/O in bulk_write.c

2024-03-12 Thread Thomas Munro

One more observation while I'm thinking about bulk_write.c... hmm, it
writes the data out and asks the checkpointer to fsync it, but doesn't
call smgrwriteback().  I assume that means that on Linux the physical
writeback sometimes won't happen until the checkpointer eventually
calls fsync() sequentially, one segment file at a time.  I see that
it's difficult to decide how to do that though; unlike checkpoints,
which have rate control/spreading, bulk writes could more easily flood
the I/O subsystem in a burst.  I expect most non-Linux systems'
write-behind heuristics to fire up for bulk sequential writes, but on
Linux where most users live, there is no write-behind heuristic AFAIK
(on the most common file systems anyway), so you have to crank that
handle if you want it to wake up and smell the coffee before it hits
internal limits, but then you have to decide how fast to crank it.

This problem will come into closer focus when we start talking about
streaming writes.  For the current non-streaming bulk_write.c coding,
I don't have any particular idea of what to do about that, so I'm just
noting the observation here.

Sorry for the sudden wall of text/monologue; this is all a sort of
reaction to bulk_write.c that I should perhaps have sent to the
bulk_write.c thread, triggered by a couple of debugging sessions.

Recent 027_streaming_regress.pl hangs

2024-03-12 Thread Thomas Munro

Hi,

Several animals are timing out while waiting for catchup,
sporadically.  I don't know why.  The oldest example I have found so
far by clicking around is:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-02-23%2015%3A44%3A35

So perhaps something was committed ~3 weeks ago triggered this.

There are many examples since, showing as recoveryCheck failures.
Apparently they are all on animals wrangled by Andres.  Hmm.  I think
some/all share a physical host, they seem to have quite high run time
variance, and they're using meson.

Re: Spurious pgstat_drop_replslot() call

2024-03-12 Thread Michael Paquier

On Tue, Mar 12, 2024 at 12:00:00PM +0300, Alexander Lakhin wrote:
> Sorry for a bit off-topic, but I've remembered an anomaly with a similar
> assert:
> https://www.postgresql.org/message-id/17947-b9554521ad963c9c%40postgresql.org

Thanks for the reminder.  The invalidation path with the stats drop is
only in 16~.

> Maybe you would find it worth considering while working in this area...
> (I've just run that reproducer on b36fbd9f8 and confirmed that the
> assertion failure is still here.)

Indeed, something needs to happen.  I am not surprised that it still
reproduces; nothing has changed with the locking of the stats entries.
:/
--
Michael


signature.asc
Description: PGP signature

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-03-12 Thread Jelte Fennema-Nio

On Tue, 12 Mar 2024 at 19:28, Alvaro Herrera  wrote:
>
> On 2024-Mar-12, Alvaro Herrera wrote:
>
> > Hmm, buildfarm member kestrel (which uses
> > -fsanitize=undefined,alignment) failed:
> >
> > # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc
> > dbname='postgres'
> > test cancellations...
> > libpq_pipeline:260: query did not fail when it was expected
>
> Hm, I tried using the same compile flags, couldn't reproduce.

Okay, it passed now it seems so I guess this test is flaky somehow.
The error message and the timing difference between failed and
succeeded buildfarm run clearly indicates that the pg_sleep ran its
180 seconds to completion (so cancel was never processed for some
reason).

**failed case**
282/285 postgresql:libpq_pipeline / libpq_pipeline/001_libpq_pipeline
   ERROR   191.56s   exit status 1

**succeeded case**

252/285 postgresql:libpq_pipeline / libpq_pipeline/001_libpq_pipeline
   OK   10.01s   21 subtests passed

I don't see any obvious reason for how this test can be flaky, but
I'll think a bit more about it tomorrow.

Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Jeff Davis

On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:
> I suggest that pg_dump/t/002_pg_dump.pl could use a verification that
> the ->{regexp} thing is not empty.

I'm not sure how exactly to test for an empty pattern. The problem is,
we don't really want to test for an empty pattern, because /(?^:)/ is
fine. The problem is //, which gets turned into an actual pattern
(perhaps empty or perhaps not), and by the time it's in the %tests
hash, I think it's too late to distinguish.

Again, I'm not a perl expert, so suggestions welcome.

>   I also tried grepping (for things
> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
> have ... but I only looked for the "qr" literal, not other ways to
> get
> regexes.

I think that's fine. qr// seems the most dangerous, because it seems to
behave differently in different versions of perl.

Grepping for regexes in perl code is an "interesting" exercise.

Regards,
Jeff Davis

Re: perl: unsafe empty pattern behavior

2024-03-12 Thread Tom Lane

Jeff Davis  writes:
> On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote:
>> I also tried grepping (for things
>> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you
>> have ... but I only looked for the "qr" literal, not other ways to
>> get regexes.

> I think that's fine. qr// seems the most dangerous, because it seems to
> behave differently in different versions of perl.

I wonder whether perlcritic has sufficiently deep understanding of
Perl code that it could find these hazards.  I already checked,
and found that there's no built-in filter for this (at least not
in the perlcritic version I have), but maybe we could write one?
The rules seem to be plug-in modules, so you can make your own
in principle.

regards, tom lane

Re: Add new error_action COPY ON_ERROR "log"

2024-03-12 Thread Michael Paquier

On Tue, Mar 12, 2024 at 12:54:29PM +0530, Bharath Rupireddy wrote:
> +1. But, do you want to add them now or later as a separate
> patch/discussion altogether?

The attached 0003 is what I had in mind:
- Simplification of the LOG generated with quotes applied around the
column name, don't see much need to add the relation name, either, for
consistency and because the knowledge is known in the query.
- A few more tests.
- Some doc changes.

>> Wouldn't it be better to squash the patches together, by the way?
> 
> I guess not. They are two different things IMV.

Well, 0001 is sitting doing nothing because the COPY code does not
make use of it internally.
--
Michael
From c45474726e084faf876a319485995ce84eef8293 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 11 Mar 2024 11:37:49 +
Subject: [PATCH v6 1/3] Add LOG_VERBOSITY option to COPY command

This commit adds a new option LOG_VERBOSITY to set the verbosity of
logged messages by COPY command. A value of 'verbose' can be used
to emit more informative messages by the command, while the value
of 'default (which is the default) can be used to not log any
additional messages. More values such as 'terse', 'row_details'
etc. can be added based on the need  to the LOG_VERBOSITY option.

An upcoming commit for emitting more info on soft errors by
COPY FROM command with ON_ERROR 'ignore' uses this.

Author: Bharath Rupireddy
Reviewed-by: Michael Paquier, Masahiko Sawada
Reviewed-by: Atsushi Torikoshi
Discussion: https://www.postgresql.org/message-id/CALj2ACXNA0focNeriYRvQQaCGc4CsTuOnFbzF9LqTKNWxuJdhA%40mail.gmail.com
---
 src/include/commands/copy.h | 10 
 src/backend/commands/copy.c | 38 +
 src/bin/psql/tab-complete.c |  6 -
 src/test/regress/expected/copy2.out |  8 ++
 src/test/regress/sql/copy2.sql  |  2 ++
 doc/src/sgml/ref/copy.sgml  | 14 +++
 src/tools/pgindent/typedefs.list|  1 +
 7 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0be..99d183fa4d 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -40,6 +40,15 @@ typedef enum CopyOnErrorChoice
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
 } CopyOnErrorChoice;
 
+/*
+ * Represents verbosity of logged messages by COPY command.
+ */
+typedef enum CopyLogVerbosityChoice
+{
+	COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+} CopyLogVerbosityChoice;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -73,6 +82,7 @@ typedef struct CopyFormatOptions
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
+	CopyLogVerbosityChoice log_verbosity;	/* verbosity of logged messages */
 	List	   *convert_select; /* list of column names (can be NIL) */
 } CopyFormatOptions;
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 056b6733c8..23eb8c9c79 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -428,6 +428,36 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 	return COPY_ON_ERROR_STOP;	/* keep compiler quiet */
 }
 
+/*
+ * Extract a CopyLogVerbosityChoice value from a DefElem.
+ */
+static CopyLogVerbosityChoice
+defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
+{
+	char	   *sval;
+
+	/*
+	 * If no parameter value given, assume the default value.
+	 */
+	if (def->arg == NULL)
+		return COPY_LOG_VERBOSITY_DEFAULT;
+
+	/*
+	 * Allow "default", or "verbose" values.
+	 */
+	sval = defGetString(def);
+	if (pg_strcasecmp(sval, "default") == 0)
+		return COPY_LOG_VERBOSITY_DEFAULT;
+	if (pg_strcasecmp(sval, "verbose") == 0)
+		return COPY_LOG_VERBOSITY_VERBOSE;
+
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("COPY LOG_VERBOSITY \"%s\" not recognized", sval),
+			 parser_errposition(pstate, def->location)));
+	return COPY_LOG_VERBOSITY_DEFAULT;	/* keep compiler quiet */
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -454,6 +484,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		freeze_specified = false;
 	bool		header_specified = false;
 	bool		on_error_specified = false;
+	bool		log_verbosity_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -613,6 +644,13 @@ ProcessCopyOptions(ParseState *pstate,
 			on_error_specified = true;
 			opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from);
 		}
+		else if (strcmp(defel->defname, "log_verbosity") == 0)
+		{
+			if (log_verbosity_specified)
+errorConflictingDefElem(defel, pstate);
+			log_verbosity_specified = true;
+			opts_out->

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-03-12 Thread Sutou Kouhei

Hi,

In 
  "Re: meson: Specify -Wformat as a common warning flag for extensions" on Fri, 
08 Mar 2024 10:05:27 -0600,
  "Tristan Partin"  wrote:

> Ok, I figured this out. -Wall implies -Wformat=1. We set warning_level
> to 1 in the Meson project() call, which implies -Wall, and set -Wall
> in CFLAGS for autoconf. That's the reason we don't get issues building
> Postgres. A user making use of the pg_config --cflags option, as Sutou
> is, *will* run into the aforementioned issues, since we don't
> propogate -Wall into pg_config.
> 
>   $ gcc $(pg_config --cflags) -E - < /dev/null > /dev/null
>   cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’
>   [-Wformat-security]
>   $ gcc -Wall $(pg_config --cflags) -E - < /dev/null > /dev/null
>   (nothing printed)

Thanks for explaining this. You're right. This is the reason
why we don't need this for PostgreSQL itself but we need
this for PostgreSQL extensions. Sorry. I should have
explained this in the first e-mail...


What should we do to proceed this patch?


Thanks,
-- 
kou

Re: Disable LLVM bitcode generation with pgxs.mk framework.

2024-03-12 Thread Xing Guo

> On Tue, Mar 12, 2024 at 10:40 PM Daniel Gustafsson  wrote:
>
> > On 12 Mar 2024, at 14:38, Xing Guo  wrote:
>
> > Would it be possible to add a new switch in the pgxs.mk framework to
> > allow users to disable this feature?
>
> Something like that doesn't seem unreasonable I think.

Thanks.

I added a new option NO_LLVM_BITCODE to pgxs. I'm not sure if the name
is appropriate.

> --
> Daniel Gustafsson
>
From e19a724fad4949bef9bc4d0f8e58719607d979be Mon Sep 17 00:00:00 2001
From: Xing Guo 
Date: Wed, 13 Mar 2024 07:56:46 +0800
Subject: [PATCH v1] Add NO_LLVM_BITCODE option to pgxs.

This patch adds a new option NO_LLVM_BITCODE to pgxs to allow user to
disable LLVM bitcode generation completely even if the PostgreSQL
installation is configured with --with-llvm.
---
 doc/src/sgml/extend.sgml | 9 +
 src/makefiles/pgxs.mk| 6 ++
 2 files changed, 15 insertions(+)

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 218940ee5c..6fe69746c2 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -1719,6 +1719,15 @@ include $(PGXS)
   
  
 
+ 
+  NO_LLVM_BITCODE
+  
+   
+don't generate LLVM bitcode even if the current PostgreSQL installation is configured with --with-llvm
+   
+  
+ 
+
  
   EXTRA_CLEAN
   
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index 0de3737e78..ec6a3c1f09 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -53,6 +53,8 @@
 # that don't need their build products to be installed
 #   NO_INSTALLCHECK -- don't define an installcheck target, useful e.g. if
 # tests require special configuration, or don't use pg_regress
+#   NO_LLVM_BITCODE -- don't generate LLVM bitcode even if the current
+# PostgreSQL installation is configured with --with-llvm
 #   EXTRA_CLEAN -- extra files to remove in 'make clean'
 #   PG_CPPFLAGS -- will be prepended to CPPFLAGS
 #   PG_CFLAGS -- will be appended to CFLAGS
@@ -218,6 +220,10 @@ endef
 
 all: $(PROGRAM) $(DATA_built) $(HEADER_allbuilt) $(SCRIPTS_built) $(addsuffix $(DLSUFFIX), $(MODULES)) $(addsuffix .control, $(EXTENSION))
 
+ifdef NO_LLVM_BITCODE
+with_llvm := no
+endif # NO_LLVM_BITCODE
+
 ifeq ($(with_llvm), yes)
 all: $(addsuffix .bc, $(MODULES)) $(patsubst %.o,%.bc, $(OBJS))
 endif
-- 
2.44.0

Re: Add basic tests for the low-level backup method.

2024-03-12 Thread David Steele


On 2/29/24 16:55, Michael Paquier wrote:

On Thu, Feb 29, 2024 at 10:30:52AM +1300, David Steele wrote:

On 11/12/23 08:21, David Steele wrote:

As promised in [1], attached are some basic tests for the low-level
backup method.


Added to the 2024-03 CF.


There is already 040_standby_failover_slots_sync.pl in recovery/ that
uses the number of your test script.  You may want to bump it, that's
a nit.


Renamed to 042_low_level_backup.pl.


+unlink("$backup_dir/postmaster.pid")
+   or BAIL_OUT("unable to unlink $backup_dir/postmaster.pid");
+unlink("$backup_dir/postmaster.opts")
+   or BAIL_OUT("unable to unlink $backup_dir/postmaster.opts");
+unlink("$backup_dir/global/pg_control")
+   or BAIL_OUT("unable to unlink $backup_dir/global/pg_control");

RELCACHE_INIT_FILENAME as well?


I'm not trying to implement the full exclusion list here, just enough to 
get the test working. Since exclusions are optional according to the 
docs I don't think we need them for a valid tests.



+# Rather than writing out backup_label, try to recover the backup without
+# backup_label to demonstrate that recovery will not work correctly without it,
+# i.e. the canary table will be missing and the cluster will be corrupt. 
Provide
+# only the WAL segment that recovery will think it needs.

Okay, why not.  No objections to this addition.  I am a bit surprised
that this is not scanning some of the logs produced by the startup
process for particular patterns.


Not sure what to look for here. There are no distinct messages for crash 
recovery. Perhaps there should be?



+# Save backup_label into the backup directory and recover using the primary's
+# archive. This time recovery will succeed and the canary table will be
+# present.

Here are well, I think that we should add some log_contains() with
pre-defined patterns to show that recovery has completed the way we
want it with a backup_label up to the end-of-backup record.


Sure, I added a check for the new log message when recovering with a 
backup_label.


Regards,
-DavidFrom 076429a578ece3c213525220a4961cb5b56c190a Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Wed, 13 Mar 2024 00:01:55 +
Subject: Add basic tests for the low-level backup method.

There are currently no tests for the low-level backup method. pg_backup_start()
and pg_backup_stop() are called but not exercised in any real fashion.

There is a lot more that can be done, but this at least provides some basic
tests and provides a place for future improvement.
---
 src/test/recovery/t/042_low_level_backup.pl | 108 
 1 file changed, 108 insertions(+)
 create mode 100644 src/test/recovery/t/042_low_level_backup.pl

diff --git a/src/test/recovery/t/042_low_level_backup.pl 
b/src/test/recovery/t/042_low_level_backup.pl
new file mode 100644
index 00..22668bdc0d
--- /dev/null
+++ b/src/test/recovery/t/042_low_level_backup.pl
@@ -0,0 +1,108 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test low-level backup method by using pg_backup_start() and pg_backup_stop()
+# to create backups.
+
+use strict;
+use warnings;
+
+use File::Copy qw(copy);
+use File::Path qw(remove_tree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Start primary node with archiving
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(has_archiving => 1, allows_streaming => 1);
+$node_primary->start;
+
+# Start backup
+my $backup_name = 'backup1';
+my $psql = $node_primary->background_psql('postgres');
+
+$psql->query_safe("SET client_min_messages TO WARNING");
+$psql->set_query_timer_restart();
+$psql->query_safe("select pg_backup_start('test label')");
+
+# Copy files
+my $backup_dir = $node_primary->backup_dir() . '/' . $backup_name;
+
+PostgreSQL::Test::RecursiveCopy::copypath(
+   $node_primary->data_dir(), $backup_dir);
+
+# Cleanup some files/paths that should not be in the backup. There is no
+# attempt to all the exclusions done by pg_basebackup here, in part because
+# they are not required, but also to keep the test simple.
+#
+# Also remove pg_control because it needs to be copied later and it will be
+# obvious if the copy fails.
+unlink("$backup_dir/postmaster.pid")
+   or BAIL_OUT("unable to unlink $backup_dir/postmaster.pid");
+unlink("$backup_dir/postmaster.opts")
+   or BAIL_OUT("unable to unlink $backup_dir/postmaster.opts");
+unlink("$backup_dir/global/pg_control")
+   or BAIL_OUT("unable to unlink $backup_dir/global/pg_control");
+
+remove_tree("$backup_dir/pg_wal", {keep_root => 1})
+   or BAIL_OUT("unable to unlink contents of $backup_dir/pg_wal");
+
+# Create a table that will be used to verify that recovery started at the
+# correct location, rather than the location recorded in pg_control
+$psql->query_safe("create table canary (id int)");
+
+# Advance the checkpoint location in pg_control past the location where the
+# backup started. Switch the WAL to make it really ob

Re: Transaction timeout

2024-03-12 Thread Alexander Korotkov

On Tue, Mar 12, 2024 at 10:28 AM Andrey M. Borodin  wrote:
> > On 11 Mar 2024, at 16:18, Alexander Korotkov  wrote:
> >
> > I think if checking psql stderr is problematic, checking just logs is
> > fine.  Could we wait for the relevant log messages one by one with
> > $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do?
>
> PFA version with $node->wait_for_log()

I've slightly revised the patch.  I'm going to push it if no objections.

--
Regards,
Alexander Korotkov


v5-0001-Add-TAP-tests-for-timeouts.patch
Description: Binary data

Re: [PROPOSAL] Skip test citext_utf8 on Windows

2024-03-12 Thread Andrew Dunstan




On 2024-03-11 Mo 22:50, Thomas Munro wrote:

On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan  wrote:

On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote:

Greetings, everyone!

While running "installchecks" on databases with UTF-8 encoding the test
citext_utf8 fails because of Turkish dotted I like this:

  SELECT 'i'::citext = 'İ'::citext AS t;
   t
  ---
- t
+ f
  (1 row)

I tried to replicate the test's results by hand and with any collation
that I tried (including --locale="Turkish") this test failed

Also an interesing result of my tesing. If you initialize you DB
with -E utf-8 --locale="Turkish" and then run select LOWER('İ');
the output will be this:
  lower
---
  İ
(1 row)

Which I find strange since lower() uses collation that was passed
(default in this case but still)

Wouldn't we be better off finding a Windows fix for this, instead of
sweeping it under the rug?

Given the sorry state of our Windows locale support, I've started
wondering about deleting it and telling users to adopt our nascent
built-in support or ICU[1].

This other thread [2] says the sorting is intransitive so I don't
think it really meets our needs anyway.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJhV__g_TJ0jVqPbnTuqT%2B%2BM6KFv2wj%2B9AV-cABNCXN6Q%40mail.gmail.com#bc35c0b88962ff8c24c27aecc1bca72e
[2] 
https://www.postgresql.org/message-id/flat/1407a2c0-062b-4e4c-b728-438fdff5cb07%40manitou-mail.org



Makes more sense than just hacking the tests to avoid running them on 
Windows. (I also didn't much like doing it by parsing the version 
string, although I know there's at least one precedent for doing that.)



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Support json_errdetail in FRONTEND builds

2024-03-12 Thread Andrew Dunstan




On 2024-03-12 Tu 14:43, Jacob Champion wrote:

Hello,

Both the incremental JSON [1] and OAuth [2] patchsets would be
improved by json_errdetail(), which was removed from FRONTEND builds
in b44669b2ca:


The routine providing a detailed error message based on the error code
is made backend-only, the existing code being unsafe to use in the
frontend as the error message may finish by being palloc'd or point to a
static string, so there is no way to know if the memory of the message
should be pfree'd or not.

Attached is a patch to undo this, by attaching any necessary
allocations to the JsonLexContext so the caller doesn't have to keep
track of them.

This is based on the first patch of the OAuth patchset, which
additionally needs json_errdetail() to be safe to use from libpq
itself. Alvaro pointed out offlist that we don't have to go that far
to re-enable this function for the utilities, so this patch is a sort
of middle ground between what we have now and what OAuth implements.
(There is some additional minimization that could be done to this
patch, but I'm hoping to keep the code structure consistent between
the two, if the result is acceptable.)




Seems reasonable.



Two notes that I wanted to point out explicitly:
- On the other thread, Daniel contributed a destroyStringInfo()
counterpart for makeStringInfo(), which is optional but seemed useful
to include here.



yeah, although maybe worth a different patch.



- After this patch, if a frontend client calls json_errdetail()
without calling freeJsonLexContext(), it will leak memory.



Not too concerned about this:


1. we tend to be a bit more relaxed about that in frontend programs, 
especially those not expected to run for long times and especially on 
error paths, where in many cases the program will just exit anyway.


2. the fix is simple where it's needed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Improve eviction algorithm in ReorderBuffer

2024-03-12 Thread Peter Smith

On Tue, Mar 12, 2024 at 4:23 PM Masahiko Sawada  wrote:
>
> On Fri, Mar 8, 2024 at 12:58 PM Peter Smith  wrote:
> >
...
> > > > 5.
> > > > + *
> > > > + * If 'indexed' is true, we create a hash table to track of each node's
> > > > + * index in the heap, enabling to perform some operations such as 
> > > > removing
> > > > + * the node from the heap.
> > > >   */
> > > >  binaryheap *
> > > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, void 
> > > > *arg)
> > > > +binaryheap_allocate(int capacity, binaryheap_comparator compare,
> > > > + bool indexed, void *arg)
> > > >
> > > > BEFORE
> > > > ... enabling to perform some operations such as removing the node from 
> > > > the heap.
> > > >
> > > > SUGGESTION
> > > > ... to help make operations such as removing nodes more efficient.
> > > >
> > >
> > > But these operations literally require the indexed binary heap as we
> > > have an assertion:
> > >
> > > void
> > > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d)
> > > {
> > > bh_nodeidx_entry *ent;
> > >
> > > Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> > > Assert(heap->bh_indexed);
> > >
> >
> > I didn’t quite understand -- the operations mentioned are "operations
> > such as removing the node", but binaryheap_remove_node() also removes
> > a node from the heap. So I still felt the comment wording of the patch
> > is not quite correct.
>
> Now I understand your point. That's a valid point.
>
> >
> > Now, if the removal of a node from an indexed heap can *only* be done
> > using binaryheap_remove_node_ptr() then:
> > - the other removal functions (binaryheap_remove_*) probably need some
> > comments to make sure nobody is tempted to call them directly for an
> > indexed heap.
> > - maybe some refactoring and assertions are needed to ensure those
> > *cannot* be called directly for an indexed heap.
> >
>
> If the 'index' is true, the caller can not only use the existing
> functions but also newly added functions such as
> binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about
> something like below?
>

You said: "can not only use the existing functions but also..."

Hmm. Is that right? IIUC those existing "remove" functions should NOT
be called directly if the heap was "indexed" because they'll delete
the node from the heap OK, but any corresponding index for that
deleted node will be left lying around -- i.e. everything gets out of
sync. This was the reason for my original concern.

>  * If 'indexed' is true, we create a hash table to track each node's
>  * index in the heap, enabling to perform some operations such as
>  * binaryheap_remove_node_ptr() etc.
>

Yeah, something like that... I'll wait for the next patch version
before commenting further.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-03-12 Thread Masahiko Sawada

On Tue, Mar 12, 2024 at 7:34 PM John Naylor  wrote:
>
> On Mon, Mar 11, 2024 at 3:13 PM Masahiko Sawada  wrote:
> >
> > On Mon, Mar 11, 2024 at 12:20 PM John Naylor  
> > wrote:
> > >
> > > On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada  
> > > wrote:
>
> > > + ts->context = CurrentMemoryContext;
> > >
> > > As far as I can tell, this member is never accessed again -- am I
> > > missing something?
> >
> > You're right. It was used to re-create the tidstore in the same
> > context again while resetting it, but we no longer support the reset
> > API. Considering it again, would it be better to allocate the iterator
> > struct in the same context as we store the tidstore struct?
>
> That makes sense.
>
> > > + /* DSA for tidstore will be detached at the end of session */
> > >
> > > No other test module pins the mapping, but that doesn't necessarily
> > > mean it's wrong. Is there some advantage over explicitly detaching?
> >
> > One small benefit of not explicitly detaching dsa_area in
> > tidstore_destroy() would be simplicity; IIUC if we want to do that, we
> > need to remember the dsa_area using (for example) a static variable,
> > and free it if it's non-NULL. I've implemented this idea in the
> > attached patch.
>
> Okay, I don't have a strong preference at this point.

I'd keep the update on that.

>
> > > +-- Add tids in random order.
> > >
> > > I don't see any randomization here. I do remember adding row_number to
> > > remove whitespace in the output, but I don't remember a random order.
> > > On that subject, the row_number was an easy trick to avoid extra
> > > whitespace, but maybe we should just teach the setting function to
> > > return blocknumber rather than null?
> >
> > Good idea, fixed.
>
> + test_set_block_offsets
> +
> + 2147483647
> +  0
> + 4294967294
> +  1
> + 4294967295
>
> Hmm, was the earlier comment about randomness referring to this? I'm
> not sure what other regression tests do in these cases, or how
> relibale this is. If this is a problem we could simply insert this
> result into a temp table so it's not output.

I didn't address the comment about randomness.

I think that we will have both random TIDs tests and fixed TIDs tests
in test_tidstore as we discussed, and probably we can do both tests
with similar steps; insert TIDs into both a temp table and tidstore
and check if the tidstore returned the results as expected by
comparing the results to the temp table. Probably we can have a common
pl/pgsql function that checks that and raises a WARNING or an ERROR.
Given that this is very similar to what we did in test_radixtree, why
do we really want to implement it using a pl/pgsql function? When we
discussed it before, I found the current way makes sense. But given
that we're adding more tests and will add more tests in the future,
doing the tests in C will be more maintainable and faster. Also, I
think we can do the debug-build array stuff in the test_tidstore code
instead.

>
> > > +Datum
> > > +tidstore_create(PG_FUNCTION_ARGS)
> > > +{
> > > ...
> > > + tidstore = TidStoreCreate(max_bytes, dsa);
> > >
> > > +Datum
> > > +tidstore_set_block_offsets(PG_FUNCTION_ARGS)
> > > +{
> > > 
> > > + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs);
> > >
> > > These names are too similar. Maybe the test module should do
> > > s/tidstore_/test_/ or similar.
> >
> > Agreed.
>
> Mostly okay, although a couple look a bit generic now. I'll leave it
> up to you if you want to tweak things.
>
> > > In general, the .sql file is still very hard-coded. Functions are
> > > created that contain a VALUES statement. Maybe it's okay for now, but
> > > wanted to mention it. Ideally, we'd have some randomized tests,
> > > without having to display it. That could be in addition to (not
> > > replacing) the small tests we have that display input. (see below)
> > >
> >
> > Agreed to add randomized tests in addition to the existing tests.
>
> I'll try something tomorrow.
>
> > Sounds a good idea. In fact, if there are some bugs in tidstore, it's
> > likely that even initdb would fail in practice. However, it's a very
> > good idea that we can test the tidstore anyway with such a check
> > without a debug-build array.
> >
> > Or as another idea, I wonder if we could keep the debug-build array in
> > some form. For example, we use the array with the particular build
> > flag and set a BF animal for that. That way, we can test the tidstore
> > in more real cases.
>
> I think the purpose of a debug flag is to help developers catch
> mistakes. I don't think it's quite useful enough for that. For one, it
> has the same 1GB limitation as vacuum's current array. For another,
> it'd be a terrible way to debug moving tidbitmap.c from its hash table
> to use TID store -- AND/OR operations and lossy pages are pretty much
> undoable with a copy of vacuum's array.

Valid points.

As I mentioned above, if we im

Re: Improve eviction algorithm in ReorderBuffer

2024-03-12 Thread Masahiko Sawada

On Wed, Mar 13, 2024 at 10:15 AM Peter Smith  wrote:
>
> On Tue, Mar 12, 2024 at 4:23 PM Masahiko Sawada  wrote:
> >
> > On Fri, Mar 8, 2024 at 12:58 PM Peter Smith  wrote:
> > >
> ...
> > > > > 5.
> > > > > + *
> > > > > + * If 'indexed' is true, we create a hash table to track of each 
> > > > > node's
> > > > > + * index in the heap, enabling to perform some operations such as 
> > > > > removing
> > > > > + * the node from the heap.
> > > > >   */
> > > > >  binaryheap *
> > > > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, 
> > > > > void *arg)
> > > > > +binaryheap_allocate(int capacity, binaryheap_comparator compare,
> > > > > + bool indexed, void *arg)
> > > > >
> > > > > BEFORE
> > > > > ... enabling to perform some operations such as removing the node 
> > > > > from the heap.
> > > > >
> > > > > SUGGESTION
> > > > > ... to help make operations such as removing nodes more efficient.
> > > > >
> > > >
> > > > But these operations literally require the indexed binary heap as we
> > > > have an assertion:
> > > >
> > > > void
> > > > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d)
> > > > {
> > > > bh_nodeidx_entry *ent;
> > > >
> > > > Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> > > > Assert(heap->bh_indexed);
> > > >
> > >
> > > I didn’t quite understand -- the operations mentioned are "operations
> > > such as removing the node", but binaryheap_remove_node() also removes
> > > a node from the heap. So I still felt the comment wording of the patch
> > > is not quite correct.
> >
> > Now I understand your point. That's a valid point.
> >
> > >
> > > Now, if the removal of a node from an indexed heap can *only* be done
> > > using binaryheap_remove_node_ptr() then:
> > > - the other removal functions (binaryheap_remove_*) probably need some
> > > comments to make sure nobody is tempted to call them directly for an
> > > indexed heap.
> > > - maybe some refactoring and assertions are needed to ensure those
> > > *cannot* be called directly for an indexed heap.
> > >
> >
> > If the 'index' is true, the caller can not only use the existing
> > functions but also newly added functions such as
> > binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about
> > something like below?
> >
>
> You said: "can not only use the existing functions but also..."
>
> Hmm. Is that right? IIUC those existing "remove" functions should NOT
> be called directly if the heap was "indexed" because they'll delete
> the node from the heap OK, but any corresponding index for that
> deleted node will be left lying around -- i.e. everything gets out of
> sync. This was the reason for my original concern.
>

All existing binaryheap functions should be available even if the
binaryheap is 'indexed'. For instance, with the patch,
binaryheap_remote_node() is:

void
binaryheap_remove_node(binaryheap *heap, int n)
{
int cmp;

Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
Assert(n >= 0 && n < heap->bh_size);

/* compare last node to the one that is being removed */
cmp = heap->bh_compare(heap->bh_nodes[--heap->bh_size],
   heap->bh_nodes[n],
   heap->bh_arg);

/* remove the last node, placing it in the vacated entry */
replace_node(heap, n, heap->bh_nodes[heap->bh_size]);

/* sift as needed to preserve the heap property */
if (cmp > 0)
sift_up(heap, n);
else if (cmp < 0)
sift_down(heap, n);
}

The replace_node(), sift_up() and sift_down() update node's index as
well if the binaryheap is indexed. When deleting the node from the
binaryheap, it will also delete its index from the hash table.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: CI speed improvements for FreeBSD

2024-03-12 Thread Thomas Munro

On Wed, Mar 13, 2024 at 4:50 AM Maxim Orlov  wrote:
> I looked at the changes and I liked them.  Here are my thoughts:

Thanks for looking!  Pushed.

1 2 >

1 - 100 of 135 matches

Mail list logo