Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-13 Thread Heikki Linnakangas

On 11/04/2024 16:37, Ranier Vilela wrote:
Em qui., 11 de abr. de 2024 às 09:54, Heikki Linnakangas 
mailto:hlinn...@iki.fi>> escreveu:


On 11/04/2024 15:03, Ranier Vilela wrote:
 > Em qua., 10 de abr. de 2024 às 18:28, Heikki Linnakangas
 > mailto:hlinn...@iki.fi> >> escreveu:
 >
 >     On 10/04/2024 21:07, Ranier Vilela wrote:
 >      > Hi,
 >      >
 >      > Per Coverity.
 >      >
 >      > The function ReorderBufferTXNByXid,
 >      > can return NULL when the parameter *create* is false.
 >      >
 >      > In the functions ReorderBufferSetBaseSnapshot
 >      > and ReorderBufferXidHasBaseSnapshot,
 >      > the second call to ReorderBufferTXNByXid,
 >      > pass false to *create* argument.
 >      >
 >      > In the function ReorderBufferSetBaseSnapshot,
 >      > fixed passing true as argument to always return
 >      > a valid ReorderBufferTXN pointer.
 >      >
 >      > In the function ReorderBufferXidHasBaseSnapshot,
 >      > fixed by checking if the pointer is NULL.
 >
 >     If it's a "known subxid", the top-level XID should already
have its
 >     ReorderBufferTXN entry, so ReorderBufferTXN() should never
return NULL.
 >
 > There are several conditions to not return NULL,
 > I think trusting never is insecure.

Well, you could make it an elog(ERROR, ..) instead. But the point is
that it should not happen, and if it does for some reason, that's very
suprpising and there is a bug somewhere. In that case, we should *not*
just blindly create it and proceed as if everything was OK.

Thanks for the clarification.
I will then suggest improving robustness,
but avoiding hiding any possible errors that may occur.


I don't much like adding extra runtime checks for "can't happen" 
scenarios either. Assertions would be more clear, but in this case the 
code would just segfault trying to dereference the NULL pointer, which 
is fine for a "can't happen" scenario.


Looking closer, when we identify an XID as a subtransaction, we:
- assign toplevel_xid,
- set RBTXN_IS_SUBXACT, and
- assign toptxn pointer.

ISTM the 'toplevel_xid' and RBTXN_IS_SUBXACT are redundant. 
'toplevel_xid' is only used in those two calls that do 
ReorderBufferTXNByXid(rb, txn->toplevel_xid,...), and you could replace 
those by following the 'toptxn' pointer directly. And RBTXN_IS_SUBXACT 
is redundant with (toptxn != NULL). So here's a patch to remove 
'toplevel_xid' and RBTXN_IS_SUBXACT altogether.


Amit, you added 'toptxn' in commit c55040ccd017; does this look right to 
you?


--
Heikki Linnakangas
Neon (https://neon.tech)
From e8452054a79034d070407775dfcd8b9754602cb9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sat, 13 Apr 2024 10:02:41 +0300
Subject: [PATCH 1/1] Remove redundant field and flag from ReorderBufferTXN

RBTXN_IS_SUBXACT and toplevel_xid are redundant with the
ReorderBufferTXN->toptxn field.
---
 .../replication/logical/reorderbuffer.c   | 28 ---
 src/include/replication/reorderbuffer.h   | 15 ++
 2 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 00a8327e771..fb0dbec155c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -947,7 +947,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
 			Assert(prev_first_lsn < cur_txn->first_lsn);
 
 		/* known-as-subtxn txns must not be listed */
-		Assert(!rbtxn_is_known_subxact(cur_txn));
+		Assert(!rbtxn_is_subtxn(cur_txn));
 
 		prev_first_lsn = cur_txn->first_lsn;
 	}
@@ -967,7 +967,7 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
 			Assert(prev_base_snap_lsn < cur_txn->base_snapshot_lsn);
 
 		/* known-as-subtxn txns must not be listed */
-		Assert(!rbtxn_is_known_subxact(cur_txn));
+		Assert(!rbtxn_is_subtxn(cur_txn));
 
 		prev_base_snap_lsn = cur_txn->base_snapshot_lsn;
 	}
@@ -1022,7 +1022,7 @@ ReorderBufferGetOldestTXN(ReorderBuffer *rb)
 
 	txn = dlist_head_element(ReorderBufferTXN, node, &rb->toplevel_by_lsn);
 
-	Assert(!rbtxn_is_known_subxact(txn));
+	Assert(!rbtxn_is_subtxn(txn));
 	Assert(txn->first_lsn != InvalidXLogRecPtr);
 	return txn;
 }
@@ -1079,7 +1079,7 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 
 	if (!new_sub)
 	{
-		if (rbtxn_is_known_subxact(subtxn))
+		if (rbtxn_is_subtxn(subtxn))
 		{
 			/* already associated, nothing to do */
 			return;
@@ -1095,8 +1095,6 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 		}
 	}
 
-	subtxn->txn_flags |= RBTXN_IS_SUBXACT;
-	subtxn->toplevel_xid = xid;
 	Assert(subtxn->nsubtxns == 0);
 
 	/* set the reference to top-level transaction */
@@ -1135,8 +1133,6 @@ static void
 ReorderBufferTransferSnapToParent(ReorderBufferTXN *txn,
   ReorderBufferTXN *subtxn)
 {
-	Asser

In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Dmitry Koterov
Hi.

Preamble: this happens in MacOS only (in built-in Terminal and in iTerm2 at
least). In Linux, everything is as expected.

I almost lost my mind today trying to figure out why sending a SIGINT
precisely to a psql interactive process delivers this SIGINT not only to
that psql, but also to its parents.

Notice that it is NOT about Ctrl-C at all (which would've been delivered to
the whole process group; but again, it's not about it, there is no Ctrl-C;
I even used pliers to remove "Ctrl" and "C" keys from my keyboard). It is
about a plain old "kill -SIGINT psql_pid" executed manually in a separate
shell.

I tried to find a place in psql source code where it fans out SIGINT to its
parent pids upon receiving. But I did not find any. There is also no shell
involved in all these (see logs below).

Any ideas, who is doing this on MacOS? Who is sending 2 extra SIGINTs to 2
parents when a SIGINT is delivered to psql?

I also tried to use any other signal-trapping program instead of psql (like
vim or an artificial Perl script which ignores signals) and did not observe
the same behavior. It looks like something very specific to psql.

Steps to reproduce:

===

*Terminal 1:*

% psql --version
psql (PostgreSQL) 16.0

% cat /tmp/my1.pl
#!/usr/bin/perl -w
if (fork()) {
  $SIG{INT} = sub { print("int in my1\n") };
  while (1) { wait() and exit(1); }
} else {
  exec("/tmp/my2.pl");
}

% cat /tmp/my2.pl
#!/usr/bin/perl -w
if (fork()) {
  $SIG{INT} = sub { print("int in my2\n") };
  while (1) { wait() and exit(1); }
} else {
  exec("psql");
}

# PostgreSQL server is running in Docker, localhost remapped port 15432
% PGHOST=127.0.0.1 PGPORT=15432 PGUSER=postgres PGPASSWORD=postgres
PGDATABASE=postgres /tmp/my1.pl
psql (16.0, server 13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
postgres=#


*Terminal 2:*

% pstree | grep -E "bash |yarn|psql|vim|sleep|lldb|my.pl|perl" | grep
dmitry | grep -v grep

 |   |   \-+= 24550 dmitry /usr/bin/perl -w /tmp/my1.pl
 |   | \-+- 24551 dmitry /usr/bin/perl -w /tmp/my2.pl
 |   |   \--- 24552 dmitry psql

% pgrep psql
24552
% kill -SIGINT 24552


*And I see this in Terminal 1:*

postgres=#
int in my2
int in my1
postgres=#

===

I.e. we can see that not only psql got the SIGINT received, but also its
parents (two artificial Perl processes).


Re: some LLVM function checks missing in meson

2024-04-13 Thread Heikki Linnakangas

On 11/04/2024 18:26, Peter Eisentraut wrote:

I have been checking the pg_config.h generated by configure and meson to
see if there is anything materially different.  I found that

HAVE_DECL_LLVMCREATEGDBREGISTRATIONLISTENER and
HAVE_DECL_LLVMCREATEPERFJITEVENTLISTENER

are missing on the meson side.

Something like the below would appear to fix that:

diff --git a/meson.build b/meson.build
index 43fad5323c0..cdfd31377d1 100644
--- a/meson.build
+++ b/meson.build
@@ -2301,6 +2301,14 @@ decl_checks += [
 ['pwritev', 'sys/uio.h'],
   ]

+# Check presence of some optional LLVM functions.
+if llvm.found()
+  decl_checks += [
+['LLVMCreateGDBRegistrationListener', 'llvm-c/ExecutionEngine.h'],
+['LLVMCreatePerfJITEventListener', 'llvm-c/ExecutionEngine.h'],
+  ]
+endif
+
   foreach c : decl_checks
 func = c.get(0)
 header = c.get(1)

I don't know what these functions do, but the symbols are used in the
source code.  Thoughts?


+1. I also don't know what they do, but clearly the configure and meson 
checks should be in sync.


There's also this in llvmjit.c:


if (llvm_opt3_orc)
{
#if defined(HAVE_DECL_LLVMORCREGISTERPERF) && HAVE_DECL_LLVMORCREGISTERPERF
if (jit_profiling_support)
LLVMOrcUnregisterPerf(llvm_opt3_orc);
#endif
LLVMOrcDisposeInstance(llvm_opt3_orc);
llvm_opt3_orc = NULL;
}

if (llvm_opt0_orc)
{
#if defined(HAVE_DECL_LLVMORCREGISTERPERF) && HAVE_DECL_LLVMORCREGISTERPERF
if (jit_profiling_support)
LLVMOrcUnregisterPerf(llvm_opt0_orc);
#endif
LLVMOrcDisposeInstance(llvm_opt0_orc);
llvm_opt0_orc = NULL;
}
}


The autoconf test that set HAVE_DECL_LLVMORCREGISTERPERF was removed in 
commit e9a9843e13. I believe that's a leftover that should also have 
been removed.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Parallel CREATE INDEX for BRIN indexes

2024-04-13 Thread Andres Freund
Hi,

While preparing a differential code coverage report between 16 and HEAD, one
thing that stands out is the parallel brin build code. Neither on
coverage.postgresql.org nor locally is that code reached during our tests.

https://coverage.postgresql.org/src/backend/access/brin/brin.c.gcov.html#2333

Greetings,

Andres Freund




Add missing ConditionVariableCancelSleep() in slot.c

2024-04-13 Thread Bharath Rupireddy
Hi,

It looks like there's missing ConditionVariableCancelSleep() in
InvalidatePossiblyObsoleteSlot() after waiting for the replication
slot to be released by another process. Although prepare to sleep
cancels the sleep if the previous one wasn't canceled, the calling
process still remains in the CV's wait queue for the last cycle after
the replication slot holding process releases the slot. I'm wondering
why there's no issue detected if the calling process isn't removed
from the CV's wait queue. It may be due to the cancel sleep call in
the sigsetjmp() path for almost every process. IMO, it's a clean
practice to cancel the sleep immediately after the sleep ends like
elsewhere.

Please find the attached patch to fix the issue. Alternatively, we can
just add ConditionVariableCancelSleep() outside of the for loop to
cancel the sleep of the last cycle if any. This also looks correct
because every prepare to sleep does ensure the previous sleep is
canceled, and if no sleep at all, the cacnel sleep call exits.

PS: I've encountered an assertion failure [1] some time back
sporadically in t/001_rep_changes.pl which I've reported in
https://www.postgresql.org/message-id/CALj2ACXO6TJ4rhc3Uz3MWJGob9e4H1C71qufH-DGKJh8k4QGZA%40mail.gmail.com.
I'm not so sure if this patch fixes the assertion failure. I ran the
tests for 100 times [2] on my dev system, I didn't see any issue with
the patch.

[1]
t/001_rep_changes.pl

2024-01-31 12:24:38.474 UTC [840166]
pg_16435_sync_16393_7330237333761601891 STATEMENT:
DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT
TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File:
"../../../../src/include/storage/proclist.h", Line: 101, PID: 840166
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565]
2024-01-31 12:24:38.476 UTC [840168]
pg_16435_sync_16390_7330237333761601891 LOG:  statement: SELECT
a.attnum,   a.attname,   a.atttypid,   a.attnum =
ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN
pg_catalog.pg_index i   ON (i.indexrelid =
pg_get_replica_identity_index(16391)) WHERE a.attnum >
0::pg_catalog.int2   AND NOT a.attisdropped AND a.attgenerated = ''
AND a.attrelid = 16391 ORDER BY a.attnum

[2] for i in {1..100}; do make check
PROVE_TESTS="t/001_rep_changes.pl"; if [ $? -ne 0 ]; then echo "The
command failed on iteration $i"; break; fi; done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Add-missing-ConditionVariableCancelSleep-in-slot..patch
Description: Binary data


Remove unnecessary segment number calculation after wal_removed invalidation of replication slots

2024-04-13 Thread Bharath Rupireddy
Hi,

It looks like there's an unnecessary segment number calculation after
InvalidateObsoleteReplicationSlots in CreateCheckPoint and
CreateRestartPoint. Since none of RedoRecPtr, _logSegNo and
wal_segment_size are changed by the slot invalidation code [1], the
recalculation of
_logSegNo with XLByteToSeg seems unnecessary.

I've attached a patch to fix this.

[1] Assertions like the following won't fail with make check-world
proving InvalidateObsoleteReplicationSlots doesn't change them at all.

XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
KeepLogSeg(recptr, &_logSegNo);
+   _logSegNo_saved = _logSegNo;
+   RedoRecPtr_saved = RedoRecPtr;
if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED,

_logSegNo, InvalidOid,

InvalidTransactionId))
{
+   Assert(_logSegNo_saved == _logSegNo);
+   Assert(RedoRecPtr_saved == RedoRecPtr);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Remove-unnecessary-seg-no-calculation-after-check.patch
Description: Binary data


Re: Parallel CREATE INDEX for BRIN indexes

2024-04-13 Thread Tomas Vondra
On 4/13/24 10:36, Andres Freund wrote:
> Hi,
> 
> While preparing a differential code coverage report between 16 and HEAD, one
> thing that stands out is the parallel brin build code. Neither on
> coverage.postgresql.org nor locally is that code reached during our tests.
>

Thanks for pointing this out, it's definitely something that I need to
improve (admittedly, should have been part of the patch). I'll also look
into eliminating the difference between BTREE and BRIN parallel builds,
mentioned in my last message in this thread.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Table AM Interface Enhancements

2024-04-13 Thread Alexander Korotkov
Hi, Melanie!

On Fri, Apr 12, 2024 at 8:48 PM Melanie Plageman
 wrote:
>
> On Thu, Apr 11, 2024 at 6:04 PM Alexander Korotkov  
> wrote:
> >
> > On Fri, Apr 12, 2024 at 12:04 AM Andres Freund  wrote:
> > > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote:
> > > > I hope this work is targeting pg18.
> > >
> > > I think anything of the scope discussed by Melanie would be very clearly
> > > targeting 18. For 17, I don't know yet whether we should revert the the
> > > ANALYZE streaming read user (041b96802ef), just do a bit of comment 
> > > polishing,
> > > or some other small change.
> > >
> > > One oddity is that before 041b96802ef, the opportunities for making the
> > > interface cleaner were less apparent, because c6fc50cb4028 increased the
> > > coupling between analyze.c and the way the table storage works.
> >
> > Thank you for pointing this out about c6fc50cb4028, I've missed this.
> >
> > > > Otherwise, do I get this right that this post feature-freeze works on
> > > > designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> > > > committed on Mar 30.
> > >
> > > Note that there were versions of the patch that were targeting the
> > > pre-27bc1772fc interface.
> >
> > Sure, I've checked this before writing.  It looks quite similar to the
> > result of applying my revert patch [1] to the head.
> >
> > Let me describe my view over the current situation.
> >
> > 1) If we just apply my revert patch and leave c6fc50cb4028 and
> > 041b96802ef in the tree, then we get our table AM API narrowed.  As
> > you expressed the current API requires block numbers to be 1:1 with
> > the actual physical on-disk location [2].  Not a secret I think the
> > current API is quite restrictive.  And we're getting the ANALYZE
> > interface narrower than it was since 737a292b5de.  Frankly speaking, I
> > don't think this is acceptable.
> >
> > 2) Pushing down the read stream and prefetch to heap am is related to
> > difficulties [3], [4].  That's quite a significant piece of work to be
> > done post FF.
>
> I had operated under the assumption that we needed to push the
> streaming read code into heap AM because that is what we did for
> sequential scan, but now that I think about it, I don't see why we
> would have to. Bilal's patch pre-27bc1772fc did not do this. But I
> think the code in acquire_sample_rows() isn't more tied to heap AM
> after 041b96802ef than it was before it. Are you of the opinion that
> the code with 041b96802ef ties acquire_sample_rows() more closely to
> heap format?

Yes, I think so.  Table AM API deals with TIDs and block numbers, but
doesn't force on what they actually mean.  For example, in ZedStore
[1], data is stored on per-column B-trees, where TID used in table AM
is just a logical key of that B-trees.  Similarly, blockNumber is a
range for B-trees.

c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an
assumption that we are sampling physical blocks as they are stored in
data files.  That couldn't anymore be some "logical" block numbers
with meaning only table AM implementation knows.  That was pointed out
by Andres [2].  I'm not sure if ZedStore is alive, but there could be
other table AM implementations like this, or other implementations in
development, etc.  Anyway, I don't feel good about narrowing the API,
which is there from pg12.

Links.
1. 
https://www.pgcon.org/events/pgcon_2020/sessions/session/44/slides/13/Zedstore-PGCon2020-Virtual.pdf
2. 
https://www.postgresql.org/message-id/20240410212117.mxsldz2w6htrl36v%40awork3.anarazel.de

--
Regards,
Alexander Korotkov




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-13 Thread Alexander Korotkov
Hi, Dmitry!

On Fri, Apr 12, 2024 at 10:59 PM Dmitry Koval  wrote:
>
> Thanks, Alexander!
>
> > Still now we're able to create a partition in the pg_temp schema
> > explicitly.
>
> Attached patches with fix.

Please, find a my version of this fix attached.  I think we need to
check relpersistence in a similar way ATTACH PARTITION or CREATE TABLE
... PARTITION OF do.  I'm going to polish this a little bit more.

--
Regards,
Alexander Korotkov


v6-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch
Description: Binary data


Re: post-freeze damage control

2024-04-13 Thread Tomas Vondra



On 4/13/24 01:03, David Steele wrote:
> On 4/12/24 22:12, Tomas Vondra wrote:
>> On 4/11/24 23:48, David Steele wrote:
>>> On 4/11/24 20:26, Tomas Vondra wrote:
>>>
 FWIW that discussion also mentions stuff that I think the feature
 should
 not do. In particular, I don't think the ambition was (or should be) to
 make pg_basebackup into a stand-alone tool. I always saw pg_basebackup
 more as an interface to "backup steps" correctly rather than a complete
 backup solution that'd manage backup registry, retention, etc.
>>>
>>> Right -- this is exactly my issue. pg_basebackup was never easy to use
>>> as a backup solution and this feature makes it significantly more
>>> complicated. Complicated enough that it would be extremely difficult for
>>> most users to utilize in a meaningful way.
>>
>> Perhaps, I agree we could/should try to do better job to do backups, no
>> argument there. But I still don't quite see why introducing such
>> infrastructure to "manage backups" should be up to the patch adding
>> incremental backups. I see it as something to build on top of
>> pg_basebackup/pg_combinebackup, not into those tools.
> 
> I'm not saying that managing backups needs to be part of pg_basebackup,
> but I am saying without that it is not a complete backup solution.
> Therefore introducing advanced features that the user then has to figure
> out how to manage puts a large burden on them. Implementing
> pg_combinebackup inefficiently out of the gate just makes their life
> harder.
> 

I agree with this in general, but I fail to see how it'd be the fault of
this patch. It merely extends what pg_basebackup did before, so if it's
not a complete solution now, it wasn't a complete solution before.

Sure, I 100% agree it'd be great to have a more efficient
pg_combinebackup with fewer restrictions. But if we make these
limitations clear, I think it's better to have this than having nothing.

>>> But they'll try because it is a new pg_basebackup feature and they'll
>>> assume it is there to be used. Maybe it would be a good idea to make it
>>> clear in the documentation that significant tooling will be required to
>>> make it work.
>>
>> Sure, I'm not against making it clearer pg_combinebackup is not a
>> complete backup solution, and documenting the existing restrictions.
> 
> Let's do that then. I think it would make sense to add caveats to the
> pg_combinebackup docs including space requirements, being explicit about
> the format required (e.g. plain), and also possible mitigation with COW
> filesystems.
> 

OK. I'll add this as an open item, for me and Robert.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: post-freeze damage control

2024-04-13 Thread Tomas Vondra



On 4/13/24 01:23, David Steele wrote:
> On 4/12/24 22:27, Tomas Vondra wrote:
>>
>>
>> On 4/12/24 08:42, David Steele wrote:
>>> On 4/11/24 20:26, Tomas Vondra wrote:
 On 4/11/24 03:52, David Steele wrote:
> On 4/11/24 10:23, Tom Kincaid wrote:
>>
>> The extensive Beta process we have can be used to build confidence we
>> need in a feature that has extensive review and currently has no
>> known
>> issues or outstanding objections.
>
> I did have objections, here [1] and here [2]. I think the complexity,
> space requirements, and likely performance issues involved in restores
> are going to be a real problem for users. Some of these can be
> addressed
> in future releases, but I can't escape the feeling that what we are
> releasing here is half-baked.
>
 I do not think it's half-baked. I certainly agree there are
 limitations,
 and there's all kinds of bells and whistles we could add, but I think
 the fundamental infrastructure is corrent and a meaningful step
 forward.
 Would I wish it to handle .tar for example? Sure I would. But I think
 it's something we can add in the future - if we require all of this to
 happen in a single release, it'll never happen.
>>>
>>> I'm not sure that I really buy this argument, anyway. It is not uncommon
>>> for significant features to spend years in development before they are
>>> committed. This feature went from first introduction to commit in just
>>> over six months. Obviously Robert had been working on it for a while,
>>> but for a feature this large six months is a sprint.
>>>
>>
>> Sure, but it's also not uncommon for significant features to be
>> developed incrementally, over multiple releases, introducing the basic
>> infrastructure first, and then expanding the capabilities later. I'd
>> cite logical decoding/replication and parallel query as examples of this
>> approach.
>>
>> It's possible there's some fundamental flaw in the WAL summarization?
>> Sure, I can't rule that out, although I find it unlikely. Could there be
>> bugs? Sure, that's possible, but that applies to all code.
>>
>> But it seems to me all the comments are about the client side, not about
>> the infrastructure. Which is fair, I certainly agree it'd be nice to
>> handle more use cases with less effort, but I still think the patch is a
>> meaningful step forward.
> 
> Yes, my comments are all about the client code. I like the
> implementation of the WAL summarizer a lot. I don't think there is a
> fundamental flaw in the design, either, but I wouldn't be surprised if
> there are bugs. That's life in software development biz.
> 

Agreed.

> Even for the summarizer, though, I do worry about the complexity of
> maintaining it over time. It seems like it would be very easy to
> introduce a bug and have it go unnoticed until it causes problems in the
> field. A lot of testing was done outside of the test suite for this
> feature and I'm not sure if we can rely on that focus with every release.
> 

I'm not sure there's a simpler way to implement this. I haven't really
worked on that part (not until the CoW changes a couple weeks ago), but
I think Robert was very conscious of the complexity.

I don't think expect this code to change very often, but I agree it's
not great to rely on testing outside the regular regression test suite.
But I'm not sure how much more we can do, really - for example my
testing was very much "randomized stress testing" with a lot of data and
long runs, looking for unexpected stuff. That's not something we could
do in the usual regression tests, I think.

But if you have suggestions how to extend the testing ...

> For me an incremental approach would be to introduce the WAL summarizer
> first. There are already plenty of projects that do page-level
> incremental (WAL-G, pg_probackup, pgBackRest) and could help shake out
> the bugs. Then introduce the client tools later when they are more
> robust. Or, release the client tools now but mark them as experimental
> or something so people know that changes are coming and they don't get
> blindsided by that in the next release. Or, at the very least, make the
> caveats very clear so users can make an informed choice.
> 

I don't think introducing just the summarizer, without any client tools,
would really work. How would we even test the summarizer, for example?
If the only users of that code are external tools, we'd do only some
very rudimentary tests. But the more complex tests would happen in the
external tools, which means it wouldn't be covered by cfbot, buildfarm
and so on. Considering the external tools are likely a bit behind, It's
not clear to me how I would do the stress testing, for example.

IMHO we should aim to have in-tree clients when possible, even if some
external tools can do more advanced stuff etc.

This however reminds me my question is the summarizer provides the right
interface(s) for the external tools. One optio

Re: Add missing ConditionVariableCancelSleep() in slot.c

2024-04-13 Thread Alvaro Herrera
On 2024-Apr-13, Bharath Rupireddy wrote:

> It looks like there's missing ConditionVariableCancelSleep() in
> InvalidatePossiblyObsoleteSlot() after waiting for the replication
> slot to be released by another process. Although prepare to sleep
> cancels the sleep if the previous one wasn't canceled, the calling
> process still remains in the CV's wait queue for the last cycle after
> the replication slot holding process releases the slot. I'm wondering
> why there's no issue detected if the calling process isn't removed
> from the CV's wait queue. It may be due to the cancel sleep call in
> the sigsetjmp() path for almost every process. IMO, it's a clean
> practice to cancel the sleep immediately after the sleep ends like
> elsewhere.

Hmm, but shouldn't we cancel the sleep after we have completed sleeping
altogether, that is, until we've determined that we're no longer to
sleep waiting for this slot?  That would suggest to put the call to
cancel sleep after the for(;;) loop is complete, rather than immediately
after sleeping.  No?

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index cebf44bb0f..59e9bef642 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1756,6 +1756,8 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
}
}
 
+   ConditionVariableCancelSleep();
+
Assert(released_lock == !LWLockHeldByMe(ReplicationSlotControlLock));
 
return released_lock;


However, I noticed that ConditionVariablePrepareToSleep() does a
CancelSleep upon being called ... so what I suggest would not have any
effect whatsoever, because the sleep would be cancelled next time
through the loop anyway.  But shouldn't we also modify PrepareToSleep to
exit without doing anything if our current sleep CV is the same one
being newly installed?

diff --git a/src/backend/storage/lmgr/condition_variable.c 
b/src/backend/storage/lmgr/condition_variable.c
index 112a518bae..65811ff989 100644
--- a/src/backend/storage/lmgr/condition_variable.c
+++ b/src/backend/storage/lmgr/condition_variable.c
@@ -57,6 +57,14 @@ ConditionVariablePrepareToSleep(ConditionVariable *cv)
 {
int pgprocno = MyProcNumber;
 
+   /*
+* If we're preparing to sleep on the same CV we were already going to
+* sleep on, we don't actually need to do anything.  This may seem like
+* supporting sloppy coding, which is what it actually does, so 
¯\_(ツ)_/¯
+*/
+   if (cv_sleep_target == cv)
+   return;
+
/*
 * If some other sleep is already prepared, cancel it; this is necessary
 * because we have just one static variable tracking the prepared sleep,

Alternatively, maybe we need to not prepare-to-sleep in
InvalidatePossiblyObsoleteSlot() if we have already prepared to sleep in
a previous iteration through the loop (and of course, don't cancel the
sleep until we're out of the loop).

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Tom: There seems to be something broken here.
Teodor: I'm in sackcloth and ashes...  Fixed.
   http://postgr.es/m/482d1632.8010...@sigaev.ru




Re: Why is parula failing?

2024-04-13 Thread Robins Tharakan
On Wed, 10 Apr 2024 at 10:24, David Rowley  wrote:
>
> Master failed today for the first time since the compiler upgrade.
> Again reltuples == 48.

Here's what I can add over the past few days:
- Almost all failures are either reltuples=48 or SIGABRTs
- Almost all SIGABRTs are DDLs - CREATE INDEX / CREATE AGGREGATEs / CTAS
  - A little too coincidental? Recent crashes have stack-trace if
interested.

Barring the initial failures (during move to gcc 13.2), in the past week:
- v15 somehow hasn't had a failure yet
- v14 / v16 have got only 1 failure each
- but v12 / v13 are lit up - failed multiple times.

-
robins


Re: altering a column's collation leaves an invalid foreign key

2024-04-13 Thread jian he
On Fri, Apr 12, 2024 at 5:06 PM jian he  wrote:
>
> On Tue, Mar 26, 2024 at 1:00 PM jian he  wrote:
> >
> > On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth
> >  wrote:
> > >
> > > On 3/23/24 10:04, Paul Jungwirth wrote:
> > > > Perhaps if the previous collation was nondeterministic we should force 
> > > > a re-check.
> > >
> > > Here is a patch implementing this. It was a bit more fuss than I 
> > > expected, so maybe someone has a
> > > better way.
I think I found a simple way.

the logic is:
* ATExecAlterColumnType changes one column once at a time.
* one column can only have one collation. so we don't need  to store a
list of collation oid.
* ATExecAlterColumnType we can get the new collation (targetcollid)
and original collation info.
* RememberAllDependentForRebuilding  will check the column dependent,
whether this column is referenced by a foreign key or not information
is recorded.
so AlteredTableInfo->changedConstraintOids have the primary key and
foreign key oids.
* ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments
in ATRewriteCatalogs)
* for tab->changedConstraintOids (foreign key, primary key) will call
ATPostAlterTypeParse, so
for foreign key (con->contype == CONSTR_FOREIGN)  will call TryReuseForeignKey.
* in TryReuseForeignKey, we can pass the information that our primary
key old collation is nondeterministic
and old collation != new collation to the foreign key constraint.
so foreign key can act accordingly at ATAddForeignKeyConstraint (old_check_ok).


based on the above logic, I add one bool in struct AlteredTableInfo,
one bool in struct Constraint.
bool in AlteredTableInfo is for storing it, later passing it to struct
Constraint.
we need bool in Constraint because ATAddForeignKeyConstraint needs it.
From 0eb0846450cb33996d6fb35c19290d297c26fd3c Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sat, 13 Apr 2024 20:28:25 +0800
Subject: [PATCH v3 1/1] ALTER COLUMN SET DATA TYPE Re-check foreign key ties
 under certain condition

with deterministic collations, we check ties by looking at binary
equality. But nondeterministic collation can allow non-identical values
to be considered equal, e.g. 'a' and 'A' when case-insensitive.
ALTER COLUMN .. SET DATA TYPE make
some references that used to be valid may not be anymore.

for `ALTER COLUMN .. SET DATA TYPE`:
If the changed key columns is part of the primary key columns,
then under the following condition
we need to revalidate the foreign key constraint:
* the primary key is referenced by a foreign key constraint
* the new collation is not the same as the old
* the old collation is nondeterministic.

discussion: https://postgr.es/m/78d824e0-b21e-480d-a252-e4b84bc2c24b%40illuminatedcomputing.com
---
 src/backend/commands/tablecmds.c  | 62 +++
 src/include/nodes/parsenodes.h|  3 +-
 .../regress/expected/collate.icu.utf8.out |  9 +++
 src/test/regress/sql/collate.icu.utf8.sql |  8 +++
 4 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 000212f2..d5bd61f1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -198,6 +198,7 @@ typedef struct AlteredTableInfo
 	/* Objects to rebuild after completing ALTER TYPE operations */
 	List	   *changedConstraintOids;	/* OIDs of constraints to rebuild */
 	List	   *changedConstraintDefs;	/* string definitions of same */
+	bool		verify_new_collation; /* T if we should recheck new collation for foreign key */
 	List	   *changedIndexOids;	/* OIDs of indexes to rebuild */
 	List	   *changedIndexDefs;	/* string definitions of same */
 	char	   *replicaIdentityIndex;	/* index to reset as REPLICA IDENTITY */
@@ -583,12 +584,13 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
    LOCKMODE lockmode);
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
  char *cmd, List **wqueue, LOCKMODE lockmode,
- bool rewrite);
+ bool rewrite,
+ bool verify_new_collation);
 static void RebuildConstraintComment(AlteredTableInfo *tab, AlterTablePass pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
-static void TryReuseForeignKey(Oid oldId, Constraint *con);
+static void TryReuseForeignKey(Oid oldId, Constraint *con, bool verify_new_collation);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
 	 List *options, LOCKMODE lockmode);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -10256,6 +10258,15 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	old_pfeqop_item);
 		}
 		if (old_check_ok)
+		{
+			/* also see TryReuseForeignKey.
+			 * collation_recheck is true means that, we need to
+			 * recheck the foreign key side value is ok with the
+			 * new primey key collat

Re: Why is parula failing?

2024-04-13 Thread Robins Tharakan
On Mon, 8 Apr 2024 at 21:25, Robins Tharakan  wrote:
>
>
> I'll keep an eye on this instance more often for the next few days.
> (Let me know if I could capture more if a run gets stuck again)


HEAD is stuck again on pg_sleep(), no CPU for the past hour or so.
Stack trace seems to be similar to last time.


$ pstack 24930
#0  0xb8280954 in epoll_pwait () from /lib64/libc.so.6
#1  0x00843408 in WaitEventSetWaitBlock (nevents=1,
occurred_events=, cur_timeout=60, set=0x3b38dac0) at
latch.c:1570
#2  WaitEventSetWait (set=0x3b38dac0, timeout=timeout@entry=60,
occurred_events=occurred_events@entry=0xfd1d66c8, nevents=nevents@entry=1,
wait_event_info=wait_event_info@entry=150994946) at latch.c:1516
#3  0x008437c4 in WaitLatch (latch=,
wakeEvents=wakeEvents@entry=41, timeout=60,
wait_event_info=wait_event_info@entry=150994946) at latch.c:538
#4  0x0090c384 in pg_sleep (fcinfo=) at misc.c:406
#5  0x00699350 in ExecInterpExpr (state=0x3b5a41a0,
econtext=0x3b5a3f98, isnull=) at execExprInterp.c:764
#6  0x006d1668 in ExecEvalExprSwitchContext (isNull=0xfd1d683f,
econtext=0x3b5a3f98, state=) at
../../../src/include/executor/executor.h:356
#7  ExecProject (projInfo=) at
../../../src/include/executor/executor.h:390
#8  ExecResult (pstate=) at nodeResult.c:135
#9  0x006ba26c in ExecProcNode (node=0x3b5a3e88) at
../../../src/include/executor/executor.h:274
#10 gather_getnext (gatherstate=0x3b5a3c98) at nodeGather.c:287
#11 ExecGather (pstate=0x3b5a3c98) at nodeGather.c:222
#12 0x0069d28c in ExecProcNode (node=0x3b5a3c98) at
../../../src/include/executor/executor.h:274
#13 ExecutePlan (execute_once=, dest=0x3b5ae8e0,
direction=, numberTuples=0, sendTuples=,
operation=CMD_SELECT, use_parallel_mode=,
planstate=0x3b5a3c98, estate=0x3b5a3a70) at execMain.c:1646
#14 standard_ExecutorRun (queryDesc=0x3b59c250, direction=,
count=0, execute_once=) at execMain.c:363
#15 0x008720e4 in PortalRunSelect (portal=portal@entry=0x3b410fb0,
forward=forward@entry=true, count=0, count@entry=9223372036854775807,
dest=dest@entry=0x3b5ae8e0) at pquery.c:924
#16 0x00873900 in PortalRun (portal=portal@entry=0x3b410fb0,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x3b5ae8e0,
altdest=altdest@entry=0x3b5ae8e0, qc=qc@entry=0xfd1d6bf0) at
pquery.c:768
#17 0x0086f5d4 in exec_simple_query
(query_string=query_string@entry=0x3b391c90
"SELECT pg_sleep(0.1);") at postgres.c:1274
#18 0x00870110 in PostgresMain (dbname=,
username=) at postgres.c:4680
#19 0x0086b6a0 in BackendMain (startup_data=,
startup_data_len=) at backend_startup.c:105
#20 0x007c6268 in postmaster_child_launch
(child_type=child_type@entry=B_BACKEND,
startup_data=startup_data@entry=0xfd1d70b8
"", startup_data_len=startup_data_len@entry=4,
client_sock=client_sock@entry=0xfd1d70c0)
at launch_backend.c:265
#21 0x007c9c50 in BackendStartup (client_sock=0xfd1d70c0) at
postmaster.c:3593
#22 ServerLoop () at postmaster.c:1674
#23 0x007cb8f8 in PostmasterMain (argc=argc@entry=8,
argv=argv@entry=0x3b38d320)
at postmaster.c:1372
#24 0x00496e18 in main (argc=8, argv=0x3b38d320) at main.c:197



 CPU% MEM%   TIME+  Command
.
.
  0.0  0.0  0:00.00 │ └─ /bin/sh -c cd /opt/postgres/build-farm-14 &&
PATH=/opt/gcc/home/ec2-user/proj/gcc/target/bin/
  0.0  0.1  0:00.07 │└─ /usr/bin/perl ./run_build.pl
--config=build-farm.conf HEAD --verbose
  0.0  0.0  0:00.00 │   └─ sh -c { cd pgsql.build/src/test/regress
&& make NO_LOCALE=1 check; echo $? > /opt/postg
  0.0  0.0  0:00.00 │  └─ make NO_LOCALE=1 check
  0.0  0.0  0:00.00 │ └─ /bin/sh -c echo "# +++ regress
check in src/test/regress +++" && PATH="/opt/postg
  0.0  0.0  0:00.10 │└─
../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=.
  0.0  0.0  0:00.01 │   ├─ psql -X -a -q -d regression
-v HIDE_TABLEAM=on -v HIDE_TOAST_COMPRESSION=on
  0.0  0.1  0:02.64 │   └─ postgres -D
/opt/postgres/build-farm-14/buildroot/HEAD/pgsql.build/src/test
  0.0  0.2  0:00.05 │  ├─ postgres: postgres
regression [local] SELECT
  0.0  0.0  0:00.06 │  ├─ postgres: logical
replication launcher
  0.0  0.1  0:00.36 │  ├─ postgres: autovacuum
launcher
  0.0  0.1  0:00.34 │  ├─ postgres: walwriter
  0.0  0.0  0:00.32 │  ├─ postgres: background
writer
  0.0  0.3  0:00.05 │  └─ postgres: checkpointer

-
robins

>


Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-13 Thread Ranier Vilela
Em sáb., 13 de abr. de 2024 às 04:16, Heikki Linnakangas 
escreveu:

> On 11/04/2024 16:37, Ranier Vilela wrote:
> > Em qui., 11 de abr. de 2024 às 09:54, Heikki Linnakangas
> > mailto:hlinn...@iki.fi>> escreveu:
> >
> > On 11/04/2024 15:03, Ranier Vilela wrote:
> >  > Em qua., 10 de abr. de 2024 às 18:28, Heikki Linnakangas
> >  > mailto:hlinn...@iki.fi>  > >> escreveu:
> >  >
> >  > On 10/04/2024 21:07, Ranier Vilela wrote:
> >  >  > Hi,
> >  >  >
> >  >  > Per Coverity.
> >  >  >
> >  >  > The function ReorderBufferTXNByXid,
> >  >  > can return NULL when the parameter *create* is false.
> >  >  >
> >  >  > In the functions ReorderBufferSetBaseSnapshot
> >  >  > and ReorderBufferXidHasBaseSnapshot,
> >  >  > the second call to ReorderBufferTXNByXid,
> >  >  > pass false to *create* argument.
> >  >  >
> >  >  > In the function ReorderBufferSetBaseSnapshot,
> >  >  > fixed passing true as argument to always return
> >  >  > a valid ReorderBufferTXN pointer.
> >  >  >
> >  >  > In the function ReorderBufferXidHasBaseSnapshot,
> >  >  > fixed by checking if the pointer is NULL.
> >  >
> >  > If it's a "known subxid", the top-level XID should already
> > have its
> >  > ReorderBufferTXN entry, so ReorderBufferTXN() should never
> > return NULL.
> >  >
> >  > There are several conditions to not return NULL,
> >  > I think trusting never is insecure.
> >
> > Well, you could make it an elog(ERROR, ..) instead. But the point is
> > that it should not happen, and if it does for some reason, that's
> very
> > suprpising and there is a bug somewhere. In that case, we should
> *not*
> > just blindly create it and proceed as if everything was OK.
> >
> > Thanks for the clarification.
> > I will then suggest improving robustness,
> > but avoiding hiding any possible errors that may occur.
>
> I don't much like adding extra runtime checks for "can't happen"
> scenarios either. Assertions would be more clear, but in this case the
> code would just segfault trying to dereference the NULL pointer, which
> is fine for a "can't happen" scenario.
>
This sounds a little confusing to me.
Is the project policy to *tolerate* dereferencing NULL pointers?
If this is the case, no problem, using Assert would serve the purpose of
protecting against these errors well.


> Looking closer, when we identify an XID as a subtransaction, we:
> - assign toplevel_xid,
> - set RBTXN_IS_SUBXACT, and
> - assign toptxn pointer.
>
> ISTM the 'toplevel_xid' and RBTXN_IS_SUBXACT are redundant.
> 'toplevel_xid' is only used in those two calls that do
> ReorderBufferTXNByXid(rb, txn->toplevel_xid,...), and you could replace
> those by following the 'toptxn' pointer directly. And RBTXN_IS_SUBXACT
> is redundant with (toptxn != NULL). So here's a patch to remove
> 'toplevel_xid' and RBTXN_IS_SUBXACT altogether.
>
+ if (rbtxn_is_subtxn(txn))
+ txn = rbtxn_get_toptxn(txn);

rbtxn_get_toptxn already calls rbtxn_is_subtx,
which adds a little unnecessary overhead.
I made a v1 of your patch, modifying this, please ignore it if you don't
agree.

best regards,
Ranier Vilela


v1-0001-Remove-redundant-field-and-flag-from-ReorderBufferTX.patch
Description: Binary data


Re: sql/json remaining issue

2024-04-13 Thread jian he
On Fri, Apr 12, 2024 at 5:44 PM Amit Langote  wrote:
>
> > elog(ERROR, "unrecognized json wrapper %d", wrapper);
> > should be
> > elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);
>
> Fixed in 0003.
>
the fix seems not in 0003?
other than that, everything looks fine.



SELECT * FROM JSON_TABLE (
'{"favorites":
{"movies":
  [{"name": "One", "director": "John Doe"},
   {"name": "Two", "director": "Don Joe"}],
 "books":
  [{"name": "Mystery", "authors": [{"name": "Brown Dan"}]},
   {"name": "Wonder", "authors": [{"name": "Jun Murakami"},
{"name":"Craig Doe"}]}]
}}'::json, '$.favs[*]'
COLUMNS (user_id FOR ORDINALITY,
  NESTED '$.movies[*]'
COLUMNS (
movie_id FOR ORDINALITY,
mname text PATH '$.name',
director text),
  NESTED '$.books[*]'
COLUMNS (
  book_id FOR ORDINALITY,
  bname text PATH '$.name',
  NESTED '$.authors[*]'
COLUMNS (
  author_id FOR ORDINALITY,
  author_name text PATH '$.name';


I actually did run the query, it returns null.
'$.favs[*]'
should be
'$.favorites[*]'



one more minor thing, I previously mentioned in getJsonPathVariable
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find jsonpath variable \"%s\"",
pnstrdup(varName, varNameLength;

do we need to remove pnstrdup?




Re: Why is parula failing?

2024-04-13 Thread Tom Lane
Robins Tharakan  writes:
> HEAD is stuck again on pg_sleep(), no CPU for the past hour or so.
> Stack trace seems to be similar to last time.

> #3  0x008437c4 in WaitLatch (latch=,
> wakeEvents=wakeEvents@entry=41, timeout=60,
> wait_event_info=wait_event_info@entry=150994946) at latch.c:538
> #4  0x0090c384 in pg_sleep (fcinfo=) at misc.c:406
> ...
> #17 0x0086f5d4 in exec_simple_query
> (query_string=query_string@entry=0x3b391c90
> "SELECT pg_sleep(0.1);") at postgres.c:1274

If we were only supposed to sleep 0.1 seconds, how is it waiting
for 60 ms (and, presumably, repeating that)?  The logic in
pg_sleep is pretty simple, and it's hard to think of anything except
the system clock jumping (far) backwards that would make this
happen.  Any chance of extracting the local variables from the
pg_sleep stack frame?

regards, tom lane




Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Tom Lane
Dmitry Koterov  writes:
> I almost lost my mind today trying to figure out why sending a SIGINT
> precisely to a psql interactive process delivers this SIGINT not only to
> that psql, but also to its parents.

Let me guess ... you're using zsh not bash?

I do not use zsh myself, but what I read in its man page suggests
that this is its designed behavior.  The kill command says its
arguments are "jobs", and elsewhere it says

   There are several ways to refer to jobs in the shell.  A job can be
   referred to by the process ID of any process of the job or by one of
   the following: ...

so I suspect zsh is treating that stack of processes as a "job" and
zapping all of it.  There is certainly nothing in psql that would
attempt to signal its parent process (let alone grandparent).

regards, tom lane




Identify huge pages accessibility using madvise

2024-04-13 Thread Dmitry Dolgov
Hi,

I would like to propose a small patch to address an annoying issue with
the way how PostgreSQL does fallback in case if "huge_pages = try" is
set. Here is how the problem looks like:

* PostgreSQL is starting on a machine with some huge pages available

* It tries to identify that fact and does mmap with MAP_HUGETLB, which
  succeeds

* But it has a pleasure to run inside a cgroup with a hugetlb
  controller and limits set to 0 (or anything less than PostgreSQL
  needs)

* Under this circumstances PostgreSQL will proceed allocating huge
  pages, but the first page fault will trigger SIGBUS

I've sketched out how to reproduce it with cgroup v1 and v2 in the
attached scripts.

This sounds like quite a rare combination of factors, but apparently
it's fairly easy to face this on K8s/OpenShift. There was a bug reported
some time ago [1] about this behaviour, and back then I was under the
impression it's a solved matter with nothing to do. Yet I still observe
this type of issues, the latest one not longer than a week ago.

After some research I found what looks to me like a relatively simple
way to address the problem. In Linux kernel 5.14 a new flag to madvise
was introduced that might be just what we need here. It's called
MADV_POPULATE_READ [2] and it tells kernel to populate page tables by
triggering read faults if required. One by-design feature of this flag
is to fail the madvise call in the situations like one above, giving an
opportunity to avoid SIGBUS.

I've outlined a patch to implement this approach and tested it on a
newish Linux kernel I've got lying around (6.9.0-rc1) -- no SIGBUS,
PostgreSQL does fallback to not use huge pages. The resulting change
seems to be small enough to justify addressing this small but annoying
issue. Any thoughts or commentaries about the proposal?

[1]: 
https://www.postgresql.org/message-id/flat/HE1PR0701MB256920EEAA3B2A9C06249F339E110%40HE1PR0701MB2569.eurprd07.prod.outlook.com
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4ca9b3859dac14bbef0c27d00667bb5b10917adb
>From 0001d43117dc5cad08fb0908a3e50a00c56f88d3 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Sat, 13 Apr 2024 11:31:46 +0200
Subject: [PATCH v1] Identify huge pages accesibility using madvise

Currently, PostgreSQL tries to figure out whether huge pages are
available, to fallback if "huge_pages = try" is set. There is an
annoying situation that this approach cannot handle, when there are huge
pages available, but they are restricted via cgroups. If this happens
and PostgreSQL is running inside a cgroup that limits on huge pages to
0, the allocation part with mmap would work, but the very first page
fault will return SIGBUS.

To handle this situation more gracefully, add madvise call with
MADV_POPULATE_READ flag if available (it was introduced in Linux kernel
5.14). This flag tells kernel to populate page tables by triggering read
faults if required, and in the situation described above it will fail,
giving PostgreSQL an opportunity to fallback and proceed without huge
pages. Note that it's not a side effect, but rather a designed behaviour [1].

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4ca9b3859dac14bbef0c27d00667bb5b10917adb
---
 src/backend/port/sysv_shmem.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 1a6d8fa0fb..cbacf62066 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -600,7 +600,7 @@ CreateAnonymousSegment(Size *size)
 {
Sizeallocsize = *size;
void   *ptr = MAP_FAILED;
-   int mmap_errno = 0;
+   int mmap_errno = 0, madv_errno = 0;
 
 #ifndef MAP_HUGETLB
/* PGSharedMemoryCreate should have dealt with this case */
@@ -625,6 +625,28 @@ CreateAnonymousSegment(Size *size)
if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED)
elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge 
pages disabled: %m",
 allocsize);
+
+#ifdef MADV_POPULATE_READ
+   /*
+* Verifying if huge pages are available is done in two steps: 
first
+* mmap with MAP_HUGETLB, then madvise with MADV_POPULATE_READ. 
For the
+* latter the MADV_POPULATE_READ flag will tell kernel to 
populate page
+* tables by triggering read faults if required, revealing 
potential
+* access issues that otherwise would result in SIGBUS.
+*
+* If mmap fails, no huge pages are available; if it does not, 
there is
+* still possibility that huge pages are limited via cgroups. If
+* madvise fails, there are some huge pages, but we cannot 
access them
+* due to cgroup limi

Re: Why is parula failing?

2024-04-13 Thread Tomas Vondra



On 4/9/24 05:48, David Rowley wrote:
> On Mon, 8 Apr 2024 at 23:56, Robins Tharakan  wrote:
>> #3  0x0083ed84 in WaitLatch (latch=, 
>> wakeEvents=wakeEvents@entry=41, timeout=60, 
>> wait_event_info=wait_event_info@entry=150994946) at latch.c:538
>> #4  0x00907404 in pg_sleep (fcinfo=) at misc.c:406
> 
>> #17 0x0086a944 in exec_simple_query 
>> (query_string=query_string@entry=0x28171c90 "SELECT pg_sleep(0.1);") at 
>> postgres.c:1274
> 
> I have no idea why WaitLatch has timeout=60.  That should be no
> higher than timeout=100 for "SELECT pg_sleep(0.1);".  I have no
> theories aside from a failing RAM module, cosmic ray or a well-timed
> clock change between the first call to gettimeofday() in pg_sleep()
> and the next one.
> 
> I know this animal is running debug_parallel_query = regress, so that
> 0.1 Const did have to get serialized and copied to the worker, so
> there's another opportunity for the sleep duration to be stomped on,
> but that seems pretty unlikely.
> 

AFAIK that GUC is set only for HEAD, so it would not explain the
failures on the other branches.

> I can't think of a reason why the erroneous  reltuples=48 would be
> consistent over 2 failing runs if it were failing RAM or a cosmic ray.
> 

Yeah, that seems very unlikely.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Why is parula failing?

2024-04-13 Thread Tomas Vondra
On 4/13/24 15:02, Robins Tharakan wrote:
> On Wed, 10 Apr 2024 at 10:24, David Rowley  wrote:
>>
>> Master failed today for the first time since the compiler upgrade.
>> Again reltuples == 48.
> 
> Here's what I can add over the past few days:
> - Almost all failures are either reltuples=48 or SIGABRTs
> - Almost all SIGABRTs are DDLs - CREATE INDEX / CREATE AGGREGATEs / CTAS
>   - A little too coincidental? Recent crashes have stack-trace if
> interested.
> 
> Barring the initial failures (during move to gcc 13.2), in the past week:
> - v15 somehow hasn't had a failure yet
> - v14 / v16 have got only 1 failure each
> - but v12 / v13 are lit up - failed multiple times.
> 

I happened to have an unused rpi5, so I installed Armbian aarch64 with
gcc 13.2.0, built with exactly the same configure options as parula, and
did ~300 loops of "make check" without a single failure.

So either parula has packages in a different way, or maybe it's a more
of a timing issue and rpi5 is way slower than graviton3.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: UUID v7

2024-04-13 Thread Sergey Prokhorenko
I think that for the sake of such an epoch-making thing as UUIDv7 it would be 
worth slightly unfreezing this feature freeze.

Best regards, 


Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Saturday, 13 April 2024 at 09:58:29 am GMT+3, Andrey M. Borodin 
 wrote:  
 
 

> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio  wrote:
> 
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in 
lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now 
in AUTH48 state. This means after final approval by authors RFC will be 
imminently publicised. Most probably, this will happen circa 2 weeks after 
feature freeze :)


Best regards, Andrey Borodin.

[0] https://www.rfc-editor.org/auth48/rfc9562  

Re: allow changing autovacuum_max_workers without restarting

2024-04-13 Thread Nathan Bossart
On Fri, Apr 12, 2024 at 10:17:44PM +, Imseih (AWS), Sami wrote:
>>> Hm. Maybe the autovacuum launcher could do that.
> 
> Would it be better to use a GUC check_hook that compares the 
> new value with the max allowed values and emits a WARNING ?
> 
> autovacuum_max_workers already has a check_autovacuum_max_workers
> check_hook, which can be repurposed for this.
> 
> In the POC patch, this check_hook is kept as-is, which will no longer make 
> sense.

IIRC using GUC hooks to handle dependencies like this is generally frowned
upon because it tends to not work very well [0].  We could probably get it
to work for this particular case, but IMHO we should still try to avoid
this approach.  I didn't find any similar warnings for other GUCs like
max_parallel_workers_per_gather, so it might not be crucial to emit a
WARNING here.

[0] https://postgr.es/m/27574.1581015893%40sss.pgh.pa.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Parallel CREATE INDEX for BRIN indexes

2024-04-13 Thread Tomas Vondra
On 4/13/24 11:19, Tomas Vondra wrote:
> On 4/13/24 10:36, Andres Freund wrote:
>> Hi,
>>
>> While preparing a differential code coverage report between 16 and HEAD, one
>> thing that stands out is the parallel brin build code. Neither on
>> coverage.postgresql.org nor locally is that code reached during our tests.
>>
> 
> Thanks for pointing this out, it's definitely something that I need to
> improve (admittedly, should have been part of the patch). I'll also look
> into eliminating the difference between BTREE and BRIN parallel builds,
> mentioned in my last message in this thread.
> 

Here's a couple patches adding a test for the parallel CREATE INDEX with
BRIN. The actual test is 0003/0004 - I added the test to pageinspect,
because that allows cross-checking the index to one built without
parallelism, which I think is better than just doing CREATE INDEX
without properly testing it produces correct results.

It's not entirely trivial because for some opclasses (e.g. minmax-multi)
the results depend on the order in which values are added, and order in
which summaries from different workers are merged.

Funnily enough, while adding the test, I ran into two pre-existing bugs.
One is that brin_bloom_union forgot to update the number of bits set in
the bitmap, another one is that 6bcda4a721 changes PG_DETOAST_DATUM to
the _PACKED version, which however does the wrong thing. Both of which
are mostly harmless - it only affects the output function, which is
unused outside pageinspect. No impact on query correctness etc.

The test needs a bit more work to make sure it works on 32-bit machines
etc. which I think may affect available space on a page, which in turn
might affect the minmax-multi summaries. But I'll take care this early
next week.


Funnily

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 450f992c12d43c14aa1b4b72cfd9d3ce0251eed1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 13 Apr 2024 22:15:26 +0200
Subject: [PATCH v20240413 1/4] Update nbits_set in brin_bloom_union

When merging BRIN bloom summaries, it's not enough to merge the bitmaps,
we need to update the nbits_set counter too.

This is mostly harmless, as it does not affect correctness - as the
counter is used only in the out function, and that's used only in
pageinspect to show basic info about the summary.

Backpatch-through: 14
---
 src/backend/access/brin/brin_bloom.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index ebf33016279..9e3bc567303 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -687,10 +687,20 @@ brin_bloom_union(PG_FUNCTION_ARGS)
 
 	nbytes = (filter_a->nbits) / 8;
 
+	filter_a->nbits_set = 0;
+
 	/* simply OR the bitmaps */
 	for (i = 0; i < nbytes; i++)
+	{
 		filter_a->data[i] |= filter_b->data[i];
 
+		for (int bit = 0; bit < 8; bit++)
+		{
+			if (filter_a->data[i] & (0x01 << bit))
+filter_a->nbits_set++;
+		}
+	}
+
 	PG_RETURN_VOID();
 }
 
-- 
2.44.0

From 07739713893bb0f6be5853324e8897159cf2ad07 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 13 Apr 2024 22:18:37 +0200
Subject: [PATCH v20240413 2/4] Use correct PG_DETOAST_DATUM macro in BRIN

Commit 6bcda4a721 replaced PG_DETOAST_DATUM with PG_DETOAST_DATUM_PACKED
in two places, in output function for minmax-multi and bloom. But this
is incorrect - the structs backing both summaries include 4B header, so
we need to detoast them fully. With _PACKED we may end up with 1B
header, so the fields will be shifted and have incorrect values.

Backpatch-through: 16
---
 src/backend/access/brin/brin_bloom.c| 2 +-
 src/backend/access/brin/brin_minmax_multi.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index 9e3bc567303..1c2437cef74 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -807,7 +807,7 @@ brin_bloom_summary_out(PG_FUNCTION_ARGS)
 	StringInfoData str;
 
 	/* detoast the data to get value with a full 4B header */
-	filter = (BloomFilter *) PG_DETOAST_DATUM_PACKED(PG_GETARG_DATUM(0));
+	filter = (BloomFilter *) PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
 
 	initStringInfo(&str);
 	appendStringInfoChar(&str, '{');
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index c5962c00d64..e5d95de5d84 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -3023,7 +3023,7 @@ brin_minmax_multi_summary_out(PG_FUNCTION_ARGS)
 	 * Detoast to get value with full 4B header (can't be stored in a toast
 	 * table, but can use 1B header).
 	 */
-	ranges = (SerializedRanges *) PG_DETOAST_DATUM_PACKED(PG_GETARG_DATUM(0));
+	ranges = (SerializedRanges *) PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
 
 	/* lookup output func for the type */

Why should I use fileno(stderr) instead of STDERR_FILENO

2024-04-13 Thread Dima Rybakov (Tlt)
Dear pgsql hackers,

I found a mixture of fileno(stderr) and STDERR_FILENO in PostgreSQL source
code which in most cases mean the same. But I have found recently that
Microsoft's fileno() _fileno() can return -2 sometimes. After some
experiments I found that applications running as windows service have
problems with stderr. I.e. fileno(stderr) returns -2 (negative two) in
windows service mode. That causes some issues with the logging collector.
Meanwhile the value of STDERR_FILENO always equals 2 and does not depend on
application mode because it is a macro.

I wonder if there are hidden advantages of using fileno(stderr) ?? Should I
use  only "fileno(stderr)" or using STDERR_FILENO is acceptable too ?? Are
there cases when I should not use STDERR_FILENO  ??

Sincerely,
Dmitry


Additional references
1. BUG #18400
2.
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fileno?view=msvc-170
Quote: "If stdout or stderr is not associated with an output stream (for
example, in a Windows application without a console window), the file
descriptor returned is -2. ..."


Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread David G. Johnston
On Saturday, April 13, 2024, Dmitry Koterov 
wrote:

>
>
> % psql --version
> psql (PostgreSQL) 16.0
>

How did you install this and can you install other, supported, versions?

David J.


Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Thomas Munro
On Sun, Apr 14, 2024 at 11:18 AM Dmitry Koterov
 wrote:
> Can it be e.g. readline? Or something related to tty or session settings 
> which psql could modify (I did not find any in the source code though).

I was wondering about that.  Are you using libedit or libreadline?
What happens if you build without readline/edit support?  From a quick
glance at libedit, it does a bunch of signal interception, but I
didn't check the details.  It is interested in stuff like SIGWINCH,
the window-resized-by-user signal.




Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Tom Lane
Dmitry Koterov  writes:
> I wish it was zsh... I tested it with zsh, but with bash (and with
> low-level kill syscall), I observed the same effect unfortunately.

> So it's still a puzzle.

> 1. Even more, when I send a kill() low-level syscall using e.g. Perl - perl
> -e 'kill("INT", 16107)' - it is the same.
> 2. If any other program but psql is used (e.g. vim or my custom Perl script
> which ignores SIGINT), the effect is not reproducible.

> Can it be e.g. readline? Or something related to tty or session settings
> which psql could modify (I did not find any in the source code though).

OK, I tried dtruss'ing psql on macOS.  What I see is that with
Apple's libedit, the response to SIGINT includes this:

kill(0, 2)   = 0 0

that is, "SIGINT my whole process group".  If I build with libreadline
from MacPorts, I just see

kill(30902, 2)   = 0 0

(30902 being the process's own PID).  I'm not real sure why either
library finds it necessary to re-signal the process --- maybe they
trap SIGINT and then try to hide that by reinstalling the app's
normal SIGINT handler and re-delivering the signal.  But anyway,
libedit seems to be vastly exceeding its authority here.  If
signaling the whole process group were wanted, it would have been
the responsibility of the original signaller to do that.

regards, tom lane




Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Dmitry Koterov
> OK, I tried dtruss'ing psql on macOS.  What I see is that with
> Apple's libedit, the response to SIGINT includes this:
> kill(0, 2)   = 0 0

OK, so it's libedit who does this. I should've tried drtuss instead of not
quite working lldb. I'll try to dig further.

Thank you, Tom!



> How did you install this and can you install other, supported, versions?
...

> I was wondering about that.  Are you using libedit or libreadline?
>

I did not build psql, I use the version delivered by brew on MacOS.

I think Tom already found out that it's libedit who is guilty. But for the
sake of history, posting the following below.

I just wrote a small C wrapper which prints, WHO is sending that SIGINT to
the parent processes, and it was indeed the psql process:



bash-3.2$ ./0-build.sh && ./1-run.sh
+ gcc my0.c -o my0
+ ./my0
my0 pid is 45710
psql (16.0, server 13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
postgres=#
int in my2
int in my1
my0 got signal 2 from process *45723* running as user 501
postgres=#

bash-3.2$ ./2-watch.sh
Every 0.5s: pstree | grep -E "bash
|yarn|psql|vim|sleep|lldb|my.pl|perl|debugserver|my0|my1|my2"
| grep dmitry | grep -v grep
 |   | \-+= 36468 dmitry bash -rcfile .bashrc
 |   |   \-+= 45709 dmitry /bin/bash ./1-run.sh
 |   | \-+- 45710 dmitry ./my0
 |   |   \-+- 45711 dmitry /usr/bin/perl -w ./my1.pl
 |   | \-+- 45712 dmitry /usr/bin/perl -w /tmp/my2.pl
 |   |   \--- *45723* dmitry psql
 |   |   \-+= 44170 dmitry /bin/bash ./2-watch.sh

bash-3.2$ ./3-kill.sh
+ perl -e 'kill("INT", `pgrep psql`)'

bash-3.2$ cat ./my0.c
#include 
#include 
#include 
#include 

void handler(int signal, siginfo_t *info, void *data) {
  printf("my0 got signal %d from process %d running as user %d\n",
 signal, info->si_pid, info->si_uid);
}

int main(void) {
  printf("my0 pid is %d\n", getpid());
  if (fork()) {
struct sigaction sa;
sigset_t mask;
sigemptyset(&mask);
sa.sa_sigaction = &handler;
sa.sa_mask = mask;
sa.sa_flags = SA_SIGINFO;
sigaction(SIGINT, &sa, NULL);
while (wait(NULL) == -1);
  } else {
if (execl("./my1.pl", "my1", NULL) == -1) {
  perror("execl");
}
  }
  return 0;
}





On Sat, Apr 13, 2024 at 4:49 PM Tom Lane  wrote:

> Dmitry Koterov  writes:
> > I wish it was zsh... I tested it with zsh, but with bash (and with
> > low-level kill syscall), I observed the same effect unfortunately.
>
> > So it's still a puzzle.
>
> > 1. Even more, when I send a kill() low-level syscall using e.g. Perl -
> perl
> > -e 'kill("INT", 16107)' - it is the same.
> > 2. If any other program but psql is used (e.g. vim or my custom Perl
> script
> > which ignores SIGINT), the effect is not reproducible.
>
> > Can it be e.g. readline? Or something related to tty or session settings
> > which psql could modify (I did not find any in the source code though).
>
> OK, I tried dtruss'ing psql on macOS.  What I see is that with
> Apple's libedit, the response to SIGINT includes this:
>
> kill(0, 2)   = 0 0
>
> that is, "SIGINT my whole process group".  If I build with libreadline
> from MacPorts, I just see
>
> kill(30902, 2)   = 0 0
>
> (30902 being the process's own PID).  I'm not real sure why either
> library finds it necessary to re-signal the process --- maybe they
> trap SIGINT and then try to hide that by reinstalling the app's
> normal SIGINT handler and re-delivering the signal.  But anyway,
> libedit seems to be vastly exceeding its authority here.  If
> signaling the whole process group were wanted, it would have been
> the responsibility of the original signaller to do that.
>
> regards, tom lane
>


Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Thomas Munro
On Sun, Apr 14, 2024 at 11:49 AM Tom Lane  wrote:
> Dmitry Koterov  writes:
> > I wish it was zsh... I tested it with zsh, but with bash (and with
> > low-level kill syscall), I observed the same effect unfortunately.
>
> > So it's still a puzzle.
>
> > 1. Even more, when I send a kill() low-level syscall using e.g. Perl - perl
> > -e 'kill("INT", 16107)' - it is the same.
> > 2. If any other program but psql is used (e.g. vim or my custom Perl script
> > which ignores SIGINT), the effect is not reproducible.
>
> > Can it be e.g. readline? Or something related to tty or session settings
> > which psql could modify (I did not find any in the source code though).
>
> OK, I tried dtruss'ing psql on macOS.  What I see is that with
> Apple's libedit, the response to SIGINT includes this:
>
> kill(0, 2)   = 0 0
>
> that is, "SIGINT my whole process group".  If I build with libreadline
> from MacPorts, I just see
>
> kill(30902, 2)   = 0 0
>
> (30902 being the process's own PID).  I'm not real sure why either
> library finds it necessary to re-signal the process --- maybe they
> trap SIGINT and then try to hide that by reinstalling the app's
> normal SIGINT handler and re-delivering the signal.  But anyway,
> libedit seems to be vastly exceeding its authority here.  If
> signaling the whole process group were wanted, it would have been
> the responsibility of the original signaller to do that.

Probably to intercept SIGWINCH -- I assume it's trying to propagate
the signal to the "real" signal handler, but it's propagating it a
little too far.

https://github.com/NetBSD/src/blob/1de18f216411bce77e26740327b0782976a89965/lib/libedit/sig.c#L110




Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Tom Lane
Thomas Munro  writes:
> On Sun, Apr 14, 2024 at 11:49 AM Tom Lane  wrote:
>> OK, I tried dtruss'ing psql on macOS.  What I see is that with
>> Apple's libedit, the response to SIGINT includes this:
>> kill(0, 2)   = 0 0

> https://github.com/NetBSD/src/blob/1de18f216411bce77e26740327b0782976a89965/lib/libedit/sig.c#L110

Ah, I was wondering if that was from upstream libedit or was an
Apple-ism.  Somebody should file a bug.

regards, tom lane




Re: allow changing autovacuum_max_workers without restarting

2024-04-13 Thread Imseih (AWS), Sami
> IIRC using GUC hooks to handle dependencies like this is generally frowned
> upon because it tends to not work very well [0]. We could probably get it
> to work for this particular case, but IMHO we should still try to avoid
> this approach. 

Thanks for pointing this out. I agree, this could lead to false logs being
emitted.

> so it might not be crucial to emit a
> WARNING here.

As mentioned earlier in the thread, we can let the autovacuum launcher emit the 
log,  but it will need to be careful not flood the logs when this condition 
exists ( i.e.
log only the first time the condition is detected or log every once in a while )

The additional complexity is not worth it.

Regards,

Sami






Re: In MacOS, psql reacts on SIGINT in a strange fashion (Linux is fine)

2024-04-13 Thread Dmitry Koterov
Thanks to everyone!

I will file a bug.

Anyways, I just built a tool to work-around it all. It allows to run psql
from processes which don't handle SIGINT in a proper shell-like manner
(like yarn for instance): https://github.com/dimikot/run-in-separate-pgrp

Basically, without this, an attempt to run `yarn psql` and then pressing ^C
kills yarn (it kills it in Linux too, since ^C propagates SIGINT to all
processes in the foreground group), because yarn doesn't behave nicely with
exit codes (and exit signals) of its child processes. With
run-in-separate-pgrp wrapping, ^C is only delivered to psql.


On Sat, Apr 13, 2024 at 5:09 PM Tom Lane  wrote:

> Thomas Munro  writes:
> > On Sun, Apr 14, 2024 at 11:49 AM Tom Lane  wrote:
> >> OK, I tried dtruss'ing psql on macOS.  What I see is that with
> >> Apple's libedit, the response to SIGINT includes this:
> >> kill(0, 2)   = 0 0
>
> >
> https://github.com/NetBSD/src/blob/1de18f216411bce77e26740327b0782976a89965/lib/libedit/sig.c#L110
>
> Ah, I was wondering if that was from upstream libedit or was an
> Apple-ism.  Somebody should file a bug.
>
> regards, tom lane
>


SupportRequestRows support function for generate_series_timestamptz

2024-04-13 Thread David Rowley
When looking at [1], I noticed that we don't have a prosupport
function for the timestamp version of generate_series.

We have this for the integer versions of generate_series(), per:

postgres=# explain analyze select * from generate_series(1, 256, 2);
QUERY PLAN
--
 Function Scan on generate_series  (cost=0.00..1.28 rows=128 width=4)
(actual time=0.142..0.183 rows=128 loops=1)

The timestamp version just gives the default 1000 row estimate:

postgres=# explain analyze select * from generate_series('2024-01-01',
'2025-01-01', interval '1 day');
 QUERY PLAN

 Function Scan on generate_series  (cost=0.00..10.00 rows=1000
width=8) (actual time=0.604..0.718 rows=367 loops=1)

I had some spare time today, so wrote a patch, which gives you:

postgres=# explain analyze select * from generate_series('2024-01-01',
'2025-01-01', interval '1 day');
QUERY PLAN
--
 Function Scan on generate_series  (cost=0.00..3.67 rows=367 width=8)
(actual time=0.258..0.291 rows=367 loops=1)

This required a bit of hackery to not have timestamp_mi() error out in
the planner when the timestamp difference calculation overflows.  I
considered adding ereturn support to fix that, but that felt like
opening Pandora's box.  Instead, I added some pre-checks similar to
what's in timestamp_mi() to have the support function fall back on the
1000 row estimate when there will be an overflow.

Also, there's no interval_div, so the patch has a macro that converts
interval to microseconds and does floating point division. I think
that's good enough for row estimations.

I'll park this here until July CF.

(I understand this doesn't help the case in [1] as the generate_series
inputs are not const there)

David

[1] 
https://www.postgresql.org/message-id/CAMPYKo0FouB-HZ1k-_Ur2v%2BkK71q0T5icQGrp%2BSPbQJGq0H2Rw%40mail.gmail.com
From ca0e982215d1335d015a2b515f038b7186935af0 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Sun, 14 Apr 2024 14:49:39 +1200
Subject: [PATCH v1] Add support function for generate_series for timestamps

This provides the planner with row estimates for
generate_series(TIMESTAMP, TIMESTAMP, INTERVAL) and
generate_series(TIMESTAMPTZ, TIMESTAMPTZ, INTERVAL)
---
 src/backend/utils/adt/timestamp.c | 83 +++
 src/include/catalog/pg_proc.dat   |  9 +++-
 2 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/timestamp.c 
b/src/backend/utils/adt/timestamp.c
index 963f2ec74a..25f2680243 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -27,6 +27,7 @@
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "optimizer/optimizer.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "parser/scansup.h"
@@ -6668,6 +6669,88 @@ generate_series_timestamptz_at_zone(PG_FUNCTION_ARGS)
return generate_series_timestamptz_internal(fcinfo);
 }
 
+/*
+ * Planner support function for generate_series(timestamp, timestamp, interval)
+ */
+Datum
+generate_series_timestamp_support(PG_FUNCTION_ARGS)
+{
+   Node *rawreq = (Node *) PG_GETARG_POINTER(0);
+   Node *ret = NULL;
+
+   if (IsA(rawreq, SupportRequestRows))
+   {
+   /* Try to estimate the number of rows returned */
+   SupportRequestRows *req = (SupportRequestRows *) rawreq;
+
+   if (is_funcclause(req->node)) /* be paranoid */
+   {
+   List *args = ((FuncExpr *) req->node)->args;
+   Node *arg1, *arg2, *arg3;
+
+   /* We can use estimated argument values here */
+   arg1 = estimate_expression_value(req->root, 
linitial(args));
+   arg2 = estimate_expression_value(req->root, 
lsecond(args));
+   arg3 = estimate_expression_value(req->root, 
lthird(args));
+
+   /*
+* If any argument is constant NULL, we can safely 
assume that
+* zero rows are returned.  Otherwise, if they're all 
non-NULL
+* constants, we can calculate the number of rows that 
will be
+* returned.
+*/
+   if ((IsA(arg1, Const) && ((Const *) arg1)->constisnull) 
||
+   (IsA(arg2, Const) && ((Const *) 
arg2)->constisnull) ||
+   (IsA(arg3, Const) && ((Const *) 
arg3)->constisnull))
+   

Re: Cleaning up threading code

2024-04-13 Thread Thomas Munro
On Mon, Jul 3, 2023 at 8:43 PM Heikki Linnakangas  wrote:
> On 10/06/2023 05:23, Thomas Munro wrote:
> > 2.  I don't like the way we have to deal with POSIX vs Windows at
> > every site where we use threads, and each place has a different style
> > of wrappers.  I considered a few different approaches to cleaning this
> > up:
> >
> > * provide centralised and thorough pthread emulation for Windows; I
> > don't like this, I don't even like all of pthreads and there are many
> > details to get lost in
> > * adopt C11 ; unfortunately it is too early, so you'd need
> > to write/borrow replacements for at least 3 of our 11 target systems
> > * invent our own mini-abstraction for a carefully controlled subset of stuff
>
> Google search on "c11 threads on Windows" found some emulation wrappers:
> https://github.com/jtsiomb/c11threads and
> https://github.com/tinycthread/tinycthread, for example. Would either of
> those work for us?
>
> Even if we use an existing emulation wrapper, I wouldn't mind having our
> own pg_* abstration on top of it, to document which subset of the POSIX
> or C11 functions we actually use.

Yeah.  I am still interested in getting our thread API tidied up, and
I intend to do that for PG 18.  The patch I posted on that front,
which can be summarised as a very lightweight subset of standard
 except with pg_ prefixes everywhere mapping to Windows or
POSIX threads, still needs one tiny bit more work: figuring out how to
get the TLS destructors to run on Windows FLS or similar, or
implementing destructors myself (= little destructor callback list
that a thread exit hook would run, work I was hoping to avoid by using
something from the OS libs... I will try again soon).  Andres also
opined upthread that we should think about offering a thread_local
storage class and getting away from TLS with keys.

One thing to note is that the ECPG code is using TLS with destructors
(in fact they are b0rked in master, git grep "FIXME: destructor" so
ECPG leaks memory on Windows, so the thing that I'm trying to fix in
pg_threads.h is actually fixing a long-standing bug),  and although
thread_local has destructors in C++ it doesn't in C, so if we decided
to add the storage class but not bother with the tss_create feature,
then ECPG would need to do cleanup another way.  I will look into that
option.

One new development since last time I wrote the above stuff is that
the Microsoft toolchain finally implemented the library components of
C11 :

https://devblogs.microsoft.com/cppblog/c11-threads-in-visual-studio-2022-version-17-8-preview-2/

It seems like it would be a long time before we could contemplate
using that stuff though, especially given our duty of keeping libpq
and ECPG requirements low and reasonable.  However, it seems fairly
straightforward to say that we require C99 + some way to access a
C11-like thread local storage class.  In practice that means a
pg_thread_local macro that points to MSVC __declspec(thread) (which
works since MSVC 2014?) or GCC/Clang __thread_local (which works since
GCC4.8 in 2013?) or whatever.  Of course every serious toolchain
supports this stuff somehow or other, having been required to for C11.

I can't immediately see any build farm animals that would be affected.
Grison's compiler info is out of date, it's really running
8.something.  The old OpenBSD GCC 4.2 animal was upgraded, and antique
AIX got the boot: that's not even a coincidence, those retirements
came about because those systems didn't support arbitrary alignment,
another C11 feature that we now effectively require.  (We could have
worked around it it we had to but on but they just weren't reasonable
targets.)

So I'll go ahead and add the storage class to the next version, and
contemplate a couple of different options for the tss stuff, including
perhaps leaving it out if that seems doable.

https://stackoverflow.com/questions/29399494/what-is-the-current-state-of-support-for-thread-local-across-platforms