Adding a '--two-phase' option to 'pg_createsubscriber' utility.

2024-12-09 Thread Shubham Khanna
Hi all,

I am writing to propose the addition of the two_phase option in
pg_createsubscriber. As discussed in [1], supporting this feature
during the development of pg_createsubscriber was planned for this
version.
Currently, pg_createsubscriber creates subscriptions with the
two_phase option set to false. Enabling the two_phase option after a
subscription has been created is not straightforward, as it requires
the subscription to be disabled first. This patch aims to address this
limitation by incorporating the two_phase option into
pg_createsubscriber which will help create subscription with two_phase
option and make use of the advantages of creating subscription with
two_phase option.
The attached patch has the changes for the same.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B7gRxiK3xNEH03CY3mMKxqi7BVz2k3VxxVZp8fgjHZxg%40mail.gmail.com

Thanks and regards,
Shubham Khanna.


v1-0001-Add-support-for-enabling-disabling-two-phase-comm.patch
Description: Binary data


Re: doc: Mention clock synchronization recommendation for hot_standby_feedback

2024-12-09 Thread Jakub Wartak
Hi Euler!,

On Thu, Dec 5, 2024 at 4:07 PM Euler Taveira  wrote:

> On Thu, Dec 5, 2024, at 6:43 AM, Jakub Wartak wrote:
>
> One of our customers ran into a very odd case, where hot standby feedback
> backend_xmin propagation stopped working due to major (hours/days) clock
> time shifts on hypervisor-managed VMs. This happens (and is fully
> reproducible) e.g. in scenarios where standby connects and its own VM is
> having time from the future (relative to primary) and then that time goes
> back to "normal". In such situation "sends hot_standby_feedback xmin"
> timestamp messages are stopped being transferred, e.g.:
>
>
> Is it worth a WARNING message if there is a "huge" time difference
> between the servers? We already have the reply time in the message so
> it is a matter of defining the "huge" interval plus a roundtrip. We also
> need to avoid spamming the log.
>

I'm trying to stay consistent with what the recovery_min_apply_delay did
(there is a warning in the docs, but there's no warning in code) and I just
wanted the to have pointer in the documentation that if someone is using
hot_standby_feedback then he would be at least warned before. Given it is
very rare I don't think we need additional code (+ what Andres has noted ).


> Your patch looks good to me. Should it be converted into a
> ...? (See synchronous_standby_names [1] for an example.)
>

Fine for me, but we would have to also convert the recovery_min_apply_delay
to do the same, right?

-J.


Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-09 Thread John Naylor
On Sat, Dec 7, 2024 at 10:16 PM Devulapalli, Raghuveer
 wrote:

> > There is another technique that computes CRC on 3 separate chunks and
> > combines them at the end, so about 3x faster on large-enough chunks.
> > That's the way used for the Arm proposal [0], coincidentally also citing a 
> > white
> > paper from Intel, but as Dimitry pointed out in that thread, its link has 
> > apparently
> > disappeared. Raghuveer, do you know about this, and is there another link
> > available?
> >
> > http://www.intel.com/content/dam/www/public/us/en/documents/white-
> > papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf
>
>  I am not aware of this paper. Let me poke a few people internally and get 
> back to you on this.

Thanks! I have a portable PoC of how this works, but I'll save that
for another thread, since it's not Intel (or Arm) specific.

> > The cut off point in one implementation is only 144 bytes [1] , which is 
> > maybe not
> > as small as we'd like, but is quite a bit smaller than 256. That seems 
> > better suited
> > to our workloads, and more portable. I have a *brand-new* laptop with an 
> > Intel
> > chip, and IIUC it doesn't support AVX-512 because it uses a big-little 
> > architecture.
> > I also understand that Sierra Forrest (a server product line) will be all 
> > little cores
> > with no AVX-512 support, so I'm not sure why the proposal here requires AVX-
> > 512.
>
> AVX-512 is present all of Intel main P-core based Xeon and AMD's Zen4 and 
> Zen5. Sierra Forest contains the SSE and AVX/AVX2 family ISA but AFAIK 
> AVX/AVX2 does not contain any CRC32C specific instructions. See:

CRC32C was added in SSE 4.2, so it's quite old. The AVX-512 intrinsics
used in the patch are not CRC-specific, if I understand correctly.

My point was, it seems Intel still considers AVX-512 as optional, so
we can't count on it being present even in future chips. That's why
I'm interested in alternatives, at least as a first step. If we can
get 3x throughput, the calculation might bend up low enough in the
profile that going to 6x might not be noticeable (not sure).

> > > There a very frequent call computing COMP_CRC32C over just 20 bytes,
> > > while holding a crucial lock.  If we were to do introduce something
> > > like this
> > > AVX-512 algorithm, it'd probably be worth to dispatch differently in
> > > case of compile-time known small lengths.
> >
> > I know you've read an earlier version of the patch and realized that it 
> > wouldn't
> > help here, but we could probably dispatch differently regardless, although 
> > it may
> > only be worth it if we can inline the instructions. Since we technically 
> > only need to
> > wait for xl_prev, I believe we could push the computation of the other 12 
> > bytes to
> > before acquiring the lock, then only execute a single instruction on 
> > xl_prev to
> > complete the CRC computation. Is there any reason why we couldn't do that,
> > assuming we have a clean way to make that portable? That would mean that the
> > CRCs between major versions would be different, but I think we don't 
> > guarantee
> > that anyway.
>
> Not sure about that. This is not my expertise and I might need a little time 
> to figure this out. Unfortunately, I am on travel with limited internet 
> connection for the next 6 weeks. I will only be able to address this when I 
> get back. Is this a blocker for the patch or is this something we can address 
> as a revision?

This is orthogonal and is not related to the patch, since it doesn't
affect 8 and 20-byte paths, only 256 and greater.

--
John Naylor
Amazon Web Services




Re: Track the amount of time waiting due to cost_delay

2024-12-09 Thread Dilip Kumar
On Mon, Dec 9, 2024 at 2:51 PM Masahiro Ikeda  wrote:
>
> On 2024-12-06 18:31, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Thu, Dec 05, 2024 at 10:43:51AM +, Bertrand Drouvot wrote:
> >> Yeah, people would likely use this new field to monitor long running
> >> vacuum.
> >> Long enough that this error should be acceptable. Do you agree?
> >
> > OTOH, adding the 100% accuracy looks as simple as v9-0002 attached
> > (0001 is
> > same as for v8), so I think we should provide it.
>
This Idea looks good to me.  Here are some comments

1.
+   Total amount of time spent in milliseconds waiting due to

+   or . In case
of parallel
+   vacuum the reported time is across all the workers and the leader. The
+   workers update the column no more frequently than once per second, so it
+   could show slightly old values.
+  

I think this waiting is influenced due to cost delay as well as cost
limit GUCs because here we are counting total wait time and that very
much depends upon how frequently we are waiting and that's completely
driven by cost limit. Isn't it?

2.
+ if (IsParallelWorker())
+ {
+ instr_time time_since_last_report;
+
+ INSTR_TIME_SET_ZERO(time_since_last_report);
+ INSTR_TIME_ACCUM_DIFF(time_since_last_report, delay_end,
+   last_report_time);
+ nap_time_since_last_report += INSTR_TIME_GET_MILLISEC(delayed_time);
+
+ if (INSTR_TIME_GET_MILLISEC(time_since_last_report) >
WORKER_REPORT_DELAY_INTERVAL)
+ {
+ pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED,
+ nap_time_since_last_report);
+ nap_time_since_last_report = 0;
+ last_report_time = delay_end;
+ }
+ }

Does it make sense to track this "nap_time_since_last_report" in a
shared variable instead of local in individual workers and whenever
the shared variable crosses WORKER_REPORT_DELAY_INTERVAL we can report
this would avoid individual reporting from different workers.  Since
we are already computing the cost balance in shared variable i.e.
VacuumSharedCostBalance, or do you think it will complicate the code?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: WARNING: missing lock on database "postgres" (OID 5) @ TID (0,4)

2024-12-09 Thread Kirill Reshke
On Mon, 9 Dec 2024 at 15:27, Alexander Kukushkin  wrote:

> postgres=# reassign owned by test to postgres;
> WARNING:  missing lock on database "postgres" (OID 5) @ TID (0,4)
> REASSIGN OWNED

Hi!
Looks like PFA should fix that.


-- 
Best regards,
Kirill Reshke


v1-0001-Lock-tuple-for-inplace-update-when-moidfying.patch
Description: Binary data


Re: Why we need to check for local buffers in BufferIsExclusiveLocked and BufferIsDirty?

2024-12-09 Thread Srinath Reddy Sadipiralla
Hi,



added this bug to commitfest https://commitfest.postgresql.org/51/5428/ 



Regards,

Srinath Reddy Sadipiralla

Member of Technical Staff

Zoho

Re: Refactoring postmaster's code to cleanup after child exit

2024-12-09 Thread Heikki Linnakangas

On 09/12/2024 01:12, Tomas Vondra wrote:

On 11/14/24 15:13, Heikki Linnakangas wrote:

On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.


Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:


How exactly do you run the test with valgrind? What platform?

It works for me, with this:

(cd build && ninja && rm -rf tmp_install && meson test --suite setup && 
valgrind --leak-check=no --gen-suppressions=all 
--suppressions=/home/heikki/git-sandbox/postgresql/src/tools/valgrind.supp 
--time-stamp=yes 
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END 
--log-file=$HOME/pg-valgrind/%p.log --trace-children=yes meson test 
--suite postmaster )



t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 4.
t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 4 subtests passed


and tmp_check/log/regress_log_001_connection_limits says:


[23:48:44.444](1.129s) ok 3 - reserved_connections limit
[23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
process ended prematurely at
/home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 154.
# Postmaster PID for node "primary" is 198592


That BackgroundPsql.pm line is this in wait_connect()

   $self->{run}->pump()
 until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

By trial and error I found that it fails on this line 70:

   push(@sessions, background_psql_as_user('regress_superuser'));

but I have no idea idea why. There are multiple similar calls a couple
lines earlier, and those work fine. And various other TAP tests with
background_sql() work fine too.

So what's so special about this particular line?


Weird. Valgrind makes everything slow; is it a timeout? Any other clues 
in the logs?


--
Heikki Linnakangas
Neon (https://neon.tech)




Re: Wrong results with right-semi-joins

2024-12-09 Thread Richard Guo
On Tue, Dec 3, 2024 at 5:56 PM Richard Guo  wrote:
> I've traced the root cause to ExecReScanHashJoin, where we neglect to
> reset the inner-tuple match flags in the hash table for right-semi
> joins when reusing the hash table.  It was my oversight in commit
> aa86129e1.  Attached is patch to fix it.

Pushed.

Thanks
Richard




Re: An inefficient query caused by unnecessary PlaceHolderVar

2024-12-09 Thread Richard Guo
On Tue, Dec 3, 2024 at 5:33 PM Andrei Lepikhov  wrote:
> A couple of words about your patch. These few lines of code caused a lot
> of discoveries, but in my opinion, they look fine. But I didn't find
> negative tests, where we need to wrap a Var with PHV like the following:

Pushed after adding two negative tests.  Thank you for your review.

Thanks
Richard




Re: Replace current implementations in crypt() and gen_salt() to OpenSSL

2024-12-09 Thread Daniel Gustafsson
> On 4 Dec 2024, at 16:57, Joe Conway  wrote:

> I can send you the source RPM for openssl 1.1.1c which was an earlier FIPS 
> validated version, but the main FIPS patch contains:

AFAICT the forks of 1.1.1 which offer FIPS certification all patch the common
OpenSSL API FIPS_mode() rather than invent a new one, so the earlier approach
should work fine. PFA an updated version which I propose we go ahead with.

--
Daniel Gustafsson



v4-0001-pgcrypto-Make-it-possible-to-disable-built-in-cry.patch
Description: Binary data


Re: Refactoring postmaster's code to cleanup after child exit

2024-12-09 Thread Tomas Vondra
On 12/9/24 13:30, Heikki Linnakangas wrote:
> On 09/12/2024 01:12, Tomas Vondra wrote:
>> On 11/14/24 15:13, Heikki Linnakangas wrote:
>>> On 09/10/2024 23:40, Heikki Linnakangas wrote:
 I pushed the first three patches, with the new test and one of the
 small
 refactoring patches. Thanks for all the comments so far! Here is a new
 version of the remaining patches.

>> Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
>> seems to have problems with valgrind :-( I reliably get this failure:
> 
> How exactly do you run the test with valgrind? What platform?
> 

It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian).

> It works for me, with this:
> 
> (cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
> valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/
> heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes
> --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/
> pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster )
> 

I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I
just do

./configure --enable-debug --prefix=/home/user/builds/master
--enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3
-DUSE_VALGRIND"

and then the usual "make check" or whatever.

The patch has a hardcoded path to the .supp file, and places the
valgrind log into /tmp. It has worked for me fine up until that commit,
and it still seems to be working in every other test directory.

>> t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
>> declared and done_testing() was not seen.
>> # Looks like your test exited with 29 just after 4.
>> t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424,
>> 0x1d00)
>> All 4 subtests passed
>>
>>
>> and tmp_check/log/regress_log_001_connection_limits says:
>>
>>
>> [23:48:44.444](1.129s) ok 3 - reserved_connections limit
>> [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
>> process ended prematurely at
>> /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/
>> PostgreSQL/Test/BackgroundPsql.pm
>> line 154.
>> # Postmaster PID for node "primary" is 198592
>>
>>
>> That BackgroundPsql.pm line is this in wait_connect()
>>
>>    $self->{run}->pump()
>>  until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
>>
>> By trial and error I found that it fails on this line 70:
>>
>>    push(@sessions, background_psql_as_user('regress_superuser'));
>>
>> but I have no idea idea why. There are multiple similar calls a couple
>> lines earlier, and those work fine. And various other TAP tests with
>> background_sql() work fine too.
>>
>> So what's so special about this particular line?
> 
> Weird. Valgrind makes everything slow; is it a timeout? Any other clues
> in the logs?
> 

Yeah, weird.

Timeouts were the first thing I thought about, but it fails even if I
set PGCTLTIMEOUT/PG_TEST_TIMEOUT_DEFAULT to 3600. And it doesn't seem to
be waiting for anything for that long :-(


regards

-- 
Tomas Vondra
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index d6bb2c33119..83544aeb5e2 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -487,12 +487,12 @@ start_postmaster(void)
 	 * has the same PID as the current child process.
 	 */
 	if (log_file != NULL)
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
-	   exec_path, pgdata_opt, post_opts,
+		cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+	   getpid(), exec_path, pgdata_opt, post_opts,
 	   DEVNULL, log_file);
 	else
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
-	   exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("exec valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrind.supp --error-limit=no --log-file=/tmp/valgrind.%d.log \"%s\" %s%s < \"%s\" 2>&1",
+	   getpid(), exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 0e40ed32a21..97d880c3deb 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2478,10 +2478,10 @@ regression_main(int argc, char *argv[],
 		 * Start the temp postmaster
 		 */
 		snprintf(buf, sizeof(buf),
- "\"%s%spostgres\" -D \"%s/data\" -F%s "
+ "valgrind --quiet --trace-children=yes --track-origins=yes --read-var-info=yes --num-callers=20 --leak-check=no --gen-suppressions=all --suppressions=/home/user/work/postgres/src/tools/valgrin

Re: Sort functions with specialized comparators

2024-12-09 Thread Andrey M. Borodin


> On 6 Dec 2024, at 08:49, John Naylor  wrote:
> 
> https://www.postgresql.org/message-id/CA%2BhUKGKztHEWm676csTFjYzortziWmOcf8HDss2Zr0muZ2xfEg%40mail.gmail.com

Wow, what a thread!
"Simpsons Already Did It"

> On Fri, Dec 6, 2024 at 1:32 AM Andrey M. Borodin  wrote:
>> 
>>> On 5 Dec 2024, at 15:16, John Naylor  wrote:
> 
>>> I believe src/port/qsort.c was meant to be just for the standard sort
>>> interface as found in a C library. We do have one globally visible
>>> special sort here:
>>> src/backend/utils/sort/qsort_interruptible.c
>>> ...so that directory seems a better fit.
>> OK. BTW do we need ST_CHECK_FOR_INTERRUPTS?
> 
> That's a good thing to raise right now -- intarray currently doesn't
> have one, and we haven't gotten complaints from people trying to sort
> large arrays and cancel the query. This extension is not commonly
> used, so that's not surprising. It could be that large arrays are even
> less common, or no one bothered to report it. What's the largest size
> that your customers use?
> 
> If we do need a check for interrupts, then this whole thing must
> remain private to intarray. From reading e64cdab0030 , it's not safe
> to interrupt in general.

I think commit message states that it's better to opt-in for interruptible 
sort. So I do not think making sort interruptible is a blocker for making 
global specialized sorting routines.

> 
>>> And one more bikeshedding bit that might get noticed: tuplesorts
>>> express their boolean as "reversed". We don't necessarily need to
>>> follow that, but consistency is good for readability.
>> 
>> I do not know if "reversed sorting order" is more idiomatic than "ascending 
>> sorting order". If you think it is - let's switch argument's name to 
>> "reversed".
> 
> After sleeping on it, I actually think it's mildly ridiculous for this
> module to force the comparator to know about the sort direction.
> Tuplesorts must do that because each sort key could have a different
> sort order. There is only one place in intarray that wants reversed
> order -- maybe that place should reverse elements itself? It's fine to
> keep thing as they are if the sort function stays private to intarray,
> but this patch creates a global function, where the "ascending"
> parameter is just noise. And if we don't have large int32 sorts
> outside of intarray, then the path of least resistance may be to keep
> it private.

We could use global function for oid lists which may be arbitrary large.
But if you think that local intarray function is better - let's go that route.

> I had a look at the files touched by this patch and noticed that there
> is another sort used for making arrays unique. Were you going to look
> at that as well?

Done.


Best regards, Andrey Borodin.


v5-0001-Use-specialized-sort-facilities.patch
Description: Binary data


Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE

2024-12-09 Thread Ilia Evdokimov

Hi hackers,

The repeated use of the number 300 in the ANALYZE-related code creates 
redundancy and relies on scattered, sometimes unclear, comments to 
explain its purpose. This can make the code harder to understand, 
especially for new contributors who might not immediately understand its 
significance. To address this, I propose introducing a macro 
STATS_MIN_ROWS to represent this value and consolidating its explanation 
in a single place, making the code more consistent and readable.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From d3cbea3875fdc9866c86c64c0d0e0d838887040c Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 9 Dec 2024 15:50:40 +0300
Subject: [PATCH v1] Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE

This introduces a macro STATS_MIN_ROWS to represent
the default minimum number of rows (300) sampled in ANALYZE.
---
 src/backend/commands/analyze.c| 26 ++-
 src/backend/statistics/extended_stats.c   |  2 +-
 src/backend/tsearch/ts_typanalyze.c   |  4 +--
 src/backend/utils/adt/rangetypes_typanalyze.c |  7 +++--
 src/include/statistics/statistics.h   | 23 
 5 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9a56de2282..ff7b6ffe8f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1897,42 +1897,20 @@ std_typanalyze(VacAttrStats *stats)
 	{
 		/* Seems to be a scalar datatype */
 		stats->compute_stats = compute_scalar_stats;
-		/*
-		 * The following choice of minrows is based on the paper
-		 * "Random sampling for histogram construction: how much is enough?"
-		 * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
-		 * Proceedings of ACM SIGMOD International Conference on Management
-		 * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
-		 * says that for table size n, histogram size k, maximum relative
-		 * error in bin size f, and error probability gamma, the minimum
-		 * random sample size is
-		 *		r = 4 * k * ln(2*n/gamma) / f^2
-		 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
-		 *		r = 305.82 * k
-		 * Note that because of the log function, the dependence on n is
-		 * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
-		 * bin size error with probability 0.99.  So there's no real need to
-		 * scale for n, which is a good thing because we don't necessarily
-		 * know it at this point.
-		 *
-		 */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 	else if (OidIsValid(eqopr))
 	{
 		/* We can still recognize distinct values */
 		stats->compute_stats = compute_distinct_stats;
-		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 	else
 	{
 		/* Can't do much but the trivial stuff */
 		stats->compute_stats = compute_trivial_stats;
-		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 
+	stats->minrows = (STATS_MIN_ROWS * stats->attstattarget);
+
 	return true;
 }
 
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 99fdf208db..451e2d1e9c 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -320,7 +320,7 @@ ComputeExtStatisticsRows(Relation onerel,
 	MemoryContextDelete(cxt);
 
 	/* compute sample size based on the statistics target */
-	return (300 * result);
+	return (STATS_MIN_ROWS * result);
 }
 
 /*
diff --git a/src/backend/tsearch/ts_typanalyze.c b/src/backend/tsearch/ts_typanalyze.c
index ccafe42729..befd90c9d5 100644
--- a/src/backend/tsearch/ts_typanalyze.c
+++ b/src/backend/tsearch/ts_typanalyze.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
 #include "common/hashfn.h"
+#include "statistics/statistics.h"
 #include "tsearch/ts_type.h"
 #include "utils/builtins.h"
 #include "varatt.h"
@@ -64,8 +65,7 @@ ts_typanalyze(PG_FUNCTION_ARGS)
 		stats->attstattarget = default_statistics_target;
 
 	stats->compute_stats = compute_tsvector_stats;
-	/* see comment about the choice of minrows in commands/analyze.c */
-	stats->minrows = 300 * stats->attstattarget;
+	stats->minrows = (STATS_MIN_ROWS * stats->attstattarget);
 
 	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 3773f98115..7fcac5621c 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -26,6 +26,7 @@
 
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
+#include "statistics/statistics.h"
 #include "utils/float.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
@@ -56,8 +57,7 @@ range_typanalyze(PG_FUNCTION_ARGS)
 
 	stats->compute_stats = compute_range_stats;
 	stats->extra_data = typcache;
-	/* same as in std_typanal

Subscription sometimes loses txns after initial table sync

2024-12-09 Thread Pritam Baral
This was discovered when testing the plan for a major version upgrade via
logical replication. Said plan requires that some tables be synced before
others. So I implemented it using ALTER PUBLICATION ... ADD TABLE ... followed
by ALTER SUBSCRIPTION ... REFRESH PUBLICATION. A test for correctness revealed
that sometimes, for some tables added this way, txns after the initial data copy
are lost by the subscription.

A reproducer script is attached. It has been tested with PG 17.2, 14.15, and
even 12.22 (on either side of the replication setup). The script runs at a
default scale of 100 tables with 10k inserts each. This scale is enough to
demonstrate a failure rate of 1% to 9% of tables on my modest laptop.

In attempts to analyse why this happens, it has been observed that the sender
sometimes does not pick up a published table, even when the receiver that
started the sender process has seen the table as available (as returned by
pg_get_publication_tables()) and has thus begun COPYing its data. When the COPY
finishes (and the tablesync worker is finished), the apply loop on the receiver
expects to receive (and apply) subsequent changes for such tables, but simply
isn't sent any. This was observed by dumping every CopyData message sent over
the wire.

The attached script (like the original migration plan) uses a single publication
and adds tables to it successively. Curiously, when the script was changed to
use a dedicated publication per table (and thus, ALTER SUBSCRIPTION ... ADD
PUBLICATION instead of ALTER SUBSCRIPTION ... REFRESH PUBLICATION), the no. of
tables with data loss jumped to 100%.

-- 
#!/usr/bin/env regards
Chhatoi Pritam Baral


sub-loss-repro.sh
Description: application/shellscript


Re: [PATCH] Add roman support for to_number function

2024-12-09 Thread Hunaid Sohail
Hi,

Thanks for reviewing this patch.

On Mon, Dec 9, 2024 at 3:07 AM Tomas Vondra  wrote:

> Thanks for a nice patch. I did a quick review today, seems almost there,
> I only have a couple minor comments:
>
> 1) Template Patterns for Numeric Formatting
>
> Why the wording change? "input between 1 and 3999" sounds fine to me.
>
input... seemed to refer to numeric input for to_char roman format. But,
after roman support in to_number function shouldn't we modify it? It is the
reason I changed it to "valid for numbers 1 to 3999".

2) The docs say "standard numbers" but I'm not sure what that is? We
> don't use that term anywhere else, I think. Same for "standard roman
> numeral".
>
I will change "standard numbers" to "numbers".
Note that we are following the standard form. There are other forms too.
For example, 4 can be represented as IV (standard) or  (non standard).
https://en.wikipedia.org/wiki/Roman_numerals#Standard_form


> 3) A couple comments need a bit of formatting. Multi-line comments
> should start with an empty line, some lines are a bit too long.
>
I will fix the multi-line formatting.


> 4) I was wondering if the regression tests check all interesting cases,
> and it seems none of the tests hit these two cases:
>
> if (vCount > 1 || lCount > 1 || dCount > 1)
> return -1;
>
SELECT to_number('viv', 'RN') test covers this if statement for the
subtraction part. But, I noticed that no test covers simple counts of V, L,
D.
I will add SELECT to_number('VV', 'RN') for that.


> and
>
> if (!IS_VALID_SUB_COMB(currChar, nextChar))
> return -1;
>
Noted. SELECT to_number('IL', 'RN') test can be added.

Regards,
Hunaid Sohail


Re: BitmapHeapScan streaming read user and prelim refactoring

2024-12-09 Thread Dilip Kumar
On Sat, Oct 19, 2024 at 2:18 AM Melanie Plageman
 wrote:
>
> On Wed, Oct 16, 2024 at 11:09 AM Tomas Vondra  wrote:
> >
>
> Thanks for the review!
>
> >
> > 1) 0001
> >
> > I find it a bit weird that we use ntuples to determine if it's exact or
> > lossy. Not an issue caused by this patch, of course, but maybe we could
> > improve that somehow? I mean this:
> >
> > if (tbmres->ntuples >= 0)
> > (*exact_pages)++;
> > else
> > (*lossy_pages)++;
>
> I agree. The most straightforward solution would be to add a boolean
> `lossy` to TBMIterateResult.
>
> In some ways, I think it would make it a bit more clear because the
> situation with TBMIterateResult->recheck is already a bit confusing.
> It recheck is true if either 1: the Bitmap is lossy and we cannot use
> TBMIterateResult->offsets for this page or 2: the original query has a
> qual requiring recheck
>
> In the body of BitmapHeapNext() it says "if we are using lossy info,
> we have to recheck the qual conditions at every tuple". And in
> heapam_bitmap_next_block(), when the bitmap is lossy it says "bitmap
> is lossy, so we must examine each line pointer on the page. But we can
> ignore HOT chains, since we'll check each tuple anyway"
>
> I must admit I don't quite understand the connection between a lossy
> bitmap and needing to recheck the qual condition. Once we've gone to
> the underlying data, why do we need to recheck each tuple unless the
> qual requires it? It seems like PageTableEntry->recheck is usually
> passed as true for hash indexes and false for btree indexes. So maybe
> it has something to do with the index type?
>
> Anyway, adding a `lossy` member can be another patch. I just wanted to
> check if that was the solution you were thinking of.
>
> On a somewhat related note, I included a patch to alter the test at
> the top of heapam_bitmap_next_tuple()
>
> if (hscan->rs_cindex < 0 || hscan->rs_cindex >= hscan->rs_ntuples)
>
> I don't really see how rs_cindex could ever be < 0 for bitmap heap
> scans. But, I'm not confident enough about this to commit the patch
> really (obviously tests pass with the Assertion I added -- but still).
>
> > Also, maybe consider manually wrapping long lines - I haven't tried, but
> > I guess pgindent did not wrap this. I mean this line, for example:
> >
> > ... whitespace ... &node->stats.lossy_pages, &node->stats.exact_pages))
>
> Ah, yes. pgindent indeed does not help with this. I have now gone
> through every commit and used
>
> git diff origin/master | grep -E '^(\+|diff)' | sed 's/^+//' | expand
> -t4 | awk "length > 78 || /^diff/"
>
> from the committing wiki :)
>
> > 2) 0003
> >
> > The "tidr" name is not particularly clear, IMO. Maybe just "range" would
> > be better?
>
> I've updated this.
>
> > +  struct
> > +  {
> > +  ItemPointerData rs_mintid;
> > +  ItemPointerData rs_maxtid;
> > +  }   tidr;
> >
> > Isn't it a bit strange that this patch remove tmbres from
> > heapam_scan_bitmap_next_block, when it was already removed from
> > table_scan_bitmap_next_tuple in the previous commit? Does it make sense
> > to separate it like this? (Maybe it does, not sure.)
>
> I've combined them.
>
> > I find this not very clear:
> >
> > + *recheck do current page's tuples need recheck
> > + *blockno used to validate pf and current block in sync
> > + *pfblockno   used to validate pf stays ahead of current block
> >
> > The "blockno" sounds weird - shouldn't it say "are in sync"? Also, not
> > clear what "pf" stands for without context (sure, it's "prefetch"). But
> > Why not to not write "prefetch_blockno" or something like that?
>
> Fixed.
>
> > 3) 0006
> >
> > Seems unnecessary to keep this as separate commit, it's  just log
> > message improvement. I'd merge it to the earlier patch.
>
> Done.
>
> > 4) 0007
> >
> > Seems fine to me in principle. This adds "subclass" similar to what we
> > do for plan nodes, except that states are not derived from Node. But
> > it's the same approach.
> >
> > Why not to do
> >
> > scan = (HeapScanDesc *) bscan;
> >
> > instead of
> >
> > scan = &bscan->rs_heap_base;
> >
> > I think the simple cast is what we do for the plan nodes, and we even do
> > it this way in the opposite direction in a couple places:
> >
> > BitmapHeapScanDesc *bscan = (BitmapHeapScanDesc *) scan;
>
> Yea, in the reverse case, there is no way to refer to a specific
> member (because we are casting something of the parent type to the
> child type). But when we are casting from the child type to the parent
> type and can actually refer to a member, it seems preferable to
> explicitly refer to the member in case anyone ever inserts a member
> above rs_heap_base.  However, for consistency with other places in the
> code, I've changed it as you suggested.
>
> Additionally, I've gone through an made a number of stylistic changes
> to conform more closely to existing styles (adding back more rs_
> prefixes and typedef'ing BitmapHeapScanDescData * as
> Bit

Re: Removing unneeded self joins

2024-12-09 Thread Alexander Korotkov
On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov  wrote:
> We initially didn't use ChangeVarNodes() in SJE at all.  See the last
> patch version without it [1].  We're trying to address Tom Lane's
> proposal to re-use more of existing tree-manipulation infrastructure
> [2].  I agree with you that the case with ChangeVarNodes() looks
> questionable.  Do you have other ideas how we can re-use some more of
> existing tree-manipulation infrastructure in SJE?

I think there was one idea to be considered.  Given that this feature
has fragility to changes in PlannerInfo and underlying structures,
Andrei proposed to generate a walker over PlannerInfo with a Perl
script.  I spent some time analyzing this idea.  I came to the
following conclusions.

It seems feasible to generate a function that would walk over all
PlannerInfo fields recursively.  Perhaps some more meta-information is
needed, at least, to check whether we should visit a particular
pointer field.  But that could be provided by additional
pg_node_attr().  With this approach, we would need to sacrifice some
efficiency (e.g., processing all the EquivalenceClasses instead of
picking only the ones used in the relation to be removed).  Also, the
logic in remove_self_join_rel() is more complex than processing all
the Relids.  That is, we would still need a massive number of
if-branches specifying how we handle each node type.  It doesn't look
like we would end up with a more simple or less error-prone
implementation.

We may decide to generate not just a walker but most of the logic in
remove_self_join_rel().  This is probably possible by injecting way
more meta-information into definitions of structures.  That isn't
going to be simpler than the current approach.  But it is probably
less error-prone: one could realize that if you add a new field to the
structure, it should have a similar pg_node_attr() as surroundings.
But I am afraid that if we go this way, we may end up with an awkward
heap of pg_node_attr() to generate the functionality of
remove_self_join_rel().  Should we better add comments to PlannerInfo
and other relevant structures saying: if you're going to modify this,
consider how that affects remove_self_join_rel()?

Any thoughts?

Links
1. 
https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru

--
Regards,
Alexander Korotkov
Supabase




Re: shared-memory based stats collector - v70

2024-12-09 Thread Michael Paquier
On Sat, Dec 07, 2024 at 12:31:46PM +0300, Anton A. Melnikov wrote:
> Completely agree that the original comment needs to be revised,
> since it implies that it is normal for deleted entries to be here,
> but it is not the case.

Yep, so applied v2-0001 to document that, and backpatched it as it is
kind of important to know about.

> Maybe it's worth adding a warning as well,
> similar to the one a few lines below in the code?

 Assert(!ps->dropped);
 if (ps->dropped)
+{
+PgStat_HashKey key = ps->key;
+elog(WARNING, "found non-deleted stats entry %u/%u/%llu"
+  "at server shutdown",
+   key.kind, key.dboid,
+   (unsigned long long) key.objid);
 continue;
+}
 
 /*
  * This discards data related to custom stats kinds that are unknown

Not sure how to feel about this suggestion, though.  This would
produce a warning when building without assertions, but the assertion
would likely let us more information with a trace during development,
so..
--
Michael


signature.asc
Description: PGP signature


Re: Missing LWLock protection in pgstat_reset_replslot()

2024-12-09 Thread Michael Paquier
On Thu, Dec 05, 2024 at 07:11:54AM +, Bertrand Drouvot wrote:
> Yeah that seems the right thing to do for me too.

Thanks for the review.  I've looked a bit more at the other paths
handling dropped entries and they should be fine, created a TAP test
that's able to check the scenario sent by Anton with a background psql
session and some shutdown checks (critical part is the generation of
the pgstats file), then applied the fix down to v15.
--
Michael


signature.asc
Description: PGP signature


Re: refactor AlterDomainAddConstraint (alter domain add constraint)

2024-12-09 Thread jian he
hi.
ALTER DOMAIN ADD CONSTRAINT syntax more simple than CREATE DOMAIN.
So in gram.y, I changed DomainConstraintElem to the following:

DomainConstraintElem:
CHECK '(' a_expr ')'
{
Constraint *n = makeNode(Constraint);
n->contype = CONSTR_CHECK;
n->location = @1;
n->raw_expr = $3;
n->cooked_expr = NULL;
n->initially_valid = true;
$$ = (Node *) n;
}
| CHECK '(' a_expr ')' NOT VALID
{
Constraint *n = makeNode(Constraint);
n->contype = CONSTR_CHECK;
n->location = @1;
n->raw_expr = $3;
n->cooked_expr = NULL;
n->initially_valid = false;
n->skip_validation = true;
$$ = (Node *) n;
}
| NOT NULL_P
{
Constraint *n = makeNode(Constraint);
n->contype = CONSTR_NOTNULL;
n->location = @1;
n->keys = list_make1(makeString("value"));
n->initially_valid = true;
$$ = (Node *) n;
}


Now everything is the same as alter domain synopsis.
It all looks so simple.
From e3aa00904edb8481585f7d27b18213f30423696e Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 9 Dec 2024 16:39:59 +0800
Subject: [PATCH v3 1/1] refactor AlterDomainAddConstraint

In gram.y only CHECK and NOT-NULL constraint are allowed for the ALTER DOMAIN
...  ADD CONSTRAINT statement. so we can safely remove the code handles errors
for other constraint type.

further simplify DomainConstraintElem syntax in gram.y. Specifying constraint properties
([ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE | NO INHERIT ])
will be syntax error at parse stage.

in gram.y DomainConstraintElem will be:

  CHECK '(' a_expr ')'
| CHECK '(' a_expr ')' NOT VALID
| NOT NULL_P

discussion: https://postgr.es/m/cacjufxhitd5lglbssapshhtdwxt0vivkthinkyw-skbx93t...@mail.gmail.com
---
 src/backend/commands/tablecmds.c |  2 +-
 src/backend/commands/typecmds.c  | 56 +---
 src/backend/parser/gram.y| 25 --
 src/backend/tcop/utility.c   |  3 +-
 src/include/commands/typecmds.h  |  2 +-
 5 files changed, 28 insertions(+), 60 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6ccae4cb4a..2dd33c0435 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5390,7 +5390,7 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 			address =
 AlterDomainAddConstraint(((AlterDomainStmt *) cmd->def)->typeName,
 		 ((AlterDomainStmt *) cmd->def)->def,
-		 NULL);
+		 NULL, context->queryString);
 			break;
 		case AT_ReAddComment:	/* Re-add existing comment */
 			address = CommentObject((CommentStmt *) cmd->def);
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index da591c0922..71080f1598 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -2903,7 +2903,7 @@ AlterDomainDropConstraint(List *names, const char *constrName,
  */
 ObjectAddress
 AlterDomainAddConstraint(List *names, Node *newConstraint,
-		 ObjectAddress *constrAddr)
+		 ObjectAddress *constrAddr, const char *querystring)
 {
 	TypeName   *typename;
 	Oid			domainoid;
@@ -2913,10 +2913,13 @@ AlterDomainAddConstraint(List *names, Node *newConstraint,
 	Constraint *constr;
 	char	   *ccbin;
 	ObjectAddress address = InvalidObjectAddress;
+	ParseState 	*pstate;
+	pstate = make_parsestate(NULL);
+	pstate->p_sourcetext = querystring;
 
 	/* Make a TypeName so we can use standard type lookup machinery */
 	typename = makeTypeNameFromNameList(names);
-	domainoid = typenameTypeId(NULL, typename);
+	domainoid = typenameTypeId(pstate, typename);
 
 	/* Look up the domain in the type table */
 	typrel = table_open(TypeRelationId, RowExclusiveLock);
@@ -2935,51 +2938,10 @@ AlterDomainAddConstraint(List *names, Node *newConstraint,
 
 	constr = (Constraint *) newConstraint;
 
-	switch (constr->contype)
-	{
-		case CONSTR_CHECK:
-		case CONSTR_NOTNULL:
-			/* processed below */
-			break;
-
-		case CONSTR_UNIQUE:
-			ereport(ERROR,
-	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("unique constraints not possible for domains")));
-			break;
-
-		case CONSTR_PRIMARY:
-			ereport(ERROR,
-	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("primary key constraints not possible for domains")));
-			break;
-
-		case CONSTR_EXCLUSION:
-			ereport(ERROR,
-	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("exclusion constraints not possible for domains")));
-			break;
-
-		case CONSTR_FOREIGN:
-			ereport(ERROR,
-	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("foreign k

Re: Memory leak in WAL sender with pgoutput (v10~)

2024-12-09 Thread Michael Paquier
On Fri, Dec 06, 2024 at 08:23:00AM +, Zhijie Hou (Fujitsu) wrote:
> Thanks for the suggestions. They make sense to me.
> 
> Please see the updated version as attached.

It sounds to me that we are in agreement for HEAD, so I've moved ahead
and fixed this issue on HEAD using your patch that adds a dedicated
memory context in PGOutputData as that's the cleanest way to address
things in a single execution context of pgoutput.

For stable branches, let's see..  I need to reply to the latest
message.
--
Michael


signature.asc
Description: PGP signature


Re: Pass ParseState as down to utility functions.

2024-12-09 Thread Kirill Reshke
Thank you for reviewing this!

On Fri, 6 Dec 2024 at 19:01, Alvaro Herrera  wrote:
> I think it would make more sense to write the commit message in terms of
> the DDL commands that now report error position, than the C functions.
> Such a list of commands does not need to be exhaustive; a
> representative-enough sample probably suffices.

Hi! I fixed the commit message as suggested.

> Once upon a time, ereport() was a simpler macro that did not
> use variadic arguments.  Back then, the list of functions embedded in it
> (errcode, errmsg etc) were forced to be in an additional level of
> parentheses so that the macro would work at all (IIRC failure to do that
> resulted in strange compile-time problems).  This is why a majority of
> code is written in the style with those parens.  But commit e3a87b4991cc
> changed ereport to use __VA_ARGS__, so the auxiliary functions are
> actual arguments to errstart() -- which means that the parentheses
> you're adding here are unnecessary and discouraged.  Just add the
> parser_errposition() call and it'll be fine.


Should be fixed in v7.



-- 
Best regards,
Kirill Reshke


v7-0001-Print-out-error-position-for-number-of-DDL-comman.patch
Description: Binary data


Re: Assert failure on running a completed portal again

2024-12-09 Thread Yugo NAGATA
On Sun, 08 Dec 2024 14:25:53 -0500
Tom Lane  wrote:

> I wrote:
> > After looking at this further, I think this whole "run_once"
> > business is badly designed, worse documented, and redundant
> > (as well as buggy).  It can be replaced with three self-contained
> > lines in ExecutePlan, as per the attached.
> 
> > (Obviously, the API changes in this wouldn't do for the back
> > branches.  But we could leave those arguments in place as
> > unused dummies.)
> 
> Here's a cut-down version that I think would be OK to use in the
> back branches.

I confirmed both versions of the pach fixed the issue using pgproto.
Also, I feel your fix makes the codes easier to understand at least for me.

I have a few minor comment:

  * moving in the specified direction.
  *
  * Runs to completion if numberTuples is 0
- *
- * Note: the ctid attribute is a 'junk' attribute that is removed before the
- * user can see it
  * 
  */

This comment remove seems not related to the fix of ExecutePlan. Should this
actually have been done by 8a5849b7ff24c?

 static void
-ExecutePlan(EState *estate,
-   PlanState *planstate,
+ExecutePlan(QueryDesc *queryDesc,
bool use_parallel_mode,
CmdType operation,
bool sendTuples,
uint64 numberTuples,
ScanDirection direction,
-   DestReceiver *dest,
-   bool execute_once)
+   DestReceiver *dest)
 {
+   EState *estate = queryDesc->estate;
+   PlanState  *planstate = queryDesc->planstate;
TupleTableSlot *slot;
uint64  current_tuple_count

I guess planstate is removed due to the redundancy that this is included
in queryDesc. If so, I think we can also remove use_parallel_mode for the
same reason.  

Regards,
Yugo Nagata


-- 
Yugo NAGATA 




Fwd: Re: Why we need to check for local buffers in BufferIsExclusiveLocked and BufferIsDirty?

2024-12-09 Thread Srinath Reddy Sadipiralla
 Forwarded message 
From: Srinath Reddy Sadipiralla 
To: "Nazir Bilal Yavuz", 
"pgsql-hackers", "Andres 
Freund"
Date: Mon, 09 Dec 2024 10:14:06 +0530
Subject: Re: Why we need to check for local buffers in BufferIsExclusiveLocked 
and BufferIsDirty?
 Forwarded message 



Hi,



added this bug to commitfest https://commitfest.postgresql.org/51/5428/ 



Regards,

Srinath Reddy Sadipiralla

Member of Technical Staff

Zoho

Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

2024-12-09 Thread jian he
On Sun, Dec 8, 2024 at 10:58 PM Andrew Dunstan  wrote:
>
> So the actual sort order as implemented is, AIUI,
>
> Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array
>
> which is ugly, but fortunately not many apps rely on jsonb sort order.
>
> Nobody else has commented, so I propose to apply this patch documenting the 
> anomaly.
>

while at it. we can fix the appearance of jsonb null.

since
select jsonb 'Null';
select jsonb 'NULL';
will fail.

so maybe change
Null in  section and
NULL
to
null




Incorrect EXPLAIN ANALYZE output in bloom index docs

2024-12-09 Thread David Rowley
I was fixing up the patch in [1] with the intention of committing it
when I noticed that there are a few outdated EXPLAIN ANALYZE examples
in the documents for the bloom contrib module.

The example outputs look like they've been created with a 100 thousand
row table, but the commands given are to insert 10 million rows into
that table. I suspect someone did a 100x on the example row count at
some point during development and forgot to update the EXPLAIN output.

The patch I want to commit adds buffer outputs to these EXPLAINs, so I
kinda need to fix these before adding that, otherwise what I want to
add does not make any sense.

Patch attached. I propose to backpatch this fix.

David


v1-0001-Doc-fix-incorrect-EXPLAIN-ANALYZE-output-for-bloo.patch
Description: Binary data


Re: Proper object locking for GRANT/REVOKE

2024-12-09 Thread Noah Misch
On Mon, Dec 02, 2024 at 12:13:56PM +0100, Peter Eisentraut wrote:
> On 25.11.24 02:24, Noah Misch wrote:
> > commit d31bbfb wrote:
> > > --- a/src/backend/catalog/aclchk.c
> > > +++ b/src/backend/catalog/aclchk.c
> > > @@ -659,147 +659,77 @@ ExecGrantStmt_oids(InternalGrant *istmt)
> > >* objectNamesToOids
> > >*
> > >* Turn a list of object names of a given type into an Oid list.
> > > - *
> > > - * XXX: This function doesn't take any sort of locks on the objects whose
> > > - * names it looks up.  In the face of concurrent DDL, we might easily 
> > > latch
> > > - * onto an old version of an object, causing the GRANT or REVOKE 
> > > statement
> > > - * to fail.
> > 
> > To prevent "latch onto an old version" and remove the last sentence of the
> > comment, we'd need two more things:
> > 
> > - Use a self-exclusive lock here, not AccessShareLock.  With two sessions
> >doing GRANT under AccessShareLock, one will "latch onto an old version".
> > 
> > - Use a self-exclusive lock before *every* CatalogTupleUpdate() of a row 
> > that
> >GRANT/REVOKE can affect.  For example, today's locking in ALTER FUNCTION 
> > is
> >the xmax stamped on the old tuple.  If GRANT switched to
> >ShareUpdateExclusiveLock, concurrent ALTER FUNCTION could still cause 
> > GRANT
> >to "latch onto an old version".
> 
> Ok, we should probably put that comment back in slightly altered form, like
> 
> "XXX This function intentionally takes only an AccessShareLock ... $REASON.
> In the face of concurrent DDL, we might easily latch
> onto an old version of an object, causing the GRANT or REVOKE statement
> to fail."

Yep.

> > I wouldn't do those, however.  It would make GRANT ALL TABLES IN SCHEMA
> > terminate every autovacuum running in the schema and consume a shared lock
> > table entry per table in the schema.  I think the user-visible benefit of
> > commit d31bbfb plus this additional work is just changing "ERROR:  tuple
> > concurrently updated" to blocking.  That's not nothing, but I don't see it
> > outweighing autovacuum termination and lock table consumption spikes.  What
> > other benefits and drawbacks should we weigh?
> 
> I think what are describing is a reasonable tradeoff.  The user experience
> is tolerable: "tuple concurrently updated" is a mildly useful error message,
> and it's probably the table owner executing both commands.
> 
> The change to AccessShareLock at least prevents confusing "cache lookup
> failed" messages, and might alleviate some security concerns about swapping
> in a completely different object concurrently (even if we currently think
> this is not an actual problem).

Perhaps.  To me, the v17 behavior smells mildly superior to the v18 behavior.

> > > --- a/src/test/isolation/expected/intra-grant-inplace.out
> > > +++ b/src/test/isolation/expected/intra-grant-inplace.out
> > > @@ -248,6 +248,6 @@ relhasindex
> > >   ---
> > >   (0 rows)
> > > -s4: WARNING:  got: cache lookup failed for relation REDACTED
> > > +s4: WARNING:  got: relation "intra_grant_inplace" does not exist
> > 
> > The affected permutation existed to cover the first LockRelease() in
> > SearchSysCacheLocked1().  Since this commit, that line no longer has 
> > coverage.
> 
> Do you have an idea how such a test case could be constructed now?

A rough idea.  The test worked because REVOKE used only LOCKTAG_TUPLE, which
didn't mind the LOCKTAG_RELATION from DROP TABLE.

One route might be to find another SearchSysCacheLocked1() caller that takes
no locks beyond the LOCKTAG_TUPLE taken right in SearchSysCacheLocked1().  I'd
add a temporary elog to report if that's happening.
check_lock_if_inplace_updateable_rel() is an example of reporting the absence
of a lock.  If check-world w/ that elog finds some operation reaching that
circumstance, this test could replace REVOKE with that operation.

Another route would be to remove the catalog row without locking the
underlying object, e.g. by replacing DROP TABLE with DELETE FROM pg_class.
That's more artificial.




Re: Why we need to check for local buffers in BufferIsExclusiveLocked and BufferIsDirty?

2024-12-09 Thread Srinath Reddy Sadipiralla
 On Sun, 08 Dec 2024 18:23:21 +0530 Nazir Bilal Yavuz  
wrote ---



Hi Srinath, 
 
On Sat, 7 Dec 2024 at 11:17, Srinath Reddy Sadipiralla 
 wrote: 
> >  On Fri, 06 Dec 2024 16:40:51 +0530 Nazir Bilal Yavuz 
> >  wrote --- 
> > LGTM. 
> 
> sorry i didn't get,what you meant to say is the assert failure which i said 
> is correct and does my patch to this looks good?🤔 
 
Sorry if I was not clear. Yes, I wanted to say what you said is 
correct and the patch looks good to me. 
 
-- 
Regards, 
Nazir Bilal Yavuz 
Microsoft

Re: generic plans and "initial" pruning

2024-12-09 Thread Amit Langote
On Fri, Dec 6, 2024 at 5:18 PM Amit Langote  wrote:
> On Thu, Dec 5, 2024 at 11:07 PM Tomas Vondra  wrote:
> > On 12/5/24 12:28, Amit Langote wrote:
> > > On Thu, Dec 5, 2024 at 3:53 PM Amit Langote  
> > > wrote:
> > >> On Thu, Dec 5, 2024 at 2:20 AM Tomas Vondra  wrote:
> > >>> Sure, changing the APIs is allowed, I'm just wondering if maybe there
> > >>> might be a way to not have this issue, or at least notice the missing
> > >>> call early.
> > >>>
> > >>> I haven't tried, wouldn't it be better to modify ExecutorStart() to do
> > >>> the retries internally? I mean, the extensions wouldn't need to check if
> > >>> the plan is still valid, ExecutorStart() would take care of that. Yeah,
> > >>> it might need some new arguments, but that's more obvious.
> > >>
> > >> One approach could be to move some code from standard_ExecutorStart()
> > >> into ExecutorStart(). Specifically, the code responsible for setting
> > >> up enough state in the EState to perform ExecDoInitialPruning(), which
> > >> takes locks that might invalidate the plan. If the plan does become
> > >> invalid, the hook and standard_ExecutorStart() are not called.
> > >> Instead, the caller, ExecutorStartExt() in this case, creates a new
> > >> plan.
> > >>
> > >> This avoids the need to add ExecPlanStillValid() checks anywhere,
> > >> whether in core or extension code. However, it does mean accessing the
> > >> PlannedStmt earlier than InitPlan(), but the current placement of the
> > >> code is not exactly set in stone.
> > >
> > > I tried this approach and found that it essentially disables testing
> > > of this patch using the delay_execution module, which relies on the
> > > ExecutorStart_hook(). The way the testing works is that the hook in
> > > delay_execution.c pauses the execution of a cached plan to allow a
> > > concurrent session to drop an index referenced in the plan. When
> > > unpaused, execution initialization resumes by calling
> > > standard_ExecutorStart(). At this point, obtaining the lock on the
> > > partition whose index has been dropped invalidates the plan, which the
> > > hook detects and reports. It then also reports the successful
> > > re-execution of an updated plan that no longer references the dropped
> > > index.  Hmm.
> > >
> >
> > It's not clear to me why the change disables this testing, and I can't
> > try without a patch. Could you explain?
>
> Sorry, PFA the delta patch for the change I described above.  It
> applies on top of v58 series of patches that I posted yesterday.
> You'll notice that delay_execution test fails if you apply and do
> check-world.
>
> As for how the change breaks the testing, here is a before and after
> of the flow of a isolation test in
> src/test/modules/delay_execution/specs/cached-plan-inval.spec (s1 is
> the session used to run a cached plan, s2 to perform concurrent DDL
> that invalidates the plan):
>
> * Before (working):
>
> 1. s2 takes advisory lock
> 2. s1 runs cached plan -> goes to ExecutorStart_hook -> waits for the
> advisory lock
> 3. s2 drops an index referenced in the plan
> 4. s2 unlocks advisory lock
> 5. s1 locks unpruned partitions -> detects plan invalidation due to
> dropped index.
>
> * After (stops working because initial pruning and locking are done
> before calling ExecutorStart_hook):
>
> 1. s2 takes advisory lock
> 2. s1 runs cached plan -> locks unpruned partitions -> goes to
> ExecutorStart_hook to get advisory lock -> waits for advisory lock
> 3. s2 drops an index referenced in the plan -> waits for lock on the
> unpruned partition -> deadlock!
>
> One idea I had after sending the email yesterday is to introduce
> ExecutorStartCachedPlan_hook for the advisory lock based waiting.
> ExecutorStartCachedPlan() is the new function that you will find in
> v58-0004 that wraps ExecutorStart() to handle plan invalidation.  This
> new hook would be called before ExecutorStartCachedPlan() calls
> ExecutorStart(), so the original testing flow can still work.

Here's that patch with this idea implemented that fixes the
delay_execution test breakage.  Applies on top of v58 series of
patches.

However, as mentioned in my previous reply, since extensions might
need to adjust their ExecutorStart hook code to check if the RT index
is in EState.es_unpruned_relids when accessing child relations
directly via ExecGetRangeTableRelation(), I can accept them also
adding a check for ExecPlanStillValid() in their ExecutorStart hook.
So we may not want to add a new hook even if only for testing.

-- 
Thanks, Amit Langote


0005-Remove-the-need-to-check-if-plan-is-valid-from-Execu.patch
Description: Binary data


remove pgrminclude?

2024-12-09 Thread Peter Eisentraut
I propose to remove the pgrminclude scripts and annotations.  AFAICT, 
per git log, the last time someone tried to do something with it was 
around 2011.  Also, many (not all) of the "pgrminclude ignore" 
annotations are of a newer date but seem to have just been copied around 
during refactorings and file moves and don't seem to reflect an actual 
need anymore.


I think the include-what-you-use tool that I've been using lately is a 
better tool and can adequately replace pgrminclude.


I'm sending separately a patch to add some IWYU annotations, but these 
don't seem to correspond very strongly to pgrminclude annotations, so I 
don't see any value in keeping the old ones even for a transition.


Here are the scripts to remove:
1. pgcheckdefines -- This could still be useful???
2. pgcominclude -- redundant with headerscheck
3. pgdefine -- used internally only
4. pgfixinclude -- Could still be useful principle?
5. pgrminclude -- obsolete
From c6378ad116950e9ae9b3abad6ca3dbe99f465f0c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 9 Dec 2024 08:08:02 +0100
Subject: [PATCH 1/2] Remove pgrminclude annotations

---
 src/backend/access/brin/brin.c| 2 +-
 src/backend/access/nbtree/nbtsort.c   | 2 +-
 src/backend/regex/regerror.c  | 2 +-
 src/backend/utils/adt/inet_net_pton.c | 3 +--
 src/bin/initdb/initdb.c   | 4 ++--
 src/bin/pg_amcheck/pg_amcheck.c   | 2 +-
 src/bin/scripts/common.h  | 4 ++--
 src/include/common/ip.h   | 2 +-
 src/include/common/relpath.h  | 6 +++---
 src/include/libpq/hba.h   | 2 +-
 src/include/libpq/ifaddr.h| 2 +-
 src/include/pg_trace.h| 2 +-
 src/include/snowball/header.h | 2 +-
 src/pl/plpgsql/src/pl_comp.c  | 2 +-
 src/pl/tcl/pltcl.c| 2 +-
 15 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3aedec882cd..9af445cdcdd 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -33,7 +33,7 @@
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
-#include "tcop/tcopprot.h" /* pgrminclude ignore */
+#include "tcop/tcopprot.h"
 #include "utils/acl.h"
 #include "utils/datum.h"
 #include "utils/fmgrprotos.h"
diff --git a/src/backend/access/nbtree/nbtsort.c 
b/src/backend/access/nbtree/nbtsort.c
index 17a352d040c..28522c0ac1c 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -51,7 +51,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bulk_write.h"
-#include "tcop/tcopprot.h" /* pgrminclude ignore */
+#include "tcop/tcopprot.h"
 #include "utils/rel.h"
 #include "utils/sortsupport.h"
 #include "utils/tuplesort.h"
diff --git a/src/backend/regex/regerror.c b/src/backend/regex/regerror.c
index 4a27c2552cb..c69aaf27747 100644
--- a/src/backend/regex/regerror.c
+++ b/src/backend/regex/regerror.c
@@ -46,7 +46,7 @@ static const struct rerr
 
 {
/* the actual table is built from regex.h */
-#include "regex/regerrs.h" /* pgrminclude ignore */
+#include "regex/regerrs.h"
{
-1, "", "oops"
},  /* explanation 
special-cased in code */
diff --git a/src/backend/utils/adt/inet_net_pton.c 
b/src/backend/utils/adt/inet_net_pton.c
index d3221a13139..ef2236d9f04 100644
--- a/src/backend/utils/adt/inet_net_pton.c
+++ b/src/backend/utils/adt/inet_net_pton.c
@@ -29,8 +29,7 @@ static const char rcsid[] = "Id: inet_net_pton.c,v 1.4.2.3 
2004/03/17 00:40:11 m
 #include 
 #include 
 
-#include "utils/builtins.h" /* pgrminclude ignore */   /* needed on some
-   
 * platforms */
+#include "utils/builtins.h"/* needed on some platforms */
 #include "utils/inet.h"
 
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9a91830783e..b3c7da0e835 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -66,9 +66,9 @@
 
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
-#include "catalog/pg_class_d.h" /* pgrminclude ignore */
+#include "catalog/pg_class_d.h"
 #include "catalog/pg_collation_d.h"
-#include "catalog/pg_database_d.h" /* pgrminclude ignore */
+#include "catalog/pg_database_d.h"
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 27a7d5e925e..a50a0268986 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -26,7 +26,7 @@
 #include "fe_utils/query_utils.h"
 #include "fe_utils/simple_list.h"
 #include "fe_utils/string_utils.h"
-#include "getopt_long.h"   /* pgrminclude ignore */
+#include "getopt_long.

Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY

2024-12-09 Thread Amit Kapila
On Fri, Dec 6, 2024 at 11:10 AM Shlok Kyal  wrote:
>
> Thanks Peter, for pointing this out.
> I also feel that the error message suggested by Amit would be better.
> I have attached a patch for the same.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Refactoring postmaster's code to cleanup after child exit

2024-12-09 Thread Tomas Vondra
On 11/14/24 15:13, Heikki Linnakangas wrote:
> On 09/10/2024 23:40, Heikki Linnakangas wrote:
>> I pushed the first three patches, with the new test and one of the small
>> refactoring patches. Thanks for all the comments so far! Here is a new
>> version of the remaining patches.
>>
Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:


t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 4.
t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
All 4 subtests passed


and tmp_check/log/regress_log_001_connection_limits says:


[23:48:44.444](1.129s) ok 3 - reserved_connections limit
[23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
process ended prematurely at
/home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 154.
# Postmaster PID for node "primary" is 198592


That BackgroundPsql.pm line is this in wait_connect()

  $self->{run}->pump()
until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

By trial and error I found that it fails on this line 70:

  push(@sessions, background_psql_as_user('regress_superuser'));

but I have no idea idea why. There are multiple similar calls a couple
lines earlier, and those work fine. And various other TAP tests with
background_sql() work fine too.

So what's so special about this particular line?


regards

-- 
Tomas Vondra





Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-12-09 Thread Alexander Korotkov
Hi!

On Fri, Aug 30, 2024 at 11:43 AM Dmitry Koval  wrote:
> I plan to prepare fixes for issues from email [1] as separate commits
> (for better code readability). Attachment in this email is a variant of
> fix for the issue:
>
>  > 1. Function createPartitionTable() should be rewritten using
>  > partitioned table OID (not name) and without using ProcessUtility().
>
> Patch "Refactor createPartitionTable to remove ProcessUtility call"
> contains code changes + test (see file
> v33-0003-Refactor-createPartitionTable-to-remove-ProcessU.patch).
>
> But I'm not sure that refactoring createPartitionTable is the best
> solution. PostgreSQL code has issue CVE-2014-0062 (commit 5f17304) - see
> relation_openrv() call in expandTableLikeClause() function [2] (opening
> relation by name after we got relation Oid).
> Example for reproduce relation_openrv() call:
>
> CREATE TABLE t (b bigint, i int DEFAULT 100);
> CREATE TABLE t1 (LIKE t_bigint INCLUDING ALL);
>
> Commit 04158e7fa3 [3] (by Alexander Korotkov) might be a good fix for
> this issue. But if we keep commit 04158e7fa3, do we need to refactor the
> createPartitionTable function (for removing ProcessUtility)?
> Perhaps the existing code
> 1) v33-0002-Implement-ALTER-TABLE-.-SPLIT-PARTITION-.-comman.patch
> 2) v33-0003-Refactor-createPartitionTable-to-remove-ProcessU.patch +
> with patch 04158e7fa3 will look better.
>
>
> I would be very grateful for comments and suggestions.

Thank you for continuing your work on the subject.  The patches
currently doesn't apply cleanly.  Please, rebase.

I think getting away from expandTableLikeClause() is the right
direction to resolve the security problems.  That looks great, it
finally not as complex as I thought.  I think the code requires some
polishing: you need to revise the comments given its not part of LIKE
clause handling anymore.

I see fixes for the issues mentioned in [1] and [2] are still not
implemented.  Do you plan to do this in this release cycle?

Links.
1. 
https://www.postgresql.org/message-id/CA%2BTgmoY0%3DbT_xBP8csR%3DMFE%3DFxGE2n2-me2-31jBOgEcLvW7ug%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/859476bf-3cb0-455e-b093-b8ab5ef17f0e%40postgrespro.ru

--
Regards,
Alexander Korotkov
Supabase




Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andrea Gelmini
Il Lun 9 Dic 2024, 10:19 Michael Harris  ha scritto:

>
> Is this already being looked into?
>

Funny, i guess it's the same reason I see randomly complain of WhatsApp web
interface, on Chrome, since I switched to XFS. It says something like "no
more space on disk" and logout, with more than 300GB available.

Anyway, just a stupid hint, I would try to write to XFS mailing list. There
you can reach XFS maintainers of Red Hat and the usual historical
developers, of course!!!


Re: proposal: schema variables

2024-12-09 Thread jian he
On Mon, Dec 9, 2024 at 2:33 AM Pavel Stehule  wrote:
>
> Hi
>
again. only applied
v20241208-0001-Enhancing-catalog-for-support-session-variables-and-.patch.

/* we want SessionVariableCreatePostprocess to see the catalog changes. */
0001 doesn't have SessionVariableCreatePostprocess,
so this comment is wrong in the context of 0001.

typo:
As above, but if the variable isn't found and is_mussing is not NULL
is_mussing should be is_missing.

--
issue with  grant.sgml and revoke.sgml.

* there are no regress tests for WITH GRANT OPTION but it seems to work;
there are no REVOKE CASCADE tests. the following tests show
REVOKE CASCADE works.

create variable v1 as int;
GRANT select on VARIABLE v1 to alice with grant option;
set session authorization alice;
GRANT select on VARIABLE v1 to bob with grant option;
reset session authorization;

select varacl from pg_variable where varname  = 'v1';
--output
{jian=rw/jian,alice=r*/jian,bob=r*/alice}
revoke all privileges on variable v1 from alice cascade;
select varacl from pg_variable where varname  = 'v1';
--output
 {jian=rw/jian}

revoke GRANT OPTION FOR all privileges on variable v1 from alice cascade;
also works.

* in revoke.sgml and grant.sgml.
if you look closely, " | ALL VARIABLES IN SCHEMA schema_name [, ...] }" is wrong
because there is no left-curly-bracket, but there is a right-curly-bracket.

* in revoke.sgml.
where role_specification can be:
[ GROUP ] role_name
  | PUBLIC
  | CURRENT_ROLE
  | CURRENT_USER
  | SESSION_USER
should be at the end of the synopsis section?
otherwise it looks weird, maybe we can put the REVOKE VARIABLE code upper.
grant.sgml changes position looks fine to me.


*  
   The GRANT command has two basic variants: one
   that grants privileges on a database object (table, column, view,
   foreign table, sequence, database, foreign-data wrapper, foreign server,
   function, procedure, procedural language, large object, configuration
   parameter, schema, tablespace, or type), and one that grants
   membership in a role.  These variants are similar in many ways, but
   they are different enough to be described separately.
  
this  in grant.sgml needs to also mention "variable"?

*   
Privileges on databases, tablespaces, schemas, languages, and
configuration parameters are
PostgreSQL extensions.
   
this  in grant.sgml needs to also mention "variable"?

--
*
+   
+Inside a query or an expression, a session variable can be
+shadowed by a column with the same name.  Similarly, the
+name of a function or procedure argument or a PL/pgSQL variable (see
PL/pgSQL should decorated as PL/pgSQL

* we already support \dV and \dV+ in
v20241208-0001-Enhancing-catalog-for-support-session-variables-and-.patch
so we should update doc/src/sgml/ref/psql-ref.sgml also.
I briefly searched \dV in
v20241208-0002-Storage-for-session-variables-and-SQL-interface.patch,
no entry.


* in doc/src/sgml/ddl.sgml
 and  need to change for variable?
,  need to change for variable?

it's kind of tricky. if we only apply
v20241208-0001-Enhancing-catalog-for-support-session-variables-and-.patch
we can not  SELECT or  UPDATE variable.
so how are we going to mention these privileges for variable?


 * we can add some tests for EVENT TRIGGER test,
since event trigger support CREATE|DROP variable. atually, I think
there is a bug.

create variable v1 as int;
CREATE OR REPLACE FUNCTION event_trigger_report_dropped()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE r record;
BEGIN
FOR r IN SELECT * from pg_event_trigger_dropped_objects()
LOOP
IF NOT r.normal AND NOT r.original THEN
CONTINUE;
END IF;
RAISE NOTICE 'NORMAL: orig=% normal=% istemp=% type=% identity=% name=% args=%',
r.original, r.normal, r.is_temporary, r.object_type,
r.object_identity, r.address_names, r.address_args;
END LOOP;
END; $$;

CREATE EVENT TRIGGER regress_event_trigger_report_dropped ON sql_drop
WHEN TAG IN ('DROP VARIABLE')
EXECUTE PROCEDURE event_trigger_report_dropped();

--output:
src9=# drop variable v1;
NOTICE:  test_event_trigger: ddl_command_start DROP VARIABLE
NOTICE:  NORMAL: orig=t normal=f istemp=f type=session variable
identity=public.v1 name={public,$} args={}
DROP VARIABLE

should i expect
NOTICE:  NORMAL: orig=t normal=f istemp=f type=session variable
identity=public.v1 name={public,$} args={}
to be
NOTICE:  NORMAL: orig=t normal=f istemp=f type=session variable
identity=public.v1 name={public,v1} args={}
?




Re: Add a warning message when using unencrypted passwords

2024-12-09 Thread Andrei Lepikhov

On 7/12/2024 21:39, Guillaume Lelarge wrote:
I'm interested in any comments about this. I didn't create a commitfest 
entry yet, I'm mostly waiting on your comments.
My first impression is that such a GUC may be generalised - something 
like a 'security_warnings' parameter.
Also, you can check the value of the log_statement and not report at all 
if it is set to off.


--
regards, Andrei Lepikhov




Re: Track the amount of time waiting due to cost_delay

2024-12-09 Thread Masahiro Ikeda

On 2024-12-06 18:31, Bertrand Drouvot wrote:

Hi,

On Thu, Dec 05, 2024 at 10:43:51AM +, Bertrand Drouvot wrote:
Yeah, people would likely use this new field to monitor long running 
vacuum.

Long enough that this error should be acceptable. Do you agree?


OTOH, adding the 100% accuracy looks as simple as v9-0002 attached 
(0001 is

same as for v8), so I think we should provide it.


Thanks! The patch looks good to me.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

2024-12-09 Thread Andrei Lepikhov

On 8/12/2024 23:37, Andres Freund wrote:

On 2024-12-08 15:44:23 +0700, Andrei Lepikhov wrote:

On 8/12/2024 09:52, Andres Freund wrote:

I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.


How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?



In my mind, this trick can be designed for specific cases like sales tables,
as illustrated before and used by well-rounded developers.


I'm not saying that the optimization isn't useful, just that i don't think it
makes much sense for mergeoins.

Besides, at least as far as I can see, fundamentally not optimizing mergejoins
in any substantial manner, it also just doesn't seem likely that queries where
this optimization would come up are likely to be planned as mergejoins. If you
have a leftjoin to a category-style table, you're very rarely going to scan
the "main" table with an ordered index scan on the category key, which would
be required for a merge join. And sorting the main table once for each
to-be-joined-category-table isn't a desirable path most of the time either.

I don't know what it means that it's to be "used by well-rounded
developers". We have to write the implementation in way it works regardless of
what kind of developer is using postgres.
I got it, thanks. I agree that the profit for MJ & HJ cases looks 
modest. When designing the prototype, I realised that it seemed more 
"symmetrical" and logical to separate such 'one-sided' clauses during 
the planning for all types of join.
But I don't insist on this statement. It would be OK even in the case of 
NestLoop (remember, we should change the NL cost model too).



I'm not sure that such optimisation would be profitable in general.


Are you suggesting it would only be enabled by a GUC?
No, such a feature probably won't generate much overhead; I don't see 
any reason to cover it under a GUC.


--
regards, Andrei Lepikhov




WARNING: missing lock on database "postgres" (OID 5) @ TID (0,4)

2024-12-09 Thread Alexander Kukushkin
Hello Hacker,

When Postgres is compiled with --enable-cassert I get subj when doing the
following:
postgres=# create user test;
CREATE ROLE
postgres=# alter database postgres owner to test;
ALTER DATABASE
postgres=# reassign owned by test to postgres;
WARNING:  missing lock on database "postgres" (OID 5) @ TID (0,4)
REASSIGN OWNED

It was discovered on 16.6, however the master shows the same behaviour.
I suspect it to be due to aac2c9b4fde889d13f859c233c2523345e72d32b.

Regards,
--
Alexander Kukushkin


Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Jakub Wartak
On Mon, Dec 9, 2024 at 10:19 AM Michael Harris  wrote:

Hi Michael,

We found this thread describing similar issues:
>
>
> https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com
>

We've got some case in the past here in EDB, where an OS vendor has blamed
XFS AG fragmentation (too many AGs, and if one AG is not having enough
space -> error). Could You perhaps show us output of on that LUN:
1. xfs_info
2. run that script from https://www.suse.com/support/kb/doc/?id=18219
for Your AG range

-J.


Re: Conflict detection for update_deleted in logical replication

2024-12-09 Thread Amit Kapila
On Mon, Dec 9, 2024 at 3:20 PM Nisha Moond  wrote:
>
> Here is a summary of tests targeted to the Publisher node in a
> Publisher-Subscriber setup.
> (All tests done with v14 patch-set)
>
> 
> Performance Tests:
> 
> Test machine details:
> Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :120 - 800GB RAM
>
> Setup:
> - Created two nodes ( 'Pub' and 'Sub'), with logical replication.
> - Configurations for Both Nodes:
>
> shared_buffers = 40GB
> max_worker_processes = 32
> max_parallel_maintenance_workers = 24
> max_parallel_workers = 32
> checkpoint_timeout = 1d
> max_wal_size = 24GB
> min_wal_size = 15GB
> autovacuum = off
>
> - Additional setting on Sub: 'track_commit_timestamp = on' (required
> for the feature).
> - Initial data insertion via 'pgbench' with scale factor 100 on both nodes.
>
> Workload:
> - Ran pgbench with 60 clients for the publisher.
> - The duration was 120s, and the measurement was repeated 10 times.
>

You didn't mention it is READONLY or READWRITE tests but I think it is
later. I feel it is better to run these tests for 15 minutes, repeat
them 3 times, and get the median data for those. Also, try to run it
for lower client counts like 2, 16, 32. Overall, the conclusion may be
same but it will rule out the possibility of any anomaly.

With Regards,
Amit Kapila.




Re: Memory leak in WAL sender with pgoutput (v10~)

2024-12-09 Thread Amit Kapila
On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada  wrote:
>
> >
> > I realized that this patch cannot be backpatched because it introduces a new
> > field into the public PGOutputData structure. Therefore, I think we may 
> > need to
> > use Alvaro's version [1] for the back branches.
>
> FWIW for back branches, I prefer using the foreach-pfree pattern
> Michael first proposed, just in case. It's not elegant but it can
> solve the problem while there is no risk of breaking non-core
> extensions.
>

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Tomas Vondra



On 12/9/24 08:34, Michael Harris wrote:
> Hello PG Hackers
> 
> Our application has recently migrated to PG16, and we have experienced
> some failed upgrades. The upgrades are performed using pg_upgrade and
> have failed during the phase where the schema is restored into the new
> cluster, with the following error:
> 
> pg_restore: error: could not execute query: ERROR:  could not extend
> file "pg_tblspc/16401/PG_16_202307071/17643/1249.1" with
> FileFallocate(): No space left on device
> HINT:  Check free disk space.
> 
> This has happened multiple times on different servers, and in each
> case there was plenty of free space available.
> 
> We found this thread describing similar issues:
> 
> https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com
> 
> As is the case in that thread, all of the affected databases are using XFS.
> 
> One of my colleagues built postgres from source with
> HAVE_POSIX_FALLOCATE not defined, and using that build he was able to
> complete the pg_upgrade, and then switched to a stock postgres build
> after the upgrade. However, as you might expect, after the upgrade we
> have experienced similar errors during regular operation. We make
> heavy use of COPY, which is mentioned in the above discussion as
> pre-allocating files.
> 
> We have seen this on both Rocky Linux 8 (kernel 4.18.0) and Rocky
> Linux 9 (Kernel 5.14.0).
> 
> I am wondering if this bug might be related:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791323
> 
>> When given an offset of 0 and a length, fallocate (man 2 fallocate) reports 
>> ENOSPC if the size of the file + the length to be allocated is greater than 
>> the available space.
> 
> There is a reproduction procedure at the bottom of the above ubuntu
> thread, and using that procedure I get the same results on both kernel
> 4.18.0 and 5.14.0.
> When calling fallocate with offset zero on an existing file, I get
> enospc even if I am only requesting the same amount of space as the
> file already has.
> If I repeat the experiment with ext4 I don't get that behaviour.
> 
> On a surface examination of the code paths leading to the
> FileFallocate call, it does not look like it should be trying to
> allocate already allocated space, but I might have missed something
> there.
> 
> Is this already being looked into?
> 

Sounds more like an XFS bug/behavior, so it's not clear to me what we
could do about it. I mean, if the filesystem reports bogus out-of-space,
is there even something we can do?

What is not clear to me is why would this affect pg_upgrade at all. We
have the data files split into 1GB segments, and the copy/clone/... goes
one by one. So there shouldn't be more than 1GB "extra" space needed.
Surely you have more free space on the system?


regards

-- 
Tomas Vondra





Re: Parallel heap vacuum

2024-12-09 Thread Peter Smith
Hi Sawada-San.

FWIW, here are the remainder of my review comments for the patch v4-0001

==
src/backend/access/heap/vacuumlazy.c

lazy_scan_heap:

2.1.

+ /*
+ * Do the actual work. If parallel heap vacuum is active, we scan and
+ * vacuum heap with parallel workers.
+ */

/with/using/

~~~

2.2.
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ do_lazy_scan_heap(vacrel);

The do_lazy_scan_heap() returns a boolean and according to that
function comment it should always be true if it is not using the
parallel heap scan. So should we get the function return value here
and Assert that it is true?

~~~

2.3.

Start uppercase even for all the single line comments for consistency
with exasiting code.

e.g.
+ /* report that everything is now scanned */

e.g
+ /* now we can compute the new value for pg_class.reltuples */

e.g.
+ /* report all blocks vacuumed */

~~~

heap_vac_scan_next_block_parallel:

2.4.
+/*
+ * A parallel scan variant of heap_vac_scan_next_block.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+   bool *all_visible_according_to_vm)

The function comment should explain the return value.

~~~

2.5.
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+
+ if (vacrel->aggressive)
+ break;

Unnecessary whitespace.

~~~

dead_items_alloc:

2.6.
+ /*
+ * We initialize parallel heap scan/vacuuming or index vacuuming
+ * or both based on the table size and the number of indexes. Note
+ * that only one worker can be used for an index, we invoke
+ * parallelism for index vacuuming only if there are at least two
+ * indexes on a table.
+ */
  vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
 vacrel->nindexes, nworkers,
 vac_work_mem,
 vacrel->verbose ? INFO : DEBUG2,
-vacrel->bstrategy);
+vacrel->bstrategy, (void *) vacrel);

Is this information misplaced? Why describe here "only one worker" and
"at least two indexes on a table" I don't see anything here checking
those conditions.

~~~

heap_parallel_vacuum_compute_workers:

2.7.
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation.  This probably needs to be a good deal more
+ * sophisticated, but we need something here for now.  Note that the
+ * upper limit of the min_parallel_table_scan_size GUC is chosen to
+ * prevent overflow here.
+ */

The "This probably needs to..." part maybe should have an "XXX" marker
in the comment which AFAIK is used to highlight current decisions and
potential for future changes.

~~~

heap_parallel_vacuum_initialize:

2.8.
There is inconsistent capitalization of the single-line comments in
this function. The same occurs in many functions in this file. but it
is just a bit more obvious in this one. Please see all the others too.

~~~

parallel_heap_complete_unfinised_scan:

2.9.
+static void
+parallel_heap_complete_unfinised_scan(LVRelState *vacrel)

TYPO in function name /unfinised/unfinished/

~~~

2.10.
+ if (!wstate->maybe_have_blocks)
+
+ continue;

Unnecessary blank line.

~~~

2.11.
+
+ /* Attache the worker's scan state and do heap scan */
+ vacrel->phvstate->myscanstate = wstate;
+ scan_done = do_lazy_scan_heap(vacrel);

/Attache/Attach/

~~~

2.12.
+ /*
+ * We don't need to gather the scan statistics here because statistics
+ * have already been accumulated the leaders statistics directly.
+ */

"have already been accumulated the leaders" -- missing word there somewhere?

~~~

do_parallel_lazy_scan_heap:

2.13.
+ /*
+ * If the heap scan paused in the middle of the table due to full of
+ * dead_items TIDs, perform a round of index and heap vacuuming.
+ */
+ if (!scan_done)
+ {
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ if (vacrel->phvstate->min_blkno > vacrel->next_fsm_block_to_vacuum)
+ {
+ /*
+ * min_blkno should have already been updated when gathering
+ * statistics
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ vacrel->phvstate->min_blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_blkno;
+ }
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* re-launcher workers */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ continue;
+ }
+
+ /* We reach the end of the table */
+ break;

Instead of:

if (!scan_done)
{
   
   continue;
}

break;

Won't it be better to refactor like:

SUGGESTION
if (scan_done)
  break;



~~~

2.14.
+ /*
+ * The parallel heap vacuum finished, but it's possible that some workers
+ * have allocated blocks but not processed yet. This can happen for
+ * example when worke

Re: fixing tsearch locale support

2024-12-09 Thread Peter Eisentraut
I have expanded this patch set.  The first three patches are the same as 
before.  I have added a new patch that gets rid of lowerstr() from 
ts_locale.c and replaces it with the standard str_tolower() that 
everyone else is using.


lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as 
str_tolower(), except that the former don't use the common locale 
provider framework but instead use the global libc locale settings.


This patch replaces uses of lowerstr*() with str_tolower(..., 
DEFAULT_COLLATION_OID).  For instances that use a libc locale globally, 
this will result in exactly the same behavior.  For instances that use 
other locale providers, you now get consistent behavior and are no 
longer dependent on the libc locale settings.


Most uses of these functions are for processing dictionary and 
configuration files.  In those cases, using the default collation seems 
appropriate.  At least we don't have a more specific collation 
available.  But the code in contrib/pg_trgm should really depend on the 
collation of the columns being processed.  This is not done here, this 
can be done in a separate patch.


(You can probably construct some edge cases where this change would 
create some locale-related upgrade incompatibility, for example if 
before you used a combination of ICU and a differently-behaving libc 
locale.  We can document this in the release notes, but I don't think 
there is anything more we can do about this.)


After these patches, the only problematic things left in ts_locale.{c,h} are

extern int  t_isalpha(const char *ptr);
extern int  t_isalnum(const char *ptr);

My current assessment is that these are best addressed after patch [0], 
because we need locale-provider-aware character classification functions.


[0]: 
https://www.postgresql.org/message-id/flat/2830211e1b6e6a2e26d845780b03e125281ea17b.ca...@j-davis.com



On 02.12.24 11:57, Peter Eisentraut wrote:
Infamously, the tsearch locale support in src/backend/tsearch/ 
ts_locale.c still depends on libc environment variable locale settings 
and is not caught up with pg_locale_t, collations, ICU, and all that 
newer stuff.  This is used in the tsearch facilities themselves, but 
also in other modules such as ltree, pg_trgm, and unaccent.


Several of the functions are wrappers around  functions, like

int
t_isalpha(const char *ptr)
{
     int clen = pg_mblen(ptr);
     wchar_t character[WC_BUF_LEN];
     pg_locale_t mylocale = 0;   /* TODO */

     if (clen == 1 || database_ctype_is_c)
     return isalpha(TOUCHAR(ptr));

     char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);

     return iswalpha((wint_t) character[0]);
}

So this has multibyte and encoding awareness, but does not observe 
locale provider or collation settings.


As an easy start toward fixing this, I think several of these functions 
we don't even need.


t_isdigit() and t_isspace() are just used to parse various configuration 
and data files, and surely we don't need support for encoding-dependent 
multibyte support for parsing ASCII digits and ASCII spaces.  At least, 
I didn't find any indications in the documentation of these file formats 
that they are supposed to support that kind of thing.  So these can be 
replaced by the normal isdigit() and isspace().


There is one call to t_isprint(), which is similarly used only to parse 
some flags in a configuration file.  From the surrounding code you can 
deduce that it's only called on single-byte characters, so it can 
similarly be replaced by plain issprint().


Note, pg_trgm has some compile-time options with macros such as 
KEEPONLYALNUM and IGNORECASE.  AFAICT, these are not documented, and the 
non-default variant is not supported by any test cases.  So as part of 
this undertaking, I'm going to remove the non-default variants if they 
are in the way of cleanup.
From 29800e82d24e70bd7e127ea1e6574dbbf89684a6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 2 Dec 2024 11:34:17 +0100
Subject: [PATCH v2 1/4] Remove t_isdigit()

---
 contrib/ltree/ltree_io.c|  8 
 src/backend/tsearch/spell.c |  4 ++--
 src/backend/tsearch/ts_locale.c | 15 ---
 src/backend/utils/adt/tsquery.c |  2 +-
 src/backend/utils/adt/tsvector_parser.c |  4 ++--
 src/include/tsearch/ts_locale.h |  1 -
 6 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index 11eefc809b2..b54a15d6c68 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -411,7 +411,7 @@ parse_lquery(const char *buf, struct Node *escontext)
case LQPRS_WAITFNUM:
if (t_iseq(ptr, ','))
state = LQPRS_WAITSNUM;
-   else if (t_isdigit(ptr))
+   else if (isdigit((unsigned char) *ptr))
{

FileFallocate misbehaving on XFS

2024-12-09 Thread Michael Harris
Hello PG Hackers

Our application has recently migrated to PG16, and we have experienced
some failed upgrades. The upgrades are performed using pg_upgrade and
have failed during the phase where the schema is restored into the new
cluster, with the following error:

pg_restore: error: could not execute query: ERROR:  could not extend
file "pg_tblspc/16401/PG_16_202307071/17643/1249.1" with
FileFallocate(): No space left on device
HINT:  Check free disk space.

This has happened multiple times on different servers, and in each
case there was plenty of free space available.

We found this thread describing similar issues:

https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com

As is the case in that thread, all of the affected databases are using XFS.

One of my colleagues built postgres from source with
HAVE_POSIX_FALLOCATE not defined, and using that build he was able to
complete the pg_upgrade, and then switched to a stock postgres build
after the upgrade. However, as you might expect, after the upgrade we
have experienced similar errors during regular operation. We make
heavy use of COPY, which is mentioned in the above discussion as
pre-allocating files.

We have seen this on both Rocky Linux 8 (kernel 4.18.0) and Rocky
Linux 9 (Kernel 5.14.0).

I am wondering if this bug might be related:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791323

> When given an offset of 0 and a length, fallocate (man 2 fallocate) reports 
> ENOSPC if the size of the file + the length to be allocated is greater than 
> the available space.

There is a reproduction procedure at the bottom of the above ubuntu
thread, and using that procedure I get the same results on both kernel
4.18.0 and 5.14.0.
When calling fallocate with offset zero on an existing file, I get
enospc even if I am only requesting the same amount of space as the
file already has.
If I repeat the experiment with ext4 I don't get that behaviour.

On a surface examination of the code paths leading to the
FileFallocate call, it does not look like it should be trying to
allocate already allocated space, but I might have missed something
there.

Is this already being looked into?

Thanks in advance,

Cheers
Mike




Re: Giving the shared catalogues a defined encoding

2024-12-09 Thread Thomas Munro
On Sat, Dec 7, 2024 at 7:51 AM Tom Lane  wrote:
> Over in the discussion of bug #18735, I've come to the realization
> that these problems apply equally to the filesystem path names that
> the server deals with: not only the data directory path, but the
> path to the installation files [1].  Can we apply the same sort of
> restrictions to those?  I'm envisioning that initdb would check
> either encoding-validity or all-ASCII-ness of those path names
> depending on which mode it's setting the server up in.

Rabbit hole engaged.  I am working on a generalisation, renamed to
just CLUSTER ENCODING, covering all text that is not in a database but
might be visible though database glasses (and probably more).  Looking
into pathnames and GUCs now.  More soon.




Re: Popcount optimization using SVE for ARM

2024-12-09 Thread devanga.susmi...@fujitsu.com
Hi Nathan,

Thanks for your suggestion. I’ll move this discussion to the existing thread so 
we can all work together toward a more efficient solution.

Regards,
Susmitha Devanga.


From: Nathan Bossart 
Sent: Friday, December 6, 2024 21:09
To: Kirill Reshke 
Cc: Susmitha, Devanga ; pgsql-hackers 
; Hajela, Ragesh ; 
Bhattacharya, Chiranmoy ; M A, Rajat 

Subject: Re: Popcount optimization using SVE for ARM

I suggest we move this discussion to the existing thread on this subject:


https://www.postgresql.org/message-id/flat/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-00%40us-west-2.amazonses.com

--
nathan


Re: Add a warning message when using unencrypted passwords

2024-12-09 Thread Jelte Fennema-Nio
On Sat, 7 Dec 2024 at 15:40, Guillaume Lelarge  wrote:
> > Whenever log_statement is set to all (which I understand should be done for 
> > a short period of time for troubleshooting purposes only), if we change the 
> > password for a user, or create a new user, the passwords would be logged in 
> > plain text. From a security point of view, this should not be allowed. 
> > Ideally, It should error out (or at least throw a warning) saying “while 
> > log_statement is set to ‘all’, you shouldn’t change passwords/create new 
> > user with passwords”.
>
> While I dislike the idea of throwing an error, I found the idea of a warning 
> message really great. So kudos to her for the idea!

+1 for more clearly letting people know that what they're doing is not
recommended from a security standpoint.

Regarding warning vs error, I agree that a WARNING is probably the
right choice generally. But I think that Divya is correct: When
log_statement = 'all', an error should be thrown instead. Because when
that is the case, we know for sure that the password will be leaked to
the logs. And that error should contain something like: You should
consider this password compromised.

Throwing an error always actually has an interesting downside: We then
automatically log the statement, and thus the password to the log.
When I change the level to ERROR in your code, I get the following
(but with WARNING the STATEMENT line is not there):

2024-12-08 22:59:50.967 CET [104900] ERROR:  using a plaintext
password in a query
2024-12-08 22:59:50.967 CET [104900] DETAIL:  plaintext password may be logged.
2024-12-08 22:59:50.967 CET [104900] HINT:  Refer to the PostgreSQL
documentation for details about using encrypted password in queries.
2024-12-08 22:59:50.967 CET [104900] STATEMENT:  ALTER ROLE jelte
PASSWORD 'abc';

PS. I created a commit fest entry here:
https://commitfest.postgresql.org/51/5426/




Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Tomas Vondra


On 12/9/24 10:47, Andrea Gelmini wrote:
> 
> 
> Il Lun 9 Dic 2024, 10:19 Michael Harris  > ha scritto:
> 
> 
> Is this already being looked into?
> 
> 
> Funny, i guess it's the same reason I see randomly complain of WhatsApp
> web interface, on Chrome, since I switched to XFS. It says something
> like "no more space on disk" and logout, with more than 300GB available.
> 

If I understand the fallocate issue correctly, it essentially ignores
the offset, so "fallocate -o 0 -l LENGTH" fails if

LENGTH + CURRENT_LENGTH > FREE_SPACE

But if you have 300GB available, that'd mean you have a file that's
close to that size already. But is that likely for WhatsApp?

> Anyway, just a stupid hint, I would try to write to XFS mailing list.
> There you can reach XFS maintainers of Red Hat and the usual historical
> developers, of course!!!
> 

Yes, I think that's a better place to report this. I don't think we're
doing anything particularly weird / wrong with fallocate().


regards

-- 
Tomas Vondra





Re: shared-memory based stats collector - v70

2024-12-09 Thread Bertrand Drouvot
Hi,

On Mon, Dec 09, 2024 at 02:39:58PM +0900, Michael Paquier wrote:
> On Sat, Dec 07, 2024 at 12:31:46PM +0300, Anton A. Melnikov wrote:
> > Completely agree that the original comment needs to be revised,
> > since it implies that it is normal for deleted entries to be here,
> > but it is not the case.
> 
> Yep, so applied v2-0001 to document that, and backpatched it as it is
> kind of important to know about.
> 
> > Maybe it's worth adding a warning as well,
> > similar to the one a few lines below in the code?
> 
>  Assert(!ps->dropped);
>  if (ps->dropped)
> +{
> +PgStat_HashKey key = ps->key;
> +elog(WARNING, "found non-deleted stats entry %u/%u/%llu"
> +  "at server shutdown",

There is a missing space. I think that should be " at server..." or "...%llu ".

> +   key.kind, key.dboid,
> +   (unsigned long long) key.objid);
>  continue;
> +}
>  
>  /*
>   * This discards data related to custom stats kinds that are unknown
> 
> Not sure how to feel about this suggestion, though.  This would
> produce a warning when building without assertions, but the assertion
> would likely let us more information with a trace during development,
> so..

Right. OTOH I think that could help the tap test added in da99fedf8c to not
rely on assert enabled build (the tap test could "simply" check for the
WARNING in the logfile instead).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Track the amount of time waiting due to cost_delay

2024-12-09 Thread Bertrand Drouvot
Hi,

On Mon, Dec 09, 2024 at 05:18:30PM +0530, Dilip Kumar wrote:
> On Mon, Dec 9, 2024 at 2:51 PM Masahiro Ikeda  
> wrote:
> >
> > On 2024-12-06 18:31, Bertrand Drouvot wrote:
> > > Hi,
> > >
> > > On Thu, Dec 05, 2024 at 10:43:51AM +, Bertrand Drouvot wrote:
> > >> Yeah, people would likely use this new field to monitor long running
> > >> vacuum.
> > >> Long enough that this error should be acceptable. Do you agree?
> > >
> > > OTOH, adding the 100% accuracy looks as simple as v9-0002 attached
> > > (0001 is
> > > same as for v8), so I think we should provide it.
> >
> This Idea looks good to me.

Thanks for looking at it!

> 1.
> +   Total amount of time spent in milliseconds waiting due to
> 
> +   or . In case
> of parallel
> +   vacuum the reported time is across all the workers and the leader. The
> +   workers update the column no more frequently than once per second, so 
> it
> +   could show slightly old values.
> +  
> 
> I think this waiting is influenced due to cost delay as well as cost
> limit GUCs because here we are counting total wait time and that very
> much depends upon how frequently we are waiting and that's completely
> driven by cost limit. Isn't it?

Yeah. I think we could change the wording that way:

s/waiting due to/waiting during/

Does that make sense? I don't think we need to mention cost limit here.

> 2.
> + if (IsParallelWorker())
> + {
> + instr_time time_since_last_report;
> +
> + INSTR_TIME_SET_ZERO(time_since_last_report);
> + INSTR_TIME_ACCUM_DIFF(time_since_last_report, delay_end,
> +   last_report_time);
> + nap_time_since_last_report += INSTR_TIME_GET_MILLISEC(delayed_time);
> +
> + if (INSTR_TIME_GET_MILLISEC(time_since_last_report) >
> WORKER_REPORT_DELAY_INTERVAL)
> + {
> + pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED,
> + nap_time_since_last_report);
> + nap_time_since_last_report = 0;
> + last_report_time = delay_end;
> + }
> + }
> 
> Does it make sense to track this "nap_time_since_last_report" in a
> shared variable instead of local in individual workers and whenever
> the shared variable crosses WORKER_REPORT_DELAY_INTERVAL we can report
> this would avoid individual reporting from different workers.  Since
> we are already computing the cost balance in shared variable i.e.
> VacuumSharedCostBalance, or do you think it will complicate the code?
> 

I'm not sure I follow. nap_time_since_last_report is the time a worker had to
wait since its last report. There is no direct relationship with
WORKER_REPORT_DELAY_INTERVAL (which is compared to time_since_last_report and
not nap_time_since_last_report).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

2024-12-09 Thread Yan Chengpeng
If many people are already using this ‘wrong’ behavior. I agree to change the 
doc. I also think using ‘null’ may be a better choice. Thanks for your comments.

From: jian he 
Date: Monday, December 9, 2024 at 16:56
To: Andrew Dunstan 
Cc: Yan Chengpeng , pgsql-hack...@postgresql.org 

Subject: Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
On Sun, Dec 8, 2024 at 10:58 PM Andrew Dunstan  wrote:
>
> So the actual sort order as implemented is, AIUI,
>
> Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array
>
> which is ugly, but fortunately not many apps rely on jsonb sort order.
>
> Nobody else has commented, so I propose to apply this patch documenting the 
> anomaly.
>

while at it. we can fix the appearance of jsonb null.

since
select jsonb 'Null';
select jsonb 'NULL';
will fail.

so maybe change
Null in  section and
NULL
to
null


v1-0001-fix-jsonb-compare.patch
Description: v1-0001-fix-jsonb-compare.patch


Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

2024-12-09 Thread Yan Chengpeng
Sorry, I uploaded the wrong file. I uploaded a new patch with the modified 
document. Please take a review. Thanks!


From: Yan Chengpeng 
Date: Monday, December 9, 2024 at 21:22
To: jian he , Andrew Dunstan 
Cc: pgsql-hack...@postgresql.org 
Subject: Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
If many people are already using this ‘wrong’ behavior. I agree to change the 
doc. I also think using ‘null’ may be a better choice. Thanks for your comments.

From: jian he 
Date: Monday, December 9, 2024 at 16:56
To: Andrew Dunstan 
Cc: Yan Chengpeng , pgsql-hack...@postgresql.org 

Subject: Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
On Sun, Dec 8, 2024 at 10:58 PM Andrew Dunstan  wrote:
>
> So the actual sort order as implemented is, AIUI,
>
> Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array
>
> which is ugly, but fortunately not many apps rely on jsonb sort order.
>
> Nobody else has commented, so I propose to apply this patch documenting the 
> anomaly.
>

while at it. we can fix the appearance of jsonb null.

since
select jsonb 'Null';
select jsonb 'NULL';
will fail.

so maybe change
Null in  section and
NULL
to
null


json-doc-update.patch
Description: json-doc-update.patch


Re: Giving the shared catalogues a defined encoding

2024-12-09 Thread Thomas Munro
On Sat, Dec 7, 2024 at 7:51 AM Tom Lane  wrote:
> Over in the discussion of bug #18735, I've come to the realization
> that these problems apply equally to the filesystem path names that
> the server deals with: not only the data directory path, but the
> path to the installation files [1].  Can we apply the same sort of
> restrictions to those?  I'm envisioning that initdb would check
> either encoding-validity or all-ASCII-ness of those path names
> depending on which mode it's setting the server up in.

Here are some things I have learned about pathname encoding:

* Some systems enforce an encoding: macOS always requires UTF-8, ext4
does too if you turn on case insensitivity, zfs has a utf8only option,
and a few less interesting-to-us ones have relevant mount options.  On
the first three at least: open("cafe\xe9", ...) -> EILSEQ, independent
of user space notions like locales.

* Implications of such a system with non-ASCII data directory:
 * there is only one valid configuration at initdb time:
--encoding=UTF8 --cluster-encoding=DATABASE, which is probably the
default anyway, so doesn't need to be spelled out
 * --cluster-encoding=ASCII would fail with the attached patch
 * --encoding=EUCXXX/MULE --encoding=DATABASE might fail if those
encodings have picky verification
 * --encoding=LATIN1 --cluster-encoding=DATABASE would be wrong but
bogusly pass verification (LATIN1 and other single-byte encodings have
no invalid sequences), and SHOW data_directory would expose the
familiar UTF8-through-LATIN1 glasses distortion "café"→"café".
All of that is perfectly reasonable I think, I just want to highlight
the cascading effect of the new constraint: Apple's file system
restricts your *database* encoding, with this design, unless you stick
to plain ASCII pathnames.  It is an interesting case to compare with
when untangling the Windows mess, see below...

* Traditional Unix filesystems eg ext4/xfs/ufs/... just don't care:
beyond '/' being special, the encoding is in the eye of the beholder.
(According to POSIX 2024 you shouldn't have  in a path
component either, see Austin Group issue #251 and others about
attempts to screw things down a bit and document EILSEQ in various
interfaces).  That's cool, just make sure --encoding matches what
you're actually using, accept default --cluster-encoding=DATABASE, and
everything should be OK.

* Windows has a completely different model.  Pathnames are really
UTF-16 in the kernel and on disk.  All char strings exchanged with the
system have a defined encoding, but it was non-obvious to this humble
Unix hacker what it is in each case.  I don't have Windows, so I spent
the afternoon firing test code at CI[1][2] to figure some of it out.
Some cases relevant to initdb: environ[] and argv[] are in ACP
encoding, even if the parent process used a different encoding by
calling setlocale() before putenv() or system() etc, or used the
UTF-16 variants _wputenv() or _wsystem().  You can also access them as
UTF-16 if you want.  So that's how the data directory pathname arrives
into initdb/postgres.  Then to use it, the filesystem functions seem
to be in two classes: the POSIXoid ones in the C runtime with
lowercase names like mkdir() are affected by setlocale() and use its
encoding, while the NT native ones with CamelCase like CreateFile()
don't care about locales and keep using the ACP.  That sounded like a
problem because we mix them: our open() is really a wrapper on
CreateFile() and yet elsewhere we also use unlink() etc, but
fortunately we keep the server locked in "C" locale and then the
lowercase ones appear to use the ACP in that case anyway, so the
difference is not revealed (it might upset frontend programs though?).

* Consequence:  It is senseless to check if getenv("PGDATA") or
argv[]-supplied paths can be validated with the cluster encoding, on
Windows.  The thing to do instead would be to check if they can be
converted from ACP to the cluster encoding, and then store the
converted version for display as SHOW data_directory, but keep the ACP
versions we need for filesystem APIs, and probably likewise for lots
of other things, and then plug all the bugs and edge cases that fall
out of that, for the rest of time... or adopt wchar_t interfaces
everywhere perhaps via wrappers that use database encoding... or other
variations which all sound completely out of scope...

* What I'm wondering is whether we can instead achieve coherence along
the lines of the Apple UTF-8 case I described above, but with an extra
step: if you want to use non-ASCII paths *you have to make your ACP
match the database and cluster encoding*.  So either you go all-in on
your favourite 80s encoding like WIN1252 that matches your ACP
(impossible for 932 AKA SJIS), or you switch your system's ACP to
UTF-8.  Alternatively, I believe postgres.exe could even be built in a
way that makes its ACP always UTF-8[3] (I guess the loader has to help
with that as it affects the way it sets up environ[] and argv[] befor

Re: Add a warning message when using unencrypted passwords

2024-12-09 Thread Greg Sabino Mullane
Overall +1 to the idea of a warning.

Regarding warning vs error, I agree that a WARNING is probably the right
> choice generally. But I think that Divya is correct: When
> log_statement = 'all', an error should be thrown instead.


First, it should be for 'all' AND 'ddl'. And obviously glossing over
log_min_duration_statement entirely. But -1 to throwing an ERROR - that's
not really an error, and not our call to make, so a WARNING is sufficient.

Cheers,
Greg


Re: Proposal to add a new URL data type.

2024-12-09 Thread Alexander Borisov

06.12.2024 21:04, Matthias van de Meent:

On Thu, 5 Dec 2024 at 15:02, Alexander Borisov  wrote:

[..]


I'd be extremely annoyed if URLs I wrote into the database didn't
return in identical manner when fetched from the database. See also
how numeric has different representations of the same value: 2.0 and
2.00 are equivalent for sorting purposes, they aren't the same and
cannot just truncate those zeroes. Note that a path of "/%2e/" could
well be interpreted differently from "/./" or "/" by a server.


That's why data types are invented.  Most likely, you will not be able
to write bad UTF-8 bit sequence into a field with the text type.
Because the incoming data will not pass validation.  The user chooses
the data type for his needs, knowing how it works.
I mean that the data in the database should be stored validated and
choosing the URL type to store URLs should not be surprised that
the incoming URL will be parsed and will pass validation.

Also, no one is stopping you from storing the URL in text format and
using the new type on the fly.



I don't think choosing to defer to a living standard is a good idea
for contrib extensions, which are expected to be supported and stable
with the major PostgreSQL release they're bundled with. If (when) that
living standard gets updated, as tends to happen to such standards,
we'd suddenly lose compatibility with the standard we said we
supported, which isn't a nice outlook. Compare that to RFCs, which
AFAIK don't change in specification once released.


WHATWG:
"The standard can generally not be changed in backwards-incompatible
ways without extreme care, and with implementer commitments leading
the way."

You can read more about what it means Living Standard
https://whatwg.org/faq#living-standard.


--
Alexander Borisov





Re: Add a warning message when using unencrypted passwords

2024-12-09 Thread Daniel Gustafsson
> On 9 Dec 2024, at 14:26, Greg Sabino Mullane  wrote:

> -1 to throwing an ERROR - that's not really an error, and not our call to 
> make, so a WARNING is sufficient.

Agreed, regardless of how bad it's considered, it's not an error.  There are
many ways sensitive data can end up in the logs and offering the impression
there is a safety switch offers a false sense of security.

--
Daniel Gustafsson





Re: Wrong results with right-semi-joins

2024-12-09 Thread Melanie Plageman
On Tue, Dec 3, 2024 at 3:56 AM Richard Guo  wrote:
>
> I ran into $subject and it can be reproduced with the query below.
>
> create temp table tbl_rs(a int, b int);
> insert into tbl_rs select i, i from generate_series(1,10)i;
> analyze tbl_rs;
>
> set enable_nestloop to off;
> set enable_hashagg to off;
>
> select * from tbl_rs t1
> where (select a from tbl_rs t2
>where exists (select 1 from
>  (select (b in (select b from tbl_rs t3)) as c
>   from tbl_rs t4 where t4.a = 1) s
>  where c in
>   (select t1.a = 1 from tbl_rs t5 union all select true))
>order by a limit 1) >= 0;
>  a | b
> ---+---
>  1 | 1
> (1 row)
>
> The expected output should be 10 rows, not 1.

Thanks for finding and fixing this. Just for my own benefit, could you
explain more about the minimal repro? Specifically, if you just need a
subplan in the hash side of a right semi-join, why do you also need
the outer part of the query that produces the initplan?

 Seq Scan on tbl_rs t1
   Filter: ((SubPlan 3) >= 0)
   SubPlan 3
 ->  Limit
   InitPlan 2
 ->  Hash Right Semi Join

- Melanie




Re: Replace current implementations in crypt() and gen_salt() to OpenSSL

2024-12-09 Thread Joe Conway

On 12/9/24 07:23, Daniel Gustafsson wrote:

On 4 Dec 2024, at 16:57, Joe Conway  wrote:



I can send you the source RPM for openssl 1.1.1c which was an earlier FIPS 
validated version, but the main FIPS patch contains:


AFAICT the forks of 1.1.1 which offer FIPS certification all patch the common
OpenSSL API FIPS_mode() rather than invent a new one, so the earlier approach
should work fine. PFA an updated version which I propose we go ahead with.


That sounds correct from my memory of it.

I have not done any actual testing (yet), but on quick scan this part 
looks suspicious:

8<---
+_PG_init(void)
+{
+   DefineCustomEnumVariable("pgcrypto.legacy_crypto_enabled",
+"Sets if builtin crypto 
functions are enabled.",
+			 "Yes enables builtin crypto, No unconditionally disables and 
OpenSSL "

+"will disable if OpenSSL is 
in FIPS mode",
+&legacy_crypto_enabled,
8<---

Rather than:
 "Yes enables builtin crypto, No unconditionally disables and OpenSSL "
  ^^^
 "will disable if OpenSSL is in FIPS mode"

I think that should say:
 "Yes enables builtin crypto, No unconditionally disables and fips "
  
 "will disable if OpenSSL is in FIPS mode"

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

2024-12-09 Thread vignesh C
On Mon, 9 Dec 2024 at 16:25, Shubham Khanna  wrote:
>
> Hi all,
>
> I am writing to propose the addition of the two_phase option in
> pg_createsubscriber. As discussed in [1], supporting this feature
> during the development of pg_createsubscriber was planned for this
> version.

Yes this will be useful.

> Currently, pg_createsubscriber creates subscriptions with the
> two_phase option set to false. Enabling the two_phase option after a
> subscription has been created is not straightforward, as it requires
> the subscription to be disabled first. This patch aims to address this
> limitation by incorporating the two_phase option into
> pg_createsubscriber which will help create subscription with two_phase
> option and make use of the advantages of creating subscription with
> two_phase option.
> The attached patch has the changes for the same.

Few comments:
1) You can keep it with no argument similar to how dry-run is handled:
@@ -1872,6 +1881,7 @@ main(int argc, char **argv)
{"publisher-server", required_argument, NULL, 'P'},
{"socketdir", required_argument, NULL, 's'},
{"recovery-timeout", required_argument, NULL, 't'},
+   {"two_phase", optional_argument, NULL, 'T'},
{"subscriber-username", required_argument, NULL, 'U'},
{"verbose", no_argument, NULL, 'v'},
{"version", no_argument, NULL, 'V'},

2) After making it to no argument option, if this option is set, just
set the value, strcmp's are not required:
+   case 'T':
+   if (optarg != NULL)
+   {
+   if (strcmp(optarg, "on") == 0)
+   opt.two_phase = true;
+   else if (strcmp(optarg, "off") == 0)
+   opt.two_phase = false;
+   else
+   pg_fatal("invalid
value for --two-phase: must be 'on' or 'off'");
+   }
+   else
+   opt.two_phase = true;   /*
Default to true if no argument
+
  * is provided */
+   break;

3) You can add a link to create subscription documentation page of
two_phase option here:
+   Enables or disables two-phase commit for the subscription.
+   When the option is provided without a value, it defaults to
+   on. Specify on to enable or
+   off to disable.
+   Two-phase commit ensures atomicity in logical replication for prepared
+   transactions. By default, this option is enabled unless explicitly set
+   to off.

4) Instead of adding new tests, can we include the prepare transaction
and prepare transaction verification to the existing tests itself?
+# Set up node A as primary
+my $node_a = PostgreSQL::Test::Cluster->new('node_a');
+my $aconnstr = $node_a->connstr;
+$node_a->init(allows_streaming => 'logical');
+$node_a->append_conf(
+   'postgresql.conf', qq[
+   autovacuum = off
+   wal_level = logical
+   max_wal_senders = 10
+   max_worker_processes = 8
+   max_prepared_transactions = 10
+]);
+
+$node_a->start;

Regards,
Vignesh




Re: Fix tiny memory leaks

2024-12-09 Thread Masahiko Sawada
On Fri, Dec 6, 2024 at 12:49 AM Daniel Gustafsson  wrote:
>
> > On 6 Dec 2024, at 09:38, Tofig Aliev  wrote:
>
> > There is a memory leak in functions check_application_name() and 
> > check_cluster_name().
> > Functions are located in src/backend/commands/variable.c
>
> LGTM.

LGTM. It seems commit a9d58bfe8a3a missed to fix this memory leak.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: WARNING: missing lock on database "postgres" (OID 5) @ TID (0,4)

2024-12-09 Thread Noah Misch
On Mon, Dec 09, 2024 at 04:50:16PM +0500, Kirill Reshke wrote:
> On Mon, 9 Dec 2024 at 15:27, Alexander Kukushkin  wrote:
> > postgres=# reassign owned by test to postgres;
> > WARNING:  missing lock on database "postgres" (OID 5) @ TID (0,4)
> > REASSIGN OWNED

Thanks for the report.

> --- a/src/backend/commands/alter.c
> +++ b/src/backend/commands/alter.c
> @@ -991,6 +991,8 @@ AlterObjectOwner_internal(Oid classId, Oid objectId, Oid 
> new_ownerId)
>   }
>   }
>  
> + LockTuple(rel, &oldtup->t_self, InplaceUpdateTupleLock);
> +
>   /* Build a modified tuple */

This silences the warning, but it doesn't generally satisfy the locking
protocol at README.tuplock section "Locking to write inplace-updated tables".
The specific problem is that this locks the tuple after copying it from shared
buffers.  If getting oldtup from a syscache, use SearchSysCacheLocked1().
Otherwise, lock before systable_endscan() and before any copying, like
AlterDatabaseOwner() does.

A fix for $SUBJECT also warrants a test in database.sql.




Re: [PATCH] Add sortsupport for range types and btree_gist

2024-12-09 Thread Andrey M. Borodin



> On 9 Dec 2024, at 22:10, Bernd Helmle  wrote:
> 
> So here's a version with the original, unchanged regression tests and
> injection point removed (i hope i forgot nothing to revert).

Besides unnecessary indentation changes in contrib/btree_gist/Makefile, the 
patch seems good to me.


Best regards, Andrey Borodin.



Re: Assert failure on running a completed portal again

2024-12-09 Thread Tom Lane
I wrote:
> Yugo NAGATA  writes:
>> I guess planstate is removed due to the redundancy that this is included
>> in queryDesc. If so, I think we can also remove use_parallel_mode for the
>> same reason.  

> Certainly there's a bit of aesthetic judgment involved here.

After thinking awhile longer, I'm coming around to the idea that
you're right and it'd be better to get rid of ExecutePlan's
use_parallel_mode argument.  We clearly have to pass down count and
direction, and it's sensible to pass down operation, sendTuples,
and dest because ExecutorRun is involved with those to perform
dest-receiver startup and shutdown.  But ExecutorRun has no real
concern with the parallelism decision, so it seems better to unify
that logic in one place.

I'll make that change before pushing.

regards, tom lane




Re: Fix tiny memory leaks

2024-12-09 Thread Daniel Gustafsson
> On 9 Dec 2024, at 19:52, Masahiko Sawada  wrote:
> 
> On Fri, Dec 6, 2024 at 12:49 AM Daniel Gustafsson  wrote:
>> 
>>> On 6 Dec 2024, at 09:38, Tofig Aliev  wrote:
>> 
>>> There is a memory leak in functions check_application_name() and 
>>> check_cluster_name().
>>> Functions are located in src/backend/commands/variable.c
>> 
>> LGTM.
> 
> LGTM. It seems commit a9d58bfe8a3a missed to fix this memory leak.

Yeah.  While fairly insignificant in nature, and there has been no complaints
until now, I'm preparing a backpatch down to REL_15_STABLE to keep the code in
sync for other backpatches to apply clean.

--
Daniel Gustafsson





unused-but-set-variable warning on REL_13_STABLE

2024-12-09 Thread Nathan Bossart
My compiler emits the following warning on REL_13_STABLE when assertions
are disabled:

reorderbuffer.c:2471:8: warning: variable 'spilled' set but not used 
[-Wunused-but-set-variable]
 2471 | Sizespilled = 0;
  | ^

Adding PG_USED_FOR_ASSERTS_ONLY (as in the attached patch) seems sufficient
to suppress this warning.  On newer versions, this variable is used for
more than assertions, so the patch only needs to be applied to v13.  Since
this is trivial, I plan to commit it shortly.

-- 
nathan
diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index c7f8fa6216..56c25e3a6d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2468,7 +2468,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
dlist_mutable_iter change_i;
int fd = -1;
XLogSegNo   curOpenSegNo = 0;
-   Sizespilled = 0;
+   Sizespilled PG_USED_FOR_ASSERTS_ONLY = 0;
 
elog(DEBUG2, "spill %u changes in XID %u to disk",
 (uint32) txn->nentries_mem, txn->xid);


Re: unused-but-set-variable warning on REL_13_STABLE

2024-12-09 Thread Tom Lane
Nathan Bossart  writes:
> Adding PG_USED_FOR_ASSERTS_ONLY (as in the attached patch) seems sufficient
> to suppress this warning.  On newer versions, this variable is used for
> more than assertions, so the patch only needs to be applied to v13.  Since
> this is trivial, I plan to commit it shortly.

LGTM

regards, tom lane




Re: Fix tiny memory leaks

2024-12-09 Thread Daniel Gustafsson
> On 9 Dec 2024, at 20:03, Daniel Gustafsson  wrote:
> 
>> On 9 Dec 2024, at 19:52, Masahiko Sawada  wrote:
>> 
>> On Fri, Dec 6, 2024 at 12:49 AM Daniel Gustafsson  wrote:
>>> 
 On 6 Dec 2024, at 09:38, Tofig Aliev  wrote:
>>> 
 There is a memory leak in functions check_application_name() and 
 check_cluster_name().
 Functions are located in src/backend/commands/variable.c
>>> 
>>> LGTM.
>> 
>> LGTM. It seems commit a9d58bfe8a3a missed to fix this memory leak.
> 
> Yeah.  While fairly insignificant in nature, and there has been no complaints
> until now, I'm preparing a backpatch down to REL_15_STABLE to keep the code in
> sync for other backpatches to apply clean.

Done, but to 16 and not 15 which was a typo in the above email.

--
Daniel Gustafsson





Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

2024-12-09 Thread jian he
On Mon, Dec 9, 2024 at 9:27 PM Yan Chengpeng  wrote:
>
> Sorry, I uploaded the wrong file. I uploaded a new patch with the modified 
> document. Please take a review. Thanks!
>
>
sorry. maybe i didn't mention it explicitly.
i mean something like:

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 54648c459c..d9b24e413e 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -584,12 +584,13 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE
jdoc @@ '$.tags[*] == "qui"';
 The btree ordering for jsonb
datums is seldom
 of great interest, but for completeness it is:
 
-Object > Array
> Boolean >
Number > String
> Null
+Object > Array
> Boolean >
Number > String
> null

 Object with n pairs > object
with n - 1 pairs

 Array with n elements > array
with n - 1 elements
 
+  with the exception that (for historical reasons) an empty array
sorts less than null.
   Objects with equal numbers of pairs are compared in the order:
 
 key-1, value-1,
key-2 ...
diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 54648c459c..d9b24e413e 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -584,12 +584,13 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"';
 The btree ordering for jsonb datums is seldom
 of great interest, but for completeness it is:
 
-Object > Array > Boolean > Number > String > Null
+Object > Array > Boolean > Number > String > null
 
 Object with n pairs > object with n - 1 pairs
 
 Array with n elements > array with n - 1 elements
 
+  with the exception that (for historical reasons) an empty array sorts less than null.
   Objects with equal numbers of pairs are compared in the order:
 
 key-1, value-1, key-2 ...


Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andres Freund
Hi,

On 2024-12-09 18:34:22 +1100, Michael Harris wrote:
> Our application has recently migrated to PG16, and we have experienced
> some failed upgrades. The upgrades are performed using pg_upgrade and
> have failed during the phase where the schema is restored into the new
> cluster, with the following error:
>
> pg_restore: error: could not execute query: ERROR:  could not extend
> file "pg_tblspc/16401/PG_16_202307071/17643/1249.1" with
> FileFallocate(): No space left on device
> HINT:  Check free disk space.

Were those pg_upgrades done with pg_upgrade --clone? Or have been, on the same
filesystem, in the past?

The reflink stuff in xfs (which is used to implement copy-on-write for files)
is somewhat newer and you're using somewhat old kernels:


> We have seen this on both Rocky Linux 8 (kernel 4.18.0) and Rocky
> Linux 9 (Kernel 5.14.0).

I found some references for bugs that were fixed in 5.13. But I think at least
some of this would persist if the filesystem ran into the issue with a kernel
before those fixes. Did you upgrade "in-place" from Rocky Linux 8?


> I am wondering if this bug might be related:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791323

Doubt it, we never do this as far as I am aware.

Greetings,

Andres Freund




Re: Document NULL

2024-12-09 Thread David G. Johnston
On Thu, Nov 21, 2024 at 8:03 AM Marcos Pegoraro  wrote:

> Em qui., 21 de nov. de 2024 às 11:42, David G. Johnston <
> david.g.johns...@gmail.com> escreveu:
>
>>
>> I'm not following your train of thought here.  Since null == null in
>> json-land there isn't a need for or concept of "is distinct from".  We tend
>> to not expend space on pointing out things that don't exist.
>>
>
> But you said previously in this document about IS DISTINCT, so it's
> related to NULL. I thought it would be better to mention that here, for
> JSON PATH, that way doesn't exist.
>
> "JSON null value is considered equal to other JSON null values, so here we
> don't have the IS DISTINCT operator"
>
>
I added this to "Overview"

 
  Throughout this section the discussion of null values will be limited to
  the SQL language unless otherwise noted.  The JSON-related data types,
and the
  non-SQL procedural languages, have their own behaviors documented in their
  respective areas.
 

And added this to "Distinctness..."

  
   On the other hand, the SQL specification is largely alone in taking this
approach to comparing
   values to the null value.  Specifically, when working within the JSON
data types the use of equals
   produces true or false and so the concept of distinctness is neither
present nor required.  Please
   consult the documentation for the non-SQL procedural language of choice
to learn about its behavior.
  

I'm OK with adding more cross-references (links or just brief comparative
verbiage like the above) to non-SQL null value treatment but this document,
for this patch, is going to focus solely on SQL NULL.

David J.


Re: [PATCH] SVE popcount support

2024-12-09 Thread Malladi, Rama


On 12/9/24 12:21 AM, devanga.susmi...@fujitsu.com wrote:

Hello,

We are sharing our patch for pg_popcount with SVE support as a 
contribution from our side in this thread. We hope this contribution 
will help in exploring and refining the popcount implementation further.
Our patch uses the existing infrastructure, i.e. the 
"choose_popcount_functions" method, to determine the correct popcount 
implementation based on the architecture, thereby requiring fewer code 
changes. The patch also includes implementations for popcount and 
popcount masked.
We can reference both solutions and work together toward achieving the 
most efficient and effective implementation for PostgreSQL.


Thanks for the patch and it looks good. I will review the full patch in 
the next couple of days. One observation was that the patch has `xsave` 
flags added. This isn't needed.



`pgport_cflags = {'crc': cflags_crc, 'popcnt': cflags_popcnt + 
cflags_popcnt_arm, 'xsave': cflags_xsave}`




*Algorithm Overview:*
1. For larger inputs, align the buffers to avoid double loads. For 
smaller inputs alignment is not necessary and might even degrade the 
performance.
2. Process the aligned buffer chunk by chunk till the last incomplete 
chunk.

3. Process the last incomplete chunk.
*Our setup:*
Machine: AWS EC2 c7g.8xlarge - 32vcpu, 64gb RAM
OS : Ubuntu 22.04.5 LTS
GCC: 11.4
*Benchmark and Result:*
We have used PostgreSQL community recommended 
popcount-test-module[0] for benchmarking and observed a speed-up of 
more than 4x for larger buffers. Even for smaller inputs of size 8 and 
16 bytes there aren't any performance degradations observed.

Looking forward to your thoughts!

I tested the patch and here attached is the performance I see on a 
`c7g.xlarge`. The perf data doesn't quite match to what you observe 
(especially for 256B). In the chart, I have comparison of baseline, AWS 
SVE (what I had implemented) and Fujitsu SVE popcount implementations. 
Can you confirm the command-line you had used for the benchmark run?



I had used the below command-line:

`sudo su postgres -c "/usr/local/pgsql/bin/psql -c 'EXPLAIN ANALYZE 
SELECT drive_popcount(10, 16);'"`






*From:* Nathan Bossart 
*Sent:* Wednesday, December 4, 2024 21:37
*To:* Malladi, Rama 
*Cc:* Kirill Reshke ; pgsql-hackers 


*Subject:* Re: [PATCH] SVE popcount support
On Wed, Dec 04, 2024 at 08:51:39AM -0600, Malladi, Rama wrote:
> Thank you, Kirill, for the review and the feedback. Please find 
inline my

> reply and an updated patch.

Thanks for the updated patch.  I have a couple of high-level comments.
Would you mind adding this to the commitfest system
(https://commitfest.postgresql.org/ 
) so that it is picked up by our

automated patch testing tools?

> +# Check for ARMv8 SVE popcount intrinsics
> +#
> +CFLAGS_POPCNT=""
> +PG_POPCNT_OBJS=""
> +PGAC_SVE_POPCNT_INTRINSICS([])
> +if test x"$pgac_sve_popcnt_intrinsics" != x"yes"; then
> +  PGAC_SVE_POPCNT_INTRINSICS([-march=armv8-a+sve])
> +fi
> +if test x"$pgac_sve_popcnt_intrinsics" = x"yes"; then
> +  PG_POPCNT_OBJS="pg_popcount_sve.o"
> +  AC_DEFINE(USE_SVE_POPCNT_WITH_RUNTIME_CHECK, 1, [Define to 1 to 
use SVE popcount instructions with a runtime check.])

> +fi
> +AC_SUBST(CFLAGS_POPCNT)
> +AC_SUBST(PG_POPCNT_OBJS)

We recently switched some intrinsics support in PostgreSQL to use
__attribute__((target("..."))) instead of applying special compiler flags
to specific files (e.g., commits f78667b and 4b03a27).  The hope is that
this approach will be a little more sustainable as we add more
architecture-specific code.  IMHO we should do something similar here.
While this means that older versions of clang might not pick up this
optimization (see the commit message for 4b03a27 for details), I think
that's okay because 1) this patch is intended for the next major 
version of

Postgres, which will take some time for significant adoption, and 2) this
is brand new code, so we aren't introducing any regressions for current
users.

> +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
> +extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (const char 
*buf, int bytes);


Could we combine this with the existing copy above this line? I'm thinking
of something like

    #if defined(TRY_POPCNT_FAST) || 
defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK)

    extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (...)
    #endif

    #ifdef TRY_POPCNT_FAST
    ...

> +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
> +extern uint64 pg_popcount_sve(const char *buf, int bytes);
> +extern int check_sve_support(void);
> +#endif

Are we able to use SVE instructions for pg_popcount32(), pg_popcount64(),
and pg_popcount_masked(), too?

> +static inline uint64
> +pg_popcount_choose(const char *buf, int bytes)
> +{
> + if (check_sve_support())
> + pg_popcount_optimized = pg_popcount_sve;
> + else
> + pg_popcount_opt

Re: Assert failure on running a completed portal again

2024-12-09 Thread Tom Lane
Yugo NAGATA  writes:
> On Sun, 08 Dec 2024 14:25:53 -0500
> Tom Lane  wrote:
> - *
> - * Note: the ctid attribute is a 'junk' attribute that is removed before the
> - * user can see it
>   * 
>   */

> This comment remove seems not related to the fix of ExecutePlan. Should this
> actually have been done by 8a5849b7ff24c?

Yeah, I'm sure it should have been removed a long time ago.
Presumably it's there because ctid was once an explicit part
of ExecutePlan's API; but that's surely not been so for a long
time, and I don't see that this sentence has any remaining
relevance where it is.

You could say that getting rid of that dead comment should be
a separate commit, but I thought that was a bit too pedantic.

>  static void
> -ExecutePlan(EState *estate,
> - PlanState *planstate,
> +ExecutePlan(QueryDesc *queryDesc,
>   bool use_parallel_mode,
>   CmdType operation,
>   bool sendTuples,
>   uint64 numberTuples,
>   ScanDirection direction,
> - DestReceiver *dest,
> - bool execute_once)
> + DestReceiver *dest)
>  {
> + EState *estate = queryDesc->estate;
> + PlanState  *planstate = queryDesc->planstate;
>   TupleTableSlot *slot;
>   uint64  current_tuple_count

> I guess planstate is removed due to the redundancy that this is included
> in queryDesc. If so, I think we can also remove use_parallel_mode for the
> same reason.  

Certainly there's a bit of aesthetic judgment involved here.  In
principle, now that we're passing down the QueryDesc, we could remove
the operation and dest arguments too.  I chose not to do that on
the grounds that ExecutorRun is deciding what ExecutePlan should do,
and at least in principle it could decide to pass something else
for these arguments than what it passes today.  But it's hard to
see how it would pass a different EState or PlanState tree than
what is in the QueryDesc, so I thought it was okay to remove those
arguments.  Somebody else writing this patch might have done that
differently, but I think that's fine.

A different line of thought is that the reason we need this change
is that the already_executed flag is in the wrong place.  If we
moved it into EState, which feels like a more natural place, we
could avoid refactoring ExecutePlan's API.  But that seemed more
invasive in the big picture, and it'd involve more risk in the
back branches, so I didn't go that way.

Thanks for reviewing and testing the patch!

regards, tom lane




Re: Document NULL

2024-12-09 Thread Marcos Pegoraro
Em seg., 9 de dez. de 2024 às 13:31, David G. Johnston <
david.g.johns...@gmail.com> escreveu:

> I'm OK with adding more cross-references (links or just brief comparative
> verbiage like the above) to non-SQL null value treatment but this document,
> for this patch, is going to focus solely on SQL NULL.
>

If you briefly mention JSON NULLs, it might be interesting to also
briefly mention that in other places NULL can be seen differently,
including changing its name. A SQL NULL passed as argument to a PL/Python
function will be named None or PL/Perl will be undefined.

regards,
Marcos


Re: Remove dependence on integer wrapping

2024-12-09 Thread Nathan Bossart
On Fri, Dec 06, 2024 at 02:12:51PM -0600, Nathan Bossart wrote:
> I seem to have a knack for picking patches that take an entire afternoon to
> back-patch...  Here's what I have staged for commit early next week.

Committed.  I chickened out of back-patching this because 1) it adds new
routines to int.h that could theoretically conflict with third-party code
and 2) more importantly, the risk/reward ratio didn't feel particularly
favorable.

-- 
nathan




Re: NOT ENFORCED constraint feature

2024-12-09 Thread jian he
hi.
only applied v7-0001.

alter_table.sgml says we can specify enforceability
for  ALTER TABLE ADD column_constraint
and ALTER TABLE ADD column_constraint table_constraint.
but we didn't have a test for column_constraint in alter_table.sql

so segmental fault happened again:

create table tx(a int);
alter table tx add column b text collate "C" constraint cc check (a >
1) not enforced;
alter table tx add column b text collate "C" constraint cc check (b <>
'h') not enforced;

errmsg("multiple ENFORCED/NOT ENFORCED clauses not allowed"),
never tested.
here are the tests:
CREATE TABLE t5(x int CHECK (x > 3) NOT ENFORCED enforced , b int);
CREATE TABLE t5(x int CHECK (x > 3) ENFORCED not enforced , b int);


create foreign table with column_constraint, segmental fault also

reproduce:
DO $d$
BEGIN
EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname '$$||current_database()||$$',
 port '$$||current_setting('port')||$$'
)$$;
EXECUTE $$CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname '$$||current_database()||$$',
 port '$$||current_setting('port')||$$'
)$$;
EXECUTE $$CREATE SERVER loopback3 FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname '$$||current_database()||$$',
 port '$$||current_setting('port')||$$'
)$$;
END;
$d$;
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
CREATE FOREIGN TABLE ft1 (c0 int constraint cc check (c0 > 1) not
enforced) SERVER loopback;




Re: Document NULL

2024-12-09 Thread David G. Johnston
On Mon, Dec 2, 2024 at 12:18 AM jian he  wrote:

> >>
> >> typo:
> >> There are none. During initializion all settings are assigned a
> non-null value.
> >> 5.2.16. Null Values in Partiton Keys
> >> As noted in the synatx chapter, a null value literal is written using
> >> the NULL keyword. Its type is the pseudo-type unknown but can be cast
> >> to any concrete data type.
> >
> > Sorry, not seeing the typo.  Can you point it out or supply the fix?
> >
>
> typo and other whitespace changes that i think make sense.
> please check attached.
>

I fixed the 3 spelling typos.  I left the full-stop-double-space alone
consistent with elsewhere on the page.  The indentation changes I've set
aside until later, though they are incomplete as provided.  0002 has the
correct indentation for the v4 patch series.  My plan is to put forth a
v5-0001 patch this week and once a review is willing to indicate
ready-to-commit I'll put out a v6 with the indenting applied.

David J.


Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andres Freund
Hi,

On 2024-12-09 15:47:55 +0100, Tomas Vondra wrote:
> On 12/9/24 11:27, Jakub Wartak wrote:
> > On Mon, Dec 9, 2024 at 10:19 AM Michael Harris  > > wrote:
> > 
> > Hi Michael,
> > 
> > We found this thread describing similar issues:
> > 
> > https://www.postgresql.org/message-id/flat/
> > 
> > AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com
> >  
> > 
> > 
> > 
> > We've got some case in the past here in EDB, where an OS vendor has
> > blamed XFS AG fragmentation (too many AGs, and if one AG is not having
> > enough space -> error). Could You perhaps show us output of on that LUN:
> > 1. xfs_info
> > 2. run that script from https://www.suse.com/support/kb/doc/?
> > id=18219  for
> > Your AG range
> > 
> 
> But this can be reproduced on a brand new filesystem - I just tried
> creating a 1GB image, create XFS on it, mount it, and fallocate a 600MB
> file twice. Which that fails, and there can't be any real fragmentation.

If I understand correctly xfs, before even looking at the file's current
layout, checks if there's enough free space for the fallocate() to
succeed.  Here's an explanation for why:
https://www.spinics.net/lists/linux-xfs/msg55429.html

  The real problem with preallocation failing part way through due to
  overcommit of space is that we can't go back an undo the
  allocation(s) made by fallocate because when we get ENOSPC we have
  lost all the state of the previous allocations made. If fallocate is
  filling holes between unwritten extents already in the file, then we
  have no way of knowing where the holes we filled were and hence
  cannot reliably free the space we've allocated before ENOSPC was
  hit.

I.e. reserving space as you go would leave you open to ending up with some,
but not all, of those allocations having been made. Whereas pre-reserving the
worst case space needed, ahead of time, ensures that you have enough space to
go through it all.

You can't just go through the file [range] and compute how much free space you
will need allocate and then do the a second pass through the file, because the
file layout might have changed concurrently...


This issue seems independent of the issue Michael is having though. Postgres,
afaik, won't fallocate huge ranges with already allocated space.

Greetings,

Andres Freund




Re: remove pgrminclude?

2024-12-09 Thread Tom Lane
Peter Eisentraut  writes:
> I propose to remove the pgrminclude scripts and annotations.  AFAICT, 
> per git log, the last time someone tried to do something with it was 
> around 2011.  Also, many (not all) of the "pgrminclude ignore" 
> annotations are of a newer date but seem to have just been copied around 
> during refactorings and file moves and don't seem to reflect an actual 
> need anymore.

I agree with dropping pgrminclude --- as you say, it's not been used
since 2011 and there seems little appetite for ever using it again.
But I think there might be some lessons for us now in why it ended up
in such bad odor.  The relevant git history is at 1609797c2:

Author: Tom Lane 
Branch: master Release: REL9_2_BR [1609797c2] 2011-09-04 01:13:16 -0400

Clean up the #include mess a little.

walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.

In short, pgrminclude turned a small human error into a giant mess
that required half a day's work to clean up, because it had no idea
which of some redundant-looking includes were reasonable to get
rid of and which weren't.

I am worried that IWYU might be prone to similar mess-amplification.
Perhaps it has better heuristics as a result of doing more thorough
semantic analysis, but heuristics are still only heuristics.

One thing that I would look favorably on, given the mistakes we made
in 2011, is automatic detection of circular #include's.  Do you happen
to know whether IWYU complains about that?

regards, tom lane




Re: Proposal: Role Sandboxing for Secure Impersonation

2024-12-09 Thread Eric Hanson
On Thu, Dec 5, 2024 at 4:29 PM Jelte Fennema-Nio  wrote:
> When PgBouncer is in transaction mode, the server connection will only
> be unlinked, when PgBouncer receives a ReadyForQuery with the "idle"
> flag from the server **and** there are no messages from the client in
> flight anymore. It's totally valid for a client to send multiple
> transactions in a single pipeline without waiting for their result.
>
> > So
> > guarded/unresettable transactions are not at all helpful for security
> > in pgbouncer?
>
> Correct.
>
> > Is this generally true for others?
>
> I'm not sure whether all poolers implement this correctly (pgbouncer
> definitely had some recent bugs in this area), but none that I know
> parse the COMMIT message. So they will happily forward commands to the
> server after the COMMIT is sent if they haven't received a
> ReadyForQuery with "idle" back yet.

Got it.

Would you agree that pipelines and connection pooling are somewhat
orthogonal anyway?  At least in the abstract.  One can pool
connections without pipelining and still at least avoid the
max_connections bottleneck.  I would think that, had guarded
sessions/transactions existed when PgBouncer was invented, they might
have added another config mode that pooled and reused the
authenticator role for multiple client roles, but somehow some way
made sure that client requests couldn't spill over after the COMMIT.
Reasonable?

> > > P.S. If we're going to move forward in this direction, then SET
> > >
> > > SESSION AUTHORIZATION should have the same functionality. Poolers
> > > probably would want to lock that instead of ROLE, that way users can
> > > still use SET ROLE to change the role that the current SESSION
> > > AUTHORIZATION is allowed to change to.
> >
> > "should have" for consistency and general usefulness?  At least for
> > poolers, this would require the authenticator role to be a superuser,
> > which is scary to me but maybe people do it.  But as far as bringing
> > sandboxing to PostgreSQL in general, I see the point.
>
> Hmm, I didn't realize that SET SESSION AUTHORIZATION required
> superuser. I had expected you could set it to any roles that you are
> part of. That seems like a fixable problem at least, we could add some
> new role that would allow that, like pg_session_authorization.

I don't even know why SET SESSION AUTH exists.  To me, session auth is
the role that started the session, which is immutable.  The docs say

> Using this command, it is possible, for example, to temporarily become
> an unprivileged user and later switch back to being a superuser.

Sort of?  It's not really "unprivileged" when the client can just
reset it back to privileged.  There must be good reasons, but it seems
like another "we don't have a sandbox" seam.

Regards,
Eric




Re: NOT ENFORCED constraint feature

2024-12-09 Thread Amul Sul
On Mon, Dec 9, 2024 at 9:40 PM jian he  wrote:
>
> hi.
> only applied v7-0001.
>
> alter_table.sgml says we can specify enforceability
> for  ALTER TABLE ADD column_constraint
> and ALTER TABLE ADD column_constraint table_constraint.
> but we didn't have a test for column_constraint in alter_table.sql
>
> so segmental fault happened again:
>

This is an assertion failure introduced by the patch to ensure that a
NOT ENFORCED constraint is marked as invalid. The failure occurs
because skip_validation and initially_valid were not set inside
transformConstraintAttrs(). I will post an updated version of the
patch tomorrow after conducting some additional testing. Thanks for
the test.

Regards,
Amul




Re: Add a warning message when using unencrypted passwords

2024-12-09 Thread Guillaume Lelarge
Hi,

Le lun. 9 déc. 2024 à 14:40, Daniel Gustafsson  a écrit :

> > On 9 Dec 2024, at 14:26, Greg Sabino Mullane  wrote:
>
> > -1 to throwing an ERROR - that's not really an error, and not our call
> to make, so a WARNING is sufficient.
>
> Agreed, regardless of how bad it's considered, it's not an error.  There
> are
> many ways sensitive data can end up in the logs and offering the impression
> there is a safety switch offers a false sense of security.
>
>
I'm fine with adding a test on whether or not we log statements. But that
completely hides the fact that people listening on the network could also
get to the password if the server doesn't use SSL. Isn't it weird to warn
about one potential leak and not the other one?


-- 
Guillaume.


Re: Track the amount of time waiting due to cost_delay

2024-12-09 Thread Dilip Kumar
On Mon, Dec 9, 2024 at 6:55 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Mon, Dec 09, 2024 at 05:18:30PM +0530, Dilip Kumar wrote:
> > On Mon, Dec 9, 2024 at 2:51 PM Masahiro Ikeda  
> > wrote:
> > >
> > > On 2024-12-06 18:31, Bertrand Drouvot wrote:
> > > > Hi,
> > > >
> > > > On Thu, Dec 05, 2024 at 10:43:51AM +, Bertrand Drouvot wrote:
> > > >> Yeah, people would likely use this new field to monitor long running
> > > >> vacuum.
> > > >> Long enough that this error should be acceptable. Do you agree?
> > > >
> > > > OTOH, adding the 100% accuracy looks as simple as v9-0002 attached
> > > > (0001 is
> > > > same as for v8), so I think we should provide it.
> > >
> > This Idea looks good to me.
>
> Thanks for looking at it!
>
> > 1.
> > +   Total amount of time spent in milliseconds waiting due to
> > 
> > +   or . In case
> > of parallel
> > +   vacuum the reported time is across all the workers and the leader. 
> > The
> > +   workers update the column no more frequently than once per second, 
> > so it
> > +   could show slightly old values.
> > +  
> >
> > I think this waiting is influenced due to cost delay as well as cost
> > limit GUCs because here we are counting total wait time and that very
> > much depends upon how frequently we are waiting and that's completely
> > driven by cost limit. Isn't it?
>
> Yeah. I think we could change the wording that way:
>
> s/waiting due to/waiting during/
>
> Does that make sense? I don't think we need to mention cost limit here.

Yeah that should be fine.

> > 2.
> > + if (IsParallelWorker())
> > + {
> > + instr_time time_since_last_report;
> > +
> > + INSTR_TIME_SET_ZERO(time_since_last_report);
> > + INSTR_TIME_ACCUM_DIFF(time_since_last_report, delay_end,
> > +   last_report_time);
> > + nap_time_since_last_report += INSTR_TIME_GET_MILLISEC(delayed_time);
> > +
> > + if (INSTR_TIME_GET_MILLISEC(time_since_last_report) >
> > WORKER_REPORT_DELAY_INTERVAL)
> > + {
> > + pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED,
> > + nap_time_since_last_report);
> > + nap_time_since_last_report = 0;
> > + last_report_time = delay_end;
> > + }
> > + }
> >
> > Does it make sense to track this "nap_time_since_last_report" in a
> > shared variable instead of local in individual workers and whenever
> > the shared variable crosses WORKER_REPORT_DELAY_INTERVAL we can report
> > this would avoid individual reporting from different workers.  Since
> > we are already computing the cost balance in shared variable i.e.
> > VacuumSharedCostBalance, or do you think it will complicate the code?
> >
>
> I'm not sure I follow. nap_time_since_last_report is the time a worker had to
> wait since its last report. There is no direct relationship with
> WORKER_REPORT_DELAY_INTERVAL (which is compared to time_since_last_report and
> not nap_time_since_last_report).

I mean currently we are tracking "time_since_last_report" and
accumulating the delayed_time in "nap_time_since_last_report" for each
worker.  So my question was does it make sense to do this in a shared
variable across workers so that we can reduce the number of reports to the
leader?


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: meson missing test dependencies

2024-12-09 Thread Peter Eisentraut

On 07.12.24 21:45, Andres Freund wrote:

If a test target does not have ay dependency 'meson test' treats it has having
a dependency on the project default test. Which in turn currently includes
everything that will be installed, i.e. everything that tmp_install might
need. But that's not something I think we should rely on going forward.


I don't understand this.  What is the project default test?  I don't 
find any documentation about this default dependency.  (Not everything 
like that tends to be documented, but do you have a source code 
reference or a GH issue?)






Re: meson missing test dependencies

2024-12-09 Thread Andres Freund
Hi,

On 2024-12-09 16:06:18 +0100, Peter Eisentraut wrote:
> On 07.12.24 21:45, Andres Freund wrote:
> > If a test target does not have ay dependency 'meson test' treats it has 
> > having
> > a dependency on the project default test. Which in turn currently includes
> > everything that will be installed, i.e. everything that tmp_install might
> > need. But that's not something I think we should rely on going forward.
> 
> I don't understand this.  What is the project default test?

Oops, default target, not test.


> I don't find any documentation about this default dependency.  (Not
> everything like that tends to be documented, but do you have a source code
> reference or a GH issue?)

https://github.com/mesonbuild/meson/blob/83253cdbaa8afd268286ca06520ca1cf2095dd28/mesonbuild/mtest.py#L2164C1-L2175C74

Greetings,

Andres Freund




Re: Track the amount of time waiting due to cost_delay

2024-12-09 Thread Bertrand Drouvot
Hi,

On Mon, Dec 09, 2024 at 08:34:13PM +0530, Dilip Kumar wrote:
> On Mon, Dec 9, 2024 at 6:55 PM Bertrand Drouvot
> > Yeah. I think we could change the wording that way:
> >
> > s/waiting due to/waiting during/
> >
> > Does that make sense? I don't think we need to mention cost limit here.
> 
> Yeah that should be fine.

Thanks! Done in v10 attached. BTW, 0001 and 0002 have been merged (thanks
Masahiro-san for having confirmed that v9-0002 made sense to you too!).

> 
> I mean currently we are tracking "time_since_last_report" and
> accumulating the delayed_time in "nap_time_since_last_report" for each
> worker.  So my question was does it make sense to do this in a shared
> variable across workers so that we can reduce the number of reports to the
> leader?

I see. We've seen up-thread that the more we interrupt the leader the faster the
vacuum is (because the leader could be interrupted while waiting).

OTOH if we make use of shared variable then we'd need to add some 
"synchronization"
(pg_atomic_xxx) overhead. So we'd reduce the number of reports and add overhead.

So I think that it might be possible to see performance degradation in some 
cases
and so think it's safer to keep the "per worker" implementation.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From da69d66e20fbf00e92f57595b725e84cd1276fc3 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 24 Jun 2024 08:43:26 +
Subject: [PATCH v10] Report the total amount of time that vacuum has been
 delayed due to cost delay

This commit adds one column: time_delayed to the pg_stat_progress_vacuum system
view to show the total amount of time in milliseconds that vacuum has been
delayed.

This uses the new parallel message type for progress reporting added
by f1889729dd.

In case of parallel worker, to avoid the leader to be interrupted too frequently
(while it might be sleeping for cost delay), the report is done only if the last
report has been done more than 1 second ago.

Having a time based only approach to throttle the reporting of the parallel
workers sounds reasonable.

Indeed when deciding about the throttling:

1. The number of parallel workers should not come into play:

 1.1) the more parallel workers is used, the less the impact of the leader on
 the vacuum index phase duration/workload is (because the repartition is done
 on more processes).

 1.2) the less parallel workers is, the less the leader will be interrupted (
 less parallel workers would report their delayed time).

2. The cost limit should not come into play as that value is distributed
proportionally among the parallel workers (so we're back to the previous point).

3. The cost delay does not come into play as the leader could be interrupted at
the beginning, the midle or whatever part of the wait and we are more interested
about the frequency of the interrupts.

3. A 1 second reporting "throttling" looks a reasonable threshold as:

 3.1 the idea is to have a significant impact when the leader could have been
interrupted say hundred/thousand times per second.

 3.2 it does not make that much sense for any tools to sample pg_stat_progress_vacuum
multiple times per second (so a one second reporting granularity seems ok).

Would need to bump catversion because this changes the definition of
pg_stat_progress_vacuum.
---
 doc/src/sgml/monitoring.sgml  | 13 +++
 src/backend/catalog/system_views.sql  |  2 +-
 src/backend/commands/vacuum.c | 49 +++
 src/backend/commands/vacuumparallel.c |  7 
 src/include/commands/progress.h   |  1 +
 src/include/commands/vacuum.h |  1 +
 src/test/regress/expected/rules.out   |  3 +-
 7 files changed, 74 insertions(+), 2 deletions(-)
  22.1% doc/src/sgml/
   3.8% src/backend/catalog/
  66.6% src/backend/commands/
   3.5% src/include/commands/
   3.7% src/test/regress/expected/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..f2aab9974c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6428,6 +6428,19 @@ FROM pg_stat_get_backend_idset() AS backendid;
cleaning up indexes.
   
  
+
+ 
+  
+   time_delayed bigint
+  
+  
+   Total amount of time spent in milliseconds waiting during 
+   or . In case of parallel
+   vacuum the reported time is across all the workers and the leader. The
+   workers update the column no more frequently than once per second, so it
+   could show slightly old values.
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index da9a8fe99f..013bd06222 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1222,7 +1222,7 @@ CREATE VIEW pg_stat_progress_vacuum AS
 S.param4 AS heap_blks_vacuumed, S.param5 AS index_vacuum_count,
 S

Re: IWYU annotations

2024-12-09 Thread Tom Lane
Peter Eisentraut  writes:
> I have done a pass over much of the source code with 
> include-what-you-use (IWYU) to remove superfluous includes (commits 
> dbbca2cf299, 9be4e5d293b, ecb5af77987).  Along the way I have collected 
> some pragma annotations to deal with exceptions and special cases and 
> peculiarities of the PostgreSQL source code header structures (see [0] 
> for description).  Here I'm proposing a set of patches to add such 
> annotations in commonly useful cases that should deal with most of the 
> noise.

This seems to be going in the direction that there will be Yet Another
tool that committers have to know everything about in order to not
commit bad code.  I'm feeling resistant to that, mainly because I'm
far from convinced that IWYU brings us enough value to justify
everybody having to learn about it.  (The fact that the predecessor
tool pgrminclude hasn't been used in a dozen years, and nobody seems
to care, speaks volumes here.)

In particular, this patchset introduces what seem like very
error-prone setups, such as in rmgrdesc.c where there's now one
group of #include's with "pragma: begin_keep/pragma: end_keep"
around it and another group without.  Most of us are likely
to just blindly stick a new #include into alphabetical order
somewhere in there and not notice that there's now an additional
concern.  The fact that that you've added precisely zero
documentation about what these pragmas are doesn't help.

regards, tom lane




Re: proposal: schema variables

2024-12-09 Thread Pavel Stehule
Hi

st 20. 11. 2024 v 21:14 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote:
> > Hi
> >
> > I wrote POC of VARIABLE(varname) syntax support
>
> Thanks, the results look good. I'm still opposed the idea of having a
> warning, and think it has to be an error -- but it's my subjective
> opinion. Lets see if there are more votes on that topic.
>

Maybe the warning of usage of unfenced variables can be changed (enhanced)
to some different mechanism that can be more restrictive (and safer), but I
think it can still be user friendly.

My idea is based on assumption so users with knowledge of stored procedures
know  variables and related risks (and know tools how to minimize risks),
and for other people the risk is higher and we should force usage of
variable fences. I think we can force usage of variable fences at query
runtime, when query is not executed from the SPI environment. This
behaviour can be enabled by default, but can be optionally disabled.

CREATE VARIABLE s.x AS int; -- allowed when user has create right on schema
s
CREATE VIEW v1 AS SELECT x; -- no problem, the check is dynamic
(execution), not static
CREATE VIEW v2 AS SELECT VARIABLE(x); -- no problem

SELECT x; -- fails on access to unfenced variable
SELECT * FROM v1; -- fails on access to unfenced variable
SELECT * FROM v2; -- ok

but inside pl, this check will not be active, and then with default setting
I can write an code like

LET var = 10; -- fencing is not allowed there, and there is not any
collision
DO $$
BEGIN
  RAISE NOTICE 'var=%', var;
  RAISE NOTICE 'var=%', (SELECT * FROM v1); --is ok here too
END;
$$;

Outside PL the fencing can be required, inside PL the fencing can be
optional. With this proposal we can limit the possible risk usage of
unfenced variables only in PL context, and the behaviour can be very
similar to PL/SQL or SQL/PSM. This check is only a runtime check, so it has
no impact on any DDL statement. It doesn't change the syntax or behavior,
so it can be implemented subsequently - it is just a safeguard against
unwanted usage of variables in an environment, where users have no
possibility to use variables already. I can imagine that this check
"allow_unfenced_variables" can have three values (everywhere, pl, nowhere)
and the default can be pl. The results of queries don't depend on the
current setting of this check. For all values for all possible queries and
situations, the result is the same (when it is finished). But sometimes,
the check can break the execution - in similar meaning like access rights.
All previous proposed warnings can be unchanged.

Comments, notes?

Regards

Pavel


IWYU annotations

2024-12-09 Thread Peter Eisentraut
I have done a pass over much of the source code with 
include-what-you-use (IWYU) to remove superfluous includes (commits 
dbbca2cf299, 9be4e5d293b, ecb5af77987).  Along the way I have collected 
some pragma annotations to deal with exceptions and special cases and 
peculiarities of the PostgreSQL source code header structures (see [0] 
for description).  Here I'm proposing a set of patches to add such 
annotations in commonly useful cases that should deal with most of the 
noise.


[0]: 
https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.mdFrom 0ee1c6c9f620ca8e1e3f9c601e5aa335a95c91d3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 9 Dec 2024 12:25:10 +0100
Subject: [PATCH 1/2] IWYU widely useful pragmas

Add various widely useful "IWYU pragma" annotations, such as

- Common header files such as c.h, postgres.h should be "always_keep".

- System headers included in c.h, postgres.h etc. should be considered
  "export".

- Some portability headers such as getopt_long.h should be
  "always_keep", so they are not considered superfluous on some
  platforms.

- Certain system headers included from portability headers should be
  considered "export" because the purpose of the portability header is
  to wrap them.

- Superfluous includes marked as "for backward compatibility" get a
  formal IWYU annotation.

- Special handling of access/rmgrlist.h in
  src/bin/pg_waldump/rmgrdesc.c, similar to what was already done in
  src/backend/access/transam/rmgr.c.

- Generated header included in utils/syscache.h is marked exported.
  This is a very commonly used include and this avoids lots of
  complaints.
---
 src/bin/pg_waldump/rmgrdesc.c| 10 +++---
 src/include/c.h  |  5 +
 src/include/getopt_long.h|  3 ++-
 src/include/pg_getopt.h  |  5 +++--
 src/include/pg_trace.h   |  2 +-
 src/include/pgstat.h |  6 +++---
 src/include/port/pg_iovec.h  |  2 +-
 src/include/port/pg_pthread.h|  2 +-
 src/include/postgres.h   |  5 +
 src/include/postgres_ext.h   |  1 +
 src/include/postgres_fe.h|  5 +
 src/include/utils/syscache.h |  2 +-
 src/interfaces/libpq/libpq-int.h |  2 ++
 13 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 6b8c17bb4c4..f5f53ceda32 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -8,6 +8,12 @@
 #define FRONTEND 1
 #include "postgres.h"
 
+#include "access/rmgr.h"
+#include "access/xlog_internal.h"
+#include "rmgrdesc.h"
+
+/* includes needed for "access/rmgrlist.h" */
+/* IWYU pragma: begin_keep */
 #include "access/brin_xlog.h"
 #include "access/clog.h"
 #include "access/commit_ts.h"
@@ -18,19 +24,17 @@
 #include "access/heapam_xlog.h"
 #include "access/multixact.h"
 #include "access/nbtxlog.h"
-#include "access/rmgr.h"
 #include "access/spgxlog.h"
 #include "access/xact.h"
-#include "access/xlog_internal.h"
 #include "catalog/storage_xlog.h"
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
 #include "replication/message.h"
 #include "replication/origin.h"
-#include "rmgrdesc.h"
 #include "storage/standbydefs.h"
 #include "utils/relmapper.h"
+/* IWYU pragma: end_keep */
 
 #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
{ name, desc, identify},
diff --git a/src/include/c.h b/src/include/c.h
index 13bb39fdef3..884128e781d 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -16,6 +16,7 @@
  *
  *-
  */
+/* IWYU pragma: always_keep */
 /*
  *
  *  TABLE OF CONTENTS
@@ -46,6 +47,8 @@
 #ifndef C_H
 #define C_H
 
+/* IWYU pragma: begin_exports */
+
 /*
  * These headers must be included before any system headers, because on some
  * platforms they affect the behavior of the system headers (for example, by
@@ -1327,4 +1330,6 @@ typedef intptr_t sigjmp_buf[5];
 /* /port compatibility functions */
 #include "port.h"
 
+/* IWYU pragma: end_exports */
+
 #endif /* C_H */
diff --git a/src/include/getopt_long.h b/src/include/getopt_long.h
index 0fdbefae7fb..a15ebbc0037 100644
--- a/src/include/getopt_long.h
+++ b/src/include/getopt_long.h
@@ -6,10 +6,11 @@
  *
  * src/include/getopt_long.h
  */
+/* IWYU pragma: always_keep */
 #ifndef GETOPT_LONG_H
 #define GETOPT_LONG_H
 
-#include "pg_getopt.h"
+#include "pg_getopt.h" /* IWYU pragma: export */
 
 #ifndef HAVE_STRUCT_OPTION
 
diff --git a/src/include/pg_getopt.h b/src/include/pg_getopt.h
index c87ea20b14f..698cc61ce83 100644
--- a/src/include/pg_getopt.h
+++ b/src/include/pg_getopt.h
@@ -15,15 +15,16 @@
  *
  * src/include/pg_getopt.h
  */
+/* IWYU pragma: always_keep */
 #ifndef PG_GETOPT_H
 #define PG_GETOPT_

some Page/PageData const stuff

2024-12-09 Thread Peter Eisentraut

In [0] I wrote:

"""
I was fiddling a bit with making some Page-related APIs const-proof, 
which might involve changing something like "Page p" to "const PageData 
*p", but I was surprised that a type PageData exists but it's an 
unrelated type local to generic_xlog.c.


This patch renames that type to a more specific name 
[GenericXLogPageData].  This makes room for possibly adding another 
PageData type with the earlier meaning, but that's not done here.


"""

[0]: 
https://www.postgresql.org/message-id/flat/001d457e-c118-4219-8132-e1846c2ae3c9%40eisentraut.org


This is now the follow-up that adds back PageData with the earlier 
meaning and updates a few of the Page-related APIs to be const-proof. 
That is all pretty straightforward, except one inline function that had 
to be changed back to a macro, because it is used in a way that 
sometimes it takes const and returns const and sometimes takes non-const 
and returns non-const.  (We might be able to do that kind of thing 
better with C23 in N years. ;-) )


Just a thought, I've been thinking it might be neat if PageData were 
actually defined something like this:


typedef struct PageData
{
union
{
PageHeaderData phdr;
PGAlignedBlock data;
};
} PageData;

Then you could write all those (PageHeader) casts in a more elegant way, 
and you don't get to randomly mix char * and Page, which has very weak 
type safety.  But this currently totally breaks, because many places 
assume you can do char-based pointer arithmetic with Page values.  So 
this would need further analysis.From c2173c42c43488b9da56df114b9ff3032f7e9222 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 9 Dec 2024 16:35:23 +0100
Subject: [PATCH 1/3] Add PageData

---
 src/include/storage/bufpage.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 6222d46e535..9f3ed976e43 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -78,7 +78,8 @@ extern PGDLLIMPORT bool ignore_checksum_failure;
  * fields.
  */
 
-typedef Pointer Page;
+typedef char PageData;
+typedef PageData *Page;
 
 
 /*

base-commit: 001a537b83ec6e2ab8fa8af44458b0502c94dd5e
-- 
2.47.1

From 0864d770d44d15dda5f8a506bead12f8da5a5a26 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 9 Dec 2024 16:35:23 +0100
Subject: [PATCH 2/3] Add const qualifiers to bufpage.h

---
 src/backend/storage/page/bufpage.c | 32 ++---
 src/include/storage/bufpage.h  | 75 +++---
 src/tools/pgindent/typedefs.list   |  1 +
 3 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c 
b/src/backend/storage/page/bufpage.c
index aa264f61b9c..b32c5c0de45 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -85,9 +85,9 @@ PageInit(Page page, Size pageSize, Size specialSize)
  * to pgstat.
  */
 bool
-PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
+PageIsVerifiedExtended(const PageData *page, BlockNumber blkno, int flags)
 {
-   PageHeader  p = (PageHeader) page;
+   const PageHeaderData *p = (PageHeaderData *) page;
size_t *pagebytes;
boolchecksum_failure = false;
boolheader_sane = false;
@@ -351,7 +351,7 @@ PageAddItemExtended(Page page,
  * The returned page is not initialized at all; caller must do 
that.
  */
 Page
-PageGetTempPage(Page page)
+PageGetTempPage(const PageData *page)
 {
SizepageSize;
Pagetemp;
@@ -368,7 +368,7 @@ PageGetTempPage(Page page)
  * The page is initialized by copying the contents of the given 
page.
  */
 Page
-PageGetTempPageCopy(Page page)
+PageGetTempPageCopy(const PageData *page)
 {
SizepageSize;
Pagetemp;
@@ -388,7 +388,7 @@ PageGetTempPageCopy(Page page)
  * given page, and the special space is copied from the given page.
  */
 Page
-PageGetTempPageCopySpecial(Page page)
+PageGetTempPageCopySpecial(const PageData *page)
 {
SizepageSize;
Pagetemp;
@@ -893,16 +893,16 @@ PageTruncateLinePointerArray(Page page)
  * PageGetHeapFreeSpace on heap pages.
  */
 Size
-PageGetFreeSpace(Page page)
+PageGetFreeSpace(const PageData *page)
 {
+   const PageHeaderData *phdr = (const PageHeaderData *) page;
int space;
 
/*
 * Use signed arithmetic here so that we behave sensibly if pd_lower >
 * pd_upper.
 */
-   space = (int) ((PageHeader) page)->pd_upper -
-   (int) ((PageHeader) page)->pd_lower;
+   space = (int) phdr->pd_upper - (int) phdr->pd_lower;
 
if (space < (int) sizeof(ItemIdData))
return 0;
@@ -920,16 +920,16 @@ PageGetFreeSpace(Page page)
  * PageGetHeapFreeSpace on heap pages.
  */
 Size
-PageGetFreeSpaceForMultipl

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Tomas Vondra



On 12/9/24 11:27, Jakub Wartak wrote:
> On Mon, Dec 9, 2024 at 10:19 AM Michael Harris  > wrote:
> 
> Hi Michael,
> 
> We found this thread describing similar issues:
> 
> https://www.postgresql.org/message-id/flat/
> 
> AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com
>  
> 
> 
> 
> We've got some case in the past here in EDB, where an OS vendor has
> blamed XFS AG fragmentation (too many AGs, and if one AG is not having
> enough space -> error). Could You perhaps show us output of on that LUN:
> 1. xfs_info
> 2. run that script from https://www.suse.com/support/kb/doc/?
> id=18219  for
> Your AG range
> 

But this can be reproduced on a brand new filesystem - I just tried
creating a 1GB image, create XFS on it, mount it, and fallocate a 600MB
file twice. Which that fails, and there can't be any real fragmentation.

regards

-- 
Tomas Vondra





Re: Document NULL

2024-12-09 Thread David G. Johnston
On Sat, Nov 23, 2024 at 5:30 AM Marcos Pegoraro  wrote:

> Em qui., 21 de nov. de 2024 às 12:04, David G. Johnston <
> david.g.johns...@gmail.com> escreveu:
>
> Typo in NuLl, mixed upper and lower case.
>
> SELECT
> NULL AS "Literal Null Value",
> pg_typeof(null) AS "Type of Null",
> pg_typeof(NuLl::text) AS "Type of Cast null",
> cast(null as text) AS "Cast null value";
>
> should be
> pg_typeof(null::text) AS "Type of Cast Null",
>

That was not a typo.  I'm intentionally showing an example demonstrating
that the SQL NULL is fully case-insensitive.  But I suppose showing both
all-upper and all-lower cases fulfills that goal sufficiently.  Changed.
Went with NULL so there are two of each.  Made all of the column headers
Title Case.

Thanks!

David J.


Re: [PATCH] Add sortsupport for range types and btree_gist

2024-12-09 Thread Bernd Helmle
Am Samstag, dem 30.11.2024 um 22:18 +0500 schrieb Andrey Borodin:
> We tried to go that route, random DEBUG1 lines were breaking tests
> (autovacuum or something like that, there was something non-
> deterministic). I think we can dig exact reason from 2021 thread why
> we abandoned that idea...
> 
> I think we have two options:
> 1. Just do not commit tests. We ran it manually, saw that paths are
> taken, we are happy.

So here's a version with the original, unchanged regression tests and
injection point removed (i hope i forgot nothing to revert).

This version also fixes cidr sortsupport (thanks again Andrey for
spotting this), where i accidently used inet instead of the correct
cidr datatype to pass down to the sortsupport function.


Thanks,
Bernd

From b38c1181d38526d0ccd948f3446ca31f6953815d Mon Sep 17 00:00:00 2001
From: Bernd Helmle 
Date: Mon, 9 Dec 2024 18:02:31 +0100
Subject: [PATCH] Add support for sorted gist index builds to btree_gist

This enables sortsupport in the btree_gist extension for faster
builds of gist indexes. Additionally sortsupport is also added for range types
to the backend so that gist indexes on range types also benefit from the
speed up.

Sorted gist index build strategy is the new default now. Regression tests are
unchanged and are using the sorted build strategy instead.
---
 contrib/btree_gist/Makefile |  14 +-
 contrib/btree_gist/btree_bit.c  |  46 +
 contrib/btree_gist/btree_bool.c |  22 +++
 contrib/btree_gist/btree_bytea.c|  35 
 contrib/btree_gist/btree_cash.c |  27 +++
 contrib/btree_gist/btree_date.c |  25 +++
 contrib/btree_gist/btree_enum.c |  33 
 contrib/btree_gist/btree_float4.c   |  28 +++
 contrib/btree_gist/btree_float8.c   |  27 +++
 contrib/btree_gist/btree_gist--1.8--1.9.sql | 192 
 contrib/btree_gist/btree_gist.control   |   2 +-
 contrib/btree_gist/btree_inet.c |  27 +++
 contrib/btree_gist/btree_int2.c |  26 +++
 contrib/btree_gist/btree_int4.c |  33 +++-
 contrib/btree_gist/btree_int8.c |  26 +++
 contrib/btree_gist/btree_interval.c |  25 +++
 contrib/btree_gist/btree_macaddr.c  |  29 +++
 contrib/btree_gist/btree_macaddr8.c |  24 +++
 contrib/btree_gist/btree_numeric.c  |  33 
 contrib/btree_gist/btree_oid.c  |  28 +++
 contrib/btree_gist/btree_text.c |  34 
 contrib/btree_gist/btree_time.c |  26 +++
 contrib/btree_gist/btree_ts.c   |  25 +++
 contrib/btree_gist/btree_utils_var.h|  12 +-
 contrib/btree_gist/btree_uuid.c |  25 +++
 contrib/btree_gist/meson.build  |   1 +
 doc/src/sgml/btree-gist.sgml|   7 +
 src/backend/utils/adt/rangetypes_gist.c |  70 +++
 src/include/catalog/pg_amproc.dat   |   3 +
 src/include/catalog/pg_proc.dat |   3 +
 30 files changed, 894 insertions(+), 14 deletions(-)
 create mode 100644 contrib/btree_gist/btree_gist--1.8--1.9.sql

diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index 7ac2df26c10..90d36a659e4 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -31,16 +31,16 @@ OBJS =  \
 
 EXTENSION = btree_gist
 DATA = btree_gist--1.0--1.1.sql \
-   btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
-   btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \
-   btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \
-   btree_gist--1.7--1.8.sql
+		btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
+		btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \
+		btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \
+		btree_gist--1.7--1.8.sql btree_gist--1.8--1.9.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
-time timetz date interval macaddr macaddr8 inet cidr text varchar char \
-bytea bit varbit numeric uuid not_equal enum bool partitions \
-stratnum without_overlaps
+		 time timetz date interval macaddr macaddr8 inet cidr text varchar char \
+		 bytea bit varbit numeric uuid not_equal enum bool partitions \
+		 stratnum without_overlaps
 
 SHLIB_LINK += $(filter -lm, $(LIBS))
 
diff --git a/contrib/btree_gist/btree_bit.c b/contrib/btree_gist/btree_bit.c
index f346b956fa9..35aa5578f83 100644
--- a/contrib/btree_gist/btree_bit.c
+++ b/contrib/btree_gist/btree_bit.c
@@ -6,6 +6,7 @@
 #include "btree_gist.h"
 #include "btree_utils_var.h"
 #include "utils/fmgrprotos.h"
+#include "utils/sortsupport.h"
 #include "utils/varbit.h"
 
 
@@ -18,10 +19,33 @@ PG_FUNCTION_INFO_V1(gbt_bit_picksplit);
 PG_FUNCTION_INFO_V1(gbt_bit_consistent);
 PG_FUNCTION_INFO_V1(gbt_bit_penalty);
 PG_FUNCTION_INFO_V1(gbt_bit_same);
+PG_FUNCTION_INFO_V1(gbt_bit_sortsupport);
+PG_F

Re: Subscription sometimes loses txns after initial table sync

2024-12-09 Thread Pritam Baral

On 09/12/24 18:50, Pritam Baral wrote:
> A reproducer script is attached. 

Apologies. The aforementioned script is broken. It was a poor port from an 
internal application.

A corrected reproducer script is attached.

-- 
#!/usr/bin/env regards
Chhatoi Pritam Baral


sub-loss-repro2.sh
Description: application/shellscript


Re: SCRAM pass-through authentication for postgres_fdw

2024-12-09 Thread Matheus Alcantara

Thanks for the comments!

On 04/12/24 19:11, Jacob Champion wrote:

This is achieved by storing the SCRAM ClientKey and ServerKey obtained
during client authentication with the backend. These keys are then
used to complete the SCRAM exchange between the backend and the fdw
server, eliminating the need to derive them from a stored plain-text
password.

What are the assumptions that have to be made for pass-through SCRAM
to succeed? Is it just "the two servers have identical verifiers for
the user," or are there others?
Yes, from the tests that I've performed I would say that this is the 
only requirement.



It looks like the patch is using the following property [1]:

If an attacker obtains the authentication information from the
authentication repository and either eavesdrops on one authentication
exchange or impersonates a server, the attacker gains the ability to
impersonate that user to all servers providing SCRAM access using the
same hash function, password, iteration count, and salt.  For this
reason, it is important to use randomly generated salt values.

It makes me a little uneasy to give users a reason to copy identical
salts/verifiers around... But for e.g. a loopback connection, it seems
like there'd be no additional risk. Is that the target use case?
I think that both can be use cases. In case of using with different 
servers it can be another option over the plain-text password approach.


I'm attaching a v2 patch with a TAP test that validate the both use 
cases. For connections with different servers an ALTER ROLE  
PASSWORD  is required, so that both servers have 
identical verifiers.




-pg_hmac_ctx *ctx = pg_hmac_create(state->hash_type);
+pg_hmac_ctx *ctx = pg_hmac_create(PG_SHA256);


Why was that change made?


Not needed, sorry. Fixed on v2

--
Matheus Alcantara
EDB: https://www.enterprisedb.comFrom 69f41167ab3cf42f5a262403145ae7b53c77a38c Mon Sep 17 00:00:00 2001
From: Matheus Alcantara 
Date: Tue, 19 Nov 2024 15:37:57 -0300
Subject: [PATCH v2 1/2] postgres_fdw: SCRAM authentication pass-through

This commit enable SCRAM authentication for postgres_fdw when connecting
to a fdw server without having to store a plain-text password on user
mapping options.

This is done by saving the SCRAM ClientKey and ServeryKey from the
client authentication and using those instead of the plain-text password
for the server-side SCRAM exchange.
---
 contrib/postgres_fdw/Makefile|   1 +
 contrib/postgres_fdw/connection.c|  67 ++-
 contrib/postgres_fdw/meson.build |   5 +
 contrib/postgres_fdw/option.c|   3 +
 contrib/postgres_fdw/t/001_auth_scram.pl | 137 +++
 src/backend/libpq/auth-scram.c   |  14 ++-
 src/include/libpq/libpq-be.h |   9 ++
 src/interfaces/libpq/fe-auth-scram.c |  29 -
 src/interfaces/libpq/fe-auth.c   |   2 +-
 src/interfaces/libpq/fe-connect.c|  31 +
 src/interfaces/libpq/libpq-int.h |   6 +
 11 files changed, 291 insertions(+), 13 deletions(-)
 create mode 100644 contrib/postgres_fdw/t/001_auth_scram.pl

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 88fdce40d6..6c12c8e925 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -8,6 +8,7 @@ OBJS = \
option.o \
postgres_fdw.o \
shippable.o
+TAP_TESTS = 1
 PGFILEDESC = "postgres_fdw - foreign data wrapper for PostgreSQL"
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index 2326f391d3..e0e1ebe0d4 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -19,6 +19,7 @@
 #include "access/xact.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/defrem.h"
+#include "common/base64.h"
 #include "funcapi.h"
 #include "libpq/libpq-be.h"
 #include "libpq/libpq-be-fe-helpers.h"
@@ -168,6 +169,7 @@ static void pgfdw_finish_abort_cleanup(List 
*pending_entries,
 static void pgfdw_security_check(const char **keywords, const char **values,
 UserMapping 
*user, PGconn *conn);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static bool UseScramPassthrough(ForeignServer *server, UserMapping *user);
 static bool disconnect_cached_connections(Oid serverid);
 static void postgres_fdw_get_connections_internal(FunctionCallInfo fcinfo,

  enum pgfdwVersion api_version);
@@ -476,7 +478,7 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 * for application_name, fallback_application_name, 
client_encoding,
 * end marker.
 */
-   n = list_length(server->options) + list_length(user->options) + 
4;
+   n = list_length(server->options) + list_length(user->opti

Re: SCRAM pass-through authentication for postgres_fdw

2024-12-09 Thread Matheus Alcantara

On 04/12/24 20:05, Jelte Fennema-Nio wrote:

On Wed, 4 Dec 2024 at 23:11, Jacob Champion
 wrote:

It makes me a little uneasy to give users a reason to copy identical
salts/verifiers around... But for e.g. a loopback connection, it seems
like there'd be no additional risk. Is that the target use case?


I don't think that necessarily has to be the usecase,
clustering/sharding setups could benefit from this too. PgBouncer
supports the same functionality[1]. I only see advantages over the
alternative, which is copying the plaintext password around. In case
of compromise of the server, only the salt+verifier has to be rotated,
not the actual user password.

The patch is very similar with what was implemented on PgBoucer[1]


Regarding the actual patch: This definitely needs a bunch of
documentation explaining how to use this and when not to use this.
I'm attaching a patch with a initial documentation, so that we can get 
initial thoughts (not sure if I should put the documentation on the 
same patch of code changes).


Thanks!

[1] 
https://github.com/pgbouncer/pgbouncer/commit/ba1abfe#diff-128a3f9ffa6a6f3863e843089ede6d07010215acf49c66b2d1f1d9baba2f49e7R1001


--
Matheus Alcantara
EDB: https://www.enterprisedb.comFrom 7ec18a1553ddab0252d8b16262c5341014b7425c Mon Sep 17 00:00:00 2001
From: Matheus Alcantara 
Date: Mon, 9 Dec 2024 14:48:07 -0300
Subject: [PATCH v2 2/2] postgres_fdw: Add documentation for SCRAM auth

---
 doc/src/sgml/postgres-fdw.sgml | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 188e8f0b4d..da04e14a04 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -770,6 +770,25 @@ OPTIONS (ADD password_required 'false');
   
  
 
+ 
+  use_scram_passthrough 
(boolean)
+  
+   
+This option controls whether postgres_fdw will use
+the SCRAM password authentication to connect into the foreign server.
+SCRAM secrets can only be used for logging into the foreign server if
+the client authentication also uses SCRAM.
+  
+  
+SCRAM authentication into the foreign server can only be possible if
+both servers have identical SCRAM secrets (encrypted password) for the
+user being used on postgres_fdw to authenticate on
+the foreign server, same salt and iterations, not merely the same
+password.
+  
+  
+ 
+
 

  
-- 
2.39.3 (Apple Git-146)



Re: Memory leak in WAL sender with pgoutput (v10~)

2024-12-09 Thread Masahiko Sawada
On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila  wrote:
>
> On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada  wrote:
> >
> > >
> > > I realized that this patch cannot be backpatched because it introduces a 
> > > new
> > > field into the public PGOutputData structure. Therefore, I think we may 
> > > need to
> > > use Alvaro's version [1] for the back branches.
> >
> > FWIW for back branches, I prefer using the foreach-pfree pattern
> > Michael first proposed, just in case. It's not elegant but it can
> > solve the problem while there is no risk of breaking non-core
> > extensions.
> >
>
> It couldn't solve the problem completely even in back-branches. The
> SQL API case I mentioned and tested by Hou-San in the email [1] won't
> be solved.

True. There seems another place where we possibly leak memory on
CacheMemoryContext when using pgoutput via SQL APIs:

/* Map must live as long as the session does. */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);

entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);

MemoryContextSwitchTo(oldctx);
RelationClose(ancestor);

entry->attrmap is pfree'd only when validating the RelationSyncEntry
so remains even after logical decoding API calls.


Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Refactoring postmaster's code to cleanup after child exit

2024-12-09 Thread Heikki Linnakangas

On 09/12/2024 14:47, Tomas Vondra wrote:

On 12/9/24 13:30, Heikki Linnakangas wrote:

On 09/12/2024 01:12, Tomas Vondra wrote:

On 11/14/24 15:13, Heikki Linnakangas wrote:

On 09/10/2024 23:40, Heikki Linnakangas wrote:

I pushed the first three patches, with the new test and one of the
small
refactoring patches. Thanks for all the comments so far! Here is a new
version of the remaining patches.


Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
seems to have problems with valgrind :-( I reliably get this failure:


How exactly do you run the test with valgrind? What platform?



It failed for me on both amd64 (Fedora 41) and rpi5 32/64-bit (Debian).


It works for me, with this:

(cd build && ninja && rm -rf tmp_install && meson test --suite setup &&
valgrind --leak-check=no --gen-suppressions=all --suppressions=/home/
heikki/git-sandbox/postgresql/src/tools/valgrind.supp --time-stamp=yes
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END --log-file=$HOME/
pg-valgrind/%p.log --trace-children=yes meson test --suite postmaster )



I have a patch that tweaks pg_ctl/pg_regress to execute valgrind, so I
just do

./configure --enable-debug --prefix=/home/user/builds/master
--enable-depend --enable-cassert --enable-tap-tests CPPFLAGS="-O0 -ggdb3
-DUSE_VALGRIND"

and then the usual "make check" or whatever.

The patch has a hardcoded path to the .supp file, and places the
valgrind log into /tmp. It has worked for me fine up until that commit,
and it still seems to be working in every other test directory.


Ok, I was able to reproduce this with that setup.

Unsurprisingly, it's a timing issue. It can be reproduced without 
valgrind by adding this delay:


diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 289059435a9..1eb6bad72ca 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -583,6 +583,7 @@ errfinish(const char *filename, int lineno, const 
char *funcname)

 * FATAL termination.  The postmaster may or may not consider 
this
 * worthy of panic, depending on which subprocess returns it.
 */
+   sleep(1);
proc_exit(1);
}

The test opens a connection that is expected to fail with the "remaining 
connection slots are reserved for roles with the SUPERUSER attribute" 
error. Right after that, it opens a new connection as superuser, and 
expects it to succeed. But if the previous backend hasn't exited yet, 
the new connection fails with "too many clients already".


Not sure how to fix this. A small sleep in the test would work, but in 
principle there's no delay that's guaranteed to be enough. A more robust 
solution would be to run a "select count(*) from pg_stat_activity" and 
wait until the number of connections are what's expected. I'll try that 
and see how complicated that gets..


--
Heikki Linnakangas
Neon (https://neon.tech)




bt_index_parent_check and concurrently build indexes

2024-12-09 Thread Michail Nikolaev
Hello, everyone!

While working on [0], I encountered an issue involving a missing tuple in
an index that was built concurrently. The problem only occurred once, but
it caused me a significant amount of frustration. :)

After some time, I managed to find a way to reproduce the issue. It turns
out that bt_index_parent_check is not suitable for validating indexes built
concurrently. The reason is that bt_index_parent_check uses SnapshotAny
during the heap scan, whereas an MVCC snapshot is used for the index build.

I’ve attached a patch that reproduces the issue (it incorrectly reports the
index as invalid, even though it is actually valid).

I’m unsure of the best way to address this issue, but here are some
possible options:
* Simply update the documentation.
* Issue a WARNING if !tupleIsAlive.
* Modify bt_index_parent_check to use an MVCC snapshot for the heap scan

Best regards,
Mikhail.

[0]: https://commitfest.postgresql.org/51/4971/
From 60d17152f47e8d85fd3bad961d1965c335493c94 Mon Sep 17 00:00:00 2001
From: nkey 
Date: Sun, 8 Dec 2024 17:32:15 +0100
Subject: [PATCH v1] test to reproduce issue with bt_index_parent_check and
 CREATE INDEX CONCURRENTLY

---
 contrib/amcheck/meson.build   |  1 +
 .../t/006_cic_bt_index_parent_check.pl| 59 +++
 2 files changed, 60 insertions(+)
 create mode 100644 contrib/amcheck/t/006_cic_bt_index_parent_check.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index fc08e32539a..2eb7ff11bd7 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -45,6 +45,7 @@ tests += {
   't/003_cic_2pc.pl',
   't/004_verify_nbtree_unique.pl',
   't/005_pitr.pl',
+  't/006_cic_bt_index_parent_check.pl',
 ],
   },
 }
diff --git a/contrib/amcheck/t/006_cic_bt_index_parent_check.pl b/contrib/amcheck/t/006_cic_bt_index_parent_check.pl
new file mode 100644
index 000..d4bb9c2a99d
--- /dev/null
+++ b/contrib/amcheck/t/006_cic_bt_index_parent_check.pl
@@ -0,0 +1,59 @@
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test CREATE INDEX CONCURRENTLY with concurrent prepared-xact modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+Test::More->builder->todo_start('filesystem bug')
+  if PostgreSQL::Test::Utils::has_wal_read_bug;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_HOT_test');
+$node->init;
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql('postgres', q(CREATE TABLE tbl(i int primary key, updated_at timestamp)));
+$node->safe_psql('postgres', q(INSERT INTO tbl SELECT i,now() FROM generate_series(1, 1) s(i);));
+
+# Run pgbench.
+$node->pgbench(
+	'--no-vacuum --client=30 --jobs=5 --exit-on-abort --transactions=500',
+	0,
+	[qr{actually processed}],
+	[qr{^$}],
+	'concurrent UPDATES w/ and CIC',
+	{
+		'pgbench_concurrent_cic' => q(
+			SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+			\if :gotlock
+CREATE INDEX CONCURRENTLY idx ON tbl(i, updated_at);
+SELECT bt_index_parent_check('idx', heapallindexed => true, rootdescend => true, checkunique => true);
+DROP INDEX CONCURRENTLY idx;
+SELECT pg_advisory_unlock(42);
+			\else
+	BEGIN;
+	UPDATE tbl SET updated_at = now() WHERE i = floor(random()*1);
+	UPDATE tbl SET updated_at = now() WHERE i = floor(random()*1);
+	UPDATE tbl SET updated_at = now() WHERE i = floor(random()*1);
+	UPDATE tbl SET updated_at = now() WHERE i = floor(random()*1);
+	UPDATE tbl SET updated_at = now() WHERE i = floor(random()*1);
+	UPDATE tbl SET updated_at = now() WHERE i = floor(random()*1);
+	COMMIT;
+			\endif
+		  )
+	});
+
+$node->stop;
+done_testing();
-- 
2.43.0



  1   2   >