Re: Add partial :-variable expansion to psql \copy

2025-04-01 Thread Fabien Coelho

Hello Corey,


If we could do this:

    COPY :"myschema".:"mytable" FROM STDIN \g < :"myfilename"

that would fit our patterns most cleanly, but we would probably create 
a parsing hassle for ourselves if we ever wanted to mix pipe-to with 
pipe-from. It would also require checking on every command, when 
uploaded \copy commands make up a very small percentage of commands 
issued. So I don't think there's a good way around the asymmetry of 
COPY TO being a regular \g-able command, whereas COPY FROM will always 
require some other send-command.


Perhaps we create a new command \copyfrom:

    COPY :"myschema".:"mytable" :options FROM STDIN \copyfrom 
:"myfilename"


    COPY :"myschema".:"mytable" :options FROM STDIN \copyfrom 
:"my_complex_command" |


If we had something like that we might be able to replace all existing 
uses of \copy.


Indeed, I like the idea of extending psql handling of COPY rather than 
trying to salvage \copy. I do not like that it is probably more work for 
significantly larger patch.


There are 4 cases to address: input/output cross join file/program, and 
as you pointed out the output ones are already handled.


I'm hesitating about the right syntax, though, for an input backslash 
command which in effect would really only apply to COPY? ISTM that \g* 
is used for "go", i.e. a semi-colon replacement which executes the SQL, 
and we should want the same thing, which suggests:


COPY "foo" FROM STDIN \gi filename

COPY "foo" FROM STDIN \gi command...|

Another drawback is that it creates an error path:

COPY "foo" FROM 'server-side-file' \gi 'client-side-file'

--

Fabien.


Re: add function argument name to substring and substr

2025-04-01 Thread David G. Johnston
On Tue, Apr 1, 2025 at 6:15 AM Marcos Pegoraro  wrote:

> Em ter., 1 de abr. de 2025 às 02:00, David G. Johnston <
> david.g.johns...@gmail.com> escreveu:
>
> Wouldn't it be good to add the use of parentheses using posix ? It's
> useful and rarely documented
> substring('Thomas', '...$')
> +substring('Email: johnj...@mymail.com, Name: John' from
> '@(.*), Name')
>
>
Agreed. A second example using () would be good here.

Was pondering explaining the "no parentheses" case here; but for someone
familiar with PREs the behavior is obvious and everyone else has the link
needed to learn what is happening.

David J.


Re: Make query cancellation keys longer

2025-04-01 Thread Heikki Linnakangas

On 22/02/2025 16:58, Jelte Fennema-Nio wrote:

On Mon, 9 Sept 2024 at 17:58, Robert Haas  wrote:


On Fri, Aug 16, 2024 at 11:29 AM Robert Haas  wrote:

I'll split this patch like that, to make it easier to compare and merge
with Jelte's corresponding patches.


That sounds great. IMHO, comparing and merging the patches is the next
step here and would be great to see.


Heikki, do you have any update on this work?


My patchset in the other protocol thread needed a rebase. So I took
that as an opportunity to rebase this patchset on top of it, because
this seems to be the protocol change that we can most easily push over
the finish line.

1. I changed the last patch from Heikki to only contain the changes
for the cancel lengthening. The general protocol change related things
I merged with mine (I only kept some error handling and docs).
2. I also removed the length field in the new BackendKey definition,
eventhough I asked for that addition previously. I agree with Heikki
that it's actually easier to parse and create without, because you can
use the same code for both versions.
3. I made our timingsafe_bcmp implementation call into OpenSSL's CRYPTO_memcmp.


Great, thanks!

I spent some time going over this again, making tiny fixes here and 
there. A couple of somewhat notable fixes:


- fixed the "server only supports protocol version %d.%d, but 
min_protocol_version was set to %d.%d" message to print the server's 
protocol version correctly
- error out if the server sends two NegotiateProtocolVersion messages 
for some reason
- added a new "protocol versions" subsection to the docs, moving the 
existing paragraphs related to version negotiation there. I also added a 
table to list the different protocol versions. See the last patch in the 
series.


I'm a bit disappointed that we're not changing the default protocol 
version to 3.2 yet. It will receive very little usage in practice until 
we do. But that seems to be the least objectionable path forward.


I think this is pretty much ready for commit. I will go over it one more 
time, and plan to push tomorrow.



One open question on the last patch is: Document what the maximum size
of the cancel key is that the client can expect? I think Jacob might
have some ideas on that.


It's documented as 256 bytes now. But that still leaves one remaining 
question: the idea is that the maximum size is long enough that 
middleware can add their data to it, and still include the original 
server's key. As the patch stands, the server creates 32 byte keys, 
leaving 224 for middleware. But that's not documented anywhere. I'm 
inclined to document that the server only uses up to 32 bytes, and 
middleware should likewise try to "add" only 32 bytes or so to the key, 
so that you can stack multiple such middleware without exceeding the 
total length. And perhaps 256 bytes is still too small, if you consider 
such stacking.


--
Heikki Linnakangas
Neon (https://neon.tech)From 2efd6b839fb00e260d40da9b4e272db1e67685f7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 1 Apr 2025 16:28:11 +0300
Subject: [PATCH 1/7] Add timingsafe_bcmp(), for constant-time memory
 comparison

timingsafe_bcmp() should be used instead of memcmp() or a naive
for-loop, when comparing passwords or secret tokens, to avoid leaking
information about the secret token by timing. This commit just
introduces the function but does not change any existing code to use
it yet.

Co-authored-by: Jelte Fennema-Nio 
Discussion: https://www.postgresql.org/message-id/7b86da3b-9356-4e50-aa1b-56570825e...@iki.fi
---
 configure  | 23 
 configure.ac   |  3 ++-
 meson.build|  2 ++
 src/include/port.h |  4 
 src/port/meson.build   |  1 +
 src/port/timingsafe_bcmp.c | 43 ++
 6 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 src/port/timingsafe_bcmp.c

diff --git a/configure b/configure
index 30d949c3c46..1e18a5a22b0 100755
--- a/configure
+++ b/configure
@@ -15927,6 +15927,16 @@ fi
 cat >>confdefs.h <<_ACEOF
 #define HAVE_DECL_STRSEP $ac_have_decl
 _ACEOF
+ac_fn_c_check_decl "$LINENO" "timingsafe_bcmp" "ac_cv_have_decl_timingsafe_bcmp" "$ac_includes_default"
+if test "x$ac_cv_have_decl_timingsafe_bcmp" = xyes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_TIMINGSAFE_BCMP $ac_have_decl
+_ACEOF
 
 
 # We can't use AC_CHECK_FUNCS to detect these functions, because it
@@ -16087,6 +16097,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "timingsafe_bcmp" "ac_cv_func_timingsafe_bcmp"
+if test "x$ac_cv_func_timingsafe_bcmp" = xyes; then :
+  $as_echo "#define HAVE_TIMINGSAFE_BCMP 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" timingsafe_bcmp.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS timingsafe_bcmp.$ac_objext"
+ ;;
+esac
+
+fi
+
 
 
 ac_fn_c_check_func "$LINENO" "pthread_barrier_wait" "ac_cv_func_pthread_barrier_wait"
diff 

Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).

2025-04-01 Thread David G. Johnston
On Tue, Apr 1, 2025 at 6:52 AM vignesh C  wrote:

>
> We are not changing the existing behavior. However, since copying data
> from large tables can take a significant amount of time, would it be
> helpful to add a cautionary note advising users to refresh the
> materialized view before running copy command to avoid stale data?
>
>
No, for the same reason the caveat about WITH NO DATA need not be mentioned
either.  These are the inherent properties/trade-offs of using a
materialized view versus a normal view.  They are discussed when talking
about the creation and usage of materialized views specifically.  Features
that interact with materialized views consistent with these two documented
properties do not need to point out that fact. i.e., the existing
non-documenting of those two points in SELECT is appropriate, and should be
emulated in COPY.  The output doesn't change if you write "copy table"
instead of "copy (select * from table)" so nothing needs to be pointed out.

David J.


Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Peter Geoghegan
On Tue, Apr 1, 2025 at 9:26 AM Aleksander Alekseev
 wrote:
> Can you think of any tests specifically for 0003, or relying on the
> added Asserts() is best we can do? Same question for 0002.

There are many more tests than those that are included in the patch. I
wrote and debugged all patches using TDD. This is also how I wrote the
Postgres 17 SAOP patch. (My test suite is actually a greatly expanded
version of the earlier SAOP patch's test suite, since this project is
a direct follow-up.)

These tests are not nearly stable enough to commit; they have
something like a 5% chance of spuriously failing when I run them,
generally due to concurrent autoanalyze activity that prevents VACUUM
from setting all VM bits. Almost all of the tests expect specific
index page accesses, which is captured by "Buffers:" output. I find
that the full test suite takes over 10 seconds to run with a debug
build. So they're really not too maintainable.

Every little behavioral detail has been memorialized by some test.
Many of the tests present some adversarial scenario where
_bt_advance_array_keys runs out of array keys before reaching the end
of a given page, where it must then decide how to get to the next leaf
page (the page whose key space matches the next distinct set of array
keys). We want to consistently make good decisions about whether we
should step to the next sibling page, or whether starting another
primscan to get to the next relevant leaf page makes more sense. That
comes up again and again (only a minority of these tests were written
to test general correctness).

I did write almost all of these tests before writing the code that
they test, but most of the tests never failed by giving wrong answers
to queries. They initially failed by not meeting some expectation that
I had in mind about how best to schedule primitive index scans.

What "the right decision" means when scheduling primitive index scans
is somewhat open to interpretation -- I do sometimes revise my opinion
in light of new information, or new requirements (SAOP scans have
basically the same issues as skip scans, but in practice skip scans
are more sensitive to these details). It's inherently necessary to
manage the uncertainty around which approach is best, in any given
situation, on any given page. Having a large and varied set of
scenarios seems to help to keep things in balance (it avoids unduly
favoring one type of scan or index over another).

> I can confirm that v33 applies and passes the test.
>
> 0002 adds _bt_set_startikey() to nbtutils.c but it is not well-covered
> by tests, many branches of the new code are never executed.

It's hard to get test coverage for these cases into the standard
regression tests, since the function is only called when on the second
or subsequent page of a given primscan -- you need quite a few
relatively big indexes. There are quite a few far-removed
implementation details that any test will inevitably have to rely on,
such as BLCKSZ, the influence of suffix truncation.

Just having test coverage of these branches wouldn't test much on its
own. In practice it's very likely that not testing the key directly
(incorrectly assuming that the _bt_keep_natts_fast precheck is enough,
just removing the uncovered branches) won't break queries at all. The
_bt_keep_natts_fast precheck tells us which columns have no more than
one distinct value on the page. If there's only one value, and if we
just left a page that definitely satisfied its keys, it's quite likely
that those same keys will also be satisfied on this new page (it's
just not certain, which is why _bt_set_startikey can't just rely on
the precheck, it has to test each key separately).

> ```
> @@ -2554,9 +2865,20 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection 
> dir,
>   */
>  if (requiredSameDir)
>  *continuescan = false;
> +else if (unlikely(key->sk_flags & SK_BT_SKIP))
> +{
> +/*
> + * If we're treating scan keys as nonrequired, and encounter 
> a
> + * skip array scan key whose current element is NULL, then it
> + * must be a non-range skip array
> + */
> +Assert(forcenonrequired && *ikey > 0);
> +return _bt_advance_array_keys(scan, NULL, tuple, tupnatts,
> +  tupdesc, *ikey, false);
> +}
> ```
>
> This branch is also never executed during the test run.

FWIW I have a test case that breaks when this particular code is
removed, given a wrong answer.

This is just another way that _bt_check_compare can find that it needs
to advance the array keys in the context of forcenonrequired mode --
it's just like the other calls to _bt_advance_array_keys from
_bt_check_compare. This particular code path is only hit when a skip
array's current element is NULL, which is unlikely to coincide with
the use of forcenonrequired mode (NULL is ju

Re: add function argument name to substring and substr

2025-04-01 Thread Marcos Pegoraro
Em ter., 1 de abr. de 2025 às 02:00, David G. Johnston <
david.g.johns...@gmail.com> escreveu:

Wouldn't it be good to add the use of parentheses using posix ? It's useful
and rarely documented
substring('Thomas', '...$')
+substring('Email: johnj...@mymail.com, Name: John' from
'@(.*), Name')

regards
Marcos


Re: Restrict publishing of partitioned table with a foreign table as partition

2025-04-01 Thread Shlok Kyal
On Fri, 28 Mar 2025 at 16:35, Álvaro Herrera  wrote:
>
> On 2025-Mar-28, Shlok Kyal wrote:
>
> > On Mon, 24 Mar 2025 at 21:17, Álvaro Herrera  
> > wrote:
>
> > > Also, surely we should document this restriction in the SGML docs
> > > somewhere.
> >
> > I have added comment in create_publication.sgml
>
> Hmm, okay, but "We cannot" is not the style used in the documentation.
> In addition, I think this mechanism should be mentioned in
> logical-replication.sgml; currently there's a note in the Restrictions
> section about foreign tables, which should be expanded to explain this.
>

I have modified the comment in create_publication.sgml and also added
comment in the restrictions section of logical-replication.sgml.
I have also added a more detailed explanation in comment of
'check_foreign_tables'

I have attached the updated v11 patch.


Thanks and Regards,
Shlok Kyal


v11-0001-Restrict-publishing-of-partitioned-table-with-fo.patch
Description: Binary data


Re: AIO writes vs hint bits vs checksums

2025-04-01 Thread Andres Freund
Hi,

On 2025-04-01 13:34:53 +0300, Heikki Linnakangas wrote:
> Here's a rebase of these patches.

Thanks!

Curious what made you do this? Do you need any parts of this soon?

I've been hacking on this a bit more, btw, just haven't gotten around to the
necessary README rephrasings... :(


> I went ahead and committed the "heapam: Only set tuple's block once per page
> in pagemode" patch, because it was trivial and independent of the rest.

Thanks!

I think I should just commit the kill_prior_tuples test, it's better than
nothing.

Greetings,

Andres Freund




Re: AIO v2.5

2025-04-01 Thread Noah Misch
On Mon, Mar 31, 2025 at 08:41:39PM -0400, Andres Freund wrote:
> updated version

All non-write patches (1-7) are ready for commit, though I have some cosmetic
recommendations below.  I've marked the commitfest entry Ready for Committer.

> + # Check a page validity error in another block, to ensure we 
> report
> + # the correct block number
> + $psql_a->query_safe(
> + qq(
> +SELECT modify_rel_block('tbl_zero', 3, corrupt_header=>true);
> +));
> + psql_like(
> + $io_method,
> + $psql_a,
> + "$persistency: test zeroing of invalid block 3",
> + qq(SELECT read_rel_block_ll('tbl_zero', 3, 
> zero_on_error=>true);),
> + qr/^$/,
> + qr/^psql::\d+: WARNING:  invalid page in block 3 
> of relation base\/.*\/.*; zeroing out page$/
> + );
> +
> +
> + # Check a page validity error in another block, to ensure we 
> report
> + # the correct block number

This comment is a copy of the previous test's comment.  While the comment is
not false, consider changing it to:

# Check one read reporting multiple invalid blocks.

> + $psql_a->query_safe(
> + qq(
> +SELECT modify_rel_block('tbl_zero', 2, corrupt_header=>true);
> +SELECT modify_rel_block('tbl_zero', 3, corrupt_header=>true);
> +));
> + # First test error
> + psql_like(
> + $io_method,
> + $psql_a,
> + "$persistency: test reading of invalid block 2,3 in 
> larger read",
> + qq(SELECT read_rel_block_ll('tbl_zero', 1, nblocks=>4, 
> zero_on_error=>false)),
> + qr/^$/,
> + qr/^psql::\d+: ERROR:  2 invalid pages among 
> blocks 1..4 of relation base\/.*\/.*\nDETAIL:  Block 2 held first invalid 
> page\.\nHINT:[^\n]+$/
> + );
> +
> + # Then test zeroing via ZERO_ON_ERROR flag
> + psql_like(
> + $io_method,
> + $psql_a,
> + "$persistency: test zeroing of invalid block 2,3 in 
> larger read, ZERO_ON_ERROR",
> + qq(SELECT read_rel_block_ll('tbl_zero', 1, nblocks=>4, 
> zero_on_error=>true)),
> + qr/^$/,
> + qr/^psql::\d+: WARNING:  zeroing out 2 invalid 
> pages among blocks 1..4 of relation base\/.*\/.*\nDETAIL:  Block 2 held first 
> zeroed page\.\nHINT:[^\n]+$/
> + );
> +
> + # Then test zeroing vio zero_damaged_pages

s/vio/via/

> +# Verify checksum handling when creating database from an invalid database.
> +# This also serves as a minimal check that cross-database IO is handled
> +# reasonably.

To me, "invalid database" is a term of art from the message "cannot connect to
invalid database".  Hence, I would change "invalid database" to "database w/
invalid block" or similar, here and below.  (Alternatively, just delete "from
an invalid database".  It's clear from the context.)

> + if (corrupt_checksum)
> + {
> + boolsuccessfully_corrupted = 0;
> +
> + /*
> +  * Any single modification of the checksum could just end up 
> being
> +  * valid again. To be sure
> +  */

Unfinished sentence.  That said, I'm not following why we'd need this loop.
If this test code were changing the input to the checksum, it's true that an
input bit flip might reach the same pd_checksum.  The test case is changing
pd_checksum, not the input bits.  I don't see how changing pd_checksum could
leave the page still valid.  There's only one valid pd_checksum value for a
given input page.

> + /*
> +  * The underlying IO actually completed OK, and thus 
> the "invalid"
> +  * portion of the IOV actually contains valid data. 
> That can hide
> +  * a lot of problems, e.g. if we were to wrongly mark a 
> buffer,
> +  * that wasn't read according to the shortened-read, IO 
> as valid,
> +  * the contents would look valid and we might miss a 
> bug.

Minimally s/read, IO/read IO,/ but I'd edit a bit further:

 * a lot of problems, e.g. if we were to wrongly 
mark-valid a
 * buffer that wasn't read according to the 
shortened-read IO, the
 * contents would look valid and we might miss a bug.

> Subject: [PATCH v2.15 05/18] md: Add comment & assert to buffer-zeroing path
>  in md[start]readv()

> The zero_damaged_pages path is incomplete, as as missing segments are not

s/as as/as/

> For now, put an Assert(false) comments documenting this choice into mdreadv()

s/comments/and comments/

> +  * For PG 18, we are putting an

Re: Draft for basic NUMA observability

2025-04-01 Thread Bertrand Drouvot
Hi Jakub,

On Tue, Apr 01, 2025 at 12:56:06PM +0200, Jakub Wartak wrote:
> On Mon, Mar 31, 2025 at 4:59 PM Bertrand Drouvot
>  wrote:
> 
> > Hi,
> 
> Hi Bertrand, happy to see you back, thanks for review and here's v18
> attached (an ideal fit for PG18 ;))

Thanks for the new version!

=== About v18-0002

It looks in a good shape to me. The helper's name might still be debatable
though.

I just have 2 comments:

=== 1

+   if (bufRecord->blocknum == InvalidBlockNumber ||
+   bufRecord->isvalid == false)

It seems to me that this check could now fit in one line.

=== 2

+   {
SRF_RETURN_DONE(funcctx);
+   }

Extra parentheses are not needed.

=== About v18-0003

=== 3

I think that pg_buffercache--1.5--1.6.sql is not correct. It should contain
only the necessary changes when updating from 1.5. It means that it should 
"only"
create the new objects (views and functions in our case) that come in v18-0003 
and grant appropriate privs.

Also it should mention:

"
\echo Use "ALTER EXTENSION pg_buffercache UPDATE TO '1.6'" to load this file. 
\quit
"
and not:

"
+\echo Use "CREATE EXTENSION pg_buffercache" to load this file. \quit
"

The already existing pg_buffercache--1.N--1.(N+1).sql are good examples.

=== 4

+   for (i = 0; i < NBuffers; i++)
+   {
+   int blk2page = (int) i * 
pages_per_blk;
+

I think that should be:

int blk2page = (int) (i * pages_per_blk);

=== About v18-0004

=== 5

When running:

select c.name, c.size as num_size, s.size as shmem_size
from (select n.name as name, sum(n.size) as size from pg_shmem_numa_allocations 
n group by n.name) c, pg_shmem_allocations s
where c.name = s.name;

I can see:

- pg_shmem_numa_allocations reporting a lot of times the same size
- pg_shmem_numa_allocations and pg_shmem_allocations not reporting the same size

Do you observe the same?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Fix build on MINGW on ARM64

2025-04-01 Thread Andrew Dunstan



On 2025-04-01 Tu 8:47 AM, vignesh C wrote:

On Tue, 1 Apr 2025 at 16:02, Andrew Dunstan  wrote:


On 2025-04-01 Tu 5:16 AM, vignesh C wrote:

On Sun, 2 Feb 2025 at 00:52, Lars Kanis  wrote:

This patch limits the workaround of using __buildin_setjmp on the
Windows MINGW platform. This workaround is only necessary for legacy
MSVCRT based toolchain, but not for UCRT based. It is not available at
all on clang on ARM64 resulting in the following compiler error:

error: __builtin_longjmp is not supported for the current target

This patch is used since years in MSYS2 packages:

https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-postgresql/postgresql-14.0-use-mingw-setjmp-on-ucrt.patch

It is also used in ruby-pg to allow compiling for
aarch64-w64-windows-gnu: https://github.com/ged/ruby-pg/pull/626/files

It would be nice if this patch could be merged upstream.

Are there any known issues with using __builtin_setjmp? I'm asking
because the comment mentions about the long standing issues in its
setjmp "However, it seems that MinGW-64 has some longstanding issues
in its setjmp support,  so on that toolchain we cheat and use gcc's
builtins. Also few users have reported segfaults when using setjmp
with MinGW as in [1].
[1] - 
https://stackoverflow.com/questions/53709069/setjmp-longjmp-in-x86-64-w64-mingw32


That report is from quite a few years ago, so I'm not sure it really helps.

If one of you would add this to the next CF we could see how the CFbot
reacts to it. In general it looks sane.

There is an existing CF entry for this at [1]. If no one picks this
till the end of this CF, we can move it to next CF.
[1] - https://commitfest.postgresql.org/patch/5610/




Somehow I missed that. OK, looks good, will commit shortly.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add partial :-variable expansion to psql \copy

2025-04-01 Thread Fabien Coelho

On 31/03/2025 17:09, Tom Lane wrote:


Fabien COELHO writes:

[...] The attached patch allows \copy to use variable's values in place of 
table and file names:

Hm ... I'm on board with the general idea of the feature, but I find
this implementation quite horrid.


Indeed, I just added the stuff in the already quite peculiar manual 
lexer/parser for \copy. I did not think of addressing the why it is like 
that issue and try to fix it :-)



   I would rather see us adjust the
lexing rules in psqlscanslash.l so that variable expansion happens
there when collecting \copy arguments.  This would eliminate at
least part of the distinction between OT_WHOLE_LINE and OT_NORMAL
modes, and we'd have to have some discussion about how far to go
there.  Or maybe just change exec_command_copy to use OT_NORMAL
not OT_WHOLE_LINE?  If we modify the behavior of OT_WHOLE_LINE
then the ability to expand variables would start to apply in the
other commands using that, notably \!.  I think that's at least
potentially good, but perhaps the blast radius of such a change
is too large.


I'm not sure that such \copy salvage to using lex is easy because:

(1) it seems that is the only command which is really full SQL hidden in 
a backslash command


(2) on one line without requiring a final ';',

(3) the client needs to actually parse it and modify it to some degree 
before sending it to the server.


so the implication for trying to maintain compatibility without adding 
weirdness seem slim.



Anyway, my feeling about it is that \copy parsing is a huge hack
right now,

Yes.

  and I'd rather see it become less of a hack, that is
more like other psql commands, instead of getting even hackier.


I think that I'll have a try with Corey suggestion to extend COPY rather 
than change \copy.


--

Fabien.


Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-01 Thread Ilia Evdokimov

On 28.03.2025 15:20, Ilia Evdokimov wrote:

Then we need to decide clearly what exactly to display in EXPLAIN for 
the Memoize node: absolute values (estimated distinct keys and 
estimated cache capacity) or ratios (hit_ratio and evict_ratio). 
Ratios have the advantage of quickly reflecting the overall 
effectiveness of Memoize. However, absolute values have a significant 
advantage as they explicitly reveal the reason of Memoize's poor 
performance, making problem diagnosis simpler.


With absolute values, users can directly understand the underlying 
reason for poor performance. For example: insufficient memory 
(capacity < distinct keys), inaccurate planner statistics (distinct 
keys significantly different from actual values), poorly ordered keys 
(capacity ~ distinct keys, but frequent evictions as seen in the 
Evictions parameter), or Memoize simply not being beneficial (capacity 
~ distinct keys ~ calls). Ratios, by contrast, only reflect the final 
outcome without clearly indicating the cause or the specific steps 
needed to resolve the issue.


Thus, absolute values do more than just inform users that a problem 
exists; they provide actionable details that enable users to directly 
address the problem (increase work_mem, refresh statistics, create 
extended statistics, or disable Memoize entirely). Additionally, no 
other plan nodes in PostgreSQL currently use a similar ratio-based 
approach - everywhere else absolute values are consistently shown 
(e.g., number of rows, buckets, batches, memory used, etc.). Using 
absolute values in Memoize maintains consistency with existing practice.


I've updated the patch to v5, since the new parameter est_unique_keys 
in make_memoize() is now placed near est_entries, which is more 
logical and readable than putting it at the end.


Any thoughts?

--
Best Regards,
Ilia Evdokimov,
Tantor Labs LLC.



With the feature freeze coming up soon, I’d like to ask: do we plan to 
include this patch in v18?


Please let me know if there’s anything I can do to help move it forward.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.





Re: Add partial :-variable expansion to psql \copy

2025-04-01 Thread Christoph Berg
Re: Fabien Coelho
> (1) it seems that is the only command which is really full SQL hidden in a
> backslash command

Perhaps this form could be improved by changing `\copy (select) to file`
to something like `select \gcopy (to file)`. That might make :expansion
in the "select" part easier to handle.

I've heard several complaints that `\copy (select)` can't be wrapped
over several lines, so offering the alternative syntax in parallel to
\copy would also solve another problem.

Christoph




Re: Add partial :-variable expansion to psql \copy

2025-04-01 Thread Pavel Stehule
Hi

út 1. 4. 2025 v 12:00 odesílatel Christoph Berg  napsal:

> Re: Fabien Coelho
> > (1) it seems that is the only command which is really full SQL hidden in
> a
> > backslash command
>
> Perhaps this form could be improved by changing `\copy (select) to file`
> to something like `select \gcopy (to file)`. That might make :expansion
> in the "select" part easier to handle.
>
> I've heard several complaints that `\copy (select)` can't be wrapped
> over several lines, so offering the alternative syntax in parallel to
> \copy would also solve another problem.
>

What is the reason to use \copy (select) to ?

psql (on client side) supports csv format pretty well with single line
switching to this format (i know so it not have the full functionality of
COPY statement).

(2025-04-01 12:08:36) postgres=# select 1,2 \g (format=csv) output.csv
(2025-04-01 12:08:53) postgres=#

For me, the original proposal has interesting benefits (Tom wrote about
it). Inconsistency where psql's variable can or cannot be used is unhappy.
It is always bad surprising when you find some inconsistencies

Regards

Pavel


>
> Christoph
>
>
>


Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).

2025-04-01 Thread Kirill Reshke
On Tue, 1 Apr 2025, 11:45 vignesh C,  wrote:

>
> One thing I noticed was that if the materialized view is not refreshed
> user will get stale data
>
> Should we document this?
>


Does this patch alter thus behaviour? User will get stale data even on
HEAD, why should we take a care within this thread?

>


Re: Draft for basic NUMA observability

2025-04-01 Thread Jakub Wartak
On Mon, Mar 31, 2025 at 4:59 PM Bertrand Drouvot
 wrote:

> Hi,

Hi Bertrand, happy to see you back, thanks for review and here's v18
attached (an ideal fit for PG18 ;))

> On Mon, Mar 31, 2025 at 11:27:50AM +0200, Jakub Wartak wrote:
> > On Thu, Mar 27, 2025 at 2:40 PM Andres Freund  wrote:
> > >
> > > > +Size
> > > > +pg_numa_get_pagesize(void)
> > [..]
> > >
> > > Should this have a comment or an assertion that it can only be used after
> > > shared memory startup? Because before that huge_pages_status won't be
> > > meaningful?
> >
> > Added both.
>
> Thanks for the updated version!
>
> +   Assert(IsUnderPostmaster);
>
> I wonder if that would make more sense to add an assertion on 
> huge_pages_status
> and HUGE_PAGES_UNKNOWN instead (more or less as it is done in
> CreateSharedMemoryAndSemaphores()).

Ok, let's have both just in case (this status is by default
initialized to _UNKNOWN, so I hope you haven't had in mind using
GetConfigOption() as this would need guc.h in port?)

> === About v17-0002-This-extracts-code-from-contrib-pg_buffercache-s.patch
[..]
> I think pg_buffercache_numa_pages() should not be mentioned before it's 
> actually
> implemented.

Right, fixed.

> === 1
>
[..]
> "i <= 9" will be correct only once v17-0003 is applied (when 
> NUM_BUFFERCACHE_PAGES_ELEM
> is increased to 10).
>
> In v17-0002 that should be i < 9 (even better i < NUM_BUFFERCACHE_PAGES_ELEM).
>
> That could also make sense to remove the loop and use memset() that way:
>
> "
> memset(&nulls[1], true, (NUM_BUFFERCACHE_PAGES_ELEM - 1) * sizeof(bool));
> "
>
> instead. It's done that way in some other places (hbafuncs.c for example).

Ouch, good catch.

> === 2
>
> -   if (expected_tupledesc->natts == NUM_BUFFERCACHE_PAGES_ELEM)
> -   TupleDescInitEntry(tupledesc, (AttrNumber) 9, 
> "pinning_backends",
> -  INT4OID, -1, 0);
>
> +   if (expected_tupledesc->natts >= NUM_BUFFERCACHE_PAGES_ELEM - 1)
> +   TupleDescInitEntry(tupledesc, (AttrNumber) 9, 
> "pinning_backends",
> +  INT4OID, -1, 0);
>
> I think we should not change the "expected_tupledesc->natts" check here until
> v17-0003 is applied.

Right, I've moved that into 003 where it belongs and now 002 has no
single NUMA reference. I've thrown 0001+0002 alone onto CI and it
passed too.

-J.


v18-0002-This-extracts-code-from-contrib-pg_buffercache-s.patch
Description: Binary data


v18-0001-Add-optional-dependency-to-libnuma-Linux-only-fo.patch
Description: Binary data


v18-0004-Add-pg_shmem_numa_allocations-to-show-NUMA-memor.patch
Description: Binary data


v18-0003-Extend-pg_buffercache-with-new-view-pg_buffercac.patch
Description: Binary data


RE: Fix slot synchronization with two_phase decoding enabled

2025-04-01 Thread Zhijie Hou (Fujitsu)
On Tue, Apr 1, 2025 at 2:09 PM Amit Kapila wrote:

> 
> On Mon, Mar 31, 2025 at 5:0 PM Zhijie Hou (Fujitsu) 
>  wrote:
> >
> > Thanks for the comments, they have been addressed in V2.
> >
> 
> In the PG17 patch, we only have a check preventing failover and 
> two_phase in CreateSubscription. Don't we need a similar check for 
> AlterSubscription?
> 
> Apart from the above, I have a few suggestions for changes in docs, 
> error messages, and comments for both versions. See attached.

Thanks for the comments.

Here is the V3 patch set which addressed all the comments.

I also added a testcase for ALTER SUB in 0002 as suggested by
Kuroda-san off-list.

Additionally, I found that a test failed in PG17 following this
patch because it involved creating a subscription with both failover and
two-phase enabled. Since that test was designed to test the upgrade of
replication slots and is not directly related to failover functionality, I
decided to disable the failover option for test case.

Best Regards,
Hou zj


v3-0001-Fix-slot-synchronization-with-two_phase-decoding-.patch
Description:  v3-0001-Fix-slot-synchronization-with-two_phase-decoding-.patch
From 0c9a8086913ca2e13106df386544821f39c1d41c Mon Sep 17 00:00:00 2001
From: Hou Zhijie 
Date: Mon, 31 Mar 2025 16:28:57 +0800
Subject: [PATCH v3] Disallow enabling failover for a replication slot that
 enables two-phase decoding

This commit fixes a bug for slot synchronization with logical replication
slots that enabled two_phase decoding. As it stands, transactions prepared
before two-phase decoding is enabled may fail to replicate to the subscriber
after being committed on a promoted standby following a failover.

The issue arises because the two_phase_at field of a slot, which tracks the LSN
from which two-phase decoding starts, is not synchronized to standby servers.
Without this field, the logical decoding might incorrectly identify prepared
transaction as already replicated to the subscriber, causing them to be
skipped.

To address the issue on HEAD, this commit makes the two_phase_at field of the 
slot
visible in the pg_replication_slots view and enables the slot synchronization
to copy this value to the corresponding synced slot on the standby server.

The bug has been present since the introduction of slot synchronization in
PostgreSQL 17. However, due to the need for catalog changes, backpatching this
fix is not feasible. Instead, to prevent the risk of losing prepared
transactions in prior versions, we now disallow enabling failover and two-phase
decoding together for a replication slot.
---
 contrib/test_decoding/expected/slot.out   |  2 ++
 contrib/test_decoding/sql/slot.sql|  1 +
 src/backend/commands/subscriptioncmds.c   | 25 +
 src/backend/replication/slot.c| 28 +++
 src/bin/pg_upgrade/t/003_logical_slots.pl |  6 ++--
 .../t/040_standby_failover_slots_sync.pl  | 23 +++
 src/test/regress/expected/subscription.out|  3 ++
 src/test/regress/sql/subscription.sql |  4 +++
 8 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/slot.out 
b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..8fd762dea85 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -427,6 +427,8 @@ SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_default_slot', '
 
 SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', 
true, false, true);
 ERROR:  cannot enable failover for a temporary replication slot
+SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_twophase_true_slot', 
'test_decoding', false, true, true);
+ERROR:  "failover" and "two_phase" are mutually exclusive options
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
  ?column? 
 --
diff --git a/contrib/test_decoding/sql/slot.sql 
b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..a89fe712ff6 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -182,6 +182,7 @@ SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_true_slot', 'tes
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 
'test_decoding', false, false, false);
 SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 
'test_decoding', false, false);
 SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', 
true, false, true);
+SELECT 'init' FROM 
pg_create_logical_replication_slot('failover_twophase_true_slot', 
'test_decoding', false, true, true);
 SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
 
 SELECT slot_name, slot_type, failover FROM pg_replication_slots;
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 9467f58a23d..6db3c9347fa 100644
--- 

Re: AIO v2.5

2025-04-01 Thread Andres Freund
Hi,

On 2025-04-01 14:56:17 +0300, Aleksander Alekseev wrote:
> Hi Andres,
>
> > > I didn't yet push
> > >
> > > > > Subject: [PATCH v2.14 13/29] aio: Add README.md explaining higher 
> > > > > level design
>
> I have several notes about 0003 / README.md:
>
> 1. I noticed that the use of "Postgres" and "postgres" is inconsistent.

:/

It probably should be consistent, but I have no idea which of the spellings we
should go for. Either looks ugly in some contexts.


> 2.
>
> ```
> +pgaio_io_register_callbacks(ioh, PGAIO_HCB_SHARED_BUFFER_READV, 0);
> ```
>
> Perhaps I'm a bit late here, but the name of the function is weird. It
> registers a single callback, but the name is "_callbacks".

It registers a *set* of callbacks (stage, complete_shared, complete_local,
report_error) on the handle.



> 3. The use of "AIO Handle" and "AioHandle" is inconsistent.

This seems ok to me.


> 4.
>
> - pgaio_io_register_callbacks
> - pgaio_io_set_handle_data_32
>
> If I understand correctly one can register multiple callbacks per one
> AIO Handle (right? ...). However I don't see an obvious way to match
> handle data to the given callback. If all the callbacks get the same
> handle data... well it's weird IMO, but we should explicitly say that.

There is:

/*
 * Associate an array of data with the Handle. This is e.g. useful to the
 * transport knowledge about which buffers a multi-block IO affects to
 * completion callbacks.
 *
 * Right now this can be done only once for each IO, even though multiple
 * callbacks can be registered. There aren't any known usecases requiring more
 * and the required amount of shared memory does add up, so it doesn't seem
 * worth multiplying memory usage by PGAIO_HANDLE_MAX_CALLBACKS.
 */


> On top of that we should probably explain in which order the callbacks
> are going to be executed. If there are any guarantees in this respect
> of course.

Alongside PgAioHandleCallbacks:
 *
 * The latest registered callback is called first. This allows
 * higher-level code to register callbacks that can rely on callbacks
 * registered by lower-level code to already have been executed.
 *


> 5. pgaio_io_set_handle_data_32(ioh, (uint32 *) buffer, 1)
>
> Perhaps it's worth mentioning if `buffer` can be freed after the call
> i.e. if it's stored by value or by reference.

By value.


> It's also worth clarifying if the maximum number of buffers is limited or
> not.

It's limited to PG_IOV_MAX, fwiw.


> 6. It is worth clarifying if AIO allows reads and writes or only reads
> at the moment.

We have patches for writes, I just ran out of time for 18. Im not particularly
excited about adding stuff that then needs to be removed in 19.


> Perhaps it's also worth explicitly saying that AIO is for disk IO only, not
> for network one.

I'd like to integrate network IO too.  I have a local prototype, fwiw.


> 7. It is worth clarifying how many times the callbacks are called when
> reading multiple buffers. Is it guaranteed that the callbacks are
> called ones, or if it somehow depends on the implementation, and also
> what happens in the case if I/O succeeds partially.

The aio subsystem doesn't know anything about buffers.  Callbacks are executed
exactly once, with the exception of the error reporting callback, which could
be called multiple times.


> 8. I believe we should tell a bit more about the context in which the
> callbacks are called. Particularly what happens to the memory contexts
> and if I can allocate/free memory, can I throw ERRORs, can I create
> new AIO Handles, is it expected that the callback should return
> quickly, are the signals masked while the callback is executed, can I
> use sockets, is it guaranteed that the callback is going to be called
> in the same process (I guess so, but the text doesn't explicitly
> promise that), etc.

There is the following above pgaio_io_register_callbacks()

 * Note that callbacks are executed in critical sections.  This is necessary
 * to be able to execute IO in critical sections (consider e.g. WAL
 * logging). To perform AIO we first need to acquire a handle, which, if there
 * are no free handles, requires waiting for IOs to complete and to execute
 * their completion callbacks.
 *
 * Callbacks may be executed in the issuing backend but also in another
 * backend (because that backend is waiting for the IO) or in IO workers (if
 * io_method=worker is used).

And also a bunch of detail along struct PgAioHandleCallbacks.


> 9.
>
> ```
> +Because acquisition of an IO handle
> +[must always succeed](#io-can-be-started-in-critical-sections)
> +and the number of AIO Handles
> +[has to be limited](#state-for-aio-needs-to-live-in-shared-memory)
> +AIO handles can be reused as soon as they have completed.
> ```
>
> What pgaio_io_acquire() does if we are out of AIO Handles? Since it
> always succeeds I guess it should block the caller in this case, but
> IMO we should say this explicitly.

That's documented above pgai

Re: pgsql: Add support for OAUTHBEARER SASL mechanism

2025-04-01 Thread Daniel Gustafsson
> On 1 Apr 2025, at 02:06, Jacob Champion  
> wrote:
> 
> On Mon, Mar 31, 2025 at 4:09 PM Jacob Champion
>  wrote:
>> I don't have hurd-amd64 to test, but I'm working on a patch that will
>> build and pass tests if I manually munge pg_config.h. We were skipping
>> the useless tests via a $windows_os check; I think I should use
>> check_pg_config() instead.
> 
> Proposed fix attached.

Thanks, I agree that this is the right fix.  While this is all subject to
change, I will go ahead with this patch in the meantime to make the tree
properly buildable.

--
Daniel Gustafsson





Re: Better HINT message for "unexpected data beyond EOF"

2025-04-01 Thread Robert Haas
On Tue, Apr 1, 2025 at 7:13 AM Jakub Wartak
 wrote:
> Thread bump. So we have the following candidates:
>
> 1. remove it as Andres stated:
> ERROR:  unexpected data beyond EOF in block 1472 of relation base/5/16387
>
> 2a. Robert's idea
> ERROR:  unexpected data beyond EOF in block 1472 of relation base/5/16387
> HINT:  This has been observed with PostgreSQL files being overwritten.
>
> 2b. Christoph's idea
> ERROR:  unexpected data beyond EOF in block 1472 of relation base/5/16387
> HINT:  Did anything besides PostgreSQL touch that file?

I don't think I proposed that exact phrasing - I prefer (2b) over
(2a), although I would replace "besides" with "other than".

> Another question is should we back-patch this? I believe we should (?)

I don't think this qualifies as a bug. The current wording isn't
factually wrong, just unhelpful. Even if it were wrong, we need a
pretty good reason to change message strings in a stable branch,
because that can break things for users who are grepping for the
current string (or a translation thereof). If an overwhelming
consensus in favor of back-patching emerges, fine, but my gut feeling
is that back-patching will make more people sad than it makes happy.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Fix slot synchronization with two_phase decoding enabled

2025-04-01 Thread Amit Kapila
On Mon, Mar 31, 2025 at 5:04 PM Zhijie Hou (Fujitsu)
 wrote:
>
> Thanks for the comments, they have been addressed in V2.
>

In the PG17 patch, we only have a check preventing failover and
two_phase in CreateSubscription. Don't we need a similar check for
AlterSubscription?

Apart from the above, I have a few suggestions for changes in docs,
error messages, and comments for both versions. See attached.

-- 
With Regards,
Amit Kapila.
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 141c140331d..bff0d143ac5 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2566,7 +2566,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
   
The address (LSN) from which the decoding of prepared
-   transactions is enabled. Always NULL for physical
+   transactions is enabled. NULL for physical
slots.
   
  
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl 
b/src/test/recovery/t/040_standby_failover_slots_sync.pl
index 67cc6374565..19273a49914 100644
--- a/src/test/recovery/t/040_standby_failover_slots_sync.pl
+++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl
@@ -896,9 +896,9 @@ is($result, 't',
 # Promote the standby1 to primary. Confirm that:
 # a) the slot 'lsub1_slot' and 'snap_test_slot' are retained on the new primary
 # b) logical replication for regress_mysub1 is resumed successfully after 
failover
-# c) changes from the transaction 'test_twophase_slotsync', which was prepared
-#on the old primary, can be consumed from the synced slot 'snap_test_slot'
-#once committed on the new primary.
+# c) changes from the transaction prepared 'test_twophase_slotsync' can be
+#consumed from the synced slot 'snap_test_slot' once committed on the new
+#primary.
 # d) changes can be consumed from the synced slot 'snap_test_slot'
 ##
 $primary->wait_for_replay_catchup($standby1);
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 8308ccaad5a..df7ab827f7d 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -657,7 +657,7 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
if (opts.twophase && opts.failover)
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-   errmsg("cannot enable failover option when 
two_phase option is enabled"));
+   errmsg("failover and two_phase are mutually 
exclusive options"));
 
/*
 * If built with appropriate switch, whine when regression-testing
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index c1bbd16254e..da510f60f9c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -357,7 +357,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
if (two_phase)
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-   errmsg("cannot enable failover for a 
replication slot that enables two-phase decoding"));
+   errmsg("failover and two_phase are 
mutually exclusive options"));
}
 
/*
@@ -873,7 +873,7 @@ ReplicationSlotAlter(const char *name, bool failover)
if (failover && MyReplicationSlot->data.two_phase)
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-   errmsg("cannot enable failover for a 
replication slot that enables two-phase decoding"));
+   errmsg("cannot enable failover for a two-phase 
enabled replication slot"));
 
if (MyReplicationSlot->data.failover != failover)
{


Re: Better HINT message for "unexpected data beyond EOF"

2025-04-01 Thread Robert Haas
On Tue, Apr 1, 2025 at 9:54 AM Christoph Berg  wrote:
> Re: Robert Haas
> > > Another question is should we back-patch this? I believe we should (?)
> > I don't think this qualifies as a bug. The current wording isn't
> > factually wrong, just unhelpful. Even if it were wrong, we need a
> > pretty good reason to change message strings in a stable branch,
> > because that can break things for users who are grepping for the
> > current string (or a translation thereof). If an overwhelming
> > consensus in favor of back-patching emerges, fine, but my gut feeling
> > is that back-patching will make more people sad than it makes happy.
>
> It's only the HINT part. If I were to grep/search for the message, I
> would definitely use the message part.

I'm sure you would, but you're very smart.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: RFC: Logging plan of the running query

2025-04-01 Thread torikoshia

On 2025-04-01 21:32, Robert Haas wrote:
Thanks for your comment.

On Mon, Mar 31, 2025 at 9:43 PM torikoshia  
wrote:
Previously, Rafael proposed a patch in this thread that added 
execution

progress tracking. However, I felt that expanding the scope could make
it harder to get the patch reviewed or committed. So, I suggested 
first
completing a feature that only retrieves the execution plan of a 
running

query, and then developing execution progress tracking afterward[3].


That's reasonable. Comparing the two patches, I think that you have
correctly solved a couple of key problems that Rafael did not handle
correctly. Specifically, I believe you have correctly implemented what
Andres described i.e. use CFI to get control to do the wrapping and
then from the ExecProcNode wrapper do the actual EXPLAIN, whereas I
believe Rafael was doing the wrapping directly from the signal
handler, which did not seem safe to me:

http://postgr.es/m/ca+tgmobrzedep+z1bpqqgnscqtq8m58wnnkjb_8lwpwbqbz...@mail.gmail.com

I also like your version of (sub)transaction abort handling much
better than the memory-context cleanup that Rafael initially chose.
He's since revised that.

That said, I think the view that Rafael proposes, with a
periodically-updating version of the EXPLAIN ANALYZE output up to that
point, could be extremely useful in some situations.

+1.


To make that
work, he mostly just hacked InstrStopNode(), although as I'm thinking
about it, that probably doesn't handle all of the parallel query cases
correctly. I'm somewhat inclined to hope that we eventually end up
with both interfaces.


I was considering whether to introduce a GUC in this patch to allow 
for

prior setup before outputting the plan or to switch to Rafael's patch
after reviewing its details. However, since there isn’t much time left
before the feature freeze, if you have already reviewed Rafael's patch
and there is a chance it could be committed, it would be better to 
focus

on that.


I wasn't feeling very confident about my ability to get that patch
committed before feature freeze. I don't want to rush into something
that we might later regret. I'm going to spend a bit more time
studying your patch next.


That’s really appreciated!
I believe some of the comments in Rafael's thread should be reflected in 
this one as well, but I haven’t incorporated them yet. Apologies for 
that.



--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.




Re: Better HINT message for "unexpected data beyond EOF"

2025-04-01 Thread Christoph Berg
Re: Robert Haas
> > Another question is should we back-patch this? I believe we should (?)
> 
> I don't think this qualifies as a bug. The current wording isn't
> factually wrong, just unhelpful. Even if it were wrong, we need a
> pretty good reason to change message strings in a stable branch,
> because that can break things for users who are grepping for the
> current string (or a translation thereof). If an overwhelming
> consensus in favor of back-patching emerges, fine, but my gut feeling
> is that back-patching will make more people sad than it makes happy.

It's only the HINT part. If I were to grep/search for the message, I
would definitely use the message part.

Christoph




Re: [PATCH] Fix build on MINGW on ARM64

2025-04-01 Thread vignesh C
On Tue, 1 Apr 2025 at 16:02, Andrew Dunstan  wrote:
>
>
> On 2025-04-01 Tu 5:16 AM, vignesh C wrote:
> > On Sun, 2 Feb 2025 at 00:52, Lars Kanis  wrote:
> >> This patch limits the workaround of using __buildin_setjmp on the
> >> Windows MINGW platform. This workaround is only necessary for legacy
> >> MSVCRT based toolchain, but not for UCRT based. It is not available at
> >> all on clang on ARM64 resulting in the following compiler error:
> >>
> >>error: __builtin_longjmp is not supported for the current target
> >>
> >> This patch is used since years in MSYS2 packages:
> >>
> >> https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-postgresql/postgresql-14.0-use-mingw-setjmp-on-ucrt.patch
> >>
> >> It is also used in ruby-pg to allow compiling for
> >> aarch64-w64-windows-gnu: https://github.com/ged/ruby-pg/pull/626/files
> >>
> >> It would be nice if this patch could be merged upstream.
> > Are there any known issues with using __builtin_setjmp? I'm asking
> > because the comment mentions about the long standing issues in its
> > setjmp "However, it seems that MinGW-64 has some longstanding issues
> > in its setjmp support,  so on that toolchain we cheat and use gcc's
> > builtins. Also few users have reported segfaults when using setjmp
> > with MinGW as in [1].
> > [1] - 
> > https://stackoverflow.com/questions/53709069/setjmp-longjmp-in-x86-64-w64-mingw32
> >
>
> That report is from quite a few years ago, so I'm not sure it really helps.
>
> If one of you would add this to the next CF we could see how the CFbot
> reacts to it. In general it looks sane.

There is an existing CF entry for this at [1]. If no one picks this
till the end of this CF, we can move it to next CF.
[1] - https://commitfest.postgresql.org/patch/5610/

Regards,
Vignesh




Re: Index AM API cleanup

2025-04-01 Thread Peter Eisentraut

On 20.03.25 12:59, Peter Eisentraut wrote:

v22-0006-Convert-from-StrategyNumber-to-CompareType.patch

This is all that remains now.  I think with a bit more polishing around 
the edges, some comment updates, etc., this is close to ready.


Here is an updated version of this patch.  I have left out all the extra 
tests and WIP patches etc. from the series for now so that the focus is 
clear.


This patch is mostly unchanged from the above, except some small amount 
of updating comments, as well as the following.


I've done a fair bit of performance testing to make sure there are no 
noticeable regressions from this patch.  I've found that the function 
get_mergejoin_opfamilies() is quite critical to the planning time of 
even simple queries (such as pgbench --select-only), so I played around 
with various caching schemes.  In the end, I just settled on hardcoding 
information about the built-in index AM types.  Which is of course ugly, 
but at least it's essentially the same as before.  If we find other 
affected hotspots, we could apply similar workarounds, but so far I 
haven't found any.
From ec6fa0fbcb1fda3c1d95e0cc22a126da3f78ee69 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 1 Apr 2025 14:41:44 +0200
Subject: [PATCH v23] Convert from StrategyNumber to CompareType

Reduce the number of places that hardcode a btree strategy number by
instead using comparison types.

Generalize predicate tests beyond btree strategies.  Extend routines
which prove logical implications between predicate expressions to be
usable by more than just B-tree index strategies.

Allow non-btree indexes to participate in mergejoin.  Rather than
hardcoding a requirement that the index be of type btree, just require
that it provide ordering support.

A number of places in the code still had references to btree
strategies where not appropriate, so fix those.  Leave untouched such
strategy numbers in nbtree, spgist, and brin indexes, and also in
table partitioning.  Hardcoded hash and btree references in table
partitioning are part of the design of table partitioning; their
replacement, if even a good idea, is beyond the scope of this patch
series.

Author: Mark Dilger 
Co-authored-by: Peter Eisentraut 
Discussion: 
https://www.postgresql.org/message-id/flat/e72eaa49-354d-4c2e-8eb9-255197f55...@enterprisedb.com
---
 contrib/postgres_fdw/deparse.c  |  16 +-
 contrib/postgres_fdw/postgres_fdw.c |   2 +-
 src/backend/executor/execExpr.c |   1 +
 src/backend/executor/nodeIndexscan.c|   3 +
 src/backend/executor/nodeMergejoin.c|   7 +-
 src/backend/optimizer/path/allpaths.c   |  28 ++--
 src/backend/optimizer/path/costsize.c   |   8 +-
 src/backend/optimizer/path/equivclass.c |   3 +-
 src/backend/optimizer/path/indxpath.c   |   9 +-
 src/backend/optimizer/path/pathkeys.c   |  45 +++---
 src/backend/optimizer/plan/createplan.c |  34 ++--
 src/backend/optimizer/util/plancat.c|  39 +++--
 src/backend/optimizer/util/predtest.c   | 119 +++---
 src/backend/parser/parse_clause.c   |   4 +-
 src/backend/parser/parse_expr.c |  34 ++--
 src/backend/partitioning/partprune.c|  13 +-
 src/backend/utils/adt/selfuncs.c| 133 
 src/backend/utils/cache/lsyscache.c | 196 
 src/backend/utils/sort/sortsupport.c|   6 +-
 src/include/nodes/pathnodes.h   |  12 +-
 src/include/optimizer/paths.h   |   2 +-
 src/include/utils/lsyscache.h   |  15 +-
 src/include/utils/selfuncs.h|   2 +-
 src/tools/pgindent/typedefs.list|   2 +-
 24 files changed, 399 insertions(+), 334 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6e7dc3d2df9..d9970dd6753 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -3969,17 +3969,17 @@ appendOrderByClause(List *pathkeys, bool has_final_sort,
appendStringInfoString(buf, ", ");
 
/*
-* Lookup the operator corresponding to the strategy in the 
opclass.
-* The datatype used by the opfamily is not necessarily the 
same as
-* the expression type (for array types for example).
+* Lookup the operator corresponding to the compare type in the
+* opclass. The datatype used by the opfamily is not 
necessarily the
+* same as the expression type (for array types for example).
 */
-   oprid = get_opfamily_member(pathkey->pk_opfamily,
-   
em->em_datatype,
-   
em->em_datatype,
-   
pathkey->pk_strategy);
+   oprid = get_opfamily_member_for_cmptype(pathkey->pk_opfamily,
+

Re: Using read stream in autoprewarm

2025-04-01 Thread Nazir Bilal Yavuz
Hi,

On Tue, 1 Apr 2025 at 05:14, Melanie Plageman  wrote:
>
> On Mon, Mar 31, 2025 at 3:45 PM Melanie Plageman
>  wrote:
> >
> > Whoops, this isn't right. It does work. I'm going to draft a version
> > suggesting slightly different variable naming and a couple comments to
> > make this more clear.
>
> Okay, so after further study, I think there are multiple issues still
> with the code. We could end up comparing a blocknumber to nblocks
> calculated from a different fork. To address this, you'll need to keep
> track of the last fork_number. At that point, you kind of have to
> bring back old_blk -- because that is what we are recreating with
> multiple local variables.

I am attaching v8, which is an updated version of the v7. I tried to
get rid of these local variables and refactored code to make logic
more straightforward instead of going back and forth.

0001 and 0002 are v8. 0003 is another refactoring attempt to make code
more straightforward. I did not squash 0003 to previous patches as you
might not like it.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


v8-0001-Optimize-autoprewarm-with-read-streams.patch
Description: Binary data


v8-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch
Description: Binary data


v8-0003-Refactor-code-more.patch
Description: Binary data


Re: pgsql: Add support for OAUTHBEARER SASL mechanism

2025-04-01 Thread Christoph Berg
Re: Jacob Champion
> (That means that Windows builds --with-libcurl are similarly broken, I
> think. Not that Windows packagers will want to use --with-libcurl --
> it doesn't do anything -- but it should build.)

Does --with-libcurl still do anything useful if this feature test
fails? From what you are saying, the answer is "no", and I can see
more "not on this platform" error messages in other callbacks.

This should be documented in doc/src/sgml/installation.sgml.

> We could change how this works a bit for the proposed libpq-oauth.so
> plugin, and only build it if we have a workable implementation. I do
> like having these other platforms compile the Curl code, though, since
> we'd prefer to keep the build clean for a future Windows
> implementation...

I would prefer to get an error from configure if the feature doesn't
do anything on my platform. The current way is confusing. If future
users of libcurl change that, the configure test can still be changed.

With the libpq-oauth split, this makes even more sense because
building a library that always throws an error isn't very useful.
(Don't build that file at all if the feature doesn't work.)

Since oauth/curl have some security implications, would it make more
sense to call the switch --enable-oauth (-Doauth) so users could
control better what features their libpq is going to have? Perhaps
some other feature (pg_service as URL?) is going to need libcurl as
well, but it should be configurable separately.

Christoph




Re: Test to dump and restore objects left behind by regression

2025-04-01 Thread Ashutosh Bapat
On Tue, Apr 1, 2025 at 11:52 AM Alvaro Herrera  wrote:
>
> On 2025-Mar-31, Daniel Gustafsson wrote:
>
> > Given where we are in the cycle, it seems to make sense to stick to using 
> > the
> > schedule we already have rather than invent a new process for generating it,
> > and work on that for 19?
>
> No objections to that.  I'll see about getting this committed during my
> morning today, so that I have plenty of time to watch the buildfarm.

Thanks Alvaro.
Just today morning, I found something which looks like another bug in
statistics dump/restore [1]. As Daniel has expressed upthread [2], we
should go ahead and commit the test even if the bug is not fixed. But
in case it creates a lot of noise and makes the build farm red, we
could suppress the failure by not dumping statistics for comparison
till the bug is fixed. PFA patchset which reintroduces 0003 which
suppresses the statistics dump - in case we think it's needed. I have
made some minor cosmetic changes to 0001 and 0002 as well.

I will also watch buildfarm too, once you commit the patch.

[1] 
https://www.postgresql.org/message-id/CAExHW5sFOgcUkVtZ8=QCAE+jv=sbndbkq0xzcnjth7019zm...@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
From 28a146c7cdaf581d889cec90f541bb42046e7904 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 24 Mar 2025 11:21:12 +0530
Subject: [PATCH 2/3] Use only one format and make the test run default

According to Alvaro (and I agree with him), the test should be run by
default. Otherwise we get to know about a bug only after buildfarm
animal where it's enabled reports a failure. Further testing only one
format may suffice; since all the formats have shown the same bugs till
now.

If we use --directory format we can use -j which reduces the time taken
by dump/restore test by about 12%.

This patch removes PG_TEST_EXTRA option as well as runs the test only in
directory format with parallelism enabled.

Note for committer: If we decide to accept this change, it should be
merged with the previous commit.
---
 doc/src/sgml/regress.sgml  | 12 -
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 75 +-
 2 files changed, 24 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 237b974b3ab..0e5e8e8f309 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -357,18 +357,6 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
   
  
 
-
-
- regress_dump_test
- 
-  
-   When enabled, src/bin/pg_upgrade/t/002_pg_upgrade.pl
-   tests dump and restore of regression database left behind by the
-   regression run. Not enabled by default because it is time and resource
-   consuming.
-  
- 
-

 
Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 8d22d538529..71dc25ca938 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -268,16 +268,9 @@ else
 	# should be done in a separate TAP test, but doing it here saves us one full
 	# regression run.
 	#
-	# This step takes several extra seconds and some extra disk space, so
-	# requires an opt-in with the PG_TEST_EXTRA environment variable.
-	#
 	# Do this while the old cluster is running before it is shut down by the
 	# upgrade test.
-	if (   $ENV{PG_TEST_EXTRA}
-		&& $ENV{PG_TEST_EXTRA} =~ /\bregress_dump_test\b/)
-	{
-		test_regression_dump_restore($oldnode, %node_params);
-	}
+	test_regression_dump_restore($oldnode, %node_params);
 }
 
 # Initialize a new node for the upgrade.
@@ -590,53 +583,33 @@ sub test_regression_dump_restore
 	$dst_node->append_conf('postgresql.conf', 'autovacuum = off');
 	$dst_node->start;
 
-	# Test all formats one by one.
-	for my $format ('plain', 'tar', 'directory', 'custom')
-	{
-		my $dump_file = "$tempdir/regression_dump.$format";
-		my $restored_db = 'regression_' . $format;
-
-		# Use --create in dump and restore commands so that the restored
-		# database has the same configurable variable settings as the original
-		# database and the plain dumps taken for comparsion do not differ
-		# because of locale changes. Additionally this provides test coverage
-		# for --create option.
-		$src_node->command_ok(
-			[
-'pg_dump', "-F$format", '--no-sync',
-'-d', $src_node->connstr('regression'),
-'--create', '-f', $dump_file
-			],
-			"pg_dump on source instance in $format format");
+	my $dump_file = "$tempdir/regression.dump";
 
-		my @restore_command;
-		if ($format eq 'plain')
-		{
-			# Restore dump in "plain" format using `psql`.
-			@restore_command = [ 'psql', '-d', 'postgres', '-f', $dump_file ];
-		}
-		else
-		{
-			@restore_command = [
-'pg_restore', '--create',
-'-d', 'postgres', $dump_file
-			];
-		}
-		$dst_node->command_ok(@restore_command,
-			"restored dump taken in $format format 

pg_recvlogical cannot create slots with failover=true

2025-04-01 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

While reviewing threads I found $SUBJECT.

I think it does not affect to output, but I could not find strong reasons not to
support it. Also, this can be used for testing purpose, i.e., DBAs can verify 
the
slot-sync can work on their system by only using pg_recvlogical.

Attached patch implements it. Since -f/-F option has already been used, -O was
chosen for the short-version. Better idea is very welcomed.

How do you feel?

Best regards,
Hayato Kuroda
FUJITSU LIMITED



0001-Allow-pg_recvlogical-to-create-slots-with-failover-t.patch
Description:  0001-Allow-pg_recvlogical-to-create-slots-with-failover-t.patch


Re: [PATCH] Fix build on MINGW on ARM64

2025-04-01 Thread vignesh C
On Sun, 2 Feb 2025 at 00:52, Lars Kanis  wrote:
>
> This patch limits the workaround of using __buildin_setjmp on the
> Windows MINGW platform. This workaround is only necessary for legacy
> MSVCRT based toolchain, but not for UCRT based. It is not available at
> all on clang on ARM64 resulting in the following compiler error:
>
>   error: __builtin_longjmp is not supported for the current target
>
> This patch is used since years in MSYS2 packages:
>   
> https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-postgresql/postgresql-14.0-use-mingw-setjmp-on-ucrt.patch
>
> It is also used in ruby-pg to allow compiling for
> aarch64-w64-windows-gnu: https://github.com/ged/ruby-pg/pull/626/files
>
> It would be nice if this patch could be merged upstream.

Are there any known issues with using __builtin_setjmp? I'm asking
because the comment mentions about the long standing issues in its
setjmp "However, it seems that MinGW-64 has some longstanding issues
in its setjmp support,  so on that toolchain we cheat and use gcc's
builtins. Also few users have reported segfaults when using setjmp
with MinGW as in [1].
[1] - 
https://stackoverflow.com/questions/53709069/setjmp-longjmp-in-x86-64-w64-mingw32

Regards,
Vignesh




Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).

2025-04-01 Thread vignesh C
On Tue, 1 Apr 2025 at 08:43, jian he  wrote:
>
> On Mon, Mar 31, 2025 at 11:27 PM Fujii Masao
>  wrote:
> >
> > Regarding the patch, here are some review comments:
> >
> > +   errmsg("cannot copy from 
> > materialized view when the materialized view is not populated"),
> >
> > How about including the object name for consistency with
> > other error messages in BeginCopyTo(), like this?
> >
> > errmsg("cannot copy from unpopulated materialized view \"%s\"",
> >RelationGetRelationName(rel)),
> >
> >
> > +   errhint("Use the REFRESH 
> > MATERIALIZED VIEW command populate the materialized view first."));
> >
> > There seems to be a missing "to" just after "command".
> > Should it be "Use the REFRESH MATERIALIZED VIEW command to
> > populate the materialized view first."? Or we could simplify
> > the hint to match what SELECT on an unpopulated materialized
> > view logs: "Use the REFRESH MATERIALIZED VIEW command.".
> >
> based on your suggestion, i changed it to:
>
> if (!RelationIsPopulated(rel))
> ereport(ERROR,
> errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("cannot copy from unpopulated
> materialized view \"%s\"",
> RelationGetRelationName(rel)),
> errhint("Use the REFRESH MATERIALIZED VIEW
> command to populate the materialized view first."));
>
>
> >
> > The copy.sgml documentation should clarify that COPY TO can
> > be used with a materialized view only if it is populated.
> >
> "COPY TO can be used only with plain tables, not views, and does not
> copy rows from child tables or child partitions"
> i changed it to
> "COPY TO can be used with plain tables and materialized views, not
> regular views, and does not copy rows from child tables or child
> partitions"
>
> Another alternative wording I came up with:
> "COPY TO can only be used with plain tables and materialized views,
> not regular views. It also does not copy rows from child tables or
> child partitions."

One thing I noticed was that if the materialized view is not refreshed
user will get stale data:
postgres=# create table t1(c1 int);
CREATE TABLE
postgres=# create materialized view mv2 as select * from t1;
SELECT 0

postgres=# insert into t1 values(10);
INSERT 0 1
postgres=# select * from t1;
 c1

 10
(1 row)

-- Before refresh the data will not be selected
postgres=# copy mv2 to stdout with (header);
c1

-- After refresh the data will be available
postgres=# refresh materialized view mv2;
REFRESH MATERIALIZED VIEW
postgres=# copy mv2 to stdout with (header);
c1
10

Should we document this?

The following can be changed to keep it consistent:
+copy matview1(id) TO stdout with (header);
+copy matview2 TO stdout with (header);
To:
COPY matview1(id) TO stdout with (header);
COPY matview2 TO stdout  with (header);

Regards,
Vignesh




Re: PRI?64 vs Visual Studio (2022)

2025-04-01 Thread Peter Eisentraut

On 31.03.25 08:28, Kyotaro Horiguchi wrote:

If you're already aware of this and have taken it into account, please
feel free to ignore this.

As described in the recent commit a0ed19e0a9e, many %ll? format
specifiers are being replaced with %.

I hadn’t paid much attention to this before, but I happened to check
how this behaves on Windows, and it seems that with VS2022, PRId64
expands to "%lld". As a result, I suspect the gettext message catalog
won't match these messages correctly.


I think this is working correctly.  Gettext has a built-in mechanism to 
translate the % back to the appropriate  %lld or %ld.  See also 
.






Re: Better HINT message for "unexpected data beyond EOF"

2025-04-01 Thread Jakub Wartak
On Thu, Mar 27, 2025 at 4:00 PM Christoph Berg  wrote:
>
> Re: Robert Haas
> > I think that would be better than what we have now, but I still wonder
> > if we should give some kind of a hint that an external process may be
> > doing something to that file. Jakub and I may be biased by having just
> > seen a case of exactly that in the field, but I wonder now how many
> > 'data beyond EOF' messages are exactly that -- and it's not like the
> > user is going to guess that 'data beyond EOF' might mean that such a
> > thing occurred.
>
> HINT:  Did anything besides PostgreSQL touch that file?

Thread bump. So we have the following candidates:

1. remove it as Andres stated:
ERROR:  unexpected data beyond EOF in block 1472 of relation base/5/16387

2a. Robert's idea
ERROR:  unexpected data beyond EOF in block 1472 of relation base/5/16387
HINT:  This has been observed with PostgreSQL files being overwritten.

2b. Christoph's idea
ERROR:  unexpected data beyond EOF in block 1472 of relation base/5/16387
HINT:  Did anything besides PostgreSQL touch that file?

Anything else? #1 has one advantage that we don't need to provide 11
translations inside src/backend/po/*.po (I could use google translate
when proposing patch, but I do not take any responsibility for what it
generates ;))

Another question is should we back-patch this? I believe we should (?)

-J.




Re: Statistics Import and Export

2025-04-01 Thread Jeff Davis
On Mon, 2025-03-31 at 13:39 -0400, Robert Haas wrote:
> +1. I think I said this before, but I don't think it's correct to
> regard the statistics as part of the database. It's great for
> pg_upgrade to preserve them, but I think doing so for a regular dump
> should be opt-in.

I'm confused about the timing of this message -- we already have an
Open Item for 18 to make this decision. After commit bde2fb797a,
changing the default is a one-line change, so there's no technical
problem.

I thought the general plan was to decide during beta. Would you like to
make the decision now for some reason?

Regards,
Jeff Davis





RE: Fix 035_standby_logical_decoding.pl race conditions

2025-04-01 Thread Hayato Kuroda (Fujitsu)
Dear Bertrand,

> > > s/to avoid the seeing a xl_running_xacts/to avoid seeing a 
> > > xl_running_xacts/?
> >
> > Fixed.

Sorry, I misunderstood your comment and wrongly fixed. I will address in next 
version.

> === 1
> 
> +* XXX What value should we return here? Originally this
> returns the
> +* inserted location of RUNNING_XACT record. Based on that,
> here
> +* returns the latest insert location for now.
> +*/
> +   return GetInsertRecPtr();
> 
> Looking at the LogStandbySnapshot() that are using the output lsn, i.e:
> 
> pg_log_standby_snapshot()
> BackgroundWriterMain()
> ReplicationSlotReserveWal()
> 
> It looks ok to me to use GetInsertRecPtr().
> 
> But if we "really" want to produce a "new" WAL record, what about using
> LogLogicalMessage()? It could also be used for debugging purpose. Bonus point:
> it does not need wal_level to be set to logical. Thoughts?

Right. Similarly, an SQL function pg_logical_emit_message() is sometimes used 
for
the testing purpose, advance_wal() and emit_wal( in Cluster.pm. Even so, we have
not found the use-case yet, thus I want to retain now and will update based on
the future needs.

I'll investigate another point [1] and then will post new version.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1%2Bx5-eOn5%2BMW6FiUjB_1bBCH8jCCARC1uMrx6erZ3J73w%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED 





Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Aleksander Alekseev
Hi Peter,

> I'm now very close to committing everything. Though I do still want
> another pair of eyes on the newer
> 0003-Improve-skip-scan-primitive-scan-scheduling.patch stuff before
> commiting (since I still intend to commit all the remaining patches
> together).

Can you think of any tests specifically for 0003, or relying on the
added Asserts() is best we can do? Same question for 0002.

I can confirm that v33 applies and passes the test.

0002 adds _bt_set_startikey() to nbtutils.c but it is not well-covered
by tests, many branches of the new code are never executed.

```
@@ -2554,9 +2865,20 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
  */
 if (requiredSameDir)
 *continuescan = false;
+else if (unlikely(key->sk_flags & SK_BT_SKIP))
+{
+/*
+ * If we're treating scan keys as nonrequired, and encounter a
+ * skip array scan key whose current element is NULL, then it
+ * must be a non-range skip array
+ */
+Assert(forcenonrequired && *ikey > 0);
+return _bt_advance_array_keys(scan, NULL, tuple, tupnatts,
+  tupdesc, *ikey, false);
+}
```

This branch is also never executed during the test run.

In 0003:

```
@@ -2006,6 +2008,10 @@ _bt_advance_array_keys(IndexScanDesc scan,
BTReadPageState *pstate,
 else if (has_required_opposite_direction_only && pstate->finaltup &&
  unlikely(!_bt_oppodir_checkkeys(scan, dir, pstate->finaltup)))
 {
+/*
+ * Make sure that any SAOP arrays that were not marked required by
+ * preprocessing are reset to their first element for this direction
+ */
 _bt_rewind_nonrequired_arrays(scan, dir);
 goto new_prim_scan;
 }
```

This branch is never executed too. This being said, technically there
is no new code here.

For your convenience I uploaded a complete HTML code coverage report
(~36 Mb) [1].

[1]: 
https://drive.google.com/file/d/1breVpHapvJLtw8AlFAdXDAbK8ZZytY6v/view?usp=sharing

--
Best regards,
Aleksander Alekseev




Re: Statistics Import and Export

2025-04-01 Thread Robert Haas
On Mon, Mar 31, 2025 at 6:04 PM Jeff Davis  wrote:
> On Mon, 2025-03-31 at 13:39 -0400, Robert Haas wrote:
> > +1. I think I said this before, but I don't think it's correct to
> > regard the statistics as part of the database. It's great for
> > pg_upgrade to preserve them, but I think doing so for a regular dump
> > should be opt-in.
>
> I'm confused about the timing of this message -- we already have an
> Open Item for 18 to make this decision. After commit bde2fb797a,
> changing the default is a one-line change, so there's no technical
> problem.
>
> I thought the general plan was to decide during beta. Would you like to
> make the decision now for some reason?

I don't think I was aware of the open item; I was just catching up on
email. But I also don't really see the value of waiting until beta to
make this decision. I seriously doubt that my opinion is going to
change. Maybe other people's will, though: I can only speak for
myself.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: AIO writes vs hint bits vs checksums

2025-04-01 Thread Nico Williams
On Wed, Oct 30, 2024 at 02:16:51PM +0200, Heikki Linnakangas wrote:
> Acquiring the exclusive lock in step 4 is just a way to wait for all the
> existing share-lockers to release the lock. You wouldn't need to block new
> share-lockers. We have LW_WAIT_UNTIL_FREE, which is almost what we need, but
> it currently ignores share-lockers. So doing this "properly" would require
> more changes to LWLocks. Briefly acquiring an exclusive lock seems
> acceptable though.

The problem is starvation.  For this you really want something more like
rwlocks that do not have the writer starvation problem.  But rwlocks
have other problems too, like in this case forcing readers to wait.

What you want here is something more like a barrier where readers that
did not see that the page has BM_IO_IN_PROGRESS set get to act as though
it's not set while readers that did see that the page has
BM_IO_IN_PROGRESS set don't, and the process that did set that bit gets
to wait for the first set of readers all without blocking the second set
of readers.  That's something akin to an rwlock, but better -- in fact,
it resembles RCU.

Nico
-- 




Re: Fix 035_standby_logical_decoding.pl race conditions

2025-04-01 Thread Bertrand Drouvot
Hi,

On Tue, Apr 01, 2025 at 04:55:06PM +0530, Amit Kapila wrote:
> On Tue, Apr 1, 2025 at 2:02 PM Bertrand Drouvot
>  wrote:
> 
> > But if we "really" want to produce a "new" WAL record, what about using
> > LogLogicalMessage()?
> >
> 
> We are using injection points for testing purposes, which means the
> caller is aware of skipping the running_xacts record during the test
> run. So, there doesn't seem to be any reason to do anything extra.

Agree, the idea was to provide extra debugging info for the tests. We can just
keep it in mind should we have a need for.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Memoize ANTI and SEMI JOIN inner

2025-04-01 Thread Richard Guo
On Mon, Mar 31, 2025 at 7:33 PM Andrei Lepikhov  wrote:
> and I don't get the case. As I see, ANTI/SEMI join just transforms to
> the regular join and it is still not the case. May you be more specific?

Upthread, you said that a qual contained in ppi_clauses will also be
included in extra->restrictlist.  I provided this example to show that
this is not true.  And it seems to me that this discrepancy makes the
check I mentioned earlier not reliable in all cases.  As I explained
earlier, this check could pass even if a restriction clause isn't
parameterized, as long as another join clause, which doesn't belong to
the current join, is included in ppi_clauses.  This is not correct.

This isn't about your patch, but about the master code.  However, if
this code is incorrect, your patch will also behave incorrectly, since
you patch relies on and extends this check.

Thanks
Richard




Re: TEMP_CONFIG vs test_aio

2025-04-01 Thread Noah Misch
On Tue, Apr 01, 2025 at 03:42:44PM -0400, Andres Freund wrote:
> The reason for the failure is simple, the buildfarm animal specifies
> io_method=io_uring (thanks to "cookt" for setting that up so quickly, whoever
> you are :)) and the test is assuming that the -c io_method=... it passes to
> initdb is actually going to be used, but it's overwritten by the TEMP_CONFIG.

> The reason that the test passes -c io_method= to initdb is that I want to
> ensure initdb passes with all the supported io_methods.  That still happens
> with TEMP_CONFIG specified, it's just afterwards over-written.
> 
> I could just append io_method=$io_method again after $node->init(), but then I

That would be the standard way.  Here's the Cluster.pm comment that tries to
articulate the vision:

# If a setting tends to affect whether tests pass or fail, print it 
after
# TEMP_CONFIG.  Otherwise, print it before TEMP_CONFIG, thereby 
permitting
# overrides.  Settings that merely improve performance or ease debugging
# belong before TEMP_CONFIG.

Since anything initdb adds is "before TEMP_CONFIG", we have this outcome.

> couldn't verify that initdb actually ran with the to-be-tested io method.
> 
> 
> Does anybody have a good suggestion for how to fix this?
> 
> 
> The least bad idea I can think of is for the test to check if
> PostgreSQL::Test::Utils::slurp_file($ENV{TEMP_CONFIG}) contains the string
> io_method and to append the io_method again to the config if it does.  But
> that seems rather ugly.
> 
> 
> Does anybody have a better idea?

Options:

1. Append, as you mention above

2. Slurp TEMP_CONFIG, as you mention above

3. Slurp postgresql.conf, and fail a test if it doesn't contain io_method.
   Then append.  If initdb fails to insert io_method, the test will catch it.

4. Run initdb outside of Cluster.pm control, then discard it

My preference order would be roughly 1,3,2,4.  The fact that initdb creates a
data directory configured for a particular io_method doesn't prove that initdb
ran with that method, so I don't see 2-4 as testing a lot beyond 1.




Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-04-01 Thread Alvaro Herrera
On 2025-Mar-28, jian he wrote:

> ATPrepAddPrimaryKey
> + if (!conForm->convalidated)
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("not-null constraint \"%s\" of table \"%s\" has not been validated",
> +   NameStr(conForm->conname),
> +   RelationGetRelationName(rel)),
> + errhint("You will need to use ALTER TABLE ... VALIDATE CONSTRAINT to
> validate it."));
> 
> I think the error message is less helpful.
> Overall, I think we should say that:
> to add the primary key on column x requires a validated not-null
> constraint on column x.

I think you're right that this isn't saying what the problem is; we
should be saying something like

ERROR:  cannot add primary key because of invalid not-null constraint 
"the_constr"
HINT:  You will need to use ALTER TABLE .. VALIDATE CONSTRAINT to validate it.

> 
> i think your patch messed up with pg_constraint.conislocal.
> for example:
> 
> CREATE TABLE parted (id bigint default 1,id_abc bigint) PARTITION BY LIST 
> (id);
> alter TABLE parted add CONSTRAINT dummy_constr not null id not valid;
> CREATE TABLE parted_1 (id bigint default 1,id_abc bigint);
> alter TABLE parted_1 add CONSTRAINT dummy_constr not null id;
> ALTER TABLE parted ATTACH PARTITION parted_1 FOR VALUES IN ('1');

It's still not clear to me what to do to fix this problem.  But while
trying to understand it, I had the chance to rework the pg_dump code
somewhat, so here it is.  Feel free to propose fixes on top of this.
(BTW, I think the business of assigning to tbinfo->checkexprs both the
block for check constraints and the one for not-null constraints is
bogus.  I didn't find what this breaks, but it looks wrong.  We probably
need another struct _constraintInfo pointer in TableInfo.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"¿Cómo puedes confiar en algo que pagas y que no ves,
y no confiar en algo que te dan y te lo muestran?" (Germán Poo)
>From c0109712638be88ba94a12fbc6d20ecfd956f121 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Tue, 1 Apr 2025 22:13:29 +0200
Subject: [PATCH v6] NOT NULL NOT VALID

Author: Rushabh Lathia 
Author: Jian He 
Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-rt...@mail.gmail.com
---
 contrib/postgres_fdw/postgres_fdw.c   |  26 ++-
 doc/src/sgml/catalogs.sgml|  11 +-
 doc/src/sgml/ref/alter_table.sgml |   8 +-
 src/backend/access/common/tupdesc.c   |  11 +-
 src/backend/bootstrap/bootstrap.c |   3 +
 src/backend/catalog/genbki.pl |   3 +
 src/backend/catalog/heap.c|  19 +-
 src/backend/catalog/pg_constraint.c   |  52 +++--
 src/backend/commands/tablecmds.c  | 242 +++---
 src/backend/executor/execMain.c   |   1 +
 src/backend/jit/llvm/llvmjit_deform.c |  10 +-
 src/backend/optimizer/util/plancat.c  |   6 +-
 src/backend/parser/gram.y |   5 +-
 src/backend/parser/parse_utilcmd.c|  14 ++
 src/backend/utils/cache/catcache.c|   1 +
 src/backend/utils/cache/relcache.c|   3 +-
 src/bin/pg_dump/pg_dump.c | 190 +++--
 src/bin/pg_dump/pg_dump.h |   5 +-
 src/bin/psql/describe.c   |   9 +-
 src/include/access/tupdesc.h  |   9 +-
 src/include/catalog/pg_attribute.h|   5 +-
 src/include/catalog/pg_constraint.h   |   4 +-
 src/test/regress/expected/alter_table.out |  69 ++
 src/test/regress/expected/constraints.out | 197 ++
 src/test/regress/sql/alter_table.sql  |  14 ++
 src/test/regress/sql/constraints.sql  | 151 ++
 26 files changed, 970 insertions(+), 98 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index ac14c06c715..e9296e2d4bd 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5549,7 +5549,15 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 			   "SELECT relname, "
 			   "  attname, "
 			   "  format_type(atttypid, atttypmod), "
-			   "  attnotnull, "
+			   "  attnotnull, ");
+
+		/* NOT VALID NOT NULL columns are supported since Postgres 18 */
+		if (PQserverVersion(conn) >= 18)
+			appendStringInfoString(&buf, "attnotnullvalid, ");
+		else
+			appendStringInfoString(&buf, "attnotnull AS attnotnullvalid, ");
+
+		appendStringInfoString(&buf,
 			   "  pg_get_expr(adbin, adrelid), ");
 
 		/* Generated columns are supported since Postgres 12 */
@@ -5651,6 +5659,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 char	   *attname;
 char	   *typename;
 char	   *attnotnull;
+char	   *attnotnullvalid;
 char	   *attgenerated;
 char	   *attdefault;
 char	   *collname;
@@ -5663,14 +5672,15 @@ po

Re: [PATCH] Fix build on MINGW on ARM64

2025-04-01 Thread Andrew Dunstan



On 2025-04-01 Tu 11:15 AM, Andrew Dunstan wrote:


On 2025-04-01 Tu 8:47 AM, vignesh C wrote:

On Tue, 1 Apr 2025 at 16:02, Andrew Dunstan  wrote:


On 2025-04-01 Tu 5:16 AM, vignesh C wrote:
On Sun, 2 Feb 2025 at 00:52, Lars Kanis  
wrote:

This patch limits the workaround of using __buildin_setjmp on the
Windows MINGW platform. This workaround is only necessary for legacy
MSVCRT based toolchain, but not for UCRT based. It is not 
available at

all on clang on ARM64 resulting in the following compiler error:

    error: __builtin_longjmp is not supported for the current target

This patch is used since years in MSYS2 packages:
https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-postgresql/postgresql-14.0-use-mingw-setjmp-on-ucrt.patch

It is also used in ruby-pg to allow compiling for
aarch64-w64-windows-gnu: 
https://github.com/ged/ruby-pg/pull/626/files


It would be nice if this patch could be merged upstream.

Are there any known issues with using __builtin_setjmp? I'm asking
because the comment mentions about the long standing issues in its
setjmp "However, it seems that MinGW-64 has some longstanding issues
in its setjmp support,  so on that toolchain we cheat and use gcc's
builtins. Also few users have reported segfaults when using setjmp
with MinGW as in [1].
[1] - 
https://stackoverflow.com/questions/53709069/setjmp-longjmp-in-x86-64-w64-mingw32


That report is from quite a few years ago, so I'm not sure it really 
helps.


If one of you would add this to the next CF we could see how the CFbot
reacts to it. In general it looks sane.

There is an existing CF entry for this at [1]. If no one picks this
till the end of this CF, we can move it to next CF.
[1] - https://commitfest.postgresql.org/patch/5610/




Somehow I missed that. OK, looks good, will commit shortly.



Done


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: TEMP_CONFIG vs test_aio

2025-04-01 Thread Andres Freund
Hi,

On 2025-04-01 20:12:29 +, Todd Cook wrote:
> On 4/1/25, 3:42 PM, "Andres Freund"  > wrote:
> > I just committed the tests for AIO, and unfortunately they (so far) fail on
> > one buildfarm animal:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bumblebee&dt=2025-04-01%2018%3A55%3A01
> >  
> > 
> >
> > The reason for the failure is simple, the buildfarm animal specifies
> > io_method=io_uring (thanks to "cookt" for setting that up so quickly, 
> > whoever
> > you are :)) and the test is assuming that the -c io_method=... it passes to
> > initdb is actually going to be used, but it's overwritten by the 
> > TEMP_CONFIG.
>
> You're welcome!
>
> Is there an alternate way I could use to configure the io_method on bumblebee?

You could use PG_TEST_INITDB_EXTRA_OPTS, but I think you did it the right
way.

For one using PG_TEST_INITDB_EXTRA_OPTS would probably require changing the
buildfarm code, because the buildfarm code filters out environment variables
that aren't on an allowlist (I really dislike that).

IOW, I think we should figure out a way to deal with TEMP_CONFIG containing
io_method=io_uring, I just don't really know how yet.

Greetings,

Andres




Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-01 Thread David Rowley
On Tue, 1 Apr 2025 at 20:52, Ilia Evdokimov
 wrote:
> With the feature freeze coming up soon, I’d like to ask: do we plan to
> include this patch in v18?

I don't think there's a clear enough consensus yet on what EXPLAIN
should display. We'd need that beforehand. There are now less than 7
days to get that, so it might be unrealistic.

I tried to move things along to address Tom's concern about not
wanting to show this in EXPLAIN's standard output. I suggested in [1]
that we could use EXPLAIN ANALYZE, but nobody commented on that.
EXPLAIN ANALYZE is much more verbose than EXPLAIN already, and if we
put these in EXPLAIN ANALYZE then the viewer can more easily compare
planned vs actual. I had mentioned that Hash Join does something like
this for buckets.

David

[1] 
https://postgr.es/m/CAApHDvqPkWmz1Lq23c9CD+mAW=hgprd289tc30f3h1f6ng5...@mail.gmail.com




Re: Add partial :-variable expansion to psql \copy

2025-04-01 Thread Corey Huinker
>
> I'm hesitating about the right syntax, though, for an input backslash
> command which in effect would really only apply to COPY? ISTM that \g* is
> used for "go", i.e. a semi-colon replacement which executes the SQL, and we
> should want the same thing, which suggests:
>
making it a \g-variant does seem to be natural.


> COPY "foo" FROM STDIN \gi filename
>
> COPY "foo" FROM STDIN \gi command...|
>
> Another drawback is that it creates an error path:
>
> COPY "foo" FROM 'server-side-file' \gi 'client-side-file'
>
Consider the case:

 INSERT INTO mytable (x) VALUES(1) \gi '/path/to/local/file'

Is this an error because we'd be teeing up a file for a command that cannot
consume one? How much do we parse the buffer to learn whether we have a
query or a COPY command in the buffer? Maybe in the future other commands
will take uploaded files, but I would imagine those operations would just
leverage the existing COPY functionality inside whatever additional
benefits they provide. Until then, this command can only really be used for
single COPY foo FROM STDIN statements. With that in mind, I think the name
\copyfrom reflects the highly specific utility of the command, and sets
boundaries for what is reasonable to have in the query buffer (i.e. one
COPY statement, possibly multiline), leaving \gi open for later, more
flexible uses.

Looking at the code a bit, \copyfrom would have a variant of do_copy() with
a much abbreviated variant parse_slash_copy(), no construction of the copy
statement whatsoever, just use the query buffer and let regular SendQuery()
error handling take over.


Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Peter Geoghegan
On Tue, Apr 1, 2025 at 10:40 AM Matthias van de Meent
 wrote:
> The review below was started for v31, then adapted to v32 when that
> arrived. I haven't cross-checked this with v33.

There's been hardly any changes to 0001- in quite a while, so that's fine.

> > Teach nbtree composite index scans to opportunistically skip over
> > irrelevant sections of composite indexes given a query with an omitted
> > prefix column.
>
> AFAIK, this is only the second reference to "composite" indexes, after
> a single mention in a comment in nbtsplitloc.c's _bt_strategy. IIUC,
> this refers to multi-column indexes, which is more frequently and more
> consistently used across the code base.

I don't think it makes much difference, but sure, I'll use the
multi-column index terminology. In both code comments, and in commit
messages.

> > When nbtree is passed input scan keys derived from a
> > query predicate "WHERE b = 5", new nbtree preprocessing steps now output
> > "WHERE a = ANY() AND b = 5" scan keys.
>
> [...] new nbtree preprocessing steps now output +the equivalent of+ [...]
>
> > Preprocessing can freely add a skip array before or after any input
> > ScalarArrayOp arrays.
>
> This implies skip arrays won't be generated for WHERE b = 5 (a
> non-SAOP scankey) or WHERE b < 3 (not SAOP either), which is probably
> not intentional, so a rewording would be appreciated.

I don't agree. Yes, that sentence (taken in isolation) does not make
it 100% unambiguous. But, would anybody ever actually be misled? Even
once, ever? The misinterpretation of my words that you're concerned
about is directly contradicted by the whole opening paragraph. Plus,
it just doesn't make any sense. Obviously, you yourself never actually
interpreted it that way. Right?

The paragraph that this sentence appears in is all about the various
ways in which SAOP arrays and skip arrays are the same thing, except
at the very lowest level of the _bt_advance_array_keys code. I think
that that context makes it particularly unlikely that anybody would
ever think that I mean to imply something about the ways in which
non-array keys can be composed alongside skip arrays.

I'm pushing back here because I think that there's a real cost to
using overly defensive language, aimed at some imaginary person. The
problem with catering to such a person is that, overall, the
clarifications do more harm than good. What seems to end up happening
(I speak from experience with writing overly defensive comments) is
that the superfluous clarifications are read (by actual readers) the
wrong way -- they're read as if we must have meant quite a lot more
than what we actually intended.

More generally, I feel that it's a mistake to try to interpret your
words on behalf of the reader. While it's definitely useful to try to
anticipate the ways in which your reader might misunderstand, you
cannot reasonably do the work for them. It's usually (though not
always) best to deal with anticipated points of confusion by subtly
constructing examples that suggest that some plausible wrong
interpretation is in fact wrong, without drawing attention to it.
Coming right out and telling the reader what to not think *is* an
important tool, but it should be reserved for cases where it's truly
necessary.

> // nbtpreprocesskeys.c
>
> > +static bool _bt_s**parray_shrink
>
> I'd like to understand the "shrink" here, as it's not entirely clear to me.
> The functions are exclusively called through dispatch in
> _bt_compare_array_scankey_args, and I'd expected that naming to be
> extended to these functions.

I don't see the problem? As Alena pointed out, we're shrinking the
arrays here (or are likely to), meaning that we're usually going to
eliminate some subset of array elements. It's possible that this will
happen more than once for a given array (since there could be more
than one "contradictory" key on input). An array can only shrink
within _bt_compare_array_scankey_args -- it can never have new array
elements added.

> > + * This qual now becomes "WHERE x = ANY('{every possible x value}') and y 
> > = 4"
>
> I understand what you're going for, but a reference that indexed NULLs
> are still handled correctly (rather than removed by the filter) would
> be appreciated, as the indicated substitution would remove NULL
> values. (This doesn't have to be much more than a footnote.)

Why, though? Obviously, '{every possible x value}' isn't a valid array
literal. Doesn't that establish that this isn't a 100% literal
statement of fact?

There are a handful of places where I make a similar statement (the
commit message is another one, as is selfuncs.c). I do make this same
point about NULLs being just another value in selfuncs.c, though only
because it's relevant there. I don't want to talk about NULLs here,
because they just aren't relevant to this high-level overview at the
top of _bt_preprocess_keys. We do talk about the issue of skip arrays
and IS NOT NULL constraints elsewhere in nbtree: we talk about tho

Re: AIO v2.5

2025-04-01 Thread Andres Freund
Hi,

On 2025-04-01 11:55:20 -0400, Andres Freund wrote:
> I haven't yet pushed the changes, but will work on that in the afternoon.

There are three different types of failures in the test_aio test so far:

1) TEMP_CONFIG

See 
https://postgr.es/m/zh5u22wbpcyfw2ddl3lsvmsxf4yvsrvgxqwwmfjddc4c2khsgp%40gfysyjsaelr5


2) Failure on at least some windows BF machines:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-04-01%2020%3A15%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-04-01%2019%3A03%3A07

Afaict the error is independent of AIO, instead just related CREATE DATABASE
... STRATEGY wal_log failing on windows.  In contrast to dropdb(), which does

/*
 * Force a checkpoint to make sure the checkpointer has received the
 * message sent by ForgetDatabaseSyncRequests.
 */
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | 
CHECKPOINT_WAIT);

/* Close all smgr fds in all backends. */

WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));

createdb_failure_callback() does no such thing.  But it's rather likely that
we, bgwriter, checkpointer (and now IO workers) have files open for the target
database.

Note that the test is failing even with "io_method=sync", which obviously
doesn't use IO workers, so it's not related to that.


It's probably not a good idea to blockingly request a checkpoint and a barrier
inside a PG_TRY/PG_ENSURE_ERROR_CLEANUP() though, so this would need a bit
more rearchitecting.


I think I'm just going to make the test more lenient by not insisting that the
error is the first thing on psql's stderr.


3) Some subtests fail if RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE are 
defined:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-04-01%2019%3A23%3A07

# +++ tap check in src/test/modules/test_aio +++

#   Failed test 'worker: batch_start() leak & cleanup in implicit xact: 
expected stderr'
#   at t/001_aio.pl line 318.
#   'psql::4: ERROR:  starting batch while batch already 
in progress'
# doesn't match '(?^:open AIO batch at end)'


The problem is basically that the test intentionally forgets to exit batchmode
- normally that would trigger an error at the end of the transaction, which
the test verifies.  However, with RELCACHE_FORCE_RELEASE and
CATCACHE_FORCE_RELEASE defined, we get other code entering batchmode and
erroring out because batchmode isn't allowed to be entered recursively.


#0  pgaio_enter_batchmode () at 
../../../../../home/andres/src/postgresql/src/backend/storage/aio/aio.c:997
#1  0x55ec847959bf in read_stream_look_ahead (stream=0x55ecbcfda098)
at 
../../../../../home/andres/src/postgresql/src/backend/storage/aio/read_stream.c:438
#2  0x55ec84796514 in read_stream_next_buffer (stream=0x55ecbcfda098, 
per_buffer_data=0x0)
at 
../../../../../home/andres/src/postgresql/src/backend/storage/aio/read_stream.c:890
#3  0x55ec8432520b in heap_fetch_next_buffer (scan=0x55ecbcfd1c00, 
dir=ForwardScanDirection)
at 
../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:679
#4  0x55ec843259a4 in heapgettup_pagemode (scan=0x55ecbcfd1c00, 
dir=ForwardScanDirection, nkeys=1, key=0x55ecbcfd1620)
at 
../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:1041
#5  0x55ec843263ba in heap_getnextslot (sscan=0x55ecbcfd1c00, 
direction=ForwardScanDirection, slot=0x55ecbcfd0e18)
at 
../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:1420
#6  0x55ec8434ebe5 in table_scan_getnextslot (sscan=0x55ecbcfd1c00, 
direction=ForwardScanDirection, slot=0x55ecbcfd0e18)
at 
../../../../../home/andres/src/postgresql/src/include/access/tableam.h:1041
#7  0x55ec8434f786 in systable_getnext (sysscan=0x55ecbcfd8088) at 
../../../../../home/andres/src/postgresql/src/backend/access/index/genam.c:541
#8  0x55ec849c784a in SearchCatCacheMiss (cache=0x55ecbcf81000, nkeys=1, 
hashValue=3830081846, hashIndex=2, v1=403, v2=0, v3=0, v4=0)
at 
../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1543
#9  0x55ec849c76f9 in SearchCatCacheInternal (cache=0x55ecbcf81000, 
nkeys=1, v1=403, v2=0, v3=0, v4=0)
at 
../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1464
#10 0x55ec849c73ec in SearchCatCache1 (cache=0x55ecbcf81000, v1=403) at 
../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1332
#11 0x55ec849e5ae3 in SearchSysCache1 (cacheId=2, key1=403) at 
../../../../../home/andres/src/postgresql/src/backend/utils/cache/syscache.c:228
#12 0x55ec849d8c78 in RelationInitIndexAccessInfo (relation=0x7f6a85901c20)
at 
../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1456
#13 0x55ec849d8471 in RelationBuildDesc (targetRelId=2703, insertIt=true)
at 
../../../../../home/andres/src/postgresql/src/backend/utils/cach

Re: Making sslrootcert=system work on Windows psql

2025-04-01 Thread Jacob Champion
On Tue, Apr 1, 2025 at 2:05 PM George MacKerron  wrote:
>
> I was very pleased to see the sslrootcert=system connection option added in 
> Postgres 16 (I even blogged about it: 
> https://neon.tech/blog/avoid-mitm-attacks-with-psql-postgres-16). But 
> sslrootcert=system has not been widely supported by psql installations, 
> perhaps because people compiling Postgres haven’t always been aware of the 
> requirement to point OpenSSL in the direction of the system’s root CA 
> certificates.
>
> I’ve recently been trying to get it more widely supported, with some success 
> (details at end of this message).

(Thank you!)

> However, psql via the EnterpriseDB Windows installer still doesn’t support 
> sslrootcert=system,

Hm. I've been in contact with Kritika recently for the EDB macOS
fixes; hopefully we can get something figured out for Windows too.

> and I think a tiny patch is needed. The diff is attached, and can be seen in 
> context here: 
> https://github.com/postgres/postgres/compare/master...jawj:postgres:jawj-sslrootcert-system-windows
>
> Essentially, on Windows with OpenSSL 3.2+, it replaces 
> SSL_CTX_set_default_verify_paths(SSL_context) with 
> SSL_CTX_load_verify_store(SSL_context, "org.openssl.winstore:”).
>
> I’m not a Windows or OpenSSL expert, but so far the patched code seems to 
> work in theory and in practice (sources below, and I’ve compiled and tested 
> it working on Windows 11 x64).

While this will get things working -- if you plan to use the Windows
store! -- I worry that it's an incompatible change, and anyone who is
actually happy with the way things currently work (i.e. not using the
EDB installers) will be broken. The meaning of `sslrootcert=system` is
"do whatever OpenSSL wants to do by default." That includes
modification by the OpenSSL environment variables, which (I think)
this patch disables.

The winstore is new to me. Is there no way to get OpenSSL to switch
its default store without code changes?

--Jacob




Re: System views for versions reporting

2025-04-01 Thread Dmitry Dolgov
> On Sun, Mar 23, 2025 at 06:21:33PM GMT, Tom Lane wrote:
>
> FWIW, I think the 0004 patch is about to be mostly obsoleted by
> Andrei's proposal at [1].  To the extent that it's not obsoleted,
> I question whether it's something we want at all, given the ground
> rule that unprivileged users are not supposed to have access to info
> about the server's filesystem.

To be clear -- I don't have a case for 0004 myself, except some vague
expectation that in certain situations it could be useful to know which
shared objects are loaded, even if they are not Postgres modules. Based
on the feedback from the original thread [2], there were couple similar
opinions, maybe folks could reply here whether [1] would be sufficient
for them.

I agree with the argument about the privileges. If the 0004 patch will
be found useful, it would make sense to allow only superuser to access
this view. I assume "revoke all on pg_system_libraries from public"
should be enough, would this address the concern?

[2]: 
https://www.postgresql.org/message-id/flat/znc72ymyoelvk5rjk5ub254v3qvcczfrk6autygjdobfvx2e7p%40s3dssvf34twa




Re: RFC: Logging plan of the running query

2025-04-01 Thread Robert Haas
On Tue, Apr 1, 2025 at 3:05 PM Sami Imseih  wrote:
> > Looking at ExplainAssembleLogOutput() is making me realize that
> > auto_explain is in serious need of some cleanup. That's not really the
> > fault of this patch, but the hack whereby we overwrite the [] that
> > would have surrounded the JSON output with {} is not very nice. I also
> > think that the auto_explain GUCs need rethinking. In theory, new
> > EXPLAIN options should be mirrored into auto_explain, but if you
> > compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
> > that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
> > that are known to regular EXPLAIN aren't known to auto_explain, and
> > any customizable options that use explain_per_plan_hook won't be able
> > to work with auto_explain, either. Changing this is non-trivial
> > because SERIALIZE, for example, can't work the same way for
> > auto_explain as it does for EXPLAIN, and a full solution might also
> > require user-visible changes like replacing
> > auto_explain.log_ with auto_explain.explain, so I don't
> > really know. Maybe we should just live with it the way it is for now,
> > but it doesn't look very nice.
>
> FWIW, I have been thinking about auto_explain for another task,
> remote plans for fdw [0], and perhaps there are now other good
> reasons, some that you mention, that can be simplified if "auto_explain"
> becomes a core feature. This could be a proposal taken up in 19.

For what we're talking about here, I don't think we would need to go
that far -- maybe put a few functions in core but no real need to move
the whole module into core. However, I don't rule out that there are
other reasons to do as you suggest.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward

2025-04-01 Thread Dimitrios Apostolou

On Sat, 29 Mar 2025, Dimitrios Apostolou wrote:


P.S.  What is the recommended way to test a change, besides a generic make
check? And how do I run selectively only the pg_dump/restore tests, in order
to speed up my development routine?


I have tested it with:

  make  -C src/bin/pg_dump  check

It didn't break any test, but I also don't see any difference, the
performance boost is noticeable only when restoring a huge archive that is
missing offsets.

Any volunteer to review this one-line patch?

Thanks,
Dimitris





TEMP_CONFIG vs test_aio

2025-04-01 Thread Andres Freund
Hi,

I just committed the tests for AIO, and unfortunately they (so far) fail on
one buildfarm animal:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bumblebee&dt=2025-04-01%2018%3A55%3A01


The reason for the failure is simple, the buildfarm animal specifies
io_method=io_uring (thanks to "cookt" for setting that up so quickly, whoever
you are :)) and the test is assuming that the -c io_method=... it passes to
initdb is actually going to be used, but it's overwritten by the TEMP_CONFIG.


I had hardened the test, with some pain, against PG_TEST_INITDB_EXTRA_OPTS
containing -c io_method=...:
# Want to test initdb for each IO method, otherwise we could just reuse
# the cluster.
#
# Unfortunately Cluster::init() puts PG_TEST_INITDB_EXTRA_OPTS after the
# options specified by ->extra, if somebody puts -c io_method=xyz in
# PG_TEST_INITDB_EXTRA_OPTS it would break this test. Fix that up if we
# detect it.
local $ENV{PG_TEST_INITDB_EXTRA_OPTS} = $ENV{PG_TEST_INITDB_EXTRA_OPTS};
if (defined $ENV{PG_TEST_INITDB_EXTRA_OPTS}
&& $ENV{PG_TEST_INITDB_EXTRA_OPTS} =~ m/io_method=/)
{
$ENV{PG_TEST_INITDB_EXTRA_OPTS} .= " -c io_method=$io_method";
}

$node->init(extra => [ '-c', "io_method=$io_method" ]);

But somehow I didn't think about TEMP_CONFIG.

The reason that the test passes -c io_method= to initdb is that I want to
ensure initdb passes with all the supported io_methods.  That still happens
with TEMP_CONFIG specified, it's just afterwards over-written.

I could just append io_method=$io_method again after $node->init(), but then I
couldn't verify that initdb actually ran with the to-be-tested io method.


Does anybody have a good suggestion for how to fix this?


The least bad idea I can think of is for the test to check if
PostgreSQL::Test::Utils::slurp_file($ENV{TEMP_CONFIG}) contains the string
io_method and to append the io_method again to the config if it does.  But
that seems rather ugly.


Does anybody have a better idea?

Greetings,

Andres Freund




Re: macOS 15.4 versus strchrnul()

2025-04-01 Thread Tom Lane
Peter Eisentraut  writes:
> On 01.04.25 17:57, Tom Lane wrote:
>> I speculate that the meson test for preadv/pwritev has never worked
>> for macOS either, and we haven't noticed because nobody has tried to
>> build with meson on a machine with low enough default deployment
>> target to not have preadv/pwritev.

> Agreed.  Attached is a patch that implements the test more along the 
> lines of how Autoconf does it.  This gives correct results for me for 
> strchrnul() in various configurations.

Cool.  Let me try it here, and I'll push if no problems arise.

> Btw., I see on the buildfarm that strchrnul() is also available on 
> FreeBSD, DragonFly BSD, NetBSD, and musl (Alpine Linux).  So perhaps 
> some of the comments ought to be rewritten away from that it's a 
> glibc-specific extension.

OK, will do something about that --- maybe like "originally glibc
specific, but later adopted by other platforms"?

regards, tom lane




Re: Small memory fixes for pg_createsubcriber

2025-04-01 Thread Ranier Vilela
Em ter., 1 de abr. de 2025 às 15:39, Noah Misch 
escreveu:

> On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> > Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier <
> mich...@paquier.xyz>
> > escreveu:
> >
> > > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote:
> > > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void)
> > > >   locale->db_locale,
> > > >   strlen(locale->db_locale));
> > > >   else
> > > > - datlocale_literal = pg_strdup("NULL");
> > > > + datlocale_literal = PQescapeLiteral(conn_new_template1,
> > > > + "NULL",
> > > > + strlen("NULL"));
> > >
> > > Yeah, I've considered that but hardcoding NULL twice felt a bit weird,
> > > as well.  Perhaps it's slightly cleaner to use an intermediate
> > > variable given then as an argument of PQescapeLiteral()?
> > >
> > Yeah, I also think it would look good like this.
>
> This became commit 2a083ab "pg_upgrade: Fix inconsistency in memory
> freeing".
> PQescapeLiteral("NULL") is "'NULL'", so this causes pg_database.datlocale
> to
> contain datlocale='NULL'::text instead of datlocale IS NULL.
>
I believe the intention of the query is:
For example:
UPDATE pg_catalog.pg_database
SET encoding = 6,
datlocprovider = 'c',
datlocale = NULL
WHERE datname = 'template0';

Where datlocale with NULL is correct  no?

Because:
UPDATE pg_catalog.pg_database
SET encoding = 6,
datlocprovider = 'c',
datlocale IS NULL
WHERE datname = 'template0';

ERROR:  syntax error at or near "IS"
LINE 4: datlocale IS NULL

best regards,
Ranier Vilela


Re: Improving tracking/processing of buildfarm test failures

2025-04-01 Thread Alexander Lakhin

Hello hackers,

Please take a look at the March report on buildfarm failures:
# SELECT br, count(*) FROM failures WHERE dt >= '2025-03-01' AND
 dt < '2025-04-01' GROUP BY br;
REL_13_STABLE: 5
REL_14_STABLE: 10
REL_15_STABLE: 7
REL_16_STABLE: 5
REL_17_STABLE: 12
master: 204
-- Total: 243
(Counting test failures only, excluding indent-check, Configure, Build
errors.)

# SELECT COUNT(*) FROM (SELECT DISTINCT issue_link FROM failures WHERE
 dt >= '2025-03-01' AND dt < '2025-04-01');
36

 # SELECT issue_link, count(*) FROM failures WHERE dt >= '2025-03-01' AND
 dt < '2025-04-01' GROUP BY issue_link ORDER BY 2 DESC LIMIT 6;
https://www.postgresql.org/message-id/1322262.1741643606%40sss.pgh.pa.us : 27
-- Fixed

https://www.postgresql.org/message-id/CAH2-WznHJZKUGL5T96TOcqxYHGXg8MnBwrYSs%2Bj4-PBoRRtWEw%40mail.gmail.com
 : 23
-- Fixed

https://www.postgresql.org/message-id/657815a2-5a89-fcc1-1c9d-d77a6986b...@gmail.com
 : 23

https://www.postgresql.org/message-id/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s%40thmcxl7i6dpw
 : 21
-- Fixed

https://www.postgresql.org/message-id/CAD21AoBM6vAcPGR-ng0nqGG0yemR_ufdg3%2Bv3gkPa6Nc2ntnrA%40mail.gmail.com
 : 10
-- Fixed

https://www.postgresql.org/message-id/m2gyjjk6hazud7hezz25t7aw7rjv73fthkct4jqbvrnu3ezqz3%40nx3m53r7scce
 : 10
-- Fixed

SELECT count(*) FROM failures WHERE dt >= '2025-03-01' AND
 dt < '2025-04-01' AND issue_link IS NULL; -- Unsorted/unhelpful failures
38

Short-lived failures: 141

Best regards,
Alexander Lakhin
Neon (https://neon.tech)




Re: Test to dump and restore objects left behind by regression

2025-04-01 Thread Daniel Gustafsson
> On 1 Apr 2025, at 19:01, Alvaro Herrera  wrote:

> Thanks!

Thanks for taking this one across the finishing line!

--
Daniel Gustafsson





Re: macOS 15.4 versus strchrnul()

2025-04-01 Thread Peter Eisentraut

On 01.04.25 17:57, Tom Lane wrote:

That is, the function exists now in macOS' libc, and so configure's
does-it-link test for HAVE_STRCHRNUL finds it, but 
will not let you use it unless you monkey around with

export MACOSX_DEPLOYMENT_TARGET=15.4

or similar.  I don't think we want to require people to do that,
so we need to fix things so that the code works with or without
a deployment target that satisfies .  This is pretty
reminiscent of a problem that we faced a couple years ago with
preadv and pwritev, and solved in commit f014b1b9b by depending
on AC_CHECK_DECLS instead of AC_CHECK_FUNCS.  I made a patch
(attached) to solve this similarly.  Interestingly, this actually
makes the one usage in snprintf.c simpler, since we no longer
need to special-case the situation where GNU  doesn't
agree with the does-it-link test.


Agreed, this matches my research.


However ... testing this here shows that it fixes the autoconf
build as desired, with or without MACOSX_DEPLOYMENT_TARGET.
But the meson version *does not work*: it will set
HAVE_DECL_STRCHRNUL to 1 with or without MACOSX_DEPLOYMENT_TARGET,
and in the "without" case the build then blows up.

I speculate that the meson test for preadv/pwritev has never worked
for macOS either, and we haven't noticed because nobody has tried to
build with meson on a machine with low enough default deployment
target to not have preadv/pwritev.


Agreed.  Attached is a patch that implements the test more along the 
lines of how Autoconf does it.  This gives correct results for me for 
strchrnul() in various configurations.


Btw., I see on the buildfarm that strchrnul() is also available on 
FreeBSD, DragonFly BSD, NetBSD, and musl (Alpine Linux).  So perhaps 
some of the comments ought to be rewritten away from that it's a 
glibc-specific extension.
From e12c5d224a195b893f79c470bc4f21de1585c808 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 1 Apr 2025 19:00:39 +0200
Subject: [PATCH] WIP: Fix AC_CHECK_DECLS equivalent in meson

---
 meson.build | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 6932a0f00f7..ba7916d1493 100644
--- a/meson.build
+++ b/meson.build
@@ -2594,8 +2594,23 @@ foreach c : decl_checks
   args = c.get(2, {})
   varname = 'HAVE_DECL_' + func.underscorify().to_upper()
 
-  found = cc.has_header_symbol(header, func,
-args: test_c_args, include_directories: postgres_inc,
+  found = cc.compiles('''
+#include <@0@>
+
+int main()
+{
+#ifndef @1@
+(void) @1@;
+#endif
+
+return 0;
+}
+'''.format(header, func),
+name: 'test whether @0@ is declared'.format(func),
+# need to add cflags_warn to get at least
+# -Werror=unguarded-availability-new if applicable
+args: test_c_args + cflags_warn,
+include_directories: postgres_inc,
 kwargs: args)
   cdata.set10(varname, found, description:
 '''Define to 1 if you have the declaration of `@0@', and to 0 if you
-- 
2.49.0



Re: [PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward

2025-04-01 Thread Nathan Bossart
On Tue, Apr 01, 2025 at 09:33:32PM +0200, Dimitrios Apostolou wrote:
> It didn't break any test, but I also don't see any difference, the
> performance boost is noticeable only when restoring a huge archive that is
> missing offsets.

This seems generally reasonable to me, but how did you decide on 1MB as the
threshold?  Have you tested other values?  Could the best threshold vary
based on the workload and hardware?

-- 
nathan




Re: Small memory fixes for pg_createsubcriber

2025-04-01 Thread Noah Misch
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier 
> escreveu:
> 
> > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote:
> > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void)
> > >   locale->db_locale,
> > >   strlen(locale->db_locale));
> > >   else
> > > - datlocale_literal = pg_strdup("NULL");
> > > + datlocale_literal = PQescapeLiteral(conn_new_template1,
> > > + "NULL",
> > > + strlen("NULL"));
> >
> > Yeah, I've considered that but hardcoding NULL twice felt a bit weird,
> > as well.  Perhaps it's slightly cleaner to use an intermediate
> > variable given then as an argument of PQescapeLiteral()?
> >
> Yeah, I also think it would look good like this.

This became commit 2a083ab "pg_upgrade: Fix inconsistency in memory freeing".
PQescapeLiteral("NULL") is "'NULL'", so this causes pg_database.datlocale to
contain datlocale='NULL'::text instead of datlocale IS NULL.




Re: Statistics Import and Export

2025-04-01 Thread Nathan Bossart
On Tue, Apr 01, 2025 at 01:20:30PM -0500, Nathan Bossart wrote:
> On Mon, Mar 31, 2025 at 09:33:15PM -0500, Nathan Bossart wrote:
>> My goal is to commit the attached patches on Friday morning, but of course
>> that is subject to change based on any feedback or objections that emerge
>> in the meantime.
> 
> I spent some more time polishing these patches this morning.  There should
> be no functional differences, but I did restructure 0003 to make it even
> simpler.

Apologies for the noise.  I noticed one more way to simplify 0002.  As
before, there should be no functional differences.

-- 
nathan
>From 85e7b507629b096275d3a7386149876af7466f74 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 31 Mar 2025 10:44:26 -0500
Subject: [PATCH v12n3 1/3] Skip second WriteToc() for custom-format dumps
 without data.

Presently, "pg_dump --format=custom" calls WriteToc() twice.  The
second call is intended to update the data offset information,
which allegedly makes parallel pg_restore significantly faster.
However, if we aren't dumping any data, this step accomplishes
nothing and can be skipped.  This is a preparatory optimization for
a follow-up commit that will move the queries for attribute
statistics to WriteToc() to save memory.

Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan
---
 src/bin/pg_dump/pg_backup_custom.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_custom.c 
b/src/bin/pg_dump/pg_backup_custom.c
index e44b887eb29..b971e3bc16e 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -755,9 +755,11 @@ _CloseArchive(ArchiveHandle *AH)
 * If possible, re-write the TOC in order to update the data 
offset
 * information.  This is not essential, as pg_restore can cope 
in most
 * cases without it; but it can make pg_restore significantly 
faster
-* in some situations (especially parallel restore).
+* in some situations (especially parallel restore).  We can 
skip this
+* step if we're not dumping any data.
 */
-   if (ctx->hasSeek &&
+   if (AH->public.dopt->dumpData &&
+   ctx->hasSeek &&
fseeko(AH->FH, tpos, SEEK_SET) == 0)
WriteToc(AH);
}
-- 
2.39.5 (Apple Git-154)

>From 3804a6e489601bb3ac6770c413c7e9d3865ce524 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 31 Mar 2025 14:53:11 -0500
Subject: [PATCH v12n3 2/3] pg_dump: Reduce memory usage of dumps with
 statistics.

Right now, pg_dump stores all generated commands for statistics in
memory.  These commands can be quite large and therefore can
significantly increase pg_dump's memory footprint.  To fix, wait
until we are about to write out the commands before generating
them, and be sure to free the commands after writing.  This is
implemented via a new defnDumper callback that works much like the
dataDumper one but is specially designed for TOC entries.

One drawback of this change is that custom dumps that include data
will run the attribute statistics queries twice.  However, a
follow-up commit will add batching for these queries that our
testing indicates should improve dump speed (even when compared to
pg_dump before this commit).

Author: Corey Huinker 
Discussion: 
https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
---
 src/bin/pg_dump/pg_backup.h  |  1 +
 src/bin/pg_dump/pg_backup_archiver.c | 28 --
 src/bin/pg_dump/pg_backup_archiver.h |  5 
 src/bin/pg_dump/pg_dump.c| 44 
 4 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 658986de6f8..781f8fa1cc9 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -285,6 +285,7 @@ typedef int DumpId;
  * Function pointer prototypes for assorted callback methods.
  */
 
+typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg);
 typedef int (*DataDumperPtr) (Archive *AH, const void *userArg);
 
 typedef void (*SetupWorkerPtrType) (Archive *AH);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 1d131e5a57d..bb5a02415f2 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1266,6 +1266,9 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId 
dumpId,
newToc->dataDumperArg = opts->dumpArg;
newToc->hadDumper = opts->dumpFn ? true : false;
 
+   newToc->defnDumper = opts->defnFn;
+   newToc->defnDumperArg = opts->defnArg;
+
newToc->formatData = NULL;
newToc->dataLength = 0;
 
@@ -2621,7 +2624,17 @@ WriteToc(ArchiveHandle *AH)
WriteStr(AH, te->tag);
WriteStr(AH, te->desc);
WriteInt(AH, te->se

Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2025-04-01 Thread Sami Imseih
I went back to look at this patch and a few things. I noticed it did
not have correct
indentation, so I ran pgindent. I also removed some extra lines added and made
some slight adjustments to the docs. Attached my edited patch as a txt. If you
agree, please revise into a v14.

I also noticed that between v12 and v13, the GUC use_invisible_index
was removed,
but I don't see a discussion as to why. I feel it's a good GUC to have
[0], and we should
at least have it as a separate patch as part of this set.

I will continue reviewing the patch, but i feel this may be close to
be marked RFC, although
not sure if it will get a committer review before code freeze.

[0] 
https://www.postgresql.org/message-id/flat/CAA5RZ0udzydObMDi65C59-oq54B9ZmjSZ1wVH3h%2Bv4XiVm6QDA%40mail.gmail.com#cfea240ffd73e947f9edd1ef1c762dae


--
Sami Imseih
Amazon Web Services (AWS)
From c8886d5ac14f31897a7f8058a1caab3d9213993e Mon Sep 17 00:00:00 2001
From: Sami Imseih 
Date: Tue, 1 Apr 2025 12:27:23 -0500
Subject: [PATCH 1/1] Introduce the ability to enable/disable indexes using
 ALTER INDEX

---
 doc/src/sgml/catalogs.sgml |  11 +
 doc/src/sgml/ref/alter_index.sgml  |  43 ++
 doc/src/sgml/ref/create_index.sgml |  29 ++
 src/backend/bootstrap/bootparse.y  |   2 +
 src/backend/catalog/index.c|  31 +-
 src/backend/catalog/toasting.c |   2 +-
 src/backend/commands/indexcmds.c   |   4 +
 src/backend/commands/tablecmds.c   |  66 +++
 src/backend/optimizer/util/plancat.c   |  14 +
 src/backend/parser/gram.y  |  48 +-
 src/backend/parser/parse_utilcmd.c |   3 +
 src/backend/utils/adt/ruleutils.c  |   4 +
 src/backend/utils/cache/relcache.c |   1 +
 src/include/catalog/index.h|   1 +
 src/include/catalog/pg_index.h |   1 +
 src/include/nodes/parsenodes.h |   3 +
 src/include/nodes/pathnodes.h  |   2 +
 src/test/regress/expected/create_index.out | 505 +
 src/test/regress/sql/create_index.sql  | 192 
 19 files changed, 954 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index fb050635551..123fcb04892 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4618,6 +4618,17 @@ SCRAM-SHA-256$:&l
partial index.
   
  
+
+ 
+  
+   indisenabled bool
+  
+  
+   If true, the index is currently enabled and should be used for queries.
+   If false, the index is disabled and should not be used for queries,
+   but is still maintained when the table is modified. Default is true.
+  
+ 
 

   
diff --git a/doc/src/sgml/ref/alter_index.sgml 
b/doc/src/sgml/ref/alter_index.sgml
index 1d42d05d858..04b3b0b9bfe 100644
--- a/doc/src/sgml/ref/alter_index.sgml
+++ b/doc/src/sgml/ref/alter_index.sgml
@@ -31,6 +31,8 @@ ALTER INDEX [ IF EXISTS ] name ALTE
 SET STATISTICS integer
 ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]
 SET TABLESPACE new_tablespace 
[ NOWAIT ]
+ALTER INDEX [ IF EXISTS ] name 
ENABLE
+ALTER INDEX [ IF EXISTS ] name 
DISABLE
 
  
 
@@ -159,6 +161,33 @@ ALTER INDEX ALL IN TABLESPACE name
 

 
+   
+ENABLE
+
+ 
+  Enable the specified index. The index will be used by the query planner
+  for query optimization. This is the default state for newly created 
indexes.
+ 
+
+   
+
+   
+DISABLE
+
+ 
+  Disable the specified index. A disabled index is not used by the query 
planner
+  for query optimization, but it is still maintained when the underlying 
table
+  data changes. This can be useful for testing query performance with and 
without
+  specific indexes, temporarily reducing the overhead of index maintenance
+  during bulk data loading operations, or verifying an index is not being 
used
+  before dropping it. If performance degrades after disabling an index, it 
can be
+  easily re-enabled. Before disabling, it's recommended to check
+  
pg_stat_user_indexes.idx_scan
+  to identify potentially unused indexes.
+ 
+
+   
+
   
   
 
@@ -301,6 +330,20 @@ CREATE INDEX coord_idx ON measured (x, y, (z + t));
 ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000;
 
 
+  
+   To enable an index:
+
+ALTER INDEX idx_name ENABLE;
+
+  
+
+  
+   To disable an index:
+
+ALTER INDEX idx_name DISABLE;
+
+  
+
  
 
  
diff --git a/doc/src/sgml/ref/create_index.sgml 
b/doc/src/sgml/ref/create_index.sgml
index 208389e8006..d7a2f7df852 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -28,6 +28,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
storage_parameter [= 
value] [, ... ] ) ]
 [ TABLESPACE tablespace_name ]
 [ WHERE predicate ]
+[ DISABLE ]
 
  
 
@@ -380,6 +381,19 @@ CREATE [ UNIQUE ] INDEX [ CONCURREN

Re: RFC: Logging plan of the running query

2025-04-01 Thread Robert Haas
On Fri, Mar 21, 2025 at 8:40 AM torikoshia  wrote:
> Rebased it again.
>
> On 2025-03-10 14:10, torikoshia wrote:
> > BTW the patch adds about 400 lines to explain.c and it may be better
> > to split the file as well as 9173e8b6046, but I leave it as it is for
> > now.
>
> This part remains unchanged.

Looking at ExplainAssembleLogOutput() is making me realize that
auto_explain is in serious need of some cleanup. That's not really the
fault of this patch, but the hack whereby we overwrite the [] that
would have surrounded the JSON output with {} is not very nice. I also
think that the auto_explain GUCs need rethinking. In theory, new
EXPLAIN options should be mirrored into auto_explain, but if you
compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
that are known to regular EXPLAIN aren't known to auto_explain, and
any customizable options that use explain_per_plan_hook won't be able
to work with auto_explain, either. Changing this is non-trivial
because SERIALIZE, for example, can't work the same way for
auto_explain as it does for EXPLAIN, and a full solution might also
require user-visible changes like replacing
auto_explain.log_ with auto_explain.explain, so I don't
really know. Maybe we should just live with it the way it is for now,
but it doesn't look very nice.

Rafael and I were just discussing off-list to what extent the parallel
query problems with his patch also apply to yours. His design makes
the problem easier to hit, I think, because setting
progressive_explain = on makes every backend attempt to dump a query
plan, whereas you'd have to hit the worker process with a signal and
just the right time. But the fundamental problem appears to be the
same. Another interesting wrinkle is that he settled on showing the
outermost running query whereas you settled on the innermost. If you
were showing the outermost, perhaps you could just prohibit
dump-the-query-plan on a parallel worker, but you don't really want to
prohibit it categorically, because the worker could be running an
inner query that could be dumped. So maybe you want to avoid setting
ActiveQueryDesc at the toplevel and then set/clear it for inner
queries. However, that's a bit weird, too. If we wanted to undertake a
bigger redesign here, we could try pushing the entire query plan
(including all subqueries) down to the worker, and just tell it which
part it's actually supposed to execute. However, that would have some
overhead and would arguably open up some risk of future bugs, since
passing subqueries as NULL is, I think, intended partly as a guard
against accidentally executing the wrong subqueries.

I don't think it's a good idea to put the logic to update
ActiveQueryDesc into ExecutorRun. I think that function should really
just call the hook, or standard_ExecutorRun. I don't know for sure
whether this work should be moved down into ExecutorRun or up into the
caller, but I don't think it should stay where it is.

My comment about the naming of WrapMultiExecProcNodesWithExplain() on
the other thread also applies here: MultiExecProcNode() is an
unrelated function. Likewise, WrapExecProcNodeWithExplain() misses
walking the node's subplan list. Also, I don't think an
ereport(DEBUG1, ...) is appropriate here.

Do we really need es->signaled? Couldn't we test es->analyze instead?

Do we really need ExecProcNodeOriginal? Can we find some way to reuse
ExecProcNodeReal instead of making the structure bigger?

I do think we should try to move some of this new code out into a
separate source file, but I'm not yet sure what we should call it. We
might want to share infrastructure with something what Rafael's patch,
which he called progressive EXPLAIN but which is really closer to a
general query progress facility, albeit perhaps too expensive to have
enabled by default.

Does the documentation typically mention when a function is
superuser-only? If so, it should do so here, too, using similar
wording.

It seems unnecessary to remark that you can't log a query plan after a
subtransaction abort, because the query wouldn't be running any more.
It's also true that you can't log a query after a toplevel abort, even
if you're still waiting for the aborted transaction to roll back. But
it also seems like once the aborted subtransaction is popped off the
stack and we return to the parent transaction, we should be able to
log any query running at the outer level. If this is meant to imply
that something doesn't work in this scenario, perhaps there is
something about the design that needs fixing.

Does ActiveQueryDesc really need to be exposed to the whole system?
Could it be a file-level variable?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Small memory fixes for pg_createsubcriber

2025-04-01 Thread Ranier Vilela
Em ter., 1 de abr. de 2025 às 15:39, Noah Misch 
escreveu:

> On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> > Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier <
> mich...@paquier.xyz>
> > escreveu:
> >
> > > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote:
> > > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void)
> > > >   locale->db_locale,
> > > >   strlen(locale->db_locale));
> > > >   else
> > > > - datlocale_literal = pg_strdup("NULL");
> > > > + datlocale_literal = PQescapeLiteral(conn_new_template1,
> > > > + "NULL",
> > > > + strlen("NULL"));
> > >
> > > Yeah, I've considered that but hardcoding NULL twice felt a bit weird,
> > > as well.  Perhaps it's slightly cleaner to use an intermediate
> > > variable given then as an argument of PQescapeLiteral()?
> > >
> > Yeah, I also think it would look good like this.
>
> This became commit 2a083ab "pg_upgrade: Fix inconsistency in memory
> freeing".
> PQescapeLiteral("NULL") is "'NULL'", so this causes pg_database.datlocale
> to
> contain datlocale='NULL'::text instead of datlocale IS NULL.
>
Yeah, thanks for pointing this out.
The patch maintained the previous behavior.
I'll take a look.

best regards,
Ranier Vilela


Re: Proposal: Progressive explain

2025-04-01 Thread Rafael Thofehrn Castro
My bad, I mistakenly did the tests without assertion enabled in the last 2
days,
so couldn't catch that Assertion failure. Was able to reproduce it, thanks.

I guess when the code was designed we were not expecting to be doing
explains
in parallel workers.

One comment is that this has nothing to do with instrumentation. So the
hacks
done with instrument objects are not to blame here.

What is interesting is that removing that Assert (not saying we should do
that)
fixes it and there doesn't seem to be other Asserts complaining anywhere.
The
feature works as expected and there are no crashes.

> What I think this means is that this patch needs significant
> rethinking to cope with parallel query. I don't think users are going
> to be happy with a progressive EXPLAIN that just ignores what the
> workers are doing, and I don't think they're going to be happy with N
> separate progressive explains that they have to merge together to get
> an overall picture of what the query is doing, and I'm absolutely
> positive that they're not going to be happy with something that
> crashes. I think there may be a way to come up with a good design here
> that avoids these problems, but we definitely don't have time before
> feature freeze (not that we were necessarily in a great place to think
> of committing this before feature freeze anyway, but it definitely
> doesn't make sense now that I understand this problem).

Yeah, that is a fair point. But I actually envisioned this feature to also
target parallel workers, from the start. I see a lot of value in being able
to debug what each parallel worker is doing in their blackboxes. It could be
that a  worker is lagging behind others as it is dealing with non cached
data blocks for example.

According to my tests the parallel workers can actually push instrumentation
to the parent while still running. It all depends on the types of operations
being performed in the plan.

For example, if the parallel worker is doing a parallel seq scan, then it
will
continuously send chunks of rows to the parent and instrumentation goes with
the data. So we don't actually need to wait for the workers to finish until
instrumentation on the parent gets updated. Here is what I got from
Torikoshi's
test after removing that Assert and enabling instrumented progressive
explain (progressive_explain_interval = '10ms'):

-[ RECORD 1
]-
datid   | 5
datname | postgres
pid | 169555
last_update | 2025-04-01 12:44:07.36775-03
query_plan  | Gather  (cost=0.02..137998.03 rows=1002 width=8) (actual
time=0.715..868.619 rows=9232106.00 loops=1) (current)
+
|   Workers Planned: 2

   +
|   Workers Launched: 2

  +
|   InitPlan 1

   +
| ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.002..0.002 rows=1.00 loops=1)
+
|   InitPlan 2

   +
| ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.000..0.000 rows=1.00 loops=1)
+
|   ->  Parallel Append  (cost=0.00..137998.01 rows=419
width=8) (actual time=0.364..264.804 rows=1533011.00 loops=1)
  +
| ->  Seq Scan on ab_a2_b1 ab_2  (cost=0.00..0.00
rows=1 width=8) (never executed)
 +
|   Filter: ((b = 1) AND ((a = (InitPlan 1).col1)
OR (a = (InitPlan 2).col1)))
 +
| ->  Seq Scan on ab_a3_b1 ab_3  (cost=0.00..0.00
rows=1 width=8) (never executed)
 +
|   Filter: ((b = 1) AND ((a = (InitPlan 1).col1)
OR (a = (InitPlan 2).col1)))
 +
| ->  Parallel Seq Scan on ab_a1_b1 ab_1
 (cost=0.00..117164.67 rows=417 width=8) (actual time=0.361..192.896
rows=1533011.00 loops=1)+
|   Filter: ((b = 1) AND ((a = (InitPlan 1).col1)
OR (a = (InitPlan 2).col1)))
 +
|
-[ RECORD 2
]-
datid   | 5
datname | postgres
pid | 169596
last_update | 2025-04-01 12:44:07.361846-03
query_plan  | Parallel Append  (cost=0.00..137998.01 rows=419 width=8)
(actual time=0.990..666.845 rows=3839515.00 loops=1) (current)
+
|   ->  Seq Scan on ab_a2_b1 ab_2  (cost=0.00..0.00 rows=1
width=8) (never executed)
   +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))

   +
|   ->  Seq Scan on ab_a3_b1 ab_3  (cost=0.00..0.00 rows=1
width=8) (never executed)
   +
| Filter: ((b = 1) AND ((a = $0) OR (a = $1)))

   +
|   ->  Parallel Seq Scan on ab_a1_b1 ab_1
 (cost=0.00..117164.67 rows=417 width=8) (actual time=0.985..480.531
rows=3839

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Alena Rybakina

On 01.04.2025 17:39, Matthias van de Meent wrote:

On Fri, 28 Mar 2025 at 22:59, Peter Geoghegan  wrote:

On Tue, Mar 25, 2025 at 7:45 PM Peter Geoghegan  wrote:

Attached is v31, which has a much-improved _bt_skip_ikeyprefix (which
I've once again renamed, this time to _bt_set_startikey).

Attached is v32
+static bool _bt_s**parray_shrink

I'd like to understand the "shrink" here, as it's not entirely clear to me.
The functions are exclusively called through dispatch in
_bt_compare_array_scankey_args, and I'd expected that naming to be
extended to these functions.
I understood _bt_skiparray_shrink() as the function that refines the 
range of values that a skip array will consider,
by interpreting existing scalar inequality conditions and applying them 
to limit the bounds of the skip scan.
I understood "shrink" to mean narrowing the range of values that the 
skip array will consider during the index scan.


--
Regards,
Alena Rybakina
Postgres Professional


Re: Extend ALTER DEFAULT PRIVILEGES for large objects

2025-04-01 Thread Fujii Masao




On 2025/01/23 19:22, Yugo NAGATA wrote:

On Wed, 22 Jan 2025 13:30:17 +0100
Laurenz Albe  wrote:


On Fri, 2024-09-13 at 16:18 +0900, Yugo Nagata wrote:

I've attached a updated patch. The test is rewritten using 
has_largeobject_privilege()
function instead of calling loread & lowrite, which makes the test a bit 
simpler.
Thare are no other changes.


When I tried to apply this patch, I found that it doesn't apply any
more since commit f391d9dc93 renamed tab-complete.c to tab-complete.in.c.

Attached is a rebased patch.


Thank you for updating the patch!


I agree that large objects are a feature that should fade out (alas,
the JDBC driver still uses it for BLOBs).  But this patch is not big
or complicated and is unlikely to create a big maintenance burden.

So I am somewhat for committing it.  It works as advertised.
If you are fine with my rebased patch, I can mark it as "ready for
committer".  If it actually gets committed depends on whether there
is a committer who thinks it worth the effort or not.


I confirmed the patch and I am fine with it.


I've started reviewing this patch since it's marked as "ready for committer".

I know of several systems that use large objects, and I believe
this feature would be beneficial for them. Overall, I like the idea.


The latest patch looks good to me. I just have one minor comment:


  only the privileges for schemas, tables (including views and foreign
  tables), sequences, functions, and types (including domains) can be
  altered.


In alter_default_privileges.sgml, this part should also mention large objects?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Document SQL functions behavior regarding custom/generic plan

2025-04-01 Thread Renan Alves Fonseca
Hi,

In [1], we discussed adding some lines regarding the fact that SQL
functions always use generic plan. Here is the suggested patch. I've
tested on V17 and V12. It does not concern V18.

Regards,
Renan

[1] 
https://www.postgresql.org/message-id/CAApHDvp3EDrVhGjmb21bceJ5-Y7iXKOn2UGG3-ngp_9ob_mpLw%40mail.gmail.com
From ea969fa15edec0bb9e8726860be5660b129ed993 Mon Sep 17 00:00:00 2001
From: "Renan A. Fonseca" 
Date: Tue, 1 Apr 2025 15:53:34 +0200
Subject: [PATCH] Document SQL functions' behavior regarding custom/generic
 plan

---
 doc/src/sgml/xfunc.sgml | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index f3a3e4e2f8f..f8058d78fe6 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -249,6 +249,13 @@ CALL clean_emp();
  

 
+
+ 
+   Except when inlined, an SQL function is always executed with a
+   generic plan. This behavior may not be desired in some situations.
+ 
+
+

 The syntax of the CREATE FUNCTION command requires
 the function body to be written as a string constant.  It is usually
-- 
2.47.0



Re: TEMP_CONFIG vs test_aio

2025-04-01 Thread Andres Freund
Hi,

On 2025-04-01 17:08:49 -0400, Andrew Dunstan wrote:
> On 2025-04-01 Tu 4:17 PM, Andres Freund wrote:
> > For one using PG_TEST_INITDB_EXTRA_OPTS would probably require changing the
> > buildfarm code, because the buildfarm code filters out environment variables
> > that aren't on an allowlist (I really dislike that).
> 
> 
> Uh, not quite. Anything in the config's build_env is not filtered out. That
> change was made a year ago.

Huh. Just recently I tried to configure PG_TEST_TIMEOUT_DEFAULT in build_env
and did not take effect until I explicitly added it to the @safe_set.

Greetings,

Andres Freund




Re: Fix 035_standby_logical_decoding.pl race conditions

2025-04-01 Thread Amit Kapila
On Tue, Apr 1, 2025 at 2:02 PM Bertrand Drouvot
 wrote:
>
> Hi Kuroda-san,
>
> On Tue, Apr 01, 2025 at 01:22:49AM +, Hayato Kuroda (Fujitsu) wrote:
> > Dear Bertrand,
> >
>
> Thanks for the updated patch!
>
> > > s/to avoid the seeing a xl_running_xacts/to avoid seeing a 
> > > xl_running_xacts/?
> >
> > Fixed.
>
> hmm, not sure as I still can see:
>
> +# Note that injection_point is used to avoid the seeing the xl_running_xacts
>
> === 1
>
> +* XXX What value should we return here? Originally this 
> returns the
> +* inserted location of RUNNING_XACT record. Based on that, 
> here
> +* returns the latest insert location for now.
> +*/
> +   return GetInsertRecPtr();
>
> Looking at the LogStandbySnapshot() that are using the output lsn, i.e:
>
> pg_log_standby_snapshot()
> BackgroundWriterMain()
> ReplicationSlotReserveWal()
>
> It looks ok to me to use GetInsertRecPtr().
>

+1.

> But if we "really" want to produce a "new" WAL record, what about using
> LogLogicalMessage()?
>

We are using injection points for testing purposes, which means the
caller is aware of skipping the running_xacts record during the test
run. So, there doesn't seem to be any reason to do anything extra.

-- 
With Regards,
Amit Kapila.




Making sslrootcert=system work on Windows psql

2025-04-01 Thread George MacKerron
I was very pleased to see the sslrootcert=system connection option added in 
Postgres 16 (I even blogged about it: 
https://neon.tech/blog/avoid-mitm-attacks-with-psql-postgres-16). But 
sslrootcert=system has not been widely supported by psql installations, perhaps 
because people compiling Postgres haven’t always been aware of the requirement 
to point OpenSSL in the direction of the system’s root CA certificates.

I’ve recently been trying to get it more widely supported, with some success 
(details at end of this message).

However, psql via the EnterpriseDB Windows installer still doesn’t support 
sslrootcert=system, and I think a tiny patch is needed. The diff is attached, 
and can be seen in context here: 
https://github.com/postgres/postgres/compare/master...jawj:postgres:jawj-sslrootcert-system-windows

Essentially, on Windows with OpenSSL 3.2+, it replaces 
SSL_CTX_set_default_verify_paths(SSL_context) with 
SSL_CTX_load_verify_store(SSL_context, "org.openssl.winstore:”).

I’m not a Windows or OpenSSL expert, but so far the patched code seems to work 
in theory and in practice (sources below, and I’ve compiled and tested it 
working on Windows 11 x64).


# Sources

https://stackoverflow.com/a/79461864/338196
https://docs.openssl.org/master/man7/OSSL_STORE-winstore/
https://docs.openssl.org/master/man3/SSL_CTX_load_verify_locations/


# Status of sslrootcert=system in various packages providing psql

## Mac
Postgres.app — now fixed (https://github.com/PostgresApp/PostgresApp/issues/801)
MacPorts — now fixed (https://trac.macports.org/ticket/72080)
EDB installer — now fixed 
(https://github.com/EnterpriseDB/edb-installers/issues/264)
homebrew — was working already

## Linux
Debian/Ubuntu — now Recommends ca-certificates 
(https://salsa.debian.org/postgresql/postgresql/-/commit/96077ad61c36386646cdd9b5ce0e423a357ce73b)

## Windows
EDB installer — in progress
WSL1, WSL2 (Ubuntu, openSUSE) — was working already



sslrootcert-system-win.diff
Description: Binary data


Re: Statistics Import and Export

2025-04-01 Thread Jeff Davis
On Tue, 2025-04-01 at 13:44 -0500, Nathan Bossart wrote:
> Apologies for the noise.  I noticed one more way to simplify 0002. 
> As
> before, there should be no functional differences.


To restate the problem: one of the problems being solved here is that
the existing code for custom-format dumps calls WriteToc twice. That
was not a big problem before this patch, when the contents of the
entries was easily accessible in memory. But the point of 0002 is to
avoid keeping all of the stats in memory at once, because that causes
bloat; and instead to query it on demand.

In theory, we could fix the pre-existing code by making the second pass
able to jump over the other contents of the entry and just update the
data offsets. But that seems invasive, at least to do it properly.

0001 sidesteps the problem by skipping the second pass if data's not
being dumped (because there are no offsets that need updating). The
worst case is when there are a lot of objects with a small amount of
data. But that's a worst case for stats in general, so I don't think
that needs to be solved here.

Issuing the stats queries twice is not great, though. If there's any
non-deterministic output in the query, that could lead to strangeness.
How bad can that be? If the results change in some way that looks
benign, but changes the length of the definition string, can it lead to
corruption of a ToC entry? I'm not saying there's a problem, but trying
to understand the risk of future problems.

For 0003, it makes an assumption about the way the scan happens in
WriteToc(). Can you add some additional sanity checks to verify that
something doesn't happen in a different order than we expect?

Also, why do we need the clause "WHERE s.tablename = ANY($2)"? Isn't
that already implied by "JOIN unnest($1, $2) ... s.tablename =
u.tablename"?

Regards,
Jeff Davis





Re: general purpose array_sort

2025-04-01 Thread Tom Lane
Junwang Zhao  writes:
> On Tue, Apr 1, 2025 at 1:11 AM Tom Lane  wrote:
>> Attached 0001 is the same as v18, and then 0002 is the proposed
>> addition to typcache.

> I've applied the patches to master and regression passed.
> 0002 is neat, I am +1 for this improvement.

Hearing no further comments, pushed.

regards, tom lane




Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-01 Thread Jacob Champion
On Mon, Mar 31, 2025 at 7:06 AM Christoph Berg  wrote:
> Perhaps we could do the same with libldap and libgssapi? (Though
> admittedly I have never seen any complaints or nagging questions from
> security people about these.)

If we end up happy with how the Curl indirection works, that seems
like it'd be kind of nice in theory. I'm not sure how many people
would notice, though.

On Wed, Mar 26, 2025 at 12:09 PM Jacob Champion
 wrote:
> Right
> now we have an SO version of 1; maybe we want to remove the SO version
> entirely to better indicate that it shouldn't be linked?

Maybe a better idea would be to ship an SONAME of
`libpq-oauth.so.0.`, without any symlinks, so that there's
never any ambiguity about which module belongs with which libpq.

--Jacob




Re: bug when apply fast default mechanism for adding new column over domain with default value

2025-04-01 Thread Tom Lane
Alexander Lakhin  writes:
> I've discovered that 95f650674 introduced a defect similar to bug #18297,
> but this time with DEFAULT. Namely, the following script:
> CREATE TABLE a (aa text);
> CREATE TABLE c (cc text) INHERITS (a);
> CREATE TABLE d (dd text) INHERITS (c, a);
> ALTER TABLE a ADD COLUMN i int DEFAULT 1;

> fails with:
> ERROR:  XX000: tuple already updated by self
> LOCATION:  simple_heap_update, heapam.c:4421

Hmm, yeah.  The failing call is here:

/* Bump the existing child att's inhcount */
...
CatalogTupleUpdate(attrdesc, &tuple->t_self, tuple);

so I think you're right that that code path is now short a
CommandCounterIncrement() somewhere.  I'll look tomorrow if
nobody beats me to it.

regards, tom lane




Re: Add mention in docs about locking all partitions for generic plans

2025-04-01 Thread David Rowley
On Mon, 31 Mar 2025 at 12:19, David Rowley  wrote:
> I'll push these in the next few days unless anyone else wants to voice
> their opinions.

Thanks for the review. Pushed.

David




Re: Reduce "Var IS [NOT] NULL" quals during constant folding

2025-04-01 Thread Richard Guo
On Wed, Apr 2, 2025 at 4:34 AM Robert Haas  wrote:
> On Tue, Apr 1, 2025 at 2:34 AM Richard Guo  wrote:
> > However, I gave up this idea because I realized it would require
> > retrieving a whole bundle of catalog information that isn't needed
> > until after the RelOptInfos are built, such as max_attr, pages,
> > tuples, reltablespace, parallel_workers, extended statistics, etc.

> Why is that bad? I mean, if we're going to need that information
> anyway, then gathering it at earlier stage doesn't hurt. Of course, if
> we move it too early, say before partition pruning, then we might
> gather information we don't really need and hurt performance. But
> otherwise it doesn't seem to hurt anything.

The attnotnull catalog information being discussed here is intended
for use during constant folding (and possibly sublink pull-up), which
occurs long before partition pruning.  Am I missing something?

Additionally, I'm doubtful that the collection of relhassubclass can
be moved after partition pruning.  How can we determine whether a
relation is inheritable without retrieving its relhassubclass
information?

As for attgenerated, I also doubt that it can be retrieved after
partition pruning.  It is used to expand virtual generated columns,
which can result in new PlaceHolderVars.  This means it must be done
before deconstruct_jointree, as make_outerjoininfo requires all active
PlaceHolderVars to be present in root->placeholder_list.

If these pieces of information cannot be retrieved after partition
pruning, and for performance reasons we don't want to move the
gathering of the majority of the catalog information before partition
pruning, then it seems to me that moving get_relation_info() to an
earlier stage might not be very meaningful.  What do you think?

> > It could be argued that the separation of catalog information
> > collection isn't very great, but it seems this isn't something new in
> > this patch.  So I respectfully disagree with your statement that "all
> > the information that the planner needs about a particular relation is
> > retrieved by get_relation_info()", and that "there's now just exactly
> > this 1 thing that is done in a different place".  For instance,
> > relhassubclass and attgenerated are retrieved before expression
> > preprocessing, a relation's constraint expressions are retrieved when
> > setting the relation's size estimates, and more.

> Nonetheless I think we ought to be trying to consolidate more things
> into get_relation_info(), not disperse some of the things that are
> there to other places.

I agree with the general idea of more consolidation and less
dispersion, but squashing all information collection into
get_relation_info() seems quite challenging.  However, I think we can
make at least one optimization, as I mentioned upthread — retrieving
the small portion of catalog information needed at an early stage in
one place instead of three.  Perhaps we could start with moving the
retrieval of relhassubclass into collect_relation_attrs() and rename
this function to better reflect this change.

Thanks
Richard




Re: Statistics Import and Export

2025-04-01 Thread Robert Haas
On Tue, Apr 1, 2025 at 4:24 PM Jeff Davis  wrote:
> On Tue, 2025-04-01 at 09:37 -0400, Robert Haas wrote:
> > I don't think I was aware of the open item; I was just catching up on
> > email.
>
> I lean towards making it opt-in for pg_dump and opt-out for pg_upgrade.

Big +1.

> But I think we should leave open the possibility for changing the
> default to opt-out for pg_dump in the future.

We can always decide to change things, but this is different from some
other cases. Sometimes we're waiting to enable a feature by default
until, say, we think it's stable. This case is more about user
expectations, which might not be so prone to changing over time (but
you never know).

> Mid-beta might be too long, but let's wait for the final CF to settle
> and give people the chance to respond to a top-level thread?

wfm, thanks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Small memory fixes for pg_createsubcriber

2025-04-01 Thread Michael Paquier
On Apr 2, 2025, at 5:14, Noah Misch  wrote:Yes, pg_upgrade should not do that.  I was trying to explain the differencebetween NULL and 'NULL'.  I didn't mean pg_upgrade should emit "IS NULL".(Apologies for the probably-weirdly-formatted message)This issue has been already mentioned around here, with patches exchanged:Re: pgsql: pg_upgrade: Fix inconsistency in memory freeingpostgresql.org--Michael

Covering the comparison between date and timestamp, tz, type

2025-04-01 Thread Kwangwon Seo
Hi,

Using the PostgreSQL code coverage report, I found that tests for
comparisons between date and timestamp[tz] are missing. Some of them have
only partial coverage.

Attached patch will cover following functions:

date_eq_timestamp
date_ne_timestamp
date_lt_timestamp
date_gt_timestamp // already covered
date_le_timestamp
date_ge_timestamp

date_eq_timestamptz
date_ne_timestamptz
date_lt_timestamptz // already covered
date_gt_timestamptz // already covered
date_le_timestamptz
date_ge_timestamptz

timestamp_eq_date
timestamp_ne_date
timestamp_lt_date
timestamp_gt_date // already covered
timestamp_le_date
timestamp_ge_date

timestamptz_eq_date
timestamptz_ne_date
timestamptz_lt_date
timestamptz_gt_date // already covered
timestamptz_le_date
timestamptz_ge_date // already covered

This is a minor patch, but I’d like to submit it to enhance test coverage.

Best regards,
Kwangwon Seo
From 6031a064cbfb5581a159d160f3dbb1d8a3d81b6b Mon Sep 17 00:00:00 2001
From: Kwangwon Seo 
Date: Fri, 28 Mar 2025 17:58:49 +0900
Subject: [PATCH v1] Add test for extra comparison between date and
 timestamp(tz)

1. Add missing resgression tests for comparison between date and timestamp
---
 src/test/regress/expected/horology.out | 160 +
 src/test/regress/sql/horology.sql  |  52 
 2 files changed, 212 insertions(+)

diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 12eefb09d4d..03ad3a75084 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2500,6 +2500,166 @@ SELECT '2020-10-05'::timestamptz >= '4714-11-24 BC'::timestamp as t;
  t
 (1 row)
 
+RESET TimeZone;
+--
+-- Comparisons between date and timestamp types
+---
+SELECT '2025-03-28'::date = '2025-03-28 00:00:00'::timestamp as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date <> '2025-03-28 00:00:01'::timestamp as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date < '2025-03-28 00:00:01'::timestamp as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date <= '2025-03-28 00:00:00'::timestamp as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date > '2025-03-27 23:59:59'::timestamp as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date >= '2025-03-28 00:00:00'::timestamp as t;
+ t 
+---
+ t
+(1 row)
+
+--
+-- Comparisons between timestamp and date types
+---
+SELECT '2025-03-28 00:00:00'::timestamp = '2025-03-28'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamp <> '2025-03-27'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamp < '2025-03-29'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamp <= '2025-03-28'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamp > '2025-03-27'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamp >= '2025-03-28'::date as t;
+ t 
+---
+ t
+(1 row)
+
+--
+-- Comparisons between date and timestamptz types
+---
+SET TimeZone to 'UTC';
+SELECT '2025-03-28'::date = '2025-03-28 00:00:00'::timestamptz as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date <> '2025-03-28 00:00:01'::timestamptz as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date < '2025-03-28 00:00:01'::timestamptz as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date <= '2025-03-28 00:00:00'::timestamptz as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date > '2025-03-27 23:59:59'::timestamptz as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28'::date >= '2025-03-28 00:00:00'::timestamptz as t;
+ t 
+---
+ t
+(1 row)
+
+RESET TimeZone;
+--
+-- Comparisons between timestamptz and date types
+---
+SET TimeZone to 'UTC';
+SELECT '2025-03-28 00:00:00'::timestamptz = '2025-03-28'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamptz <> '2025-03-27'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamptz < '2025-03-29'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamptz <= '2025-03-28'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamptz > '2025-03-27'::date as t;
+ t 
+---
+ t
+(1 row)
+
+SELECT '2025-03-28 00:00:00'::timestamptz >= '2025-03-28'::date as t;
+ t 
+---
+ t
+(1 row)
+
 RESET TimeZone;
 --
 -- Tests for BETWEEN
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 86481637223..e85ba6e1b4c 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -376,6 +376,58 @@ SELECT '2020-10-05'::timestamptz >= '4714-11-24 BC'::timestamp as t;
 
 RESET TimeZone;
 
+--
+-- Comparisons between date and timestamp types
+---
+
+SELECT '2025-03-28'::date = '2025-03-28 00:00:00'::timestamp as t;
+SELECT '2025-03-28'::date <> '2025-03-28 00:00:01'::timestamp as t;
+SELECT '2025-03-28'::date < '2025-03-28 00:00:01'::timestamp as t;
+SELECT '2025-03-28'::date <= '2025-03-28 00:00:00'::timestamp as t;
+SELECT '2025-03-28'::date > '2025-03-27 23:59

Re: AIO v2.5

2025-04-01 Thread Noah Misch
On Tue, Apr 01, 2025 at 06:25:28PM -0400, Andres Freund wrote:
> On 2025-04-01 17:47:51 -0400, Andres Freund wrote:
> > 3) Some subtests fail if RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE 
> > are defined:
> > 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-04-01%2019%3A23%3A07
> > 
> > # +++ tap check in src/test/modules/test_aio +++
> > 
> > #   Failed test 'worker: batch_start() leak & cleanup in implicit xact: 
> > expected stderr'
> > #   at t/001_aio.pl line 318.
> > #   'psql::4: ERROR:  starting batch while batch 
> > already in progress'
> > # doesn't match '(?^:open AIO batch at end)'
> > 
> > 
> > The problem is basically that the test intentionally forgets to exit 
> > batchmode
> > - normally that would trigger an error at the end of the transaction, which
> > the test verifies.  However, with RELCACHE_FORCE_RELEASE and
> > CATCACHE_FORCE_RELEASE defined, we get other code entering batchmode and
> > erroring out because batchmode isn't allowed to be entered recursively.

> > I don't really have a good idea how to deal with that yet.
> 
> Hm. Making the query something like
> 
> SELECT * FROM (VALUES (NULL), (batch_start()));
> 
> avoids the wrong output, because the type lookup happens for the first row
> already. But that's pretty magical and probably fragile.

Hmm.  Some options:

a. VALUES() trick above.  For test code, it's hard to argue with something
   that seems to solve it in practice.

b. Encapsulate the test in a PROCEDURE, so perhaps less happens between the
   batch_start() and the procedure-managed COMMIT.  Maybe less fragile than
   (a), maybe more fragile.

c. Move RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE to be
   GUC-controlled, like how CLOBBER_CACHE_ALWAYS changed into the
   debug_discard_caches GUC.  Then disable them for relevant parts of
   test_aio.  This feels best long-term, but it's bigger.  I also wanted this
   in syscache-update-pruned.spec[1].

d. Have test_aio deduce whether these are set, probably by observing memory
   contexts or DEBUG messages.  Maybe have every postmaster startup print a
   DEBUG message about these settings being enabled.  Skip relevant parts of
   test_aio.  This sounds messy.

Each of those feels defensible to me.  I'd probably do (a) or (b) to start.


[1] For that spec, an alternative expected output sufficed.  Incidentally,
I'll soon fix that spec flaking on valgrind/skink.




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Matthias van de Meent
On Tue, 1 Apr 2025 at 23:56, Matthias van de Meent
 wrote:
>
> On Tue, 1 Apr 2025 at 04:02, Peter Geoghegan  wrote:
> >
> > On Fri, Mar 28, 2025 at 5:59 PM Peter Geoghegan  wrote:
> > > Attached is v32, which has very few changes, but does add a new patch:
> > > a patch that adds skip-array-specific primitive index scan heuristics
> > > to _bt_advance_array_keys (this is
> > > v32-0003-Improve-skip-scan-primitive-scan-scheduling.patch).
> >
> > Attached is v33
>
> 0001:
>
> I just realised we never check whether skip keys' high/low_compare
> values generate valid quals, like what you'd see generated for WHERE a
> > 5 AND a < 3 AND b = 2;
>
> This causes a performance regression in the patched version:

Apparently it's not a regression, as we don't have this check in place
in the master branch. So, that's an optimization for PG19.

Sorry for the noise.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Matthias van de Meent
On Fri, 28 Mar 2025 at 22:59, Peter Geoghegan  wrote:
>
> On Tue, Mar 25, 2025 at 7:45 PM Peter Geoghegan  wrote:
> > Attached is v31, which has a much-improved _bt_skip_ikeyprefix (which
> > I've once again renamed, this time to _bt_set_startikey).
>
> Attached is v32

Thanks!

The review below was started for v31, then adapted to v32 when that
arrived. I haven't cross-checked this with v33.

Review for 0001:

I have some comments on the commit message, following below.  _Note:
For smaller patches I would let small things like this go, but given
the complexity of the feature I think it is important to make sure
there can be no misunderstanding about how it works and why it's
correct. Hence, these comments._

> Teach nbtree composite index scans to opportunistically skip over
> irrelevant sections of composite indexes given a query with an omitted
> prefix column.

AFAIK, this is only the second reference to "composite" indexes, after
a single mention in a comment in nbtsplitloc.c's _bt_strategy. IIUC,
this refers to multi-column indexes, which is more frequently and more
consistently used across the code base.

FYI, I think "composite index" would be more likely to refer to GIN,
given its double-btree structure. If we are about to use this term
more often, an entry in the glossary would be in place.

> When nbtree is passed input scan keys derived from a
> query predicate "WHERE b = 5", new nbtree preprocessing steps now output
> "WHERE a = ANY() AND b = 5" scan keys.

[...] new nbtree preprocessing steps now output +the equivalent of+ [...]

> Preprocessing can freely add a skip array before or after any input
> ScalarArrayOp arrays.

This implies skip arrays won't be generated for WHERE b = 5 (a
non-SAOP scankey) or WHERE b < 3 (not SAOP either), which is probably
not intentional, so a rewording would be appreciated.

// nbtpreprocesskeys.c

> +static bool _bt_s**parray_shrink

I'd like to understand the "shrink" here, as it's not entirely clear to me.
The functions are exclusively called through dispatch in
_bt_compare_array_scankey_args, and I'd expected that naming to be
extended to these functions.

> + * This qual now becomes "WHERE x = ANY('{every possible x value}') and y = 
> 4"

I understand what you're going for, but a reference that indexed NULLs
are still handled correctly (rather than removed by the filter) would
be appreciated, as the indicated substitution would remove NULL
values. (This doesn't have to be much more than a footnote.)

> + * Also sets *numSkipArrayKeys to # of skip arrays _bt_preprocess_array_keys
> + * caller must add to the scan keys it'll output.  Caller must add this many
> + * skip arrays to each of the most significant attributes lacking any keys

I don't think that's a good description; as any value other than 0 or
1 would mean more than one skip array per such attribute, which is
obviously incorrect. I think I'd word it like:

+ * Also sets *numSkipArrayKeys to # of skip arrays _bt_preprocess_array_keys
+ * caller must add to the scan keys it'll output.  Caller will have to add
+ * this many skip arrays: one for each of the most significant attributes
+ * lacking any keys that use the = strategy [...]


> +#if 0
> +/* Could be a redundant input scan key, so can't do this: */
> +Assert(inkey->sk_strategy == BTEqualStrategyNumber ||
> +   (inkey->sk_flags & SK_SEARCHNULL));
> +#endif

I think this should be removed?


> +#ifdef DEBUG_DISABLE_SKIP_SCAN

I noticed this one and only reference to DEBUG_DISABLE_SKIP_SCAN. Are
you planning on keeping that around, or is this a leftover?

> +ScanKeyinkey = scan->keyData + i;
> +
> +/*
> + * Backfill skip arrays for any wholly omitted attributes prior to
> + * attno_inkey
> + */
> +while (attno_skip < attno_inkey)

I don't understand why we're adding skip keys before looking at the
contents of this scankey, nor why we're backfilling skip keys based on
a now old value of attno_inkey. Please
add some comments on how/why this is the right approach.

> prev_numSkipArrayKeys, *numSkipArrayKeys

I'm a bit confused why we primarily operate on *numSkipArrayKeys,
rather than a local variable that we store in *numSkipArrayKeys once
we know we can generate skip keys. I.e., I'd expected something more
in line with the following snippet (incomplete code blocks):

+if (!OidIsValid(skip_eq_ops[attno_skip - 1]))
+{
+/*
+ * Cannot generate a skip array for this or later attributes
+ * (input opclass lacks an equality strategy operator)
+ */
+return numArrayKeys + *numSkipArrayKeys;
+}
+
+/* plan on adding a backfill skip array for this attribute */
+loc_numSkipArrayKeys++;
+attno_skip++;
+}
+
+*numSkipArrayKeys = loc_numSkipArrayKeys;

I think this is easier for the compi

Re: Using read stream in autoprewarm

2025-04-01 Thread Melanie Plageman
On Tue, Apr 1, 2025 at 8:50 AM Nazir Bilal Yavuz  wrote:
>
> I am attaching v8, which is an updated version of the v7. I tried to
> get rid of these local variables and refactored code to make logic
> more straightforward instead of going back and forth.
>
> 0001 and 0002 are v8. 0003 is another refactoring attempt to make code
> more straightforward. I did not squash 0003 to previous patches as you
> might not like it.

I looked at the code on your github branch that has all three of these
squashed together.
I think our approaches are converging. I like that you are
fast-forwarding to the next filenumber or fork number explicitly when
there is a bad relation or fork. I've changed my version (see newest
one attached) to do the fast-forwarding inline instead of in a
separate function like yours (the function didn't save many LOC and
actually may have added to cognitive overhead).

Compared to my version, I think you avoided one level of loop nesting with your

if (!rel)
else if (smgrexists(RelationGetSmgr(rel), blk->forknum))
else

but for starters, I don't think you can do this:

else if (smgrexists(RelationGetSmgr(rel), blk->forknum))

because you didn't check if you have a legal forknum first

And, I actually kind of prefer the explicitly nested structure

loop through all relations
  loop through all forks
loop through all buffers

While in the old structure, I liked your
autoprewarm_prewarm_relation() function, but I think it is nicer
inlined like in my version. It makes the loop through all buffers
explicit too.

I know you mentioned off-list that you don't like the handling of
global objects in my version, but I prefer doing it this way (even
though we have to check for in the loop condition) to having to set
the current database once we reach non-shared objects. It feels too
fiddly. This way seems less error prone. Looking at this version, what
do you think? Could we do it better?

Let me know what you think of this version. I think it is the best of
both our approaches. I've separated it into two commits -- the first
does all the refactoring without using the read stream API and the
second one uses the read stream API.

On another topic, what are the minimal places we need to call
have_free_buffers() (in this version)? I haven't even started looking
at the last patch you've been sending that is about checking the
freelist. I'll have to do that next.

- Melanie
From 2d14f37ea824a6ce3605f53b0a9c5622f08bf6e7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 31 Mar 2025 22:02:25 -0400
Subject: [PATCH 1/2] Refactor autoprewarm_database_main() in preparation for
 read stream

TODO: write a commit message

Co-authored-by: Nazir Bilal Yavuz 
Co-authored-by: Melanie Plageman 
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
---
 contrib/pg_prewarm/autoprewarm.c | 166 +--
 1 file changed, 89 insertions(+), 77 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 73485a2323c..84072587ea0 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -429,12 +429,11 @@ apw_load_buffers(void)
 void
 autoprewarm_database_main(Datum main_arg)
 {
-	int			pos;
 	BlockInfoRecord *block_info;
-	Relation	rel = NULL;
-	BlockNumber nblocks = 0;
-	BlockInfoRecord *old_blk = NULL;
+	int			i;
+	BlockInfoRecord *blk;
 	dsm_segment *seg;
+	Oid			database;
 
 	/* Establish signal handlers; once that's done, unblock signals. */
 	pqsignal(SIGTERM, die);
@@ -449,108 +448,121 @@ autoprewarm_database_main(Datum main_arg)
  errmsg("could not map dynamic shared memory segment")));
 	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid, 0);
 	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
-	pos = apw_state->prewarm_start_idx;
+
+	i = apw_state->prewarm_start_idx;
+	database = apw_state->database;
+
+	blk = &block_info[i];
 
 	/*
 	 * Loop until we run out of blocks to prewarm or until we run out of free
-	 * buffers.
+	 * buffers. We'll quit if we've reached records for another database,
+	 * however we do want to prewarm any global objects.
 	 */
-	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
+	while (i < apw_state->prewarm_stop_idx &&
+		   (blk->database == database || blk->database == 0) &&
+		   have_free_buffer())
 	{
-		BlockInfoRecord *blk = &block_info[pos++];
-		Buffer		buf;
+		RelFileNumber filenumber = blk->filenumber;
+		Oid			reloid;
+		Relation	rel;
 
 		CHECK_FOR_INTERRUPTS();
 
-		/*
-		 * Quit if we've reached records for another database. If previous
-		 * blocks are of some global objects, then continue pre-warming.
-		 */
-		if (old_blk != NULL && old_blk->database != blk->database &&
-			old_blk->database != 0)
-			break;
+		StartTransactionCommand();
 
-		/*
-		 * As soon as we encounter a block of a new relation, close the old
-		 * re

Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2025-04-01 Thread Shayon Mukherjee
On Tue, Apr 1, 2025 at 2:41 PM Sami Imseih  wrote:

> I went back to look at this patch and a few things. I noticed it did
> not have correct
> indentation, so I ran pgindent. I also removed some extra lines added and
> made
> some slight adjustments to the docs. Attached my edited patch as a txt. If
> you
> agree, please revise into a v14.
>
> I also noticed that between v12 and v13, the GUC use_invisible_index
> was removed,
> but I don't see a discussion as to why. I feel it's a good GUC to have
> [0], and we should
> at least have it as a separate patch as part of this set.
>
>
My apologies, I rebased off an old commit and that's how we lost the GUC
change and also the rename from ENABLE/DISABLE to VISIBLE/INVISIBLE. I have
brought it all back, in addition to the specs and other changes mentioned
in v12.  Let me know if you have any feedback, more than happy to
incorporate them.

Also, thank you for letting me know of pgindent, very handy!

I will continue reviewing the patch, but i feel this may be close to
> be marked RFC, although
> not sure if it will get a committer review before code freeze.
>

I really appreciate the reviews and guidance, thank you! Would love if this
can become part of v18 release. No worries if not of course. I am
definitely around to help get this production ready and also if any issues
arise after.

Thank you
Shayon


v14-0001-Introduce-the-ability-to-set-index-visibility-us.patch
Description: Binary data


Re: bug when apply fast default mechanism for adding new column over domain with default value

2025-04-01 Thread Tender Wang
Tom Lane  于2025年4月2日周三 09:05写道:

> Alexander Lakhin  writes:
> > I've discovered that 95f650674 introduced a defect similar to bug #18297,
> > but this time with DEFAULT. Namely, the following script:
> > CREATE TABLE a (aa text);
> > CREATE TABLE c (cc text) INHERITS (a);
> > CREATE TABLE d (dd text) INHERITS (c, a);
> > ALTER TABLE a ADD COLUMN i int DEFAULT 1;
>
> > fails with:
> > ERROR:  XX000: tuple already updated by self
> > LOCATION:  simple_heap_update, heapam.c:4421
>
> Hmm, yeah.  The failing call is here:
>
> /* Bump the existing child att's inhcount */
> ...
> CatalogTupleUpdate(attrdesc, &tuple->t_self, tuple);
>
> so I think you're right that that code path is now short a
> CommandCounterIncrement() somewhere.  I'll look tomorrow if
> nobody beats me to it.
>

Yes, when table a process its children, which are table c and table d.
Table c is first to be done.
At the same time, table d is also child of table c, so after updating own
pg_attribute tuple, table c will
process its child table d. And table d update its pg_attribute catalog
tuple.

After finishing table c, the logic returning to continue to process table
a's children, which this time is table d.
Between table d pg_attribute tuple updated as child of table c and updating
table d pg_attribute tuple again as child of table a,
there is no call CommandCounterIncrement().

So let's add CommandCounterIncrement() after calling StoreAttrMissingVal().


-- 
Thanks, Tender Wang
From ee2bc290cab83daf0ebecf9f29cc23539ba19741 Mon Sep 17 00:00:00 2001
From: Tender Wang 
Date: Wed, 2 Apr 2025 09:52:22 +0800
Subject: [PATCH] Fix "tuple already updated by self" issue.

---
 src/backend/commands/tablecmds.c  |  2 ++
 src/test/regress/expected/inherit.out | 17 +
 src/test/regress/sql/inherit.sql  |  8 
 3 files changed, 27 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 10624353b0a..67e0aabf4a7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7518,6 +7518,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, 
Relation rel,
{
StoreAttrMissingVal(rel, 
attribute->attnum, missingval);
has_missing = true;
+   /* Make above changes visible */
+   CommandCounterIncrement();
}
FreeExecutorState(estate);
}
diff --git a/src/test/regress/expected/inherit.out 
b/src/test/regress/expected/inherit.out
index 4d07d0bd79b..ba4327e7acb 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1108,6 +1108,23 @@ Child tables: inhtb,
   inhtd
 
 DROP TABLE inhta, inhtb, inhtc, inhtd;
+-- Test for adding a column wiht DEFAULT value
+CREATE TABLE inhtda (aa text);
+CREATE TABLE inhtdc (cc text) INHERITS (inhtda);
+CREATE TABLE inhtdd (dd text) INHERITS (inhtdc, inhtda);
+NOTICE:  merging multiple inherited definitions of column "aa"
+ALTER TABLE inhtda ADD COLUMN i int DEFAULT 1;
+NOTICE:  merging definition of column "i" for child "inhtdd"
+\d+ inhtda
+   Table "public.inhtda"
+ Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | 
Description 
++-+---+--+-+--+--+-
+ aa | text|   |  | | extended |  | 
+ i  | integer |   |  | 1   | plain|  | 
+Child tables: inhtdc,
+  inhtdd
+
+DROP TABLE inhtda, inhtdc, inhtdd;
 -- Test for renaming in diamond inheritance
 CREATE TABLE inht2 (x int) INHERITS (inht1);
 CREATE TABLE inht3 (y int) INHERITS (inht1);
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 941189761fd..d9fd6340798 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -384,6 +384,14 @@ ALTER TABLE inhta ADD COLUMN i int;
 \d+ inhta
 DROP TABLE inhta, inhtb, inhtc, inhtd;
 
+-- Test for adding a column wiht DEFAULT value
+CREATE TABLE inhtda (aa text);
+CREATE TABLE inhtdc (cc text) INHERITS (inhtda);
+CREATE TABLE inhtdd (dd text) INHERITS (inhtdc, inhtda);
+ALTER TABLE inhtda ADD COLUMN i int DEFAULT 1;
+\d+ inhtda
+DROP TABLE inhtda, inhtdc, inhtdd;
+
 -- Test for renaming in diamond inheritance
 CREATE TABLE inht2 (x int) INHERITS (inht1);
 CREATE TABLE inht3 (y int) INHERITS (inht1);
-- 
2.34.1



Re: general purpose array_sort

2025-04-01 Thread Junwang Zhao
On Wed, Apr 2, 2025 at 6:05 AM Tom Lane  wrote:
>
> Junwang Zhao  writes:
> > On Tue, Apr 1, 2025 at 1:11 AM Tom Lane  wrote:
> >> Attached 0001 is the same as v18, and then 0002 is the proposed
> >> addition to typcache.
>
> > I've applied the patches to master and regression passed.
> > 0002 is neat, I am +1 for this improvement.
>
> Hearing no further comments, pushed.
>
> regards, tom lane

Thanks for pushing, and I noticed the corresponding CF entry has
been marked as committed, thanks for taking care of it.

-- 
Regards
Junwang Zhao




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Matthias van de Meent
On Tue, 1 Apr 2025 at 04:02, Peter Geoghegan  wrote:
>
> On Fri, Mar 28, 2025 at 5:59 PM Peter Geoghegan  wrote:
> > Attached is v32, which has very few changes, but does add a new patch:
> > a patch that adds skip-array-specific primitive index scan heuristics
> > to _bt_advance_array_keys (this is
> > v32-0003-Improve-skip-scan-primitive-scan-scheduling.patch).
>
> Attached is v33

0001:

I just realised we never check whether skip keys' high/low_compare
values generate valid quals, like what you'd see generated for WHERE a
> 5 AND a < 3 AND b = 2;

This causes a performance regression in the patched version:

   ->  Index Only Scan using test_a_b_idx on test  (cost=0.14..8.16
rows=1 width=0) (actual time=0.240..0.241 rows=0.00 loops=1)
 Index Cond: ((a > 5) AND (a < 3) AND (b = 2))
 Heap Fetches: 0
 Index Searches: 1
 Buffers: shared hit=1

As you can see in this explain, we're doing an index search, while the
index searches attribute before this patch would've been 0 due to
conflicting conditions.

(This came up while reviewing 0004, when I thought about doing this
key consistency check after the increment/decrement optimization of
that patch and after looking couldn't find the skipscan bounds
consistency check at all)

0002:

// nbtutils.c

> + * (safe even when "key->sk_attno <= firstchangingattnum")

Typo: should be "key->sk_attno >= firstchangingattnum".

I'd also refactor the final segment to something like the following,
to remove a redundant compare when the attribute we're checking is
equal between firsttup and lasttup:

+firstdatum = index_getattr(firsttup, key->sk_attno, tupdesc,
&firstnull);
+
+/* Test firsttup */
+_bt_binsrch_skiparray_skey(false, ForwardScanDirection,
+   firstdatum, firstnull, array, key,
+   &result);
+if (result != 0)
+break;/* unsafe */
+
+/* both attributes are equal, so no need to check lasttup */
+if (key->sk_attno < firstchangingattnum)
+continue;
+
+lastdatum = index_getattr(lasttup, key->sk_attno, tupdesc, &lastnull);
+
+/* Test lasttup */
+_bt_binsrch_skiparray_skey(false, ForwardScanDirection,
+   lastdatum, lastnull, array, key,
+   &result);
+if (result != 0)
+break;/* unsafe */
+
+/* Safe, range skip array satisfied by every tuple */

0003: LGTM

0004: LGTM, but note the current bug in 0001, which is probably best
solved with a fix that keeps this optimization in mind, too.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: AIO v2.5

2025-04-01 Thread Andres Freund
Hi,

On 2025-04-01 17:47:51 -0400, Andres Freund wrote:
> 3) Some subtests fail if RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE 
> are defined:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-04-01%2019%3A23%3A07
> 
> # +++ tap check in src/test/modules/test_aio +++
> 
> #   Failed test 'worker: batch_start() leak & cleanup in implicit xact: 
> expected stderr'
> #   at t/001_aio.pl line 318.
> #   'psql::4: ERROR:  starting batch while batch 
> already in progress'
> # doesn't match '(?^:open AIO batch at end)'
> 
> 
> The problem is basically that the test intentionally forgets to exit batchmode
> - normally that would trigger an error at the end of the transaction, which
> the test verifies.  However, with RELCACHE_FORCE_RELEASE and
> CATCACHE_FORCE_RELEASE defined, we get other code entering batchmode and
> erroring out because batchmode isn't allowed to be entered recursively.
> 
> 
> #0  pgaio_enter_batchmode () at 
> ../../../../../home/andres/src/postgresql/src/backend/storage/aio/aio.c:997
> #1  0x55ec847959bf in read_stream_look_ahead (stream=0x55ecbcfda098)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/storage/aio/read_stream.c:438
> #2  0x55ec84796514 in read_stream_next_buffer (stream=0x55ecbcfda098, 
> per_buffer_data=0x0)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/storage/aio/read_stream.c:890
> #3  0x55ec8432520b in heap_fetch_next_buffer (scan=0x55ecbcfd1c00, 
> dir=ForwardScanDirection)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:679
> #4  0x55ec843259a4 in heapgettup_pagemode (scan=0x55ecbcfd1c00, 
> dir=ForwardScanDirection, nkeys=1, key=0x55ecbcfd1620)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:1041
> #5  0x55ec843263ba in heap_getnextslot (sscan=0x55ecbcfd1c00, 
> direction=ForwardScanDirection, slot=0x55ecbcfd0e18)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:1420
> #6  0x55ec8434ebe5 in table_scan_getnextslot (sscan=0x55ecbcfd1c00, 
> direction=ForwardScanDirection, slot=0x55ecbcfd0e18)
> at 
> ../../../../../home/andres/src/postgresql/src/include/access/tableam.h:1041
> #7  0x55ec8434f786 in systable_getnext (sysscan=0x55ecbcfd8088) at 
> ../../../../../home/andres/src/postgresql/src/backend/access/index/genam.c:541
> #8  0x55ec849c784a in SearchCatCacheMiss (cache=0x55ecbcf81000, nkeys=1, 
> hashValue=3830081846, hashIndex=2, v1=403, v2=0, v3=0, v4=0)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1543
> #9  0x55ec849c76f9 in SearchCatCacheInternal (cache=0x55ecbcf81000, 
> nkeys=1, v1=403, v2=0, v3=0, v4=0)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1464
> #10 0x55ec849c73ec in SearchCatCache1 (cache=0x55ecbcf81000, v1=403) at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1332
> #11 0x55ec849e5ae3 in SearchSysCache1 (cacheId=2, key1=403) at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/syscache.c:228
> #12 0x55ec849d8c78 in RelationInitIndexAccessInfo 
> (relation=0x7f6a85901c20)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1456
> #13 0x55ec849d8471 in RelationBuildDesc (targetRelId=2703, insertIt=true)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1201
> #14 0x55ec849d9e9c in RelationIdGetRelation (relationId=2703) at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2100
> #15 0x55ec842d219f in relation_open (relationId=2703, lockmode=1) at 
> ../../../../../home/andres/src/postgresql/src/backend/access/common/relation.c:58
> #16 0x55ec8435043c in index_open (relationId=2703, lockmode=1) at 
> ../../../../../home/andres/src/postgresql/src/backend/access/index/indexam.c:137
> #17 0x55ec8434f2f9 in systable_beginscan (heapRelation=0x7f6a859353a8, 
> indexId=2703, indexOK=true, snapshot=0x0, nkeys=1, key=0x7ffc11aa7c90)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/access/index/genam.c:400
> #18 0x55ec849c782c in SearchCatCacheMiss (cache=0x55ecbcfa0e80, nkeys=1, 
> hashValue=2659955452, hashIndex=60, v1=2278, v2=0, v3=0, v4=0)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1533
> #19 0x55ec849c76f9 in SearchCatCacheInternal (cache=0x55ecbcfa0e80, 
> nkeys=1, v1=2278, v2=0, v3=0, v4=0)
> at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1464
> #20 0x55ec849c73ec in SearchCatCache1 (cache=0x55ecbcfa0e80, v1=2278) at 
> ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1332
> #21 0x55ec849e5ae3 in SearchSysCache1 (cacheId=82, key1=2278) at 
> ../../../../../home/andres/src/postgresql/src/backend/

Re: [PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward

2025-04-01 Thread Dimitrios Apostolou
Thanks. This is the first value I tried and it works well. In the archive I 
have all blocks seem to be between 8 and 20KB so the jump forward before the 
change never even got close to 1MB. Could it be bigger in an uncompressed 
archive? Or in a future pg_dump that raises the block size? I don't really 
know, so it is difficult to test such scenario but it made sense to guard 
against these cases too.

I chose 1MB by basically doing a very crude calculation in my mind: when would 
it be worth seeking forward instead of reading? On very slow drives 60MB/s 
sequential and 60 IOPS for random reads is a possible speed. In that worst case 
it would be better to seek() forward for lengths of over 1MB. 

On 1 April 2025 22:04:00 CEST, Nathan Bossart  wrote:
>On Tue, Apr 01, 2025 at 09:33:32PM +0200, Dimitrios Apostolou wrote:
>> It didn't break any test, but I also don't see any difference, the
>> performance boost is noticeable only when restoring a huge archive that is
>> missing offsets.
>
>This seems generally reasonable to me, but how did you decide on 1MB as the
>threshold?  Have you tested other values?  Could the best threshold vary
>based on the workload and hardware?
>




Re: Hashed IN only applied to first encountered IN

2025-04-01 Thread David Rowley
On Wed, 2 Apr 2025 at 01:31, David Rowley  wrote:
>
> On Wed, 2 Apr 2025 at 00:51, David Geier  wrote:
> > The hashed IN optimization is only applied to the first encountered
> > ScalarArrayOpExpr during the expression tree traversal in
> > convert_saop_to_hashed_saop_walker(). Reason being that the walker
> > returns true which aborts the traversal.
>
> > I've also attached a patch with a fix.
>
> Thanks for the report and fix.  On first inspection your patch looks
> fine.  I'll have a closer look tomorrow.

I've now pushed and backpatched the relevant portion of this back to v14.

Thanks again.

David




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-04-01 Thread Peter Geoghegan
On Tue, Apr 1, 2025 at 5:56 PM Matthias van de Meent
 wrote:
> 0002:
>
> // nbtutils.c
>
> > + * (safe even when "key->sk_attno <= firstchangingattnum")
>
> Typo: should be "key->sk_attno >= firstchangingattnum".

Good catch!

Though I think it should be "" safe even when "key->sk_attno >
firstchangingattnum" "", to highlight that the rule here is
significantly more permissive than even the nearby range skip array
case, which is still safe when (key->sk_attno == firstchangingattnum).

As I'm sure you realize, SAOP = keys and regular = keys are only safe
when "key->sk_attno < firstchangingattnum". So there are a total of 3
distinct rules about how firstchangingattnum affects whether it's safe
to advance pstate.startikey past a scan key (which of the 3 rules we
apply depends solely on the type of scan key).

In summary, simple = keys have the strictest firstchangingattnum rule,
range skip arrays/scalar inequalities have a somewhat less restrictive
rule, and non-range skip arrays have the least restrictive/most
permissive rule. As I think you understand already, it is generally
safe to set pstate.startikey to an offset that's past several earlier
simple skip arrays (against several prefix columns, all omitted from
the query) -- even when firstchangingattnum is the lowest possible
value (which is attnum 1).

> I'd also refactor the final segment to something like the following,
> to remove a redundant compare when the attribute we're checking is
> equal between firsttup and lasttup:

At one point the code did look like that, but I concluded that the
extra code wasn't really worth it. We can only save cycles within
_bt_set_startikey itself this way, which doesn't add up to much.
_bt_set_startikey is only called once per page.

Besides, in general it's fairly likely that a range skip array that
_bt_set_startikey sees won't be against a column that we already know
(from the _bt_keep_natts_fast precheck, which returns
firstchangingattnum) to only have one distinct value among all tuples
on the page.

> 0003: LGTM
>
> 0004: LGTM

Great, thanks!

-- 
Peter Geoghegan




Re: TEMP_CONFIG vs test_aio

2025-04-01 Thread Andrew Dunstan


On 2025-04-01 Tu 4:17 PM, Andres Freund wrote:

Hi,

On 2025-04-01 20:12:29 +, Todd Cook wrote:

On 4/1/25, 3:42 PM, "Andres Freund" mailto:and...@anarazel.de>> wrote:

I just committed the tests for AIO, and unfortunately they (so far) fail on
one buildfarm animal:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bumblebee&dt=2025-04-01%2018%3A55%3A01
 


The reason for the failure is simple, the buildfarm animal specifies
io_method=io_uring (thanks to "cookt" for setting that up so quickly, whoever
you are :)) and the test is assuming that the -c io_method=... it passes to
initdb is actually going to be used, but it's overwritten by the TEMP_CONFIG.

You're welcome!

Is there an alternate way I could use to configure the io_method on bumblebee?

You could use PG_TEST_INITDB_EXTRA_OPTS, but I think you did it the right
way.

For one using PG_TEST_INITDB_EXTRA_OPTS would probably require changing the
buildfarm code, because the buildfarm code filters out environment variables
that aren't on an allowlist (I really dislike that).



Uh, not quite. Anything in the config's build_env is not filtered out. 
That change was made a year ago.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Improve CRC32C performance on SSE4.2

2025-04-01 Thread Nathan Bossart
On Tue, Apr 01, 2025 at 05:33:02PM +0700, John Naylor wrote:
> On Thu, Mar 27, 2025 at 2:55 AM Devulapalli, Raghuveer 
>  wrote:
>> (1) zmm_regs_available() in pg_crc32c_sse42_choose.c is a duplicate of
>> pg_popcount_avx512.c but perhaps that’s fine for now. I will propose a
>> consolidated SIMD runtime check in a follow up patch.
> 
> Yeah, I was thinking a small amount of duplication is tolerable.

+1.  This is easy enough to change in the future if/when we add some sort
of centralized CPU feature detection.

>> (2) Might be apt to rename pg_crc32c_sse42*.c to pg_crc32c_x86*.c  since
>> they contain both sse42 and avx512 versions.
> 
> The name is now not quite accurate, but it's not exactly misleading
> either. I'm leaning towards keeping it the same, so for now I've just
> updated the header comment.

I'm not too worried about this one either.  FWIW I'm likely going to look
into moving all the x86_64 popcount stuff into pg_popcount_avx512.c and
renaming it to pg_popcount_x86_64.c for v19.  This would parallel
pg_popcount_aarch64.c a bit better, and a file per architecture seems like
a logical way to neatly organize things.

> For v16, I made another pass through made some more mostly superficial
> adjustments:
> - copied rewritten Meson comment to configure.ac
> - added some more #include guards out of paranoia
> - added tests with longer lengths that exercise the new paths
> - adjusted configure / meson messages for consistency
> - changed not-quite-accurate wording about "AVX-512 CRC instructions"
> - used "PCLMUL" only when talking about specific intrinsics and prefer
> "AVX-512" elsewhere, to head off potential future confusion with Arm
> PMULL.

I read through the code a couple of times and nothing stood out to me.

-- 
nathan




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2025-04-01 Thread Ashutosh Bapat
On Tue, Apr 1, 2025 at 1:32 PM Amit Langote  wrote:
>
> I think David’s suggestion to use 64 as the fixed initial size is
> simpler and more predictable. Since list_length * 2 will always be >=
> 64 anyway, unless we expect clause counts to grow significantly right
> after the threshold, the fixed size avoids the need to reason about
> sizing heuristics and keeps the logic clearer.
>
> It also lines up well with David’s point -- 64 strikes a good balance
> for both memory usage and CPU efficiency. Unless there’s a strong
> reason to favor dynamic sizing, I’d prefer to stick with the fixed 64.

Consider following table and query (based on a similar table in
join.sql, which exercises find_derived_clause_for_ec_member() code
path.
#create table fkest (x integer, x10 integer, x10b integer, x100
integer, unique(x, x10, x100), foreign key (x, x10b, x100) references
fkest(x, x10, x100));
#select 'select * from fkest f1 join ' || string_agg(format('fkest f%s
on (f1.x = f%s.x and f1.x10 = f%s.x10b and f1.x100 = f%s.x100)', i, i,
i , i), ' join ') || ' where f1.x100 = 2' query from
generate_series(2, 100) i; \gset
#explain (summary) :query;

This is a 100-way self-join between foreign key and referenced key and
one of the foreign keys being set to a constant. This exercises the
case of derived clauses with constant EM.

When planning this query, all the 100 derived clauses containing the
constant EM are created first and then they are searched one by one
for every EM. Thus when the hash table is created in
find_derived_clause_for_ec_member()->ec_search_derived_clause_for_ems(),
the length of ec_derives_list is already 100, so the hash table will
be expanded while it's being filled up the first time, if we use
constant 64 as initial hash table size. This can be avoided if we use
list_length() * 2. The pattern for create_join_clause() however is
search then insert - so it will always create the hash table with
initial size 64 right when the list length is 32. Both these cases are
served well if we base the initial hash table size as a multiple of
list_length(ec_derives_list).

For the record, without the patch this query takes about 5800ms on
average for planning on my laptop. With the patch the planning time
reduces to about 5400ms - 6-7% of improvement if my approximate math
is correct.

>
> As David suggested off-list, it seems better to have a static inline
> function for the key canonicalization logic than to duplicate it in
> the insert and lookup paths, as done in your 0004.

Static inline function to fill the key in canonical form is a good
idea. Thanks for the patch.

>
> * Replaces literal values for threshold and initial size with macros,
> defined near the key and entry types.

I don't see this in your attached patches. Am I missing something?
It's still using list_length() for initial hash table size. But with
my explanation above, I believe we will keep it that way.

>
> I wasn’t sure why you removed this comment:
>
> - * We do not attempt a second lookup with EMs swapped when using the hash
> - * table; such clauses are inserted under both orderings at the time of
> - * insertion.

When I read it, I didn't understand why we mentioned a second lookup,
so I dropped it. But now I see that the comment is in the context of
two comparisons being done when using list. I have rephrased the
comment a bit to make this comparison clear.


>
> 0003’s commit message includes a blurb I plan to paste into the main
> patch’s message (with minor tweaks) when we squash the patches.

Slight correction in the first sentence of that blurb message.

Derived clauses are now stored in the ec_derives_hash using canonicalized
keys: the EquivalenceMember with lower memory address is always placed
in em1, and ...

+/*
+ * fill_ec_derives_key
+ * Compute a canonical key for ec_derives_hash lookup or insertion.
+ *
+ * Derived clauses are looked up using a pair of EquivalenceMembers and a
+ * parent EquivalenceClass. To avoid storing or searching for both EM
orderings,
+ * we canonicalize the key:
+ *
+ * - For clauses involving two non-constant EMs, we order the EMs by address
+ * and place the lower one first.
+ * - For clauses involving a constant EM, the caller must pass the non-constant
+ * EM as leftem; we then set em1 = NULL and em2 = leftem.
+ */
+static inline void
+fill_ec_derives_key(ECDerivesKey *key,
+ EquivalenceMember *leftem,
+ EquivalenceMember *rightem,
+ EquivalenceClass *parent_ec)
+{
+ Assert(leftem); /* Always required for lookup or insertion */
+
+ /*
+ * Clauses with a constant EM are always stored and looked up using only
+ * the non-constant EM, with the other key slot set to NULL.
+ */

This comment seems to overlap with what's already there in the
prologue. However "Clauses with a constant EM are always stored and
looked up using the non-constant EM" is non-overlapping part. This
part is also repeated in ec_add_clause_to_derives_hash(), which I
think is a better place for this comment. Changed in the atta

Re: Test to dump and restore objects left behind by regression

2025-04-01 Thread Alvaro Herrera
On 2025-Apr-01, Ashutosh Bapat wrote:

> Just today morning, I found something which looks like another bug in
> statistics dump/restore [1]. As Daniel has expressed upthread [2], we
> should go ahead and commit the test even if the bug is not fixed. But
> in case it creates a lot of noise and makes the build farm red, we
> could suppress the failure by not dumping statistics for comparison
> till the bug is fixed. PFA patchset which reintroduces 0003 which
> suppresses the statistics dump - in case we think it's needed. I have
> made some minor cosmetic changes to 0001 and 0002 as well.

I have made some changes of my own, and included --no-statistics.
But I had already started messing with your patch, so I didn't look at
the cosmetic changes you did here.  If they're still relevant, please
send them my way.

Hopefully it won't break, and if it does, it's likely fault of the
changes I made.  I've run it through CI and all is well though, so
fingers crossed.
https://cirrus-ci.com/build/6327169669922816


I observe in the CI results that the pg_upgrade test is not necessarily
the last one to finish.  In one case it even finished in place 12!

[16:36:48.447]  12/332 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 
 112.16s   22 subtests passed
https://api.cirrus-ci.com/v1/task/5803071017582592/logs/test_world.log

... but if we still find that it's too slow, we can make it into a
PG_TEST_EXTRA test easily with a "skip" line.  (Or we can add
a new PG_TEST_EXCLUDE thingy for impatient people).

Thanks!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: pgsql: Add support for OAUTHBEARER SASL mechanism

2025-04-01 Thread Jacob Champion
On Tue, Apr 1, 2025 at 6:12 AM Daniel Gustafsson  wrote:
>
> > On 1 Apr 2025, at 15:03, Christoph Berg  wrote:
>
> > With the libpq-oauth split, this makes even more sense because
> > building a library that always throws an error isn't very useful.
> > (Don't build that file at all if the feature doesn't work.)
>
> After the split, configure/meson should fail if the libcurl dependency isn't
> satisfied or if the platform isn't supported.

Yeah, after sleeping on it I agree. If I want a "canary" buildfarm
animal to opt into compilation on unsupported platforms, I can instead
look into a manual #define or something; it doesn't have to be a
supported configure-time thing.

> > Since oauth/curl have some security implications, would it make more
> > sense to call the switch --enable-oauth (-Doauth) so users could
> > control better what features their libpq is going to have? Perhaps
> > some other feature (pg_service as URL?) is going to need libcurl as
> > well, but it should be configurable separately.
>
> Perhaps --with-oauth-client for the opt-in libpq-oauth?

It started as -Doauth way back when, but was changed as part of the
discussion at [1]. Peter, do you have any objections to switching back
to an OAuth-related name?

--Jacob

[1] https://postgr.es/m/6bde5f56-9e7a-4148-b81c-eb6532cb3651%40eisentraut.org




RE: Fix 035_standby_logical_decoding.pl race conditions

2025-04-01 Thread Hayato Kuroda (Fujitsu)
Dear Bertrand,

> s/to avoid the seeing a xl_running_xacts/to avoid seeing a xl_running_xacts/?

Fixed.

> 
> === 2 (Nit)
> 
> /* For testing slot invalidation due to the conflict */
> 
> Not sure "due to the conflict" is needed.
>

OK, removed.

>  About PG17-v2-0001
> 
> === 3
> 
> The commit message still mentions injection point.

Oh, removed.

> === 4
> 
> -# Note that pg_current_snapshot() is used to get the horizon.  It does
> -# not generate a Transaction/COMMIT WAL record, decreasing the risk of
> -# seeing a xl_running_xacts that would advance an active replication slot's
> -# catalog_xmin.  Advancing the active replication slot's catalog_xmin
> -# would break some tests that expect the active slot to conflict with
> -# the catalog xmin horizon.
> 
> I'd be tempted to not remove this comment but reword it a bit instead. 
> Something
> like?
> 
> # Note that pg_current_snapshot() is used to get the horizon.  It does
> # not generate a Transaction/COMMIT WAL record, decreasing the risk of
> # seeing a xl_running_xacts that would advance an active replication slot's
> # catalog_xmin.  Advancing the active replication slot's catalog_xmin
> # would break some tests that expect the active slot to conflict with
> # the catalog xmin horizon. We ensure that active replication slots are not
> # created for tests that might produce this race condition though.

Added.

> === 6 (Nit)
> 
> In drop_logical_slots(), s/needs_active_slot/drop_active_slot/?

Fixed.

> === 7 (Nit)
> 
> In check_slots_conflict_reason(), s/needs_active_slot/checks_active_slot/?

Fixed.

>  About PG16-v2-0001
> 
> Same as for PG17-v2-0001.

I followed all needed changes.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v3-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch
Description:  v3-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch


PG16-v3-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch
Description:  PG16-v3-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch


PG17-v3-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch
Description:  PG17-v3-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch


  1   2   >