date:20250214

Re: Parallel heap vacuum

2025-02-14 Thread Masahiko Sawada

On Thu, Feb 13, 2025 at 8:16 PM John Naylor  wrote:
>
> On Thu, Feb 13, 2025 at 5:37 AM Masahiko Sawada  wrote:
> >
> > During eager vacuum scan, we reset the eager_scan_remaining_fails
> > counter when we start to scan the new region. So if we want to make
> > parallel heap vacuum behaves exactly the same way as the
> > single-progress vacuum in terms of the eager vacuum scan, we would
> > need to have the eager_scan_remaining_fails counters for each region
> > so that the workers can decrement it corresponding to the region of
> > the block that the worker is scanning. But I'm concerned that it makes
> > the logic very complex. I'd like to avoid making newly introduced
> > codes more complex by adding yet another new code on top of that.
>
> Would it be simpler to make only phase III parallel?

Yes, I think so.

>  In other words,
> how much of the infrastructure and complexity needed for parallel
> phase I is also needed for phase III?

Both phases need some common changes to the parallel vacuum
infrastructure so that we can launch parallel workers using
ParallelVacuumContext for phase I and III and parallel vacuum workers
can do its task on the heap table. Specifically, we need to change
vacuumparallel.c to work also in parallel heap vacuum case, and add
some table AM callbacks. Other than that, these phases have different
complexity. As for supporting parallelism for phase III, changes to
lazy vacuum would not be very complex, it needs to change both the
radix tree and TidStore to support shared iteration through.
Supporting parallelism of phase I is more complex since it integrates
parallel table scan with lazy heap scan.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Dmitry Dolgov

> On Thu, Feb 13, 2025 at 05:08:45PM GMT, Sami Imseih wrote:
> Constants passed as parameters to a prepared statement will not be
> handled as expected. I did not not test explicit PREPARE/EXECUTE statement,
> but I assume it will have the same issue.

This is the same question of supporting various cases. The original
patch implementation handled Param expressions as well, this part was
explicitly rejected during review. I think as a first step it's
important to find a balance between applying this optimization in as
many cases as possible, and at the same time keep the implementation
simple to give the patch a chance. So far I'm inclined to leave Param
for the future work, although of course I'm open to discussion.

> This case with an array passed to aa function seems to cause a regression
> in pg_stat_statements query text. As you can see the text is incomplete.

I've already mentioned that in the previous email. To reiterate, it's
not a functionality regression, but an incomplete representation of a
normalized query which turned out to be hard to change. While I'm
working on that, there is a suggestion that it's not a blocker.

> A typo in the docs.
> c/lenght/length

One day I'll write documentation without any typo, but not today :) Thanks,
will fix with the next version.

> Is this parameter specific to only useful to merge the values of an IN list.
> Should the documentation be more specific and say that only IN lists
> will benefit from this parameter?

You can find one test showing that this optimization is applied to a
plain ARRAY[1, 2, 3, ...], so it's not only IN expressions.

> Also, if there is only 1 value in the list, it will have a different
> queryId than
> that of the same query in which more than 1 value is passed to the IN list.
> Should the documentation be clear about that?

I tend to think there is not much value in emphasizing that. It will add more
mental overhead to process, but this part already says "an array of constants
will contribute only the first and the last elements to the query identifier"
-- having only the first element differs from having both, hence a new entry.

> pg_node_attr of query_jumble_merge is doing something
> very specific to the elements list of an ArrayExpr. The
> merge code likely cannot be used for other node types.

It can be, take a look at pg_node_attr commentary. Any node can have a
field marked with query_jumble_merge attribut and benefit from merging.

Re: [RFC] Lock-free XLog Reservation from WAL

2025-02-14 Thread Zhou, Zhiguo




On 2/11/2025 9:25 AM, Japin Li wrote:

On Mon, 10 Feb 2025 at 22:12, "Zhou, Zhiguo"  wrote:

On 2/5/2025 4:32 PM, Japin Li wrote:

On Mon, 27 Jan 2025 at 17:30, "Zhou, Zhiguo"  wrote:

On 1/26/2025 10:59 PM, Yura Sokolov wrote:

24.01.2025 12:07, Japin Li пишет:

On Thu, 23 Jan 2025 at 21:44, Japin Li  wrote:

On Thu, 23 Jan 2025 at 15:03, Yura Sokolov
 wrote:

23.01.2025 11:46, Japin Li пишет:

On Wed, 22 Jan 2025 at 22:44, Japin Li  wrote:

On Wed, 22 Jan 2025 at 17:02, Yura Sokolov
 wrote:

I believe, I know why it happens: I was in hurry making v2 by
cherry-picking internal version. I reverted some changes in
CalcCuckooPositions manually and forgot to add modulo
PREV_LINKS_HASH_CAPA.

Here's the fix:

    pos->pos[0] = hash % PREV_LINKS_HASH_CAPA;
-   pos->pos[1] = pos->pos[0] + 1;
+   pos->pos[1] = (pos->pos[0] + 1) % PREV_LINKS_HASH_CAPA;
    pos->pos[2] = (hash >> 16) % PREV_LINKS_HASH_CAPA;
-   pos->pos[3] = pos->pos[2] + 2;
+   pos->pos[3] = (pos->pos[2] + 2) % PREV_LINKS_HASH_CAPA;

Any way, here's v3:
- excess file "v0-0001-Increase..." removed. I believe it was source
      of white-space apply warnings.
- this mistake fixed
- more clear slots strategies + "8 positions in two
cache-lines" strategy.

You may play with switching PREV_LINKS_HASH_STRATEGY to 2 or 3
and see
if it affects measurably.


Thanks for your quick fixing.  I will retest it tomorrow.

Hi, Yura Sokolov
Here is my test result of the v3 patch:
| case  | min    | avg    | max
|
|---++
+|
| master (44b61efb79)   | 865,743.55 | 871,237.40 |
874,492.59 |
| v3    | 857,020.58 | 860,180.11 |
864,355.00 |
| v3 PREV_LINKS_HASH_STRATEGY=2 | 853,187.41 | 855,796.36 |
858,436.44 |
| v3 PREV_LINKS_HASH_STRATEGY=3 | 863,131.97 | 864,272.91 |
865,396.42 |
It seems there are some performance decreases :( or something I
missed?



Hi, Japin.
(Excuse me for duplicating message, I found I sent it only to you
first time).


Never mind!


v3 (as well as v2) doesn't increase NUM_XLOGINSERT_LOCKS itself.
With only 8 in-progress inserters spin-lock is certainly better
than any
more complex solution.

You need to compare "master" vs "master + NUM_XLOGINSERT_LOCKS=64" vs
"master + NUM_XLOGINSERT_LOCKS=64 + v3".

And even this way I don't claim "Lock-free reservation" gives any
profit.

That is why your benchmarking is very valuable! It could answer, does
we need such not-small patch, or there is no real problem at all?



Hi, Yura Sokolov

Here is the test result compared with NUM_XLOGINSERT_LOCKS and the
v3 patch.

| case  | min  | avg  |
max  | rate% |
|---+--+--+--
+---|
| master (4108440)  | 891,225.77   | 904,868.75   |
913,708.17   |    |
| lock 64   | 1,007,716.95 | 1,012,013.22 |
1,018,674.00 | 11.84 |
| lock 64 attempt 1 | 1,016,716.07 | 1,017,735.55 |
1,019,328.36 | 12.47 |
| lock 64 attempt 2 | 1,015,328.31 | 1,018,147.74 |
1,021,513.14 | 12.52 |
| lock 128  | 1,010,147.38 | 1,014,128.11 |
1,018,672.01 | 12.07 |
| lock 128 attempt 1    | 1,018,154.79 | 1,023,348.35 |
1,031,365.42 | 13.09 |
| lock 128 attempt 2    | 1,013,245.56 | 1,018,984.78 |
1,023,696.00 | 12.61 |
| lock 64 v3    | 1,010,893.30 | 1,022,787.25 |
1,029,200.26 | 13.03 |
| lock 64 attempt 1 v3  | 1,014,961.21 | 1,019,745.09 |
1,025,511.62 | 12.70 |
| lock 64 attempt 2 v3  | 1,015,690.73 | 1,018,365.46 |
1,020,200.57 | 12.54 |
| lock 128 v3   | 1,012,653.14 | 1,013,637.09 |
1,014,358.69 | 12.02 |
| lock 128 attempt 1 v3 | 1,008,027.57 | 1,016,849.87 |
1,024,597.15 | 12.38 |
| lock 128 attempt 2 v3 | 1,020,552.04 | 1,024,658.92 |
1,027,855.90 | 13.24 |


The data looks really interesting and I recognize the need for further
investigation. I'm not very familiar with BenchmarkSQL but we've done
similar tests with HammerDB/TPCC by solely increasing
NUM_XLOGINSERT_LOCKS from 8 to 128, and we observed a significant
performance drop of ~50% and the cycle ratio of spinlock acquisition
(s_lock) rose to over 60% of the total, which is basically consistent
with the previous findings in [1].

Could you please share the details of your test environment, including
the device, configuration, and test approach, so we can collaborate on
understanding the differences?

Sorry for the late reply.  I'm on my vacation.
I use Hygon C86 7490 64-core, it has 8 NUMA nodes with 1.5T memory,
and
I use 0-3 run the database, and 4-7 run the BenchmarkSQL.
Here is my database settings:
listen_addresses = '*'
max_connections = '1050'
shared_buffers = '100GB'
work_mem = '64MB'
maintenance_work_mem = '512MB'
max_wal_size = '50GB'
min_wal_size = '10GB'
random_page_cost = '1.1'
wal_buffers = '1GB'
wal_level = 'minimal'
max_wal_senders = '0'
wal_sync_method = 'open_datasync'
wal_compression = 'lz4'
tr

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-02-14 Thread Masahiko Sawada

On Tue, Feb 11, 2025 at 11:44 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Tue, Feb 11, 2025 at 02:11:10PM -0800, Masahiko Sawada wrote:
> > I've updated the patch that includes comment updates and bug fixes.
>
> Thanks!
>
> > The main idea of changing WAL level online is to decouple two aspects:
> > (1) the information included in WAL records and (2) the
> > functionalities available at each WAL level. With that, we can change
> > the WAL level gradually. For example, when increasing the WAL level
> > from 'replica' to 'logical', we first switch the WAL level on the
> > shared memory to a new higher level where we allow processes to write
> > WAL records with additional information required by the logical
> > decoding, while keeping the logical decoding unavailable. The new
> > level is something between 'replica' and 'logical'. Once we confirm
> > all processes have synchronized to the new level, we increase the WAL
> > level further to 'logical', allowing us to start logical decoding. The
> > patch supports all combinations of WAL level transitions. It makes
> > sense to me to use a background worker to proceed with this transition
> > work since we need to wait at some points, rather than delegating it
> > to the checkpointer process.
>
> The background worker being added is "wal_level control worker". I wonder if
> it would make sense to create a more "generic" one instead (to whom we could
> assign more "tasks" later on, as suggested in the past in [1]).
>
> +   /*
> +* XXX: Perhaps it's not okay that we failed to launch a bgworker and give
> +* up wal_level change because we already reported that the change has
> +* been accepted. Do we need to use aux process instead for that purpose?
> +*/
> +   if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
> +   ereport(WARNING,
> +   (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
> +errmsg("out of background worker slots"),
> +errhint("You might need to increase \"%s\".", 
> "max_worker_processes")));
>
> Not sure it has to be an aux process instead as it should be busy in rare 
> occasions.

Thank you for referring to the custodian worker thread. I'm not sure
that online wal_level change work would fit the concept of custodian
worker, which offloads some work for time-critical works such as
checkpointing, but this idea made me think of other possible
directions of this work.

Looking at the latest custodian worker patch, the basic architecture
is to have a single custodian worker and processes can ask it for some
work such as removing logical decoding related files. The online
wal_level change will be the one of the tasks that processes (eps.
checkpointer) can ask for it. On the other hand, one point that I
think might not fit this wal_level work well is that while the
custodian worker is a long-lived worker process, it's sufficient for
the online wal_level change work to have a bgworker that does its work
and then exits. IOW, from the perspective of this work, I prefer the
idea of having one short-lived worker for one task over having one
long-lived worker for multiple tasks. Reading that thread, while we
need to resolve the XID wraparound issue for the work of removing
logical decoding related files, the work of removing temporary files
seems to fit a short-lived worker style. So I thought as one of the
directions, it might be worth considering to have an infrastructure
where we can launch a bgworker just for one task, and we implement the
online wal_level change and temporary files removal on top of it.

> Maybe we could add some mechanism for ensuring that a bgworker slot is 
> available
> when needed (as suggested in [2])?

Yeah, we need this mechanism if we use a bgworker for these works.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Allow io_combine_limit up to 1MB

2025-02-14 Thread Jakub Wartak

On Wed, Feb 12, 2025 at 1:03 AM Andres Freund  wrote:
>
> Hi,
>
> On 2025-02-11 13:12:17 +1300, Thomas Munro wrote:
> > Tomas queried[1] the limit of 256kB (or really 32 blocks) for
> > io_combine_limit.  Yeah, I think we should increase it and allow
> > experimentation with larger numbers.  Note that real hardware and
> > protocols have segment and size limits that can force the kernel to
> > split your I/Os, so it's not at all a given that it'll help much or at
> > all to use very large sizes, but YMMV.

+0.02 to the initiative, I've been always wondering why the IOs were
so capped, I know :)

> FWIW, I see substantial performance *regressions* with *big* IO sizes using
> fio. Just looking at cached buffered IO.
>
> for s in 4 8 16 32 64 128 256 512 1024 2048 4096 8192;do echo -ne "$s\t\t"; 
> numactl --physcpubind 3 fio --directory /srv/dev/fio/ --size=32GiB 
> --overwrite 1 --time_based=0 --runtime=10 --name test --rw read --buffered 0 
> --ioengine psync --buffered 1 --invalidate 0 --output-format json 
> --bs=$((1024*${s})) |jq '.jobs[] | .read.bw_mean';done
>
> io size kB  throughput in MB/s
[..]
> 256 16864
> 512 19114
> 102412874
[..]

> It's worth noting that if I boot with mitigations=off clearcpuid=smap I get
> *vastly* better performance:
>
> io size kB  throughput in MB/s
[..]
> 128 23133
> 256 23317
> 512 25829
> 102415912
[..]
> Most of the gain isn't due to mitigations=off but clearcpuid=smap. Apparently
> SMAP, which requires explicit code to allow kernel space to access userspace
> memory, to make exploitation harder, reacts badly to copying lots of memory.
>
> This seems absolutely bonkers to me.

There are two bizarre things there, +35% perf boost just like that due
to security drama, and that io_size=512kb being so special to give a
10-13% boost in Your case? Any ideas, why? I've got on that Lsv2
individual MS nvme under Hyper-V, on ext4, which seems to be much more
real world and average Joe situation, and it is much slower, but it is
not showing advantage for blocksize beyond let's say 128:

io size kB  throughput in MB/s
41070
81117
161231
321264
641249
1281313
2561323
5121257
10241216
20481271
40961304
81921214

top hitter on of course stuff like clear_page_rep [k] and
rep_movs_alternative [k] (that was with mitigations=on).

-J.

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Julien Rouhaud

Hi,

On Fri, Feb 14, 2025 at 09:36:08AM +0100, Dmitry Dolgov wrote:
> > On Thu, Feb 13, 2025 at 05:08:45PM GMT, Sami Imseih wrote:
> > This case with an array passed to aa function seems to cause a regression
> > in pg_stat_statements query text. As you can see the text is incomplete.
>
> I've already mentioned that in the previous email. To reiterate, it's
> not a functionality regression, but an incomplete representation of a
> normalized query which turned out to be hard to change. While I'm
> working on that, there is a suggestion that it's not a blocker.

While talking about the normalized query text with this patch, I see that
merged values are now represented like this, per the regression tests files:

+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+query | calls
+--+---
+ SELECT * FROM test_merge_numeric WHERE data IN (...) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t   | 1
+(2 rows)

This was probably ok a few years back but pg 16 introduced a new GENERIC_PLAN
option for EXPLAIN (3c05284d83b2) to be able to run EXPLAIN on a query
extracted from pg_stat_statements (among other things).

This feature would break the use case.  Note that this is not a hypothetical
need: I get very frequent reports on the PoWA project about the impossibility
to get an EXPLAIN (we do have some code that tries to reinject the parameters
from stored quals but we cannot always do it) that is used with the automatic
index suggestion, and we planned to rely on EXPLAIN (GENERIC_PLAN) to have an
always working solution.  I suspect that other projects also rely on this
option for similar features.

Since the merging is a yes/no option (I think there used to be some discussions
about having a threshold or some other fancy modes), maybe you could instead
differentiate the merged version by have 2 constants rather than this "..." or
something like that?

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Thomas Munro

On Sat, Feb 15, 2025 at 12:50 PM Melanie Plageman
 wrote:
> It fixed the issue (after an off-list correction to the patch by Thomas).

Thanks!  It's green again.

Decision by Monday: PQescapeString() vs. encoding violation

2025-02-14 Thread Noah Misch

The security team has a question below about how best to proceed with a recent
behavior change.

Commit 5dc1e42b4fa6a4434afa7d7cdcf0291351a7b873 for this week's CVE-2025-1094
changed how PQescapeString()[1] reacts to input that is not valid in the
client encoding.  Before that commit, the function would ignore encoding
problems except at the end of the string.  Now, it replaces the bad sequence
up to the length implied by the first byte.  For example, if UTF8 input has
0xc2 followed by an ASCII byte, the function removes both bytes.

Jeff Davis reported to the security team that
http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences forbids something
like this, saying a UTF-8 converter "must not consume the [second byte] if it
continues".  (That's my summary, not his.  He might reply to add color here.)
While PQescapeString() is not a UTF-8 converter, that standard still felt to
multiple security team members like a decent argument for removing only the
invalid 0xc2, not the byte following it.  UTF8 is the most important encoding,
and other encodings tolerate this.  Either way, the server will report an
encoding error.  The difference doesn't have functional consequences if one
simply puts the function result in a query.  The difference could matter for
debugging or if applications are postprocessing the PQescapeString() result in
some way.  Postprocessing is not supported, but we'd still like to do the best
thing for applications that may already be doing it.

Security team members disagreed on whether next week's releases are the last
reasonable chance to change this, or whether changing it in e.g. May would be
reasonable.  If applications make changes to cope with the new behavior, that
could be an argument against further change.

Question for all: would you switch to the "remove fewer bytes" behavior in
next week's releases, switch later, or switch never?  Why so?  Please answer
in the next 24hr if possible, given the little time until we wrap next week's
releases on Monday.  I regret the late notice.

I'm attaching a WIP patch from Andres Freund.  We may use it to adopt the
"remove fewer bytes" behavior, if that's the decision.

Thanks,
nm


[1] The commit changed other functions, but PQescapeString() is most
interesting for this discussion.  Others have ways to report errors, or they
have reason to believe the input is already checked.  New code should be using
the others and checking the error indicator.
>From 6f0b93afb6a4ba1157482e674e71f56cd9c555c9 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 14 Feb 2025 18:31:15 -0500
Subject: [PATCH v3 2/2] Have escape functions process bytes after invalid
 multi-byte char

Reviewed-by: Jeff Davis 
Backpatch: 13
---
 src/fe_utils/string_utils.c| 40 ++
 src/interfaces/libpq/fe-exec.c | 17 ---
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index b6a7b197087..8621856fbc1 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -206,14 +206,13 @@ fmtIdEnc(const char *rawid, int encoding)
 * "skip" over quote characters, e.g. when 
parsing
 * character-by-character.
 *
-* Replace the bytes corresponding to the 
invalid character
-* with an invalid sequence, for the same 
reason as above.
+* Replace the current byte with with an 
invalid sequence, for the
+* same reason as above.
 *
-* It would be a bit faster to verify the whole 
string the
-* first time we encounter a set highbit, but 
this way we can
-* replace just the invalid characters, which 
probably makes
-* it easier for users to find the invalidly 
encoded portion
-* of a larger string.
+* It would be a bit faster to verify the whole 
string the first
+* time we encounter a set highbit, but this 
way we can replace
+* just the invalid byte, which probably makes 
it easier for users
+* to find the invalidly encoded portion of a 
larger string.
 */
enlargePQExpBuffer(id_return, 2);
pg_encoding_set_invalid(encoding,
@@ -222,11 +221,13 @@ fmtIdEnc(const char *rawid, int encoding)
id_return->data[id_return->len] = '\0';
 
/*
-* Copy the rest of the string after the 
invalid multi-byte
-* character

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2025-02-14 Thread Jacob Champion

On Thu, Feb 13, 2025 at 4:03 PM Michael Paquier  wrote:
> > If a CA is issuing Subject data that is somehow dangerous to the
> > operation of the server, I think that's a security problem in and of
> > itself: there are clientcert HBA modes that don't validate the
> > Subject, but they're still going to push that data into the catalogs,
> > aren't they?
>
> Is that the case before we finish authentication now?

No, but I still don't understand why that's relevant. My point is that
transport authentication data should be neither less trustworthy prior
to ClientAuthentication, nor more trustworthy after it, since it's
signed by the same authentication provider that you're trusting to
make the authentication decisions in the first place. (But it doesn't
seem like we're going to agree on this for now; in the meantime I'll
prepare a version of the patch that only calls
pgstat_bestart_security() once.)

At some point in the future, I would really like to clarify what
potential problems there are if we put verified Subject data into the
catalogs before ClientAuthentication completes. I think that any such
problems would continue to be problems after ClientAuthentication
completes, too.

Thanks,
--Jacob

Re: pg17.3 PQescapeIdentifier() ignores len

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-13 14:00:09 -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> > The fprintf suggests that since 5dc1e42b4 PQescapeIdentifier ignores its 
> > len.
> 
> Ugh, yes.  Need something like the attached.

I just pushed this fix, together with an expansion of test_escape.c. With the
expanded test both uses of strlen() are detected.


> FTR, 5dc1e42b4 et al were quite subtle patches done under extreme time
> pressure.  I wonder if they have any other issues. More eyes on those
> patches would be welcome, now that they are public.

Indeed.

Greetings,

Andres Freund

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Melanie Plageman

On Fri, Feb 14, 2025 at 6:09 PM Thomas Munro  wrote:
>
> > Agreed that right now is a bad time to push this to v17 --- we need to
> > keep the risk factors as low as possible for the re-release.  Master
> > now and v17 after the re-wrap seems like the right compromise.
>
> Cool, will push to master.  Melanie, could you please confirm that
> this patch works for you?  I haven't figured out what I'm doing wrong
> but my local Valgrind doesn't seem to show the problem (USE_VALGRIND
> defined, Debian's Valgrind v3.19.0).

It fixed the issue (after an off-list correction to the patch by Thomas).

- Melanie

Re: Improve pgindent exclude handling: ignore empty lines

2025-02-14 Thread Andrew Dunstan




On 2025-02-08 Sa 4:39 AM, Zsolt Parragi wrote:

Hello,

We ran into an issue where pgindent stopped reformatting anything with
our custom exclude file, and after some investigation we found the
empty line accidentally inserted into the exclude file.

Pgindent currently treats empty lines as valid exclusions and creates
an empty regex from them. The empty regex matches any pattern,
resulting in ignoring everything.

As this behavior doesn't seem to be useful in practice, and it is easy
to reproduce accidentally (it works even at the end of the file), I
propose a patch that ignores empty lines in the exclude file.

If somebody really wants to ignore everything, that is still possible
with more explicit patterns like ".*"



It seems reasonable to ignore an empty line. But your proposed commit 
message isn't correct w.r.t. whitespace, either relating to current or 
proposed behaviour. i.e. comment lines can't have leading whitespace, 
and you patch won't ignore lines wconsisting of one or more whitespace 
characters. So I'm not quite clear what you want to do.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Get rid of WALBufMappingLock

2025-02-14 Thread Yura Sokolov

14.02.2025 17:09, Yura Sokolov пишет:
> 14.02.2025 13:24, Alexander Korotkov пишет:
>> On Fri, Feb 14, 2025 at 11:45 AM Pavel Borisov  
>> wrote:
>>> On Fri, 14 Feb 2025 at 00:59, Alexander Korotkov  
>>> wrote:
 On Thu, Feb 13, 2025 at 6:39 PM Pavel Borisov  
 wrote:
> On Thu, 13 Feb 2025 at 14:08, Alexander Korotkov  
> wrote:
>>
>> On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov  
>> wrote:
>>> 13.02.2025 12:34, Alexander Korotkov пишет:
 On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov 
  wrote:
> 08.02.2025 13:07, Alexander Korotkov пишет:
>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov 
>>  wrote:
>>> Good, thank you.  I think 0001 patch is generally good, but needs 
>>> some
>>> further polishing, e.g. more comments explaining how does it work.
>
> I tried to add more comments. I'm not good at, so recommendations are 
> welcome.
>
>> Two things comes to my mind worth rechecking about 0001.
>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
>> XLogCtl->xlblocks always page-aligned?  Because algorithm seems to be
>> sensitive to that.  If so, I would propose to explicitly comment that
>> and add corresponding asserts.
>
> They're certainly page aligned, since they are page borders.
> I added assert on alignment of InitializeReserved for the sanity.
>
>> 2) Check if there are concurrency issues between
>> AdvanceXLInsertBuffer() and switching to the new WAL file.
>
> There are no issues:
> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses
> GetXLogBuffer to zero-out WAL page.
> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion 
> locks,
> so switching wal is not concurrent. (Although, there is no need in 
> this
> exclusiveness, imho.)

 Good, thank you.  I've also revised commit message and comments.

 But I see another issue with this patch.  In the worst case, we do
 XLogWrite() by ourselves, and it could potentially could error out.
 Without patch, that would cause WALBufMappingLock be released and
 XLogCtl->InitializedUpTo not advanced.  With the patch, that would
 cause other processes infinitely waiting till we finish the
 initialization.

 Possible solution would be to save position of the page to be
 initialized, and set it back to XLogCtl->InitializeReserved on error
 (everywhere we do LWLockReleaseAll()).  We also must check that on
 error we only set XLogCtl->InitializeReserved to the past, because
 there could be multiple concurrent failures.  Also we need to
 broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.
>>>
>>> The single place where AdvanceXLInsertBuffer is called outside of 
>>> critical
>>> section is in XLogBackgroundFlush. All other call stacks will issue 
>>> server
>>> restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer.
>>>
>>> XLogBackgroundFlush explicitely avoids writing buffers by passing
>>> opportunistic=true parameter.
>>>
>>> Therefore, error in XLogWrite will not cause hang in 
>>> AdvanceXLInsertBuffer
>>> since server will shutdown/restart.
>>>
>>> Perhaps, we just need to insert `Assert(CritSectionCount > 0);` before 
>>> call
>>> to XLogWrite here?
>>
>> You're correct.  I just reflected this in the next revision of the patch.
>
> I've looked at the patchset v6.

 Oh, sorry, I really did wrong.  I've done git format-patch for wrong
 local branch for v5 and v6.  Patches I've sent for v5 and v6 are
 actually the same as my v1.  This is really pity.  Please, find the
 right version of patchset attached.
>>>
>>> I've rechecked v7. In v6 a proposal from [1] was not reflected. Now it
>>> landed in v7.
>>>
>>> Other changes are not regarding code behavior. The things from my
>>> previous review that still could apply to v7:
>>>
>>> For 0001:
>>>
>>> Comment change proposed:
>>> "lock-free with cooperation with" -> "lock-free accompanied by changes
>>> to..." (maybe other variant)
>>
>> Good catch.  I've rephrased this comment even more.
>>
>>> I propose a new define:
>>> #define FirstValidXLogRecPtr 1
>>> While FirstValidXLogRecPtr = InvalidXLogRecPtr + 1 is true in the code
>>> that has no semantical meaning and it's better to avoid using direct
>>> arithmetics to relate meaning of FirstValidXLogRecPtr from
>>> InvalidXLogRecPtr.
>>
>> Makes sense, but I'm not sure if this change is required at all.  I've
>> reverted this to the state of master, and everything seems to work.
>>
>>> For 0002 both comments proposals from my message applied to v6 apply
>>> to v7 as well
>>
>> Than

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Dmitry Dolgov

> On Fri, Feb 14, 2025 at 10:39:45PM GMT, Julien Rouhaud wrote:
> There seems to be an off-by-1 error in parameter numbering when merging them.

There are indeed three constants, but the second is not visible in the
query text. Maybe makes sense to adjust the number in this case, let me
try.

> Note that the query text as-is can still be successfully be used in an EXPLAIN
> (GENERIC_PLAN), but it might cause problem to third party tools that try to do
> something smarter about the parameters.

Since the normalized query will be a valid one now, I hope that such
cases will be rare. On top of that it always will be option to not
enable constants squashing and avoid any troubles. Or do you have some
particular scenario of what might be problematic?

Re: Thread-safe nl_langinfo() and localeconv()

2025-02-14 Thread Peter Eisentraut


On 09.02.25 15:52, Peter Eisentraut wrote:
This patch set is still desirable.  Here is a rebased version of the v5 
patches.  I haven't applied any corrections or review comments.


Here is the same patch set with some review comments.

Patch 0001 looks okay to me.  I'm just offering some cosmetic 
improvements in patch 0004.


Patch 0002 also looks okay, except that the error handling could be 
unified with existing code, as I had previously pointed out.  Patch 0005 
fixes that.


About patch 0003:

I had previously pointed out that the canonicalization might have been 
intentional, and that we have canonicalization of ICU locale names.  But 
we don't have to keep the setlocale()-based locale checking 
implementation just for that, I think.  (If this was meant to be a real 
feature offered by libc, there should be a get_locale_name(locale_t) 
function.)


I'm unsure about the correct error handling of _create_locale() on 
Windows.  Does _dosmaperr(GetLastError()) do anything useful in this 
context, or is this just copied from elsewhere?  If it's useful, maybe 
it should be added to report_newlocale_failure().


Also, maybe we don't need per-category locale checking?  Would it not be 
enough to check the locale using LC_ALL_MASK?  Is there a scenario where 
a locale name would work for one category but not another?  I think the 
old code was just conveniently coded that way so that you only have to 
save and restore one locale category.  But we wouldn't have to do it 
that way anymore if we use newlocale().
From 94879f695ff8961255c2daa46b81ce378a55732d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 13 Aug 2024 14:15:54 +1200
Subject: [PATCH v6.1 1/5] Provide thread-safe pg_localeconv_r().

This involves four different implementation strategies:

1.  For Windows, we now require _configthreadlocale() to be available
and work, and the documentation says that the object returned by
localeconv() is in thread-local memory.

2.  For glibc, we translate to nl_langinfo_l() calls, because it offers
the same information that way as an extension, and that API is
thread-safe.

3.  For macOS/*BSD, use localeconv_l(), which is thread-safe.

4.  For everything else, use uselocale() to set the locale for the
thread, and use a big ugly lock to defend against the returned object
being concurrently clobbered.  In practice this currently means only
Solaris.

The new call is used in pg_locale.c, replacing calls to setlocale() and
localeconv().

This patch adds a hard requirement on Windows' _configthreadlocale().
In the past there were said to be MinGW systems that didn't have it, or
had it but it didn't work.  As far as I can tell, CI (optional MinGW
task + mingw cross build warning task) and build farm (fairywren msys)
should be happy with this.  Fingers crossed.  (There are places that use
configure checks for that in ECPG; other proposed patches would remove
those later.)

Reviewed-by: Heikki Linnakangas 
Discussion: 
https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com
---
 configure |   2 +-
 configure.ac  |   1 +
 meson.build   |   1 +
 src/backend/utils/adt/pg_locale.c | 128 ++-
 src/include/pg_config.h.in|   3 +
 src/include/port.h|   6 +
 src/port/Makefile |   1 +
 src/port/meson.build  |   1 +
 src/port/pg_localeconv_r.c| 367 ++
 9 files changed, 402 insertions(+), 108 deletions(-)
 create mode 100644 src/port/pg_localeconv_r.c

diff --git a/configure b/configure
index 0ffcaeb4367..3e7c5fc91d6 100755
--- a/configure
+++ b/configure
@@ -14934,7 +14934,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info 
getauxval getifaddrs getpeerucred inet_pton kqueue mbstowcs_l memset_s 
posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast 
strchrnul strsignal syncfs sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info 
getauxval getifaddrs getpeerucred inet_pton localeconv_l kqueue mbstowcs_l 
memset_s posix_fallocate ppoll pthread_is_threaded_np setproctitle 
setproctitle_fast strchrnul strsignal syncfs sync_file_range uselocale 
wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index f56681e0d91..e56136049e9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1710,6 +1710,7 @@ AC_CHECK_FUNCS(m4_normalize([
getifaddrs
getpeerucred
inet_pton
+   localeconv_l
kqueue
mbstowcs_l
memset_s
diff --git a/meson.build b/meson.build
index 1ceadb9a830..a8a6f34f6c6 100644
--- a/meson.build
+++ b/meson.build
@@ -2634,6 +2634,7 @@ func_checks = [
   ['inet_aton'],
   [

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Dmitry Dolgov

> On Fri, Feb 14, 2025 at 05:26:19AM GMT, Sami Imseih wrote:
> > > On Thu, Feb 13, 2025 at 05:08:45PM GMT, Sami Imseih wrote:
> > > Constants passed as parameters to a prepared statement will not be
> > > handled as expected. I did not not test explicit PREPARE/EXECUTE 
> > > statement,
> > > but I assume it will have the same issue.
>
> > This is the same question of supporting various cases. The original
> > patch implementation handled Param expressions as well, this part was
> > explicitly rejected during review. I think as a first step it's
> > important to find a balance between applying this optimization in as
> > many cases as possible, and at the same time keep the implementation
> > simple to give the patch a chance. So far I'm inclined to leave Param
> > for the future work, although of course I'm open to discussion.
>
> I do see the discussion here [1], sorry for not noticing it.
>
> I am not sure about this though. At minimum this needs to be documented,
> However, I think the  prepared statement case is too common of a case to
> skip for the first release of tis feature, and users that will likely
> benefit from this feature are using prepared statements ( i.e. JDBC, etc ).

Right, prepared statements are quite common case. This would be the
first thing I'll take on in the case if this patch will find it's way
into the release. As you can see it's not at all obvious that that will
happen, I estimate chances for that to be higher if moving in smaller
steps.

> > > pg_node_attr of query_jumble_merge is doing something
> > > very specific to the elements list of an ArrayExpr. The
> > > merge code likely cannot be used for other node types.
>
> > It can be, take a look at pg_node_attr commentary. Any node can have a
> > field marked with query_jumble_merge attribut and benefit from merging.
>
> I can't think of other cases beyond ArrayExpr where this will be needed.
> The node that could use this will need to carry constants, but ArrayExpr
> is the only case I can think of in which this will be useful for jumbling.
> There should be a really good reason IMO to do something other than the
> existing pattern of using custom_query_jumble.

Well, there are plenty expression nodes that have lists in them, maybe
more will be added in the future. And as before, the idea of using
pg_node_attr was a resonable suggestion from Michael Paquier on top of
the original design (which indeed used custom jumble function for
ArrayExpr).

> It's not a functionality regression as far as query execution
> or pg_stat_statements counters go, but it is a regression as far as
> displaying query text in pg_stat_statements. pg_stat_statements, unlike
> pg_stat_acitivty, makes a guaranteee not to trim text as stated in the docs 
> [2]
> "The representative query texts are kept in an external disk file,
> and do not consume shared memory. Therefore,
> even very lengthy query texts can be stored successfully."

Just to clarify, the part you reference doesn't say anything about
trimming, doesn't it? In fact, the query text stored in
pg_stat_statements might be as well somewhat different from one that was
executed, due to similar queries having the same query_id and differ
only in e.g. parenthesis.

But in any case, you're right that the original thing was a bug. I
didn't realize you're talking about missing chunk of the normalized
query. The issue could be triggered when having multiple merged
intervals withing the same query.

Btw, there was another mistake in the last version introducing
"$1 /*, ... */" format, the constant position has to be of course
calculated as usual.
>From 0b65b35500906460c061007f7cbb372eeaf9c4ab Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Tue, 3 Dec 2024 14:55:45 +0100
Subject: [PATCH v26] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is jumbled. In certain situations it's undesirable, especially
if the list becomes too large.

Make an array of Const expressions contribute only the first/last
elements to the jumble hash. Allow to enable this behavior via the new
pg_stat_statements parameter query_id_squash_values with the default value off.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane,
Michael Paquier, Sergei Kornilov, Alvaro Herrera, David Geier, Sutou Kouhei,
Sami Imseih, Julien Rouhaud
Tested-by: Chengxi Sun, Yasuo Honda
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/merging.out   | 465 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  56 ++-
 contrib/pg_stat_statements/sql/merging.sql| 180 +++
 doc/src/sgml/config.sgml  |  28 ++
 doc/src/sgml/pgstatstatements.sgml|  28 +-
 src/backend/nodes/ge

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Dmitry Dolgov

> On Fri, Feb 14, 2025 at 09:06:24AM GMT, Sami Imseih wrote:
> I think it will be sad to not include this very common case from
> the start, because it is going to be one of the most common
> cases.
>
> Wouldn't doing something like this inside IsMergeableConst
> """
> if (!IsA(arg, Const) && !IsA(arg, Param))
> """
>
> instead of
> """
> if (!IsA(arg, Const))
> """
>
> be sufficient?

That's exactly what the original rejected implementation was doing. I
guess to answer this question fully I need to do some more detailed
investigation, I'm probably not aware about everything at this point.

> I perhap meant "missing chunk" instead of "trimming". To me it just
> looked like a trimmed text, which was wrong. Looks like v25
> deals with that better at least. I am just not sure about all that we are 
> doing
> here as I believe it may open up big changes for bugs generating the 
> normalized
> query texts. I'm a bit worried about that. IMO, we are better off just
> adding a comment
> at the start of a query that this query text such as "/*
> query_id_squash_values */"
> and keeping all the parameter symbols in-place.

I see what you mean, but keeping everything in place is partially
defeating purpose of the patch. The idea is not only to make those
queries to have the same query_id, but also to reduce the size of
queries themselves. E.g. the use case scenario that has triggered the
patch was about queries having dozens of thousands of such constants,
so that the size of them was a burden on its own.

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Sami Imseih

> > I perhap meant "missing chunk" instead of "trimming". To me it just
> > looked like a trimmed text, which was wrong. Looks like v25
> > deals with that better at least. I am just not sure about all that we are 
> > doing
> > here as I believe it may open up big changes for bugs generating the 
> > normalized
> > query texts. I'm a bit worried about that. IMO, we are better off just
> > adding a comment
> > at the start of a query that this query text such as "/*
> > query_id_squash_values */"
> > and keeping all the parameter symbols in-place.
>
> I see what you mean, but keeping everything in place is partially
> defeating purpose of the patch. The idea is not only to make those
> queries to have the same query_id, but also to reduce the size of
> queries themselves. E.g. the use case scenario that has triggered the
> patch was about queries having dozens of thousands of such constants,
> so that the size of them was a burden on its own.

My experience with this issue is not so much the size of the query text,
but it's the fact that similar queries ( with varying length IN-lists ) being
tracked in different entries, causing high deallocation and heavy
garbage collection. This is besides the overall loss of quality of
the data from pg_stat_statements if there is constant deallocation.

But, with what you are doing with this patch, we will now have
a single tracking entry for similar queries with varying IN-lists and
even if the query text is *large*, it's only a single entry tracking
and we are no longer continuously deallocating and garbage
collecting as frequently.

-- 
Sami

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Sami Imseih

> > > I perhap meant "missing chunk" instead of "trimming". To me it just
> > > looked like a trimmed text, which was wrong. Looks like v25
> > > deals with that better at least. I am just not sure about all that we are 
> > > doing
> > > here as I believe it may open up big changes for bugs generating the 
> > > normalized
> > > query texts. I'm a bit worried about that. IMO, we are better off just
> > > adding a comment
> > > at the start of a query that this query text such as "/*
> > > query_id_squash_values */"
> > > and keeping all the parameter symbols in-place.
> >
> > I see what you mean, but keeping everything in place is partially
> > defeating purpose of the patch. The idea is not only to make those
> > queries to have the same query_id, but also to reduce the size of
> > queries themselves. E.g. the use case scenario that has triggered the
> > patch was about queries having dozens of thousands of such constants,
> > so that the size of them was a burden on its own.
>
> My experience with this issue is not so much the size of the query text,
> but it's the fact that similar queries ( with varying length IN-lists ) being
> tracked in different entries, causing high deallocation and heavy
> garbage collection. This is besides the overall loss of quality of
> the data from pg_stat_statements if there is constant deallocation.
>
> But, with what you are doing with this patch, we will now have
> a single tracking entry for similar queries with varying IN-lists and
> even if the query text is *large*, it's only a single entry tracking
> and we are no longer continuously deallocating and garbage
> collecting as frequently.

Another point, I think if we want to control the size of the query texts,
that could be something that is maybe useful overall for pg_stat_statements,
not just for IN-list type queries.

-- 

Sami

Re: Get rid of WALBufMappingLock

2025-02-14 Thread Yura Sokolov

14.02.2025 13:24, Alexander Korotkov пишет:
> On Fri, Feb 14, 2025 at 11:45 AM Pavel Borisov  wrote:
>> On Fri, 14 Feb 2025 at 00:59, Alexander Korotkov  
>> wrote:
>>> On Thu, Feb 13, 2025 at 6:39 PM Pavel Borisov  
>>> wrote:
 On Thu, 13 Feb 2025 at 14:08, Alexander Korotkov  
 wrote:
>
> On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov  
> wrote:
>> 13.02.2025 12:34, Alexander Korotkov пишет:
>>> On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov  
>>> wrote:
 08.02.2025 13:07, Alexander Korotkov пишет:
> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov 
>  wrote:
>> Good, thank you.  I think 0001 patch is generally good, but needs 
>> some
>> further polishing, e.g. more comments explaining how does it work.

 I tried to add more comments. I'm not good at, so recommendations are 
 welcome.

> Two things comes to my mind worth rechecking about 0001.
> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
> XLogCtl->xlblocks always page-aligned?  Because algorithm seems to be
> sensitive to that.  If so, I would propose to explicitly comment that
> and add corresponding asserts.

 They're certainly page aligned, since they are page borders.
 I added assert on alignment of InitializeReserved for the sanity.

> 2) Check if there are concurrency issues between
> AdvanceXLInsertBuffer() and switching to the new WAL file.

 There are no issues:
 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses
 GetXLogBuffer to zero-out WAL page.
 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion 
 locks,
 so switching wal is not concurrent. (Although, there is no need in this
 exclusiveness, imho.)
>>>
>>> Good, thank you.  I've also revised commit message and comments.
>>>
>>> But I see another issue with this patch.  In the worst case, we do
>>> XLogWrite() by ourselves, and it could potentially could error out.
>>> Without patch, that would cause WALBufMappingLock be released and
>>> XLogCtl->InitializedUpTo not advanced.  With the patch, that would
>>> cause other processes infinitely waiting till we finish the
>>> initialization.
>>>
>>> Possible solution would be to save position of the page to be
>>> initialized, and set it back to XLogCtl->InitializeReserved on error
>>> (everywhere we do LWLockReleaseAll()).  We also must check that on
>>> error we only set XLogCtl->InitializeReserved to the past, because
>>> there could be multiple concurrent failures.  Also we need to
>>> broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.
>>
>> The single place where AdvanceXLInsertBuffer is called outside of 
>> critical
>> section is in XLogBackgroundFlush. All other call stacks will issue 
>> server
>> restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer.
>>
>> XLogBackgroundFlush explicitely avoids writing buffers by passing
>> opportunistic=true parameter.
>>
>> Therefore, error in XLogWrite will not cause hang in 
>> AdvanceXLInsertBuffer
>> since server will shutdown/restart.
>>
>> Perhaps, we just need to insert `Assert(CritSectionCount > 0);` before 
>> call
>> to XLogWrite here?
>
> You're correct.  I just reflected this in the next revision of the patch.

 I've looked at the patchset v6.
>>>
>>> Oh, sorry, I really did wrong.  I've done git format-patch for wrong
>>> local branch for v5 and v6.  Patches I've sent for v5 and v6 are
>>> actually the same as my v1.  This is really pity.  Please, find the
>>> right version of patchset attached.
>>
>> I've rechecked v7. In v6 a proposal from [1] was not reflected. Now it
>> landed in v7.
>>
>> Other changes are not regarding code behavior. The things from my
>> previous review that still could apply to v7:
>>
>> For 0001:
>>
>> Comment change proposed:
>> "lock-free with cooperation with" -> "lock-free accompanied by changes
>> to..." (maybe other variant)
> 
> Good catch.  I've rephrased this comment even more.
> 
>> I propose a new define:
>> #define FirstValidXLogRecPtr 1
>> While FirstValidXLogRecPtr = InvalidXLogRecPtr + 1 is true in the code
>> that has no semantical meaning and it's better to avoid using direct
>> arithmetics to relate meaning of FirstValidXLogRecPtr from
>> InvalidXLogRecPtr.
> 
> Makes sense, but I'm not sure if this change is required at all.  I've
> reverted this to the state of master, and everything seems to work.
> 
>> For 0002 both comments proposals from my message applied to v6 apply
>> to v7 as well
> 
> Thank you for pointing.  For now, I'm concentrated on improvements on
> 0001.  Probably Yura could work on your notes to 0002.

I wrote good commit message for 0002 wi

Parameter binding for COPY TO queries

2025-02-14 Thread Jens-Wolfhard Schicke-Uffmann

Hi,

I'd like some input on the idea of adding parameter binding support to
queries executed as part of a COPY TO command. Is there a technical
or philosophical reason why these queries should not contain bindable
parameters?

As far as I could ascertain, the feature has been desired for a long
time (but only occasionally), see e.g.:
* 
https://www.postgresql.org/message-id/flat/12415.1550157763%40sss.pgh.pa.us#682d53c41bda2d21b7cd4fba5000793c
* https://postgrespro.com/list/thread-id/1893680
* 
https://stackoverflow.com/questions/69233792/how-to-pass-a-param-for-a-binding-in-postgresql-copy-to-stdout-format
* 
https://stackoverflow.com/questions/44190514/pdo-postgresql-copy/44190617#44190617
* https://github.com/npgsql/npgsql/issues/1677
* https://github.com/jackc/pgx/issues/2247#issuecomment-2645836118

Currently, the protocol forces users of COPY TO to use SQL quoting,
resulting in multiple client libraries not supporting parameters in COPY TO,
resulting in various users doing unsafe interpolations (I come here from
one such occasion).


I might have some time to attempt an implementation, but would first like
to know if there are reservations against it.


Thanks,
  Drahflow


signature.asc
Description: PGP signature

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Julien Rouhaud

On Fri, Feb 14, 2025 at 03:20:24PM +0100, Dmitry Dolgov wrote:
> 
> Btw, there was another mistake in the last version introducing
> "$1 /*, ... */" format, the constant position has to be of course
> calculated as usual.

I'm not sure what you mean here, but just in case:

> +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9) AND data = 
> 2;
> + id | data 
> ++--
> +(0 rows)
> +
> +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) AND 
> data = 2;
> + id | data 
> ++--
> +(0 rows)
> +
> +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) AND 
> data = 2;
> + id | data 
> ++--
> +(0 rows)
> +
> +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
> +   query| calls 
> ++---
> + SELECT * FROM test_merge WHERE id IN ($1 /*, ... */) AND data = $3 | 3
> + SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
> +(2 rows)

There seems to be an off-by-1 error in parameter numbering when merging them.

Note that the query text as-is can still be successfully be used in an EXPLAIN
(GENERIC_PLAN), but it might cause problem to third party tools that try to do
something smarter about the parameters.

Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-14 Thread Greg Sabino Mullane

On Fri, Feb 14, 2025 at 1:41 AM Sébastien  wrote:

> I had huge problems on server 3 weeks after a 6 TB migration from other
> DB. I think it's sad to rewrite all data twice.
>

You could always COPY FREEZE into a new table, then move any remaining
rows, and finally rename the tables. Should be a win if the majority of the
table is newly-imported data. Another thing you might look at is increasing
your three week forced freeze window by adjusting params and/or decreasing
the number of transactions your cluster is using. (granted, still the same
overall write volume). You could also do manual vacuum freezes at a time
better for you (since you mention autovac is doing this at
unpredictable times).

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-02-14 Thread Melanie Plageman

io_combine_limit 1, effective_io_concurrency 16, read ahead kb 16

On Fri, Feb 14, 2025 at 12:18 PM Tomas Vondra  wrote:
>
> Based on off-list discussion with Melanie, I ran a modified version of
> the benchmark, with these changes:

Thanks! It looks like the worst offender is io_combine_limit 1 (128
kB), effective_io_concurrency 16, read_ahead_kb 16. This is a
combination that people shouldn't run in practice I think -- an
io_combine_limit larger than read ahead.

But we do still see regressions with io_combine_limit 1,
effective_io_concurrency 16 at other read_ahead_kb values. Therefore
I'd be interested to see that subset (ioc 1, eic 16, diff read ahead
values) with the attached patch.

> FWIW this does not change anything in the detection of sequential access
> patterns, discussed nearby, because the benchmarks started before Andres
> looked into that. If needed, I can easily rerun these tests, I just need
> a patch to apply.

Attached a patch that has all the commits squashed + removes
sequential detection.

- Melanie
From 0617aebfdd635ba75f3cc7bc44cc9e72431aa9c5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Fri, 17 Jan 2025 16:42:25 -0500
Subject: [PATCH] Streaming BHS

---
 src/backend/access/gin/ginget.c   |  39 +-
 src/backend/access/gin/ginscan.c  |   2 +-
 src/backend/access/heap/heapam.c  |  70 
 src/backend/access/heap/heapam_handler.c  | 366 ++-
 src/backend/access/table/tableamapi.c |   2 -
 src/backend/executor/nodeBitmapHeapscan.c | 424 ++
 src/backend/nodes/tidbitmap.c | 101 +++---
 src/backend/optimizer/util/plancat.c  |   2 +-
 src/backend/storage/aio/read_stream.c |   4 +-
 src/include/access/gin_private.h  |   7 +-
 src/include/access/relscan.h  |   8 +-
 src/include/access/tableam.h  | 121 ++
 src/include/nodes/execnodes.h |  23 +-
 src/include/nodes/tidbitmap.h |  18 +-
 14 files changed, 411 insertions(+), 776 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 63dd1f3679f..ac1b749ed98 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -332,10 +332,11 @@ restartScanEntry:
 	entry->list = NULL;
 	entry->nlist = 0;
 	entry->matchBitmap = NULL;
-	entry->matchResult = NULL;
+	entry->matchResult.blockno = InvalidBlockNumber;
 	entry->reduceResult = false;
 	entry->predictNumberResult = 0;
 
+
 	/*
 	 * we should find entry, and begin scan of posting tree or just store
 	 * posting list in memory
@@ -824,19 +825,19 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 		{
 			/*
 			 * If we've exhausted all items on this block, move to next block
-			 * in the bitmap.
+			 * in the bitmap. tbm_private_iterate() sets matchResult->blockno
+			 * to InvalidBlockNumber when the bitmap is exhausted.
 			 */
-			while (entry->matchResult == NULL ||
-   (entry->matchResult->ntuples >= 0 &&
-	entry->offset >= entry->matchResult->ntuples) ||
-   entry->matchResult->blockno < advancePastBlk ||
+			while ((!BlockNumberIsValid(entry->matchResult.blockno)) ||
+   (entry->matchResult.ntuples >= 0 &&
+	entry->offset >= entry->matchResult.ntuples) ||
+   entry->matchResult.blockno < advancePastBlk ||
    (ItemPointerIsLossyPage(&advancePast) &&
-	entry->matchResult->blockno == advancePastBlk))
+	entry->matchResult.blockno == advancePastBlk))
 			{
-entry->matchResult =
-	tbm_private_iterate(entry->matchIterator);
+tbm_private_iterate(entry->matchIterator, &entry->matchResult);
 
-if (entry->matchResult == NULL)
+if (!BlockNumberIsValid(entry->matchResult.blockno))
 {
 	ItemPointerSetInvalid(&entry->curItem);
 	tbm_end_private_iterate(entry->matchIterator);
@@ -847,7 +848,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 
 /*
  * Reset counter to the beginning of entry->matchResult. Note:
- * entry->offset is still greater than matchResult->ntuples if
+ * entry->offset is still greater than matchResult.ntuples if
  * matchResult is lossy.  So, on next call we will get next
  * result from TIDBitmap.
  */
@@ -860,10 +861,10 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			 * We're now on the first page after advancePast which has any
 			 * items on it. If it's a lossy result, return that.
 			 */
-			if (entry->matchResult->ntuples < 0)
+			if (entry->matchResult.ntuples < 0)
 			{
 ItemPointerSetLossyPage(&entry->curItem,
-		entry->matchResult->blockno);
+		entry->matchResult.blockno);
 
 /*
  * We might as well fall out of the loop; we could not
@@ -877,27 +878,27 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			 * Not a lossy page. Skip over any offsets <= advancePast, and
 			 * return that.
 			 */
-			if (entry->matchResult->blockno == advancePastBlk)
+			if (entry->matchResult.blockno == advancePastBlk)

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Sami Imseih

> > Wouldn't doing something like this inside IsMergeableConst
> > """
> > if (!IsA(arg, Const) && !IsA(arg, Param))
> > """
> >
> > instead of
> > """
> > if (!IsA(arg, Const))
> > """
> >
> > be sufficient?
>
> That's exactly what the original rejected implementation was doing. I
> guess to answer this question fully I need to do some more detailed
> investigation, I'm probably not aware about everything at this point.

I am not sure which rejected implementation you are referring to
as this is a log thread :). But I will just add my findings ( as I
really wanted to try this out )
on top of your latest v27 here. Maybe this is all we need. Essentially
check for a PARAM_EXTERN
as we are scanning through the elements and only consider those types of args,
and the constants of course.

"""
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -295,6 +295,14 @@ IsMergeableConst(Node *element, List
**known_immutable_funcs)
{
Node *arg = lfirst(temp);

+   if (IsA(arg, Param))
+   {
+   Param *p = (Param *) arg;
+
+   if (p->paramkind == PARAM_EXTERN)
+   return true;
+   }
+
if (!IsA(arg, Const))
return false;
}
@@ -302,6 +310,14 @@ IsMergeableConst(Node *element, List
**known_immutable_funcs)
return true;
}

+   if (IsA(element, Param))
+   {
+   Param *p = (Param *) element;
+
+   if (p->paramkind == PARAM_EXTERN)
+   return true;
+   }
+
if (!IsA(element, Const))
return false;
 """

"""
set query_id_squash_values = on;
select pg_stat_statements_reset();

select where 1 in (1, 2, 3);
select where 1 in (1, 2, 3, 4);

prepare prep(int, int, int) as select where 1 in ($1, $2, $3);
execute prep(1, 2, 3);
deallocate prep;
prepare prep(int, int, int, int) as select where 1 in ($1, $2, $3, $4);
execute prep(1, 2, 3, 4);
deallocate prep;
-- mixed constants and parameters
prepare prep(int, int, int) as select where 1 in ($1, $2, $3, 4);
execute prep(1, 2, 3);
deallocate prep;
prepare prep(int, int, int, int) as select where 1 in ($1, $2, $3, 4, $4);
execute prep(1, 2, 3, 5);
deallocate prep;

select where 1 in ($1, $2, $3) \bind 1 2 3
;
select where 1 in ($1, $2, $3, $4) \bind 1 2 3 4
;
-- mixed constants and parameters
select where 1 in ($1, $2, $3, 4) \bind 1 2 3
;
select where 1 in ($1, $2, $3, 4, $4) \bind 1 2 3 5
;


select query, queryid, calls from pg_stat_statements;

postgres=# select query, queryid, calls from pg_stat_statements;
   query|   queryid| calls
+--+---
 select pg_stat_statements_reset()  |   522241623491678666 | 1
 deallocate $1  | -3638851837470664936 | 4
 select where $1 in ($2 /*, ... */) | -7657972370536959080 |10
(3 rows)
"""

---

Sami

Re: New buildfarm animals with FIPS mode enabled

2025-02-14 Thread Daniel Gustafsson

> On 14 Feb 2025, at 19:01, Tom Lane  wrote:

> I'm kind of disinclined to do all the work that'd be needed to turn
> these animals completely green, especially when the reason to do it
> seems to be that someone decided we should without any community
> consultation.  Perhaps others have different opinions though.

If the owner of the BF animal shows up with a patch for providing alternative
outputs for the backbranches I don't mind accepting it, I'm not volunteering
myself to do more than review though.

--
Daniel Gustafsson

Re: Parameter binding for COPY TO queries

2025-02-14 Thread Tom Lane

Andres Freund  writes:
> On 2025-02-14 10:06:13 -0500, Tom Lane wrote:
>> None of those arguments apply to the sub-SELECT of a "COPY (query) TO"
>> command, but there's a practical matter of how to get the parameters
>> passed through the COPY to the sub-SELECT without opening Pandora's
>> box for every other kind of utility statement.

> I think there's already precedent - CREATE TABLE AS does accept parameters
> today, and it's a utility command too:

Hmm ... yeah, now that I look, there's more pre-existing plumbing
in ProcessUtility than I remembered.  So maybe this wouldn't be too
hard after all.

regards, tom lane

Re: Feature Request: Add AES-128-CFB Mode Support to pgcrypto

2025-02-14 Thread Daniel Gustafsson

> On 5 Feb 2025, at 18:28, Daniel Gustafsson  wrote:
> 
>> On 5 Feb 2025, at 18:24, Álvaro Herrera  wrote:
> 
>> Anyway, at least the bytes appear to be interpreted the same by both
>> openssl and this new function, so that's good news.
> 
> Thanks for confirming.  Short of a very small doc change (which I don't have
> handy on this laptop) I think this patch is ready.  I wish we had a better way
> of adding support for ciphers in OpenSSL but it's not the responsibility of
> this patch to solve that.

I took another look at this, fixed the trivial detail in the docs, and applied
it.  Thanks for the submission!

--
Daniel Gustafsson

Re: Track the amount of time waiting due to cost_delay

2025-02-14 Thread Nathan Bossart

Committed.

I noticed two things that I felt I should note here:

* For the vacuum path, we call pgstat_progress_end_command() prior to
  accessing the value.  This works because pgstat_progress_end_command()
  doesn't clear the st_progress_param array (that is done in
  pgstat_progress_start_command()).  AFAICT it's worked this way since this
  stuff was introduced ~9 years ago, and I have no reason to believe it
  will change anytime soon.

* We are bypassing the changecount mechanism when accessing the value.  I
  believe this is okay because the calling process is the only one that
  updates these values.  Even in the parallel worker case, the worker sends
  a message to the leader to increment the value.  Perhaps this could break
  in the future if we switched to using atomics or something, but that
  approach was already considered and abandoned once before [0], and the
  worst case scenario would likely be compiling errors or bogus delay
  values.

So, I chose to just add comments about this stuff for now.  If someone
feels strongly that we should do pro forma changecount checks before
pgstat_progress_end_command(), I'm happy to draft the patch.

[0] https://postgr.es/m/72CD33F6-C2B5-45E4-A78F-85EC923DCF0F%40amazon.com

-- 
nathan

Re: generic plans and "initial" pruning

2025-02-14 Thread Alexander Lakhin


Hello Amit,

06.02.2025 04:35, Amit Langote wrote:

I plan to push 0001 tomorrow, barring any objections.



Please try the following script:
CREATE TABLE pt (a int, b int) PARTITION BY range (a);
CREATE TABLE tp1 PARTITION OF pt FOR VALUES FROM (1) TO (2);
CREATE TABLE tp2 PARTITION OF pt FOR VALUES FROM (2) TO (3);

MERGE INTO pt
USING (SELECT pg_backend_pid() AS pid) AS q JOIN tp1 ON (q.pid = tp1.a)
ON pt.a = tp1.a
WHEN MATCHED THEN DELETE;

which fails for me with segfault:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at 
nodeModifyTable.c:3680
3680    relationDesc = 
RelationGetDescr(resultRelInfo->ri_RelationDesc);
(gdb) bt
#0  ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at 
nodeModifyTable.c:3680
#1  0x5a9b67e6dfb5 in ExecInitModifyTable (node=0x5a9b9fbd5858, estate=0x5a9b9fbcbe20, eflags=0) at 
nodeModifyTable.c:4906

#2  0x5a9b67e273f7 in ExecInitNode (node=0x5a9b9fbd5858, 
estate=0x5a9b9fbcbe20, eflags=0) at execProcnode.c:177
#3  0x5a9b67e1b9d2 in InitPlan (queryDesc=0x5a9b9fbb9970, eflags=0) at 
execMain.c:1092
#4  0x5a9b67e1a524 in standard_ExecutorStart (queryDesc=0x5a9b9fbb9970, 
eflags=0) at execMain.c:268
#5  0x5a9b67e1a223 in ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) at 
execMain.c:142
...

starting from cbc127917.

(I've discovered this anomaly with SQLsmith.)

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Re: Document How Commit Handles Aborted Transactions

2025-02-14 Thread Ahmed Ashour

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Summary:
-
The patch adds documentation to clarify how PostgreSQL handles aborted 
transactions during the commit process. The changes are clear and improve the 
existing documentation.

Testing:

1. Manually applied the patch to the latest master branch (commit 4cffc93).
2. Fixed SGML structure issues in `advanced.sgml` and `commit.sgml` by wrapping 
`` in ``.
3. Rebuilt the documentation using `make html`.
4. Verified the new sections are present and correctly formatted in the 
generated HTML.

Feedback:
-
- The patch was manually applied due to conflicts in `advanced.sgml` and 
`commit.sgml`.
- Fixed invalid SGML structure by wrapping `` in ``.
- The documentation is accurate and follows PostgreSQL’s style guidelines.
- No additional issues were found.

Recommendation:
---
Ready for committer. No objections.

The new status of this patch is: Ready for Committer

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Thomas Munro

On Sat, Feb 15, 2025 at 10:50 AM Thomas Munro  wrote:
> On Sat, Feb 15, 2025 at 7:30 AM Melanie Plageman
>  wrote:
> > Seems valgrind doesn't like this [1]. I'm looking into it.
>
> Melanie was able to reproduce this on her local valgrind and
> eventually we figured out that it's my fault.  I put code into
> read_stream.c that calls wipe_mem(), thinking that that was our
> standard way of scribbling 0x7f on memory that you shouldn't access
> again until it's reused.  I didn't realise that wipe_mem() also tells
> valgrind that the memory is now "no access".  That makes sense for
> palloc/pfree because when that range is allocated again it'll clear
> that.  The point is to help people discover that they have a dangling
> reference to per-buffer data after they advance to the next buffer,
> which wouldn't work because it's in a circular queue and could be
> overwritten any time after that.

Here's a patch.  Is there a tidier way to write this?

It should probably be back-patched to 17, because external code might
use per-buffer data (obviously v17 core doesn't or skink would have
told us this sooner).   It's not a good time to push to 17 today,
though.  Push to master now to cheer skink up and 17 some time later
when the coast is clear, or just wait?
From bf4ba8b334c7ea6fcd33e8e6a6a4628c88161624 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 15 Feb 2025 11:06:30 +1300
Subject: [PATCH] Fix explicit valgrind interaction in read_stream.c.

By calling wipe_mem() on per-buffer data memory that has been released,
we are also telling Valgrind that the memory is "noaccess".  We need to
set it to "undefined" before giving it to the registered callback to
fill in, when a slot is reused.

As discovered by build farm animal skink when the VACUUM streamification
patches landed (the first users of per-buffer data).

Back-patch to 17, where read streams landed.  There aren't any users of
per-buffer data in 17, but extension code might do that.

Reported-by: Melanie Plageman 
Discussion: https://postgr.es/m/CA%2BhUKG%2Bg6aXpi2FEHqeLOzE%2BxYw%3DOV%2B-N5jhOEnnV%2BF0USM9xA%40mail.gmail.com
---
 src/backend/storage/aio/read_stream.c | 39 ++-
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index e4414b2e915..d722eda7d8e 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -193,9 +193,20 @@ read_stream_get_block(ReadStream *stream, void *per_buffer_data)
 	if (blocknum != InvalidBlockNumber)
 		stream->buffered_blocknum = InvalidBlockNumber;
 	else
+	{
+		/*
+		 * Tell Valgrind that the per-buffer data is undefined.  That replaces
+		 * the "noaccess" state that was set when the consumer moved past this
+		 * entry last time around the queue, and should also catch callbacks
+		 * that fail to initialize data that the buffer consumer later
+		 * accesses.  On the first go around, it is undefined already.
+		 */
+		VALGRIND_MAKE_MEM_UNDEFINED(per_buffer_data,
+	stream->per_buffer_data_size);
 		blocknum = stream->callback(stream,
 	stream->callback_private_data,
 	per_buffer_data);
+	}
 
 	return blocknum;
 }
@@ -752,8 +763,11 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
 	}
 
 #ifdef CLOBBER_FREED_MEMORY
-	/* Clobber old buffer and per-buffer data for debugging purposes. */
+	/* Clobber old buffer for debugging purposes. */
 	stream->buffers[oldest_buffer_index] = InvalidBuffer;
+#endif
+
+#if defined(CLOBBER_FREED_MEMORY) || defined(USE_VALGRIND)
 
 	/*
 	 * The caller will get access to the per-buffer data, until the next call.
@@ -762,11 +776,24 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
 	 * that is holding a dangling pointer to it.
 	 */
 	if (stream->per_buffer_data)
-		wipe_mem(get_per_buffer_data(stream,
-	 oldest_buffer_index == 0 ?
-	 stream->queue_size - 1 :
-	 oldest_buffer_index - 1),
- stream->per_buffer_data_size);
+	{
+		void	   *per_buffer_data;
+
+		per_buffer_data = get_per_buffer_data(stream,
+			  oldest_buffer_index == 0 ?
+			  stream->queue_size - 1 :
+			  oldest_buffer_index - 1);
+
+#if defined(CLOBBER_FREED_MEMORY)
+		/* This also tells Valgrind the memory is "noaccess". */
+		wipe_mem(get_per_buffer_data(per_buffer_data,
+	 stream->per_buffer_data_size));
+#elif defined(USE_VALGRIND)
+		/* Tell it ourselves. */
+		VALGRIND_MAKE_MEM_NO_ACCESS(per_buffer_data,
+	stream->per_buffer_data_size);
+#endif
+	}
 #endif
 
 	/* Pin transferred to caller. */
-- 
2.48.1

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Thomas Munro

On Sat, Feb 15, 2025 at 12:03 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Here's a patch.  Is there a tidier way to write this?
>
> Hmm, I think not with the current set of primitives.  We could think
> about refactoring them, but that's not a job for a band-aid patch.

Thanks for looking.

> > It should probably be back-patched to 17, because external code might
> > use per-buffer data (obviously v17 core doesn't or skink would have
> > told us this sooner).   It's not a good time to push to 17 today,
> > though.  Push to master now to cheer skink up and 17 some time later
> > when the coast is clear, or just wait?
>
> Agreed that right now is a bad time to push this to v17 --- we need to
> keep the risk factors as low as possible for the re-release.  Master
> now and v17 after the re-wrap seems like the right compromise.

Cool, will push to master.  Melanie, could you please confirm that
this patch works for you?  I haven't figured out what I'm doing wrong
but my local Valgrind doesn't seem to show the problem (USE_VALGRIND
defined, Debian's Valgrind v3.19.0).

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Tom Lane

Thomas Munro  writes:
> Here's a patch.  Is there a tidier way to write this?

Hmm, I think not with the current set of primitives.  We could think
about refactoring them, but that's not a job for a band-aid patch.

> It should probably be back-patched to 17, because external code might
> use per-buffer data (obviously v17 core doesn't or skink would have
> told us this sooner).   It's not a good time to push to 17 today,
> though.  Push to master now to cheer skink up and 17 some time later
> when the coast is clear, or just wait?

Agreed that right now is a bad time to push this to v17 --- we need to
keep the risk factors as low as possible for the re-release.  Master
now and v17 after the re-wrap seems like the right compromise.

regards, tom lane

Re: [PATCH] SVE popcount support

2025-02-14 Thread Nathan Bossart

On Thu, Feb 06, 2025 at 10:33:35AM -0600, Nathan Bossart wrote:
> On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com 
> wrote:
>>> Does this hand-rolled loop unrolling offer any particular advantage?  What
>>> do the numbers look like if we don't do this or if we process, say, 4
>>> vectors at a time?
>> 
>> The unrolled version performs better than the non-unrolled one, but
>> processing four vectors provides no additional benefit. The numbers
>> and code used are given below.
> 
> Hm.  Any idea why that is?  I wonder if the compiler isn't using as many
> SVE registers as it could for this.

I've also noticed that the latest patch doesn't compile on my M3 macOS
machine.  After a quick glance, I think the problem is that the
TRY_POPCNT_FAST macro is set, so it's trying to compile the assembly
versions.

../postgresql/src/port/pg_bitutils.c:230:41: error: invalid output constraint 
'=q' in asm
  230 | __asm__ __volatile__(" popcntl %1,%0\n":"=q"(res):"rm"(word):"cc");
  | ^
../postgresql/src/port/pg_bitutils.c:247:41: error: invalid output constraint 
'=q' in asm
  247 | __asm__ __volatile__(" popcntq %1,%0\n":"=q"(res):"rm"(word):"cc");
  | ^
2 errors generated.

-- 
nathan

Re: Parallel heap vacuum

2025-02-14 Thread Melanie Plageman

On Wed, Feb 12, 2025 at 5:37 PM Masahiko Sawada  wrote:
>
> Since we introduced the eagar vacuum scan (052026c9b9), I need to
> update the parallel heap vacuum patch. After thinking of how to
> integrate these two features, I find some complexities. The region
> size used by eager vacuum scan and the chunk size used by parallel
> table scan are different. While the region is fixed size the chunk
> becomes smaller as we scan the table. A chunk of the table that a
> parallel vacuum worker took could be across different regions or be
> within one region, and different parallel heap vacuum workers might
> scan the same region. And parallel heap vacuum workers could be
> scanning different regions of the table simultaneously.

Ah, I see. What are the chunk size ranges? I picked a 32 MB region
size after a little testing and mostly because it seemed reasonable. I
think it would be fine to use different region size. Parallel workers
could just consider the chunk they get an eager scan region (unless it
is too small or too large -- then it might not make sense).

> During eager vacuum scan, we reset the eager_scan_remaining_fails
> counter when we start to scan the new region. So if we want to make
> parallel heap vacuum behaves exactly the same way as the
> single-progress vacuum in terms of the eager vacuum scan, we would
> need to have the eager_scan_remaining_fails counters for each region
> so that the workers can decrement it corresponding to the region of
> the block that the worker is scanning. But I'm concerned that it makes
> the logic very complex. I'd like to avoid making newly introduced
> codes more complex by adding yet another new code on top of that.

I don't think it would have to behave exactly the same. I think we
just don't want to add a lot of complexity or make it hard to reason
about.

Since the failure rate is defined as a percent, couldn't we just have
parallel workers set eager_scan_remaining_fails when they get their
chunk assignment (as a percentage of their chunk size)? (I haven't
looked at the code, so maybe this doesn't make sense).

For the success cap, we could have whoever hits it first disable eager
scanning for all future assigned chunks.

> Another idea is to disable the eager vacuum scan when parallel heap
> vacuum is enabled. It might look like just avoiding difficult things
> but it could make sense in a sense. The eager vacuum scan is aimed to
> amortize the aggressive vacuum by incrementally freezing pages that
> are potentially frozen by the next aggressive vacuum. On the other
> hand, parallel heap vacuum is available only in manual VACUUM and
> would be used to remove garbage on a large table as soon as possible
> or to freeze the entire table to avoid reaching the XID limit. So I
> think it might make sense to disable the eager vacuum scan when
> parallel vacuum.

Do we only do parallelism in manual vacuum because we don't want to
use up too many parallel workers for a maintenance subsystem? I never
really tried to find out why parallel index vacuuming is only in
manual vacuum. I assume you made the same choice they did for the same
reasons.

If the idea is to never allow parallelism in vacuum, then I think
disabling eager scanning during manual parallel vacuum seems
reasonable. People could use vacuum freeze if they want more freezing.

Also, if you start with only doing parallelism for the third phase of
heap vacuuming (second pass over the heap), this wouldn't be a problem
because eager scanning only impacts the first phase.

- Melanie

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Thomas Munro

On Sat, Feb 15, 2025 at 7:30 AM Melanie Plageman
 wrote:
> Seems valgrind doesn't like this [1]. I'm looking into it.

Melanie was able to reproduce this on her local valgrind and
eventually we figured out that it's my fault.  I put code into
read_stream.c that calls wipe_mem(), thinking that that was our
standard way of scribbling 0x7f on memory that you shouldn't access
again until it's reused.  I didn't realise that wipe_mem() also tells
valgrind that the memory is now "no access".  That makes sense for
palloc/pfree because when that range is allocated again it'll clear
that.  The point is to help people discover that they have a dangling
reference to per-buffer data after they advance to the next buffer,
which wouldn't work because it's in a circular queue and could be
overwritten any time after that.

This fixes it, but is not yet my proposed change:

@@ -193,9 +193,12 @@ read_stream_get_block(ReadStream *stream, void
*per_buffer_data)
if (blocknum != InvalidBlockNumber)
stream->buffered_blocknum = InvalidBlockNumber;
else
+   {
+   VALGRIND_MAKE_MEM_UNDEFINED(per_buffer_data,
stream->per_buffer_data_size);
blocknum = stream->callback(stream,

 stream->callback_private_data,

 per_buffer_data);
+   }

Thinking about how to make it more symmetrical...

Re: New buildfarm animals with FIPS mode enabled

2025-02-14 Thread Jacob Champion

On Fri, Feb 14, 2025 at 12:51 PM Daniel Gustafsson  wrote:
>
> > On 14 Feb 2025, at 19:01, Tom Lane  wrote:
>
> > I'm kind of disinclined to do all the work that'd be needed to turn
> > these animals completely green, especially when the reason to do it
> > seems to be that someone decided we should without any community
> > consultation.  Perhaps others have different opinions though.
>
> If the owner of the BF animal shows up with a patch for providing alternative
> outputs for the backbranches I don't mind accepting it, I'm not volunteering
> myself to do more than review though.

I'm not buildfarm@, but these animals have now been stopped until we
get them figured out. Sorry -- and thanks for the ping Tom!

--Jacob

Re: New buildfarm animals with FIPS mode enabled

2025-02-14 Thread Tom Lane

Jacob Champion  writes:
> I'm not buildfarm@, but these animals have now been stopped until we
> get them figured out. Sorry -- and thanks for the ping Tom!

Thanks for that.  Just to be clear, I think it'd be great to run
those RHEL9 animals on v17 and later.  I'm only questioning whether
it's worth the work to back-patch the regression changes to older
branches, and even more whether we'd learn anything by supporting
OpenSSL 1.x's variant spelling of the error messages.

Back-patching to make OpenSSL 3 green on all current branches would
at least be a one-time effort.  The other thing would entail a new
set of variant expected-files that we'd have to maintain into the
future, so I'm feeling much more dubious about that one.

regards, tom lane

Re: Change GUC hashtable to use simplehash?

2025-02-14 Thread John Naylor

On Fri, Feb 14, 2025 at 6:40 PM Anton A. Melnikov
 wrote:
> Yes, of course. I tested this patch on the current master at 9e17ac997
> in the same way and found no valgrind errors.

Thanks, I'll push next week after the next minor release.

PS: I now realize the source of the confusion: In the time after your
initial report, I misremembered what the bad commit was. Sorry about
that!

-- 
John Naylor
Amazon Web Services

Re: Move wal_buffers_full to WalUsage (and report it in pgss/explain)

2025-02-14 Thread Bertrand Drouvot

Hi,

On Fri, Feb 14, 2025 at 03:24:22PM +0900, Michael Paquier wrote:
> On Tue, Feb 11, 2025 at 09:37:37AM +, Bertrand Drouvot wrote:
> > While at it, adding 0004 to report wal_buffers_full in VACUUM/ANALYZE 
> > (VERBOSE).
> 
> Thanks for summarizing the history behind WalUsage and
> wal_buffers_full.

Thanks for looking at it.

> FWIW, 0001 that moves wal_buffers_full from
> PgStat_PendingWalStats to WalUsage is going to make our lives much
> easier for your pending patch to adds backend statistics for WAL.  WAL
> write/sync numbers/times will be covered in the backend stats by
> pg_stat_io, allowing us to remove entirely the dependency to
> PgStat_PendingWalStats.

Yup.

> I have been wondering for a bit if the comment at the top of WalUsage
> should be updated, but the current description fits as well with the
> follow-up patch series.

Agree that the comment is "still" fine with the extra struct member.

> 0002, 0003 and 0004 are straight-forward follow-ups.  It's IMO one of
> these things where extra data is cheap to have access to, and can be
> useful to be aware when distributed across multiple contexts like
> queries, plans or even EXPLAIN.  So no real objections here.

Yeah and that makes the comment at the top of WalUsage accurate ;-) 

"
and is displayed by EXPLAIN command, pg_stat_statements extension, etc
"

> show_wal_usage() should have its check on (usage->wal_buffers_full) in
> explain.c, as Ilia has mentioned.  It's true that you would not get a
> (wal_buffers_full > 0) if at least the condition on wal_bytes is not
> satisfied, but the addition makes sense on consistency grounds, at
> least.

I'm thinking the opposite and think that the current check could be simplfied to
"usage->wal_bytes > 0" only. I don't think that wal_records and wal_fpi and
wal_buffers_full can be = 0 if wal_bytes > 0. I don't think that's worth a 
dedicated
thread and could be done in passing instead of adding the check on 
wal_buffers_full.

Done that way in the attached (added a comment in the commit message though).

> Agreed about the attribute ordering in pgss with everything associated
> to WalUsage grouped together, btw.

Ok, you have the majority then and looking at the ordering of the "api_version 
>="
checks in pg_stat_statements_internal() it looks like it's not the first time
we'd break queries relying on ordinal numbers. Done that way in the attached.
Note that the attached also changes the ordering in the Counters struct (to be
consistent with what 5147ab1dd34a did for example).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 0cd60f0c623f338a3a023064c4245b42a7bf40ec Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 5 Feb 2025 13:17:43 +
Subject: [PATCH v3 1/4] Move wal_buffers_full to pgWalUsage

Moving wal_buffers_full to pgWalUsage. That simplifies the refactoring needed
in an upcoming commit (to display per-backend WAL statistics). Last but not
least that also open the door to add wal_buffers_full in EXPLAIN and in
pg_stat_statements (all existing pgWalUsage members are reported in the
above).
---
 src/backend/access/transam/xlog.c   | 2 +-
 src/backend/executor/instrument.c   | 2 ++
 src/backend/utils/activity/pgstat_wal.c | 2 +-
 src/include/executor/instrument.h   | 1 +
 src/include/pgstat.h| 1 -
 5 files changed, 5 insertions(+), 3 deletions(-)
  19.2% src/backend/access/transam/
  30.4% src/backend/executor/
  24.1% src/backend/utils/activity/
  17.7% src/include/executor/
   8.4% src/include/

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a50fd99d9e5..25a5c605404 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2057,7 +2057,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 	WriteRqst.Flush = 0;
 	XLogWrite(WriteRqst, tli, false);
 	LWLockRelease(WALWriteLock);
-	PendingWalStats.wal_buffers_full++;
+	pgWalUsage.wal_buffers_full++;
 	TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 }
 /* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 2d3569b3748..56e635f4700 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -280,6 +280,7 @@ WalUsageAdd(WalUsage *dst, WalUsage *add)
 	dst->wal_bytes += add->wal_bytes;
 	dst->wal_records += add->wal_records;
 	dst->wal_fpi += add->wal_fpi;
+	dst->wal_buffers_full += add->wal_buffers_full;
 }
 
 void
@@ -288,4 +289,5 @@ WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
 	dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
 	dst->wal_records += add->wal_records - sub->wal_records;
 	dst->wal_fpi += add->wal_fpi - sub->wal_fpi;
+	dst->wal_buffers_full += add->wal_buffers_full - sub->wal_buffers_full;
 }
diff --git a/src/backend/utils/acti

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-02-14 Thread Jakub Wartak

On Wed, Feb 12, 2025 at 9:57 PM Robert Haas  wrote:
>
> On Wed, Feb 12, 2025 at 3:07 PM Tomas Vondra  wrote:
> > AFAICS the "1" value is simply one of the many "defensive" defaults in
> > our sample config. It's much more likely to help than cause harm, even
> > on smaller/older systems, but for many systems a higher value would be
> > more appropriate. There's usually a huge initial benefit (say, going to
> > 16 or 32), and then the benefits diminish fairly quickly.
>
> I'm happy to see us change the value to something that is likely to be
> good for most people. I think it's OK if people on very tiny systems
> need to change a few defaults for optimum performance. We should keep
> in mind that people do sometimes run PostgreSQL on fairly small VMs
> and not go crazy with it, but there's no reason to pretend that the
> typical database runs on a Raspberry Pi.

Is there any reason we couldn't have new pg_test_iorates (similiar to
other pg_test_* proggies), that would literally do this and calibrate
best e_io_c during initdb and put the result into postgresql.auto.conf
(pg_test_iorates --adjust-auto-conf) , that way we would avoid user
questions on how to come with optimal value?

root@jw-test3:/nvme# ./pg_test_iorates
File size: 16.00 GB, Block size: 8192 bytes
buffered sequential read: 1573.33 MB/s
direct sequential read: 289.46 MB/s
random read performance with different prefetch distances:
using prefetch distance 1: 173.58 MB/s
using prefetch distance 2: 243.05 MB/s
using prefetch distance 4: 376.78 MB/s
using prefetch distance 8: 590.66 MB/s
using prefetch distance 16: 824.49 MB/s
using prefetch distance 32: 861.45 MB/s
using prefetch distance 64: 830.36 MB/s

Attached, compile naive way via: gcc pg_test_iorates.c -o
pg_test_iorates -I /git/postgres/build/src/include -I
/git/postgres/src/include -L /git/postgres/build/src/common
-L/git/postgres/build/src/port -lpgcommon -lpgport -lm

-J.
/*-
 *
 * pg_test_iorates --- measures I/O and prefetch distance impact
 *
 * Win32 not tested
 *
 *-
 */

#include "c.h"
#include "postgres_fe.h"
#include 
#include 
#include 
#define _GNU_SOURCE
#define __USE_GNU 1
#include 
#include 
#include 
#include 
#include "common/logging.h"
#include "common/pg_prng.h"
#include "getopt_long.h"

#define DEFAULT_FILE_SIZE 16 * 1024 * 1024 * 1024   /* 16 GB */
#define DEFAULT_BLOCK_SIZE 8192 /* 8 KB */
#define DEFAULT_TEST_TIME 3 /* seconds */

static const char *progname;
static char*test_filename = "pg_test_io.tmp";
static int  needs_unlink = 0;
static ssize_t file_size = (ssize_t)DEFAULT_FILE_SIZE;
static size_t block_size = DEFAULT_BLOCK_SIZE;
static unsigned test_seconds = DEFAULT_TEST_TIME;
static volatile sig_atomic_t alarm_triggered;

static void handle_args(int argc, char **argv);
static void create_test_file(void);
static void cleanup(void);
static void signal_handler(SIGNAL_ARGS);
static void test_sequential_read(int use_direct);
static void test_random_read(int use_direct, int use_advise, int 
prefetch_distance);

#ifndef WIN32
#define DIRECT_FLAG O_DIRECT
#else
#define DIRECT_FLAG _O_DIRECT
#endif

int
main(int argc, char **argv)
{
progname = get_progname(argv[0]);
pg_logging_init(progname);

handle_args(argc, argv);

#ifndef WIN32
signal(SIGALRM, signal_handler);
#endif

pg_prng_seed(&pg_global_prng_state, (uint64) time(NULL));
printf("File size: %.2f GB, Block size: %zu bytes\n",
   (double)file_size / (1024 * 1024 * 1024), block_size);

create_test_file();

test_sequential_read(false);
#ifdef DIRECT_FLAG
test_sequential_read(true);
#endif

printf("random read performance with different prefetch distances:\n");
int test_distances[] = {1, 2, 4, 8, 16, 32, 64};
int num_distances = sizeof(test_distances) / sizeof(int);

for (int i = 0; i < num_distances; i++) {
int d = test_distances[i];
printf("using prefetch distance %d: ", d);
fflush(stdout);
test_random_read(false, true, d);
}

cleanup();
return 0;
}

static void
handle_args(int argc, char **argv)
{
static struct option long_options[] = {
{"file", required_argument, NULL, 'f'},
{"size", required_argument, NULL, 's'},
{"block-size", required_argument, NULL, 'b'},
{"time", required_argument, NULL, 't'},
{NULL, 0, NULL, 0}
};

int c;
while ((c = getopt_long(argc, argv, "f:s:b:t:", long_options, NULL)) != 
-1) {
switch (c) {
case 'f':
test_filename = pg_strdup(optarg);
break;
case 's':

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Álvaro Herrera

On 2025-Feb-14, Julien Rouhaud wrote:

> Since the merging is a yes/no option (I think there used to be some 
> discussions
> about having a threshold or some other fancy modes), maybe you could instead
> differentiate the merged version by have 2 constants rather than this "..." or
> something like that?

Maybe the representation can be "($1 /*, ... */)" so that it's obvious
that the array extends beyond the first element but is still
syntactically valid.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"In Europe they call me Niklaus Wirth; in the US they call me Nickel's worth.
 That's because in Europe they call me by name, and in the US by value!"

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Sami Imseih

> > > Since the merging is a yes/no option (I think there used to be some 
> > > discussions
> > > about having a threshold or some other fancy modes), maybe you could 
> > > instead
> > > differentiate the merged version by have 2 constants rather than this 
> > > "..." or
> > > something like that?
> > >
> > Maybe the representation can be "($1 /*, ... */)" so that it's obvious
> > that the array extends beyond the first element but is still
> > syntactically valid.

> Yeah that works too and it's probably way easier to implement.

+1

Just to throw out an alternate idea using comments. What about
adding a comment at the start to the query "/* query_id_squash_values */"
and keep the parameter symbols as-is. The comment at the start will
indicate that this feature was used.

> > On Thu, Feb 13, 2025 at 05:08:45PM GMT, Sami Imseih wrote:
> > Constants passed as parameters to a prepared statement will not be
> > handled as expected. I did not not test explicit PREPARE/EXECUTE statement,
> > but I assume it will have the same issue.

> This is the same question of supporting various cases. The original
> patch implementation handled Param expressions as well, this part was
> explicitly rejected during review. I think as a first step it's
> important to find a balance between applying this optimization in as
> many cases as possible, and at the same time keep the implementation
> simple to give the patch a chance. So far I'm inclined to leave Param
> for the future work, although of course I'm open to discussion.

I do see the discussion here [1], sorry for not noticing it.

I am not sure about this though. At minimum this needs to be documented,
However, I think the  prepared statement case is too common of a case to
skip for the first release of tis feature, and users that will likely
benefit from this feature are using prepared statements ( i.e. JDBC, etc ).

But, others may disagree.

> > pg_node_attr of query_jumble_merge is doing something
> > very specific to the elements list of an ArrayExpr. The
> > merge code likely cannot be used for other node types.

> It can be, take a look at pg_node_attr commentary. Any node can have a
> field marked with query_jumble_merge attribut and benefit from merging.

I can't think of other cases beyond ArrayExpr where this will be needed.
The node that could use this will need to carry constants, but ArrayExpr
is the only case I can think of in which this will be useful for jumbling.
There should be a really good reason IMO to do something other than the
existing pattern of using custom_query_jumble.

I scanned through the thread and could not find a discussion on this,
but maybe others have an opinion.

> > This case with an array passed to aa function seems to cause a regression
> > in pg_stat_statements query text. As you can see the text is incomplete.

> I've already mentioned that in the previous email. To reiterate, it's
> not a functionality regression, but an incomplete representation of a
> normalized query which turned out to be hard to change. While I'm
> working on that, there is a suggestion that it's not a blocker.

It's not a functionality regression as far as query execution
or pg_stat_statements counters go, but it is a regression as far as
displaying query text in pg_stat_statements. pg_stat_statements, unlike
pg_stat_acitivty, makes a guaranteee not to trim text as stated in the docs [2]
"The representative query texts are kept in an external disk file,
and do not consume shared memory. Therefore,
even very lengthy query texts can be stored successfully."

I don't think any feature that trims text in pg_stat_statements is acceptable,
IMO. Others may disagree.

Regards,

Sami

[1] 
https://www.postgresql.org/message-id/20230211104707.grsicemegr7d3mgh@erthalion.local
[2] https://www.postgresql.org/docs/current/pgstatstatements.html

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-02-14 Thread Shlok Kyal

On Fri, 14 Feb 2025 at 10:50, Shubham Khanna
 wrote:
>
> On Thu, Feb 13, 2025 at 5:36 PM Shlok Kyal  wrote:
> >
> > On Thu, 13 Feb 2025 at 15:20, Shubham Khanna
> >  wrote:
> > >
> > > On Tue, Feb 11, 2025 at 9:56 PM Shlok Kyal  
> > > wrote:
> > > >
> > > > On Tue, 11 Feb 2025 at 09:51, Shubham Khanna
> > > >  wrote:
> > > > >
> > > > > On Fri, Feb 7, 2025 at 7:46 AM Hayato Kuroda (Fujitsu)
> > > > >  wrote:
> > > > > >
> > > > > > Dear Shubham,
> > > > > >
> > > > > > Thanks for updating the patch.
> > > > > >
> > > > > > Previously you told that you had a plan to extend the patch to drop 
> > > > > > other replication
> > > > > > objects [1], but I think it is not needed. pg_createsubscriber has 
> > > > > > already been
> > > > > > able to drop the existing subscrisubscriptions in 
> > > > > > check_and_drop_existing_subscriptions().
> > > > > > As for the replication slot, I have told in [2], it would be 
> > > > > > created intentionally
> > > > > > thus I feel it should not be dropped.
> > > > > > Thus I regard the patch does not have concrete extending plan.
> > > > > >
> > > > > > Below part contains my review comment.
> > > > > >
> > > > > > 01. Option name
> > > > > >
> > > > > > Based on the above discussion, "--cleanup-publisher-objects" is not 
> > > > > > suitable because
> > > > > > it won't drop replication slots. How about "--cleanup-publications"?
> > > > > >
> > > > >
> > > > > I have changed the name of the option  to 
> > > > > "--cleanup-existing-publications"
> > > > >
> > > > > > 02. usage()
> > > > > > ```
> > > > > > +   printf(_("  -C  --cleanup-publisher-objects drop all 
> > > > > > publications on the logical replica\n"));
> > > > > > ```
> > > > >
> > > > > Fixed.
> > > > >
> > > > > > s/logical replica/subscriber
> > > > > >
> > > > > > 03. drop_all_publications
> > > > > > ```
> > > > > > +/* Drops all existing logical replication publications from all 
> > > > > > subscriber
> > > > > > + * databases
> > > > > > + */
> > > > > > +static void
> > > > > > ```
> > > > > >
> > > > > > Initial line of the comment must be blank [3].
> > > > > >
> > > > >
> > > > > Removed this function.
> > > > >
> > > > > > 04. main
> > > > > > ```
> > > > > > +   {"cleanup-publisher-objects", no_argument, NULL, 
> > > > > > 'C'},
> > > > > > ```
> > > > > >
> > > > > > Is there a reason why upper case is used? I feel lower one is 
> > > > > > enough.
> > > > > >
> > > > >
> > > > > Fixed.
> > > > >
> > > > > > 05. main
> > > > > > ```
> > > > > > +   /* Drop publications from the subscriber if requested */
> > > > > > +   if (opt.cleanup_publisher_objects)
> > > > > > +   drop_all_publications(dbinfo);
> > > > > > ```
> > > > > >
> > > > > > After considering more, I noticed that we have already called 
> > > > > > drop_publication()
> > > > > > in the setup_subscriber(). Can we call drop_all_publications() 
> > > > > > there instead when
> > > > > > -C is specified?
> > > > > >
> > > > >
> > > > > I agree with you on this. I have removed the drop_all_publication()
> > > > > and added the "--cleanup-existing-publications" option to the
> > > > > drop_publication()
> > > > >
> > > > > > 06. 040_pg_createsubscriber.pl
> > > > > >
> > > > > > ```
> > > > > > +$node_s->start;
> > > > > > +# Create publications to test it's removal
> > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL 
> > > > > > TABLES;");
> > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL 
> > > > > > TABLES;");
> > > > > > +
> > > > > > +# Verify the existing publications
> > > > > > +my $pub_count_before =
> > > > > > +  $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;");
> > > > > > +is($pub_count_before, '2',
> > > > > > +   'two publications created before 
> > > > > > --cleanup-publisher-objects is run');
> > > > > > +
> > > > > > +$node_s->stop;
> > > > > > ```
> > > > > >
> > > > > > I feel it requires unnecessary startup and shutdown. IIUC, creating 
> > > > > > publications and check
> > > > > > counts can be before stopping the node_s, around line 331.
> > > > > >
> > > > >
> > > > > Fixed.
> > > > >
> > > > > > 07. 040_pg_createsubscriber.pl
> > > > > >
> > > > > > ```
> > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL 
> > > > > > TABLES;");
> > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL 
> > > > > > TABLES;");
> > > > > > +
> > > > > > +# Verify the existing publications
> > > > > > +my $pub_count_before =
> > > > > > +  $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;");
> > > > > > +is($pub_count_before, '2',
> > > > > > +   'two publications created before 
> > > > > > --cleanup-publisher-objects is run');
> > > > > > +
> > > > > > ```
> > > > > >
> > > > > > Also, there is a possibility that CREATE PUBLICATION on node_p is 
> > > > > > not replicated yet
> > > > > > when SELECT COUNT(*) is executed. Please wait for the replay.
> > > > > >
> >

Re: Get rid of WALBufMappingLock

2025-02-14 Thread Pavel Borisov

Hi, Alexander!

On Fri, 14 Feb 2025 at 00:59, Alexander Korotkov  wrote:
>
> Hi, Pavel!
>
> On Thu, Feb 13, 2025 at 6:39 PM Pavel Borisov  wrote:
> > On Thu, 13 Feb 2025 at 14:08, Alexander Korotkov  
> > wrote:
> > >
> > > On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov  
> > > wrote:
> > > > 13.02.2025 12:34, Alexander Korotkov пишет:
> > > > > On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov 
> > > > >  wrote:
> > > > >> 08.02.2025 13:07, Alexander Korotkov пишет:
> > > > >>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov 
> > > > >>>  wrote:
> > > >  Good, thank you.  I think 0001 patch is generally good, but needs 
> > > >  some
> > > >  further polishing, e.g. more comments explaining how does it work.
> > > > >>
> > > > >> I tried to add more comments. I'm not good at, so recommendations 
> > > > >> are welcome.
> > > > >>
> > > > >>> Two things comes to my mind worth rechecking about 0001.
> > > > >>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
> > > > >>> XLogCtl->xlblocks always page-aligned?  Because algorithm seems to 
> > > > >>> be
> > > > >>> sensitive to that.  If so, I would propose to explicitly comment 
> > > > >>> that
> > > > >>> and add corresponding asserts.
> > > > >>
> > > > >> They're certainly page aligned, since they are page borders.
> > > > >> I added assert on alignment of InitializeReserved for the sanity.
> > > > >>
> > > > >>> 2) Check if there are concurrency issues between
> > > > >>> AdvanceXLInsertBuffer() and switching to the new WAL file.
> > > > >>
> > > > >> There are no issues:
> > > > >> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses
> > > > >> GetXLogBuffer to zero-out WAL page.
> > > > >> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion 
> > > > >> locks,
> > > > >> so switching wal is not concurrent. (Although, there is no need in 
> > > > >> this
> > > > >> exclusiveness, imho.)
> > > > >
> > > > > Good, thank you.  I've also revised commit message and comments.
> > > > >
> > > > > But I see another issue with this patch.  In the worst case, we do
> > > > > XLogWrite() by ourselves, and it could potentially could error out.
> > > > > Without patch, that would cause WALBufMappingLock be released and
> > > > > XLogCtl->InitializedUpTo not advanced.  With the patch, that would
> > > > > cause other processes infinitely waiting till we finish the
> > > > > initialization.
> > > > >
> > > > > Possible solution would be to save position of the page to be
> > > > > initialized, and set it back to XLogCtl->InitializeReserved on error
> > > > > (everywhere we do LWLockReleaseAll()).  We also must check that on
> > > > > error we only set XLogCtl->InitializeReserved to the past, because
> > > > > there could be multiple concurrent failures.  Also we need to
> > > > > broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.
> > > >
> > > > The single place where AdvanceXLInsertBuffer is called outside of 
> > > > critical
> > > > section is in XLogBackgroundFlush. All other call stacks will issue 
> > > > server
> > > > restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer.
> > > >
> > > > XLogBackgroundFlush explicitely avoids writing buffers by passing
> > > > opportunistic=true parameter.
> > > >
> > > > Therefore, error in XLogWrite will not cause hang in 
> > > > AdvanceXLInsertBuffer
> > > > since server will shutdown/restart.
> > > >
> > > > Perhaps, we just need to insert `Assert(CritSectionCount > 0);` before 
> > > > call
> > > > to XLogWrite here?
> > >
> > > You're correct.  I just reflected this in the next revision of the patch.
> >
> > I've looked at the patchset v6.
>
> Oh, sorry, I really did wrong.  I've done git format-patch for wrong
> local branch for v5 and v6.  Patches I've sent for v5 and v6 are
> actually the same as my v1.  This is really pity.  Please, find the
> right version of patchset attached.

I've rechecked v7. In v6 a proposal from [1] was not reflected. Now it
landed in v7.

Other changes are not regarding code behavior. The things from my
previous review that still could apply to v7:

For 0001:

Comment change proposed:
"lock-free with cooperation with" -> "lock-free accompanied by changes
to..." (maybe other variant)

I propose a new define:
#define FirstValidXLogRecPtr 1
While FirstValidXLogRecPtr = InvalidXLogRecPtr + 1 is true in the code
that has no semantical meaning and it's better to avoid using direct
arithmetics to relate meaning of FirstValidXLogRecPtr from
InvalidXLogRecPtr.

For 0002 both comments proposals from my message applied to v6 apply
to v7 as well

[1] 
https://www.postgresql.org/message-id/d6799557-e352-42c8-80cc-ed36e3b8893c%40postgrespro.ru

Regards,
Pavel Borisov
Supabase

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Julien Rouhaud

On Fri, Feb 14, 2025 at 10:36:48AM +0100, Álvaro Herrera wrote:
> On 2025-Feb-14, Julien Rouhaud wrote:
> 
> > Since the merging is a yes/no option (I think there used to be some 
> > discussions
> > about having a threshold or some other fancy modes), maybe you could instead
> > differentiate the merged version by have 2 constants rather than this "..." 
> > or
> > something like that?
> 
> Maybe the representation can be "($1 /*, ... */)" so that it's obvious
> that the array extends beyond the first element but is still
> syntactically valid.

Yeah that works too and it's probably way easier to implement.

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Dmitry Dolgov

> On Fri, Feb 14, 2025 at 05:57:01PM GMT, Julien Rouhaud wrote:
> On Fri, Feb 14, 2025 at 10:36:48AM +0100, Álvaro Herrera wrote:
> > On 2025-Feb-14, Julien Rouhaud wrote:
> >
> > > Since the merging is a yes/no option (I think there used to be some 
> > > discussions
> > > about having a threshold or some other fancy modes), maybe you could 
> > > instead
> > > differentiate the merged version by have 2 constants rather than this 
> > > "..." or
> > > something like that?
> >
> > Maybe the representation can be "($1 /*, ... */)" so that it's obvious
> > that the array extends beyond the first element but is still
> > syntactically valid.
>
> Yeah that works too and it's probably way easier to implement.

Agree, that looks good, here is the version that uses this format.
>From e193fa6b99102c805e048d2cb70476291cd82fc3 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Tue, 3 Dec 2024 14:55:45 +0100
Subject: [PATCH v25] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is jumbled. In certain situations it's undesirable, especially
if the list becomes too large.

Make an array of Const expressions contribute only the first/last
elements to the jumble hash. Allow to enable this behavior via the new
pg_stat_statements parameter query_id_squash_values with the default value off.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane,
Michael Paquier, Sergei Kornilov, Alvaro Herrera, David Geier, Sutou Kouhei,
Sami Imseih, Julien Rouhaud
Tested-by: Chengxi Sun, Yasuo Honda
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/merging.out   | 432 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  47 +-
 contrib/pg_stat_statements/sql/merging.sql| 169 +++
 doc/src/sgml/config.sgml  |  27 ++
 doc/src/sgml/pgstatstatements.sgml|  28 +-
 src/backend/nodes/gen_node_support.pl |  21 +-
 src/backend/nodes/queryjumblefuncs.c  | 165 ++-
 src/backend/postmaster/launch_backend.c   |   3 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/nodes/nodes.h |   3 +
 src/include/nodes/primnodes.h |   2 +-
 src/include/nodes/queryjumble.h   |   8 +-
 15 files changed, 894 insertions(+), 26 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/merging.out
 create mode 100644 contrib/pg_stat_statements/sql/merging.sql

diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 241c02587b..eef8d69cc4 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config 
$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
user_activity wal entry_timestamp privileges extended \
-   parallel cleanup oldextversions
+   parallel cleanup oldextversions merging
 # Disabled because these tests require 
"shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/merging.out 
b/contrib/pg_stat_statements/expected/merging.out
new file mode 100644
index 00..470cdf5a7f
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/merging.out
@@ -0,0 +1,432 @@
+--
+-- Const merging functionality
+--
+CREATE EXTENSION pg_stat_statements;
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- No merging is performed, as a baseline result
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+query  
  | calls 
+-+---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) 
  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, 
$10)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, 
$10, $11) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t

Re: Change GUC hashtable to use simplehash?

2025-02-14 Thread Anton A. Melnikov


Hi, Jhon!

On 14.02.2025 08:17, John Naylor wrote:

Hi Anton, could you please test if the attached passes for you? This
seems the simplest way.


Yes, of course. I tested this patch on the current master at 9e17ac997
in the same way and found no valgrind errors.

Moreover, with -O2 build on my PC the nsphash_lookup() has
become fully inlined with this patch and almost two time
faster - 112 asm instructions under the same conditions.
Thanks!

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PATCH: Disallow a netmask of zero unless the IP is also all zeroes

2025-02-14 Thread Andreas Karlsson


On 2/11/25 9:25 PM, Tom Lane wrote:

Greg Sabino Mullane  writes:

I say "of course" but few people (even tech ones) know the distinction.
(Nor should they have to! But that's for a nearby thread). This patch aims
to prevent this very bad footgun by only allowing a /0 if the IP consists
of only zeroes. It works for ipv4 and ipv6.


More generally, should we reject if the netmask causes *any* nonzero
IP bits to be ignored?  Our CIDR type already imposes that rule:


+1 From me too. I think we should fix the general issue rather than 
special casing /0.


Andreas

Re: Assignment before assert

2025-02-14 Thread Daniel Gustafsson

> On 14 Feb 2025, at 06:44, Michael Paquier  wrote:

> the proposed patch makes sense to me.

Committed, with a backpatch down to 13.

--
Daniel Gustafsson

Re: BackgroundPsql swallowing errors on windows

2025-02-14 Thread Daniel Gustafsson

> On 13 Feb 2025, at 18:39, Andres Freund  wrote:

> The banner being the same between queries made it hard to understand if a
> banner that appeared in the output was from the current query or a past
> query. Therefore I added a counter to it.

+   my $banner = "background_psql: QUERY_SEPARATOR $query_cnt";
+   my $banner_match = qr/(^|\n)$banner\r?\n/;
+   $self->{stdin} .= "$query\n;\n\\echo $banner\n\\warn $banner\n";
+   pump_until(
+   $self->{run}, $self->{timeout},
+   \$self->{stdout}, qr/$banner_match/);

Won't this allow "QUERY_SEPARATOR 11" to match against "QUERY_SEPARATOR 1"?
It's probably only of academic interest but appending an end-of-banner
character like "_" or something after the query counter should fix that.

> For debugging I added a "note" that shows stdout/stderr after executing the
> query, I think it may be worth keeping that, but I'm not sure.

I think it could be worth it, +1 for keeping it until it's beeb proven a
problem somewhere.

> This was a rather painful exercise.

I have no trouble believing that.

--
Daniel Gustafsson

Re: Add an option to skip loading missing publication to avoid logical replication failure

2025-02-14 Thread Amit Kapila

On Mon, Feb 19, 2024 at 12:49 PM vignesh C  wrote:
>
> Currently ALTER SUBSCRIPTION ... SET PUBLICATION will break the
> logical replication in certain cases. This can happen as the apply
> worker will get restarted after SET PUBLICATION, the apply worker will
> use the existing slot and replication origin corresponding to the
> subscription. Now, it is possible that before restart the origin has
> not been updated and the WAL start location points to a location prior
> to where PUBLICATION pub exists which can lead to such an error. Once
> this error occurs, apply worker will never be able to proceed and will
> always return the same error.
>
> There was discussion on this and Amit had posted a patch to handle
> this at [2]. Amit's patch does continue using a historic snapshot but
> ignores publications that are not found for the purpose of computing
> RelSyncEntry attributes. We won't mark such an entry as valid till all
> the publications are loaded without anything missing. This means we
> won't publish operations on tables corresponding to that publication
> till we found such a publication and that seems okay.
> I have added an option skip_not_exist_publication to enable this
> operation only when skip_not_exist_publication is specified as true.
> There is no change in default behavior when skip_not_exist_publication
> is specified as false.
>

Did you try to measure the performance impact of this change? We can
try a few cases where DDL and DMLs are involved, missing publication
(drop publication and recreate after a varying number of records to
check the impact).

The other names for the option could be:
skip_notexistant_publications, or ignore_nonexistant_publications. Can
we think of any others?

-- 
With Regards,
Amit Kapila.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-02-14 Thread Bertrand Drouvot

Hi,

On Fri, Feb 14, 2025 at 12:17:48AM -0800, Masahiko Sawada wrote:
> On Tue, Feb 11, 2025 at 11:44 PM Bertrand Drouvot
>  wrote:

> Looking at the latest custodian worker patch, the basic architecture
> is to have a single custodian worker and processes can ask it for some
> work such as removing logical decoding related files. The online
> wal_level change will be the one of the tasks that processes (eps.
> checkpointer) can ask for it. On the other hand, one point that I
> think might not fit this wal_level work well is that while the
> custodian worker is a long-lived worker process,

That was the case initialy but it looks like it would not have been the case
at the end. See, Tom's comment in [1]:

"
I wonder if a single long-lived custodian task is the right model at all.
At least for RemovePgTempFiles, it'd make more sense to write it as a
background worker that spawns, does its work, and then exits,
independently of anything else
"

> it's sufficient for
> the online wal_level change work to have a bgworker that does its work
> and then exits.

Fully agree and I did not think about changing this behavior.

> IOW, from the perspective of this work, I prefer the
> idea of having one short-lived worker for one task over having one
> long-lived worker for multiple tasks.

Yeah, or one short-lived worker for multiple tasks could work too. It just 
starts when it has something to do and then exit.

> Reading that thread, while we
> need to resolve the XID wraparound issue for the work of removing
> logical decoding related files, the work of removing temporary files
> seems to fit a short-lived worker style. So I thought as one of the
> directions, it might be worth considering to have an infrastructure
> where we can launch a bgworker just for one task, and we implement the
> online wal_level change and temporary files removal on top of it.

Yeap, that was exactly my point when I mentioned the custodian thread (taking
into account Tom's comment quoted above).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Virtual generated columns

2025-02-14 Thread Peter Eisentraut


On 13.02.25 14:06, jian he wrote:

I didn't solve the out join semantic issue.
i am wondering, can we do the virtual generated column expansion in
the rewrite stage as is,
and wrap the expressions in PHVs if the virtual generated
columns come from the nullable side of an outer join.


PlaceHolderVar looks like a fitting mechanism for this.  But it's so far 
a planner node, so it might take some additional consideration if we 
want to expand where it's used.


Maybe a short-term fix would be to error out if we find ourselves about 
to expand a Var with varnullingrels != NULL.  That would mean you 
couldn't use a virtual generated column on the nullable output side of 
an outer join, which is annoying but not fatal, and we could fix it 
incrementally later.

Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-02-14 Thread Ranier Vilela

Hi Álvaro.

Em qui., 13 de fev. de 2025 às 18:38, Álvaro Herrera <
alvhe...@alvh.no-ip.org> escreveu:

> On 2025-Feb-13, Ranier Vilela wrote:
>
> > Hi.
> >
> > Coverity complained about possible dereference null pointer
> > in *reindex_one_database* function.
> > That's not really true.
> > But the logic is unnecessarily complicated.
>
> Hmm, this code looks quite suspect, but I wonder if instead of (what
> looks more or less like) a straight revert of cc0e7ebd304a as you
> propose, a better fix wouldn't be to split get_parallel_object_list in
> two: get_parallel_table_list for the DATABASE and SCHEMA cases, and
> get_parallel_tabidx_list (or whatever) for the INDEX case.  In the first
> case we just return a list of values, but in the latter case we also
> meddle with the input list which becomes an output list ...
>
Sure, I'll try to do it.

best regards,
Ranier Vilela

Re: pg17.3 PQescapeIdentifier() ignores len

2025-02-14 Thread Ranier Vilela

Em qui., 13 de fev. de 2025 às 16:00, Tom Lane  escreveu:

> Justin Pryzby  writes:
> > The fprintf suggests that since 5dc1e42b4 PQescapeIdentifier ignores its
> len.
>
> Ugh, yes.  Need something like the attached.
>
> FTR, 5dc1e42b4 et al were quite subtle patches done under extreme time
> pressure.  I wonder if they have any other issues.  More eyes on those
> patches would be welcome, now that they are public.
>
Passes on standard tests at Windows 64 bits, msvc 2022 64 bits.

best regards,
Ranier Vilela

Re: Update Unicode data to Unicode 16.0.0

2025-02-14 Thread Peter Eisentraut


On 05.02.25 22:47, Jeff Davis wrote:

   (b) we should make reasonable attempts to mitigate potential
problems.

One idea for (b) resurfaced, which was to make a best-effort check at
pg_upgrade time for affected indexes. The check would not be
bulletproof, because we can't catch dependencies that are hidden inside
SPI (e.g. a plpgsql function that calls LOWER()), but it would catch
most potential problems.

Patch attached. A few notes:

   * The dependency entries don't exist because LOWER(), etc., are
system objects (pinned); so it queries the indexprs, indpreds,
partexprs, and conbin.
   * The query is large and perhaps too clever, but it seems to work. I
tried to add inline comments to the SQL, and pgindent had its own ideas
about how to format them -- suggestions welcome.
   * We haven't actually done the Unicode update yet, so it will notice
that the PG17 and PG18 Unicode versions are the same, and return early.
Either apply on top of the Unicode update patch, or comment out the
early return for testing.
   * It emits a warning rather than an error, so you need to specify
pg_upgrade with "-r" to see the output file.
   * I didn't adapt the query to run on pre-17 versions, even though it
could find some potential problem cases (like an index on NORMALIZE()).
I can add that if someone thinks it's worthwhile.


This all looks quite reasonable to me.  The code could obviously use a 
bit closer inspection, but the way you've structured it looked quite 
okay to me.

Re: pgbench without dbname worked differently with psql and pg_dump

2025-02-14 Thread Ashutosh Bapat

On Wed, Jan 22, 2025 at 7:24 AM Hayato Kuroda (Fujitsu)
 wrote:
> > ```
>
> Thanks for the confirmation. I found below executables seem to have the same
> logic to decide the dbname:
>
> - pg_amcheck

The documentation just says "If no such options are present, the
default database will be checked.". But it does not mention what the
default database is. So possibly we need to mention what the default
database is.

> - clusterdb
> - reindexdb
> - vacuumdb
>

I confirm that all these utilities need a documentation update.

Regarding the patch 0002: I would prefer to keep the old construct
i.e. "If that is not set, the current system user name is used as the
database name.". But I think your construct is acceptable too; the app
has to connect to the database before doing anything.

vaccumdb changes need a fix - s/d atabase/database/

Interestingly, all these utilities use the value of PGUSER as the
databasename if it exists. I think the doc change needs to reflect
that as well.

>- pgbench

This is the more interesting part and probably changes the perspective
on everything. Digging up history shows that when
412893b4168819955e9bf428036cd95b0832d657 was committed, we really used
the username (variable name login at that time) as the database name.
The username itself defaulted to PGUSER.
f1516ad7b3a9645a316846fa7b2229163bc55907 changed the code to use only
PGUSER and didn't use username at all. That's where I think the
document and the behaviour went out of sync. We should be fixing the
behaviour change caused by f1516ad7b3a9645a316846fa7b2229163bc55907,
which your 0001 tries to do. However, it should check username before
checking existence of PGUSER. It assumes that doConnect() and its
minions would use PGUSER if username is not specified, which is what
412893b4168819955e9bf428036cd95b0832d657 did as well. So it looks
good. I am adding Michael Pacquire and Kota Miyake to this thread for
their opinions.

This makes me wonder whether the real intention of code in clusterdb,
reindexdb and vacuumdb is to use the connection user name as the
database name as the document suggests. But then the current code is
very old - like 22 years old or something. So we should just fix the
documentation. It's quite possible that all the users just pass -d.
That explains why nobody noticed this descripancy for last 22 odd
years. Or may be we are missing something ... .

--
Best Wishes,
Ashutosh Bapat

Re: Get rid of WALBufMappingLock

2025-02-14 Thread Alexander Korotkov

On Fri, Feb 14, 2025 at 11:45 AM Pavel Borisov  wrote:
> On Fri, 14 Feb 2025 at 00:59, Alexander Korotkov  wrote:
> > On Thu, Feb 13, 2025 at 6:39 PM Pavel Borisov  
> > wrote:
> > > On Thu, 13 Feb 2025 at 14:08, Alexander Korotkov  
> > > wrote:
> > > >
> > > > On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov 
> > > >  wrote:
> > > > > 13.02.2025 12:34, Alexander Korotkov пишет:
> > > > > > On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov 
> > > > > >  wrote:
> > > > > >> 08.02.2025 13:07, Alexander Korotkov пишет:
> > > > > >>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov 
> > > > > >>>  wrote:
> > > > >  Good, thank you.  I think 0001 patch is generally good, but 
> > > > >  needs some
> > > > >  further polishing, e.g. more comments explaining how does it 
> > > > >  work.
> > > > > >>
> > > > > >> I tried to add more comments. I'm not good at, so recommendations 
> > > > > >> are welcome.
> > > > > >>
> > > > > >>> Two things comes to my mind worth rechecking about 0001.
> > > > > >>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
> > > > > >>> XLogCtl->xlblocks always page-aligned?  Because algorithm seems 
> > > > > >>> to be
> > > > > >>> sensitive to that.  If so, I would propose to explicitly comment 
> > > > > >>> that
> > > > > >>> and add corresponding asserts.
> > > > > >>
> > > > > >> They're certainly page aligned, since they are page borders.
> > > > > >> I added assert on alignment of InitializeReserved for the sanity.
> > > > > >>
> > > > > >>> 2) Check if there are concurrency issues between
> > > > > >>> AdvanceXLInsertBuffer() and switching to the new WAL file.
> > > > > >>
> > > > > >> There are no issues:
> > > > > >> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie 
> > > > > >> uses
> > > > > >> GetXLogBuffer to zero-out WAL page.
> > > > > >> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion 
> > > > > >> locks,
> > > > > >> so switching wal is not concurrent. (Although, there is no need in 
> > > > > >> this
> > > > > >> exclusiveness, imho.)
> > > > > >
> > > > > > Good, thank you.  I've also revised commit message and comments.
> > > > > >
> > > > > > But I see another issue with this patch.  In the worst case, we do
> > > > > > XLogWrite() by ourselves, and it could potentially could error out.
> > > > > > Without patch, that would cause WALBufMappingLock be released and
> > > > > > XLogCtl->InitializedUpTo not advanced.  With the patch, that would
> > > > > > cause other processes infinitely waiting till we finish the
> > > > > > initialization.
> > > > > >
> > > > > > Possible solution would be to save position of the page to be
> > > > > > initialized, and set it back to XLogCtl->InitializeReserved on error
> > > > > > (everywhere we do LWLockReleaseAll()).  We also must check that on
> > > > > > error we only set XLogCtl->InitializeReserved to the past, because
> > > > > > there could be multiple concurrent failures.  Also we need to
> > > > > > broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.
> > > > >
> > > > > The single place where AdvanceXLInsertBuffer is called outside of 
> > > > > critical
> > > > > section is in XLogBackgroundFlush. All other call stacks will issue 
> > > > > server
> > > > > restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer.
> > > > >
> > > > > XLogBackgroundFlush explicitely avoids writing buffers by passing
> > > > > opportunistic=true parameter.
> > > > >
> > > > > Therefore, error in XLogWrite will not cause hang in 
> > > > > AdvanceXLInsertBuffer
> > > > > since server will shutdown/restart.
> > > > >
> > > > > Perhaps, we just need to insert `Assert(CritSectionCount > 0);` 
> > > > > before call
> > > > > to XLogWrite here?
> > > >
> > > > You're correct.  I just reflected this in the next revision of the 
> > > > patch.
> > >
> > > I've looked at the patchset v6.
> >
> > Oh, sorry, I really did wrong.  I've done git format-patch for wrong
> > local branch for v5 and v6.  Patches I've sent for v5 and v6 are
> > actually the same as my v1.  This is really pity.  Please, find the
> > right version of patchset attached.
>
> I've rechecked v7. In v6 a proposal from [1] was not reflected. Now it
> landed in v7.
>
> Other changes are not regarding code behavior. The things from my
> previous review that still could apply to v7:
>
> For 0001:
>
> Comment change proposed:
> "lock-free with cooperation with" -> "lock-free accompanied by changes
> to..." (maybe other variant)

Good catch.  I've rephrased this comment even more.

> I propose a new define:
> #define FirstValidXLogRecPtr 1
> While FirstValidXLogRecPtr = InvalidXLogRecPtr + 1 is true in the code
> that has no semantical meaning and it's better to avoid using direct
> arithmetics to relate meaning of FirstValidXLogRecPtr from
> InvalidXLogRecPtr.

Makes sense, but I'm not sure if this change is required at all.  I've
reverted this to the state of master, and everything seems to work.

> For

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-02-14 Thread Shubham Khanna

On Fri, Feb 14, 2025 at 4:57 PM Shlok Kyal  wrote:
>
> On Fri, 14 Feb 2025 at 10:50, Shubham Khanna
>  wrote:
> >
> > On Thu, Feb 13, 2025 at 5:36 PM Shlok Kyal  wrote:
> > >
> > > On Thu, 13 Feb 2025 at 15:20, Shubham Khanna
> > >  wrote:
> > > >
> > > > On Tue, Feb 11, 2025 at 9:56 PM Shlok Kyal  
> > > > wrote:
> > > > >
> > > > > On Tue, 11 Feb 2025 at 09:51, Shubham Khanna
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Feb 7, 2025 at 7:46 AM Hayato Kuroda (Fujitsu)
> > > > > >  wrote:
> > > > > > >
> > > > > > > Dear Shubham,
> > > > > > >
> > > > > > > Thanks for updating the patch.
> > > > > > >
> > > > > > > Previously you told that you had a plan to extend the patch to 
> > > > > > > drop other replication
> > > > > > > objects [1], but I think it is not needed. pg_createsubscriber 
> > > > > > > has already been
> > > > > > > able to drop the existing subscrisubscriptions in 
> > > > > > > check_and_drop_existing_subscriptions().
> > > > > > > As for the replication slot, I have told in [2], it would be 
> > > > > > > created intentionally
> > > > > > > thus I feel it should not be dropped.
> > > > > > > Thus I regard the patch does not have concrete extending plan.
> > > > > > >
> > > > > > > Below part contains my review comment.
> > > > > > >
> > > > > > > 01. Option name
> > > > > > >
> > > > > > > Based on the above discussion, "--cleanup-publisher-objects" is 
> > > > > > > not suitable because
> > > > > > > it won't drop replication slots. How about 
> > > > > > > "--cleanup-publications"?
> > > > > > >
> > > > > >
> > > > > > I have changed the name of the option  to 
> > > > > > "--cleanup-existing-publications"
> > > > > >
> > > > > > > 02. usage()
> > > > > > > ```
> > > > > > > +   printf(_("  -C  --cleanup-publisher-objects drop all 
> > > > > > > publications on the logical replica\n"));
> > > > > > > ```
> > > > > >
> > > > > > Fixed.
> > > > > >
> > > > > > > s/logical replica/subscriber
> > > > > > >
> > > > > > > 03. drop_all_publications
> > > > > > > ```
> > > > > > > +/* Drops all existing logical replication publications from all 
> > > > > > > subscriber
> > > > > > > + * databases
> > > > > > > + */
> > > > > > > +static void
> > > > > > > ```
> > > > > > >
> > > > > > > Initial line of the comment must be blank [3].
> > > > > > >
> > > > > >
> > > > > > Removed this function.
> > > > > >
> > > > > > > 04. main
> > > > > > > ```
> > > > > > > +   {"cleanup-publisher-objects", no_argument, NULL, 
> > > > > > > 'C'},
> > > > > > > ```
> > > > > > >
> > > > > > > Is there a reason why upper case is used? I feel lower one is 
> > > > > > > enough.
> > > > > > >
> > > > > >
> > > > > > Fixed.
> > > > > >
> > > > > > > 05. main
> > > > > > > ```
> > > > > > > +   /* Drop publications from the subscriber if requested */
> > > > > > > +   if (opt.cleanup_publisher_objects)
> > > > > > > +   drop_all_publications(dbinfo);
> > > > > > > ```
> > > > > > >
> > > > > > > After considering more, I noticed that we have already called 
> > > > > > > drop_publication()
> > > > > > > in the setup_subscriber(). Can we call drop_all_publications() 
> > > > > > > there instead when
> > > > > > > -C is specified?
> > > > > > >
> > > > > >
> > > > > > I agree with you on this. I have removed the drop_all_publication()
> > > > > > and added the "--cleanup-existing-publications" option to the
> > > > > > drop_publication()
> > > > > >
> > > > > > > 06. 040_pg_createsubscriber.pl
> > > > > > >
> > > > > > > ```
> > > > > > > +$node_s->start;
> > > > > > > +# Create publications to test it's removal
> > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL 
> > > > > > > TABLES;");
> > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL 
> > > > > > > TABLES;");
> > > > > > > +
> > > > > > > +# Verify the existing publications
> > > > > > > +my $pub_count_before =
> > > > > > > +  $node_s->safe_psql($db1, "SELECT COUNT(*) FROM 
> > > > > > > pg_publication;");
> > > > > > > +is($pub_count_before, '2',
> > > > > > > +   'two publications created before 
> > > > > > > --cleanup-publisher-objects is run');
> > > > > > > +
> > > > > > > +$node_s->stop;
> > > > > > > ```
> > > > > > >
> > > > > > > I feel it requires unnecessary startup and shutdown. IIUC, 
> > > > > > > creating publications and check
> > > > > > > counts can be before stopping the node_s, around line 331.
> > > > > > >
> > > > > >
> > > > > > Fixed.
> > > > > >
> > > > > > > 07. 040_pg_createsubscriber.pl
> > > > > > >
> > > > > > > ```
> > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL 
> > > > > > > TABLES;");
> > > > > > > +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL 
> > > > > > > TABLES;");
> > > > > > > +
> > > > > > > +# Verify the existing publications
> > > > > > > +my $pub_count_before =
> > > > > > > +  $node_s->safe_psql($db1, "SELECT COUNT(*) FROM 
> > > > > > > pg_publication;");
> > > > > > > +

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-14 Thread Ashutosh Bapat

Hi Shubham,
Here are some review comments from my side

On Thu, Feb 13, 2025 at 8:58 AM Shubham Khanna
 wrote:
> The attached patch contains the required changes.
>

clusterdb, vacuumdb use -a and -all for all databases. Do we want to
use the same long option name here? Probably they don't have
-databases in the long option name since db is already in their name.
pg_createsubscriber doesn't have db in its name. But still it might be
better to have just -all to be consistent with those utilities. Sorry
if this has already been discussed.

+ printf(_(" -a, --all-databases create subscriptions on the target
for all source databases\n"));

Suggest "create subscriptions for all non-template source databases".
"on the target" seems unnecessary, it's implicit.

+
+# run pg_createsubscriber with '--database' and '--all-databases' and
verify the
+# failure
+command_fails_like(
+ [
+ 'pg_createsubscriber',
+ '--verbose',
+ '--dry-run',
+ '--pgdata' => $node_s->data_dir,
+ '--publisher-server' => $node_p->connstr($db1),
+ '--socketdir' => $node_s->host,
+ '--subscriber-port' => $node_s->port,
+ '--database' => $db1,
+ '--all-databases',
+ ],
+ qr/--all-databases cannot be used with --database/,
+ 'fail if --all-databases is used with --database');

Need a test where --all-databases is specified before --database.

All the tests run in --dry-mode which won't test whether all the
replication slots, subscriptions and publications are working well
when --all-databases is specified. I think we need to test non-dry
mode creation. I may go as far as suggesting to split the current test
file into 3, 1. tests sanity of command line options 2. tests single
subscription creation thoroughly along with multiple --database
specifications 3. tests --all-databases health of the subscriptions
and --all-databases specific scenarios like non-template databases
aren't subscribed to.

-- 
Best Wishes,
Ashutosh Bapat

Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-02-14 Thread Nisha Moond

Please find the updated v78 patches after a few off-list review rounds.

Here is a summary of changes in v78:
patch-001:
- Fixed bugs reported by Hou-san and Peter in [1] and [2].
- Fixed a race condition reported by Hou-san off-list, which could
lead to an assert failure.
This failure happens when the checkpointer sets the invalidation cause
to idle_timeout on the first attempt, but if it later finds another
process's pid active for the slot, it retries after terminating that
process. By then, inactive_since may have been updated, and it
determines the invalidation_cause as RS_INVAL_NONE and below assert
fails:

```
Assert(!(invalidation_cause_prev != RS_INVAL_NONE && terminated &&
invalidation_cause_prev != invalidation_cause));
```

- Moved the slot's idle_time calculation to the caller of
ReportSlotInvalidation().
- Improved the patch commit message for better clarity.

patch-002:
- Fixed a bug reported by Kuroda-san - "check_extension() must be done
before the CREATE EXTENSION".
- Addressed a few other comments by Peter and Kuroda-san to optimize
code and improve comments.

[1] 
https://www.postgresql.org/message-id/CABdArM7eeejXEgd6t4wtBiK%3DaWc%2B%2Bgt1__WwAWm-Y_5xMVskWg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPtnWyOMvxb6mZHWFxqD-NdHuYL8Zp%3D-QasAQ3VvxauiMA%40mail.gmail.com

--
Thanks,
Nisha


v78-0001-Introduce-inactive_timeout-based-replication-slo.patch
Description: Binary data


v78-0002-Add-TAP-test-for-slot-invalidation-based-on-inac.patch
Description: Binary data

Re: BackgroundPsql swallowing errors on windows

2025-02-14 Thread Andrew Dunstan




On 2025-02-13 Th 12:39 PM, Andres Freund wrote:

Hi,

One of the remaining tasks for AIO was to convert the tests to be proper tap
tests.  I did that and was thanked by the tests fairly randomly failing on
windows. Which test fails changes from run to run.

The symptom is that BackgroundPsql->query() sometimes would simply swallow
errors that were clearly generated by the backend. Which then would cause
tests to fail, because testing for errors was the whole point of $test.


At first I thought the issue was that psql didn't actually flush stderr after
displaying errors. And while that may be an issue, it doesn't seem to be the
one causing problems for me.

Lots of hair-pulling later, I have a somewhat confirmed theory for what's
happening:

BackgroundPsql::query() tries to detect if the passed in query has executed by
adding a "banner" after the query and using pump_until() to wait until that
banner has been "reached".  That seems to work reasonably well on !windows.

On windows however, it looks like there's no guarantee that if stdout has been
received by IPC::Run, stderr also has been received, even if the stderr
content has been generated first. I tried to add an extra ->pump_nb() call to
query(), thinking that maybe IPC::Run just didn't get input that had actually
arrived, due to waiting on just one pipe. But no success.

My understanding is that IPC::Run uses a proxy process on windows to execute
subprocesses and then communicates with that over TCP (or something along
those lines).  I suspect what's happening is that the communication with the
external process allows for reordering between stdout/stderr.

And indeed, changing BackgroundPsql::query() to emit the banner on both stdout
and stderr and waiting on both seems to fix the issue.


One complication is that I found that just waiting for the banner, without
also its newline, sometimes lead to unexpected newlines causing later queries
to fail. I think that happens if the trailing newline is read separately from
the rest of the string.

However, matching the newlines caused tests to fail on some machines. After a
lot of cursing I figured out that for interactive psql we output \r\n, causing
my regex match to fail. I.e. tests failed whenever IO::PTY was availble...

It's also not correct, as we did before, to just look for the banner, it has
to be anchored to either the start of the output or a newline, otherwise the
\echo (or \warn) command itself will be matched by pump_until() (but then the
replacing the command would fail). Not sure that could cause active problems
without the addition of \warn (which is also echoed on stdout), but it
certainly could after.


The banner being the same between queries made it hard to understand if a
banner that appeared in the output was from the current query or a past
query. Therefore I added a counter to it.


For debugging I added a "note" that shows stdout/stderr after executing the
query, I think it may be worth keeping that, but I'm not sure.


This was a rather painful exercise.





It's been discussed before, but I'd really really like to get rid of 
BackgroundPsql. It's ugly, non-intuitive and fragile.


Last year I did some work on this. I was going to present it at Athens 
but illness prevented me, and then other life events managed to get in 
my way. But the basic work is around. See 
 
This introduces a libpq session object (PostgreSQL::Test::Session) which 
can be backed either by FFI or a small XS wrapper - the commit has 
recipes for both. Missing is a meson.build file for the XS wrapper. 
There are significant performance gains to be had too (poll_query_until 
is much nicer, for example, as are most uses of safe_psql). If there is  
interest I will bring the work up to date, and maybe we can introduce 
this early in the v19 cycle. It would significantly reduce our 
dependency on IPC::Run, especially the pump() stuff.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Sample rate added to pg_stat_statements

2025-02-14 Thread Ilia Evdokimov


Hi hackers,

I've decided to explore a slightly different approach to reducing 
spinlock contention—by introducing a simple execution time threshold. If 
a query’s execution time exceeds this threshold, it is recorded in 
pg_stat_statements; otherwise, it is ignored. As Alexander [0] pointed 
out, this helps retain valuable queries for further analysis. A similar 
mechanism is already present in auto_explain and pg_store_plans. When 
pg_stat_statements.track_min_duration = -1, disable tracking. If 
pg_stat_statements.track_min_duration = -1, all statements are tracked.


I benchmarked this approach using -M prepared -S on my machine with 48 
CPUs. However, I couldn’t reproduce spinlock contention because the 
machine isn’t large enough to create sufficient concurrency. 
Nevertheless, I’m sharing my results for reference and checking correct 
results of threshold.


Here’s the benchmarking procedure I followed:
    createdb pgbench
    pgbench -i -s 3000 pgbench
    psql -c 'SELECT pg_stat_statements_reset()'
    pgbench -c 46 -j 46 -T 120 -M prepared -S --progress=10 pgbench

select query, calls, min_exec_time, max_exec_time, mean_exec_time, 
stddev_exec_time from pg_stat_statements where query = 'SELECT abalance 
FROM pgbench_accounts WHERE aid = $1';


track_min_duration |     calls | min_exec_time | max_exec_time |       
mean_exec_time | stddev_exec_time
 0 | 111282955 |   0.00365 |      15.56946 | 
0.015042374707317802 | 0.06067634978916631
 5 |   458 |   5.00627 |      15.699129 |    
5.962879746724887 | 1.1432124887616204
    10 |    14 |  10.538461 |      16.113204 |   
12.4152189 | 1.5598854455354354
    20 | - |          - |              - |     
               - | -
    -1 | - |          - |              - |     
               - | -


I’d greatly appreciate any feedback on this alternative approach, as 
well as benchmarking on a pretty large machine to see its impact at scale.


[0]: 
https://www.postgresql.org/message-id/CAPpHfdsTKAQqC3A48-MGQhrhfEamXZPb64w%3Dutk7thQcOMNr7Q%40mail.gmail.com


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From aaf82d481aeb40458dfd7debfb3bcd254f33cdee Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Fri, 14 Feb 2025 15:40:36 +0300
Subject: [PATCH v16] Allow setting execution time threshold for
 pg_stat_statements

New configuration parameter pg_stat_statements.track_min_duration makes it
possible to track statements whose execution time is greater than threshold,
to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/duration.out  | 201 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  79 ---
 contrib/pg_stat_statements/sql/duration.sql   |  58 +
 doc/src/sgml/pgstatstatements.sgml|  21 ++
 6 files changed, 332 insertions(+), 30 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/duration.out
 create mode 100644 contrib/pg_stat_statements/sql/duration.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..4b2a59d7c3 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
 	user_activity wal entry_timestamp privileges extended \
-	parallel cleanup oldextversions
+	parallel duration cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/duration.out b/contrib/pg_stat_statements/expected/duration.out
new file mode 100644
index 00..8bbd4b0183
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/duration.out
@@ -0,0 +1,201 @@
+--
+-- track_min_duration
+--
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+ pg_stat_statements.track 
+--
+ top
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+---+---
+(0 rows)
+
+SET pg_stat_statements.track_min_duration = 0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT $1 AS "int" | 1
+(1 row)
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.track_min_duration = -

Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-02-14 Thread Ranier Vilela

Em sex., 14 de fev. de 2025 às 09:13, Ranier Vilela 
escreveu:

> Hi Álvaro.
>
> Em qui., 13 de fev. de 2025 às 18:38, Álvaro Herrera <
> alvhe...@alvh.no-ip.org> escreveu:
>
>> On 2025-Feb-13, Ranier Vilela wrote:
>>
>> > Hi.
>> >
>> > Coverity complained about possible dereference null pointer
>> > in *reindex_one_database* function.
>> > That's not really true.
>> > But the logic is unnecessarily complicated.
>>
>> Hmm, this code looks quite suspect, but I wonder if instead of (what
>> looks more or less like) a straight revert of cc0e7ebd304a as you
>> propose, a better fix wouldn't be to split get_parallel_object_list in
>> two: get_parallel_table_list for the DATABASE and SCHEMA cases, and
>> get_parallel_tabidx_list (or whatever) for the INDEX case.  In the first
>> case we just return a list of values, but in the latter case we also
>> meddle with the input list which becomes an output list ...
>>
> Sure, I'll try to do it.
>
Attached is the prototype version v1.
What do you think?

best regards,
Ranier Vilela


v1-simplifies-reindex-one-database-reindexdb.patch
Description: Binary data

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-02-14 Thread Tomas Vondra

On 2/14/25 18:31, Melanie Plageman wrote:
> io_combine_limit 1, effective_io_concurrency 16, read ahead kb 16
> 
> On Fri, Feb 14, 2025 at 12:18 PM Tomas Vondra  wrote:
>>
>> Based on off-list discussion with Melanie, I ran a modified version of
>> the benchmark, with these changes:
> 
> Thanks! It looks like the worst offender is io_combine_limit 1 (128
> kB), effective_io_concurrency 16, read_ahead_kb 16. This is a
> combination that people shouldn't run in practice I think -- an
> io_combine_limit larger than read ahead.
> 
> But we do still see regressions with io_combine_limit 1,
> effective_io_concurrency 16 at other read_ahead_kb values. Therefore
> I'd be interested to see that subset (ioc 1, eic 16, diff read ahead
> values) with the attached patch.
> 
>> FWIW this does not change anything in the detection of sequential access
>> patterns, discussed nearby, because the benchmarks started before Andres
>> looked into that. If needed, I can easily rerun these tests, I just need
>> a patch to apply.
> 
> Attached a patch that has all the commits squashed + removes
> sequential detection.
> 

OK, it's running.

I didn't limit the benchmarks any further, it seems useful to have
"full" comparison with the last run. I expect to have the results in ~48
hours, i.e. by Monday.


regards

-- 
Tomas Vondra

Re: BackgroundPsql swallowing errors on windows

2025-02-14 Thread Jacob Champion

On Fri, Feb 14, 2025 at 8:53 AM Andres Freund  wrote:
> commit 70291a3c66e
> Author: Michael Paquier 
> Date:   2024-11-07 12:11:27 +0900
>
> Improve handling of empty query results in BackgroundPsql::query()
>
> commit ba08edb0654
> Author: Michael Paquier 
> Date:   2024-11-06 15:31:14 +0900
>
> Extend Cluster.pm's background_psql() to be able to start asynchronously
>
>
> Particularly the former makes it hard to backpatch, as it's a behavioural
> difference that really interacts with the problems described in this thread.
>
> Michael, Jacob, thoughts?

I think both should be backpatchable without too much risk, though
it's possible that there are more useless ok() calls in back branches
that would need to be touched when the first patch goes back. If we're
concerned about the second for any reason, the only conflicting part
should be the name and documentation of wait_connect, right?

--Jacob

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Melanie Plageman

On Thu, Feb 13, 2025 at 9:06 PM Melanie Plageman
 wrote:
>
> On Thu, Feb 13, 2025 at 8:30 PM Masahiko Sawada  wrote:
> > The rest looks good to me.
>
> Cool! I'll plan to push this tomorrow barring any objections.

I've committed this and marked it as such in the CF app.

- Melanie

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-02-14 Thread Tomas Vondra

On 2/14/25 18:14, Andres Freund wrote:
> Hi,
> 
> On 2025-02-14 10:04:41 +0100, Jakub Wartak wrote:
>> Is there any reason we couldn't have new pg_test_iorates (similiar to
>> other pg_test_* proggies), that would literally do this and calibrate
>> best e_io_c during initdb and put the result into postgresql.auto.conf
>> (pg_test_iorates --adjust-auto-conf) , that way we would avoid user
>> questions on how to come with optimal value?
> 
> Unfortunately I think this is a lot easier said than done:
> 
> - The optimal depth depends a lot on your IO patterns, there's a lot of things
>   between fully sequential and fully random.
> 
>   You'd really need to test different IO sizes and different patterns. The
>   test matrix for that gets pretty big.
> 
> - The performance characteristics of storage heavily changes over time.
> 
>   This is particularly true in cloud environments, where disk/VM combinations
>   will be "burstable", allowing higher throughput for a while, but then not
>   anymore.  Measureing during either of those states will not be great for the
>   other state.
> 
> - e_io_c is per-query-node, but impacts the whole system. If you set
>   e_io_c=1000 on a disk with a metered IOPS of say 1k/s you might get a
>   slightly higher throughput for a bitmap heap scan, but also your commit
>   latency in concurrent will go through the roof, because your fdatasync()
>   will be behind a queue of 1k reads that all are throttled.
> 

All of this is true, ofc, but maybe it's better to have a tool providing
at least some advice? I'd imagine pg_test_fsync is affected by many of
those issues too (considering both are about I/O).

I'd definitely not want initdb to do this automatically, though. Getting
good numbers is fairly expensive (in time and I/O), can be flaky, etc.

But maybe having a tool that gives you a bunch of numbers, as input for
manual tuning, would be good enough?

As you say, it's not just about the hardware (and how that changes over
time because of "burst" credits etc.), but also about the workload.
Would it be possible to track something, and adjust this dynamically
over time? And then adjust the prefetch distance in some adaptive way?

Perhaps it's a bit naive, but say we know what % of requests is handled
from cache, how long it took, etc. We opportunistically increase the
prefetch distance, and check the cache hit ratio after a while. Did it
help (% increased, consumed less time), maybe try another step. If not,
maybe try prefetching less?

regards

-- 
Tomas Vondra

Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> Introduce an INSERT FROZEN feature to bypass vacuum processing for
> large-scale cold data imports, reducing the impact on system performance
> post-import. For large imports, migrations and major version upgrades.
> Business Use-case:
> 
> When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
> loaded server, we observed that the system struggled significantly weeks
> later when the autovacuum process had to freeze all the imported data
> pages. This led to severe performance degradation, requiring manual
> intervention to prioritize vacuum jobs to complete them as quickly as
> possible.

What version of postgres was this?  What batch sizes do you need to support?
I.e. is all of this data inserted at once, or in steps?

As already discussed, it seems unlikely that we'll ever support INSERT FROZEN,
due to the potential of causing concurrent queries to give bogus answers. But
there's actually a lot we can do to improve this short of INSERT FROZEN.

The reason I asked for the version is that the behaviour would e.g. likely be
worse before autovacuum_vacuum_insert_scale_factor existed. We are working on
improvements around that in 18 too, ensuring that the gap between insert
triggered vacuums does not grow forever.

Several recent releases have also improved the situation around this in other
ways, e.g. by just making vacuuming faster and by avoiding doing redundant
work in more cases (by increasing relfrozenzid more aggressively).

We've also been talking about performing freezing during e.g. checkpoints, if
possible.

If you're inserting all the data in a single transaction however, it'll be
hard to improve most of this, because while that long long transaction runs,
we can't do anything that needs to know the transaction has finished. OTOH,
if it were a single transaction, you could already use COPY FREEZE.

A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently
does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL
to be emitted compared to inserting the same data via COPY.

> This issue is particularly critical during database *migrations* or *version
> upgrades*, where a full data reload is often necessary. Each time a major
> PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> the same problem of vacuum storms post-import. An INSERT FROZEN feature
> would allow importing data that is known to be immutable, preventing
> unnecessary vacuum overhead and reducing system strain.

What are you using for such upgrades or migrations? I'd not expect INSERT to
be used, due to the overhead that has compared to COPY.

Greetings,

Andres Freund

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Sami Imseih

> > I do see the discussion here [1], sorry for not noticing it.
> >
> > I am not sure about this though. At minimum this needs to be documented,
> > However, I think the  prepared statement case is too common of a case to
> > skip for the first release of tis feature, and users that will likely
> > benefit from this feature are using prepared statements ( i.e. JDBC, etc ).
>
> Right, prepared statements are quite common case. This would be the
> first thing I'll take on in the case if this patch will find it's way
> into the release. As you can see it's not at all obvious that that will
> happen, I estimate chances for that to be higher if moving in smaller
> steps.

I think it will be sad to not include this very common case from
the start, because it is going to be one of the most common
cases.

Wouldn't doing something like this inside IsMergeableConst
"""
if (!IsA(arg, Const) && !IsA(arg, Param))
"""

instead of
"""
if (!IsA(arg, Const))
"""

be sufficient?

> > I can't think of other cases beyond ArrayExpr where this will be needed.
> > The node that could use this will need to carry constants, but ArrayExpr
> > is the only case I can think of in which this will be useful for jumbling.
> > There should be a really good reason IMO to do something other than the
> > existing pattern of using custom_query_jumble.
>
> Well, there are plenty expression nodes that have lists in them, maybe
> more will be added in the future. And as before, the idea of using
> pg_node_attr was a resonable suggestion from Michael Paquier on top of
> the original design (which indeed used custom jumble function for
> ArrayExpr).

OK. I don't necessarily agree with this, but it's been discussed [1] and
I will not push this point any further.

> > It's not a functionality regression as far as query execution
> > or pg_stat_statements counters go, but it is a regression as far as
> > displaying query text in pg_stat_statements. pg_stat_statements, unlike
> > pg_stat_acitivty, makes a guaranteee not to trim text as stated in the docs 
> > [2]
> > "The representative query texts are kept in an external disk file,
> > and do not consume shared memory. Therefore,
> > even very lengthy query texts can be stored successfully."
>
> Just to clarify, the part you reference doesn't say anything about
> trimming, doesn't it? In fact, the query text stored in
> pg_stat_statements might be as well somewhat different from one that was
> executed, due to similar queries having the same query_id and differ
> only in e.g. parenthesis.

I perhap meant "missing chunk" instead of "trimming". To me it just
looked like a trimmed text, which was wrong. Looks like v25
deals with that better at least. I am just not sure about all that we are doing
here as I believe it may open up big changes for bugs generating the normalized
query texts. I'm a bit worried about that. IMO, we are better off just
adding a comment
at the start of a query that this query text such as "/*
query_id_squash_values */"
and keeping all the parameter symbols in-place.

[1] https://www.postgresql.org/message-id/ZTmuCtymIS3n3fP_%40paquier.xyz

--

Sami

Re: Parameter binding for COPY TO queries

2025-02-14 Thread Tom Lane

Jens-Wolfhard Schicke-Uffmann  writes:
> I'd like some input on the idea of adding parameter binding support to
> queries executed as part of a COPY TO command. Is there a technical
> or philosophical reason why these queries should not contain bindable
> parameters?

It would require some rethinking of system structure.  The current
design is that plannable statements (SELECT/INSERT/UPDATE/DELETE/MERGE)
accept parameters while utility statements (everything else) don't.
This is not unrelated to the fact that plannable statements all go
through a standard parse analysis/plan/execute pipeline while
utility statements don't.

There are reasons to be skeptical of parameters in something like

ALTER TABLE t ADD COLUMN c integer DEFAULT $1;

one being that it feels a little action-at-a-distance-y for a Param's
value to become embedded in system catalogs (and indeed the user who
wrote that might not fully grasp when the Param is going to get
evaluated).  Another is that we just don't have any infrastructure
for passing such Params down to utility statement execution.

None of those arguments apply to the sub-SELECT of a "COPY (query) TO"
command, but there's a practical matter of how to get the parameters
passed through the COPY to the sub-SELECT without opening Pandora's
box for every other kind of utility statement.

Also, if we ever did want to open that box, would something we do
specially for COPY get in the way of such a larger redesign?

So I think an actual patch for this might not be terribly large,
but it'd require a fairly deep understanding of system structure
to propose something that doesn't create a mess.

regards, tom lane

Re: describe special values in GUC descriptions more consistently

2025-02-14 Thread Nathan Bossart

On Thu, Feb 13, 2025 at 05:01:59PM -0600, Nathan Bossart wrote:
> Okay, I took your suggestions in v7.

Committed.  Thanks, David, Peter, and Daniel!

-- 
nathan

Re: BackgroundPsql swallowing errors on windows

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote:
> It's been discussed before, but I'd really really like to get rid of
> BackgroundPsql. It's ugly, non-intuitive and fragile.

I agree, unfortunately we're stuck with this until we have a better
alternative in tree :(


> Last year I did some work on this. I was going to present it at Athens but
> illness prevented me, and then other life events managed to get in my way.
> But the basic work is around. See 
> 
> This introduces a libpq session object (PostgreSQL::Test::Session) which can
> be backed either by FFI or a small XS wrapper - the commit has recipes for
> both. Missing is a meson.build file for the XS wrapper. There are
> significant performance gains to be had too (poll_query_until is much nicer,
> for example, as are most uses of safe_psql). If there is  interest I will
> bring the work up to date, and maybe we can introduce this early in the v19
> cycle. It would significantly reduce our dependency on IPC::Run, especially
> the pump() stuff.

I definitely am interested.

Greetings,

Andres Freund

Re: BackgroundPsql swallowing errors on windows

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 13:35:40 +0100, Daniel Gustafsson wrote:
> > On 13 Feb 2025, at 18:39, Andres Freund  wrote:
>
> > The banner being the same between queries made it hard to understand if a
> > banner that appeared in the output was from the current query or a past
> > query. Therefore I added a counter to it.
>
> + my $banner = "background_psql: QUERY_SEPARATOR $query_cnt";
> + my $banner_match = qr/(^|\n)$banner\r?\n/;
> + $self->{stdin} .= "$query\n;\n\\echo $banner\n\\warn $banner\n";
> + pump_until(
> + $self->{run}, $self->{timeout},
> + \$self->{stdout}, qr/$banner_match/);
>
> Won't this allow "QUERY_SEPARATOR 11" to match against "QUERY_SEPARATOR 1"?
> It's probably only of academic interest but appending an end-of-banner
> character like "_" or something after the query counter should fix that.

You're right.  I went with ":".

Thanks for reviewing!

Updated patch attached.

I also applied similar changes to wait_connect(), it had the same issues as
query().  This mostly matters for interactive_psql() uses. The fact that we
matched on the \echo itself, lead to the first query() having additional query
output, along the lines of

  \echo background_psql: ready
  psql (18devel)
  Type "help" for help.

  postgres=# \echo background_psql: ready
  background_psql: ready

Which is rather confusing and can throw off checks based on the query results.

It does seem rather weird that psql outputs its input twice in this case, but
that's a separate issue.

I was thinking that this really ought to be backported, debugging failures due
to the set of fied bugs is really painful (just cost me 1 1/2 days), but
unfortunately there have been a bunch of recent changes that haven't been
backpatched:

commit 70291a3c66e
Author: Michael Paquier 
Date:   2024-11-07 12:11:27 +0900

Improve handling of empty query results in BackgroundPsql::query()

commit ba08edb0654
Author: Michael Paquier 
Date:   2024-11-06 15:31:14 +0900

Extend Cluster.pm's background_psql() to be able to start asynchronously

Particularly the former makes it hard to backpatch, as it's a behavioural
difference that really interacts with the problems described in this thread.

Michael, Jacob, thoughts?

Greetings,

Andres Freund
>From 0bea4887adabd458cdefa4d9fcb4c1a4baefc12a Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 13 Feb 2025 10:09:49 -0500
Subject: [PATCH v2] tests: BackgroundPsql: Fix potential for lost errors on
 windows

This addresses various corner cases in BackgroundPsql:

- On windows stdout and stderr may arrive out of order, leading to errors not
  being reported, or attributed to the wrong statement.

  To fix, emit the "query-separation banner" on both stdout and stderr and
  wait for both.

- Very occasionally the "query-separation banner" would not get removed, because
  we waited until the banner arrived, but then replaced the banner plus
  newline.

  To fix, wait for banner and newline.

- For interactive psql replacing $banner\n is not sufficient, interactive psql
  outputs \r\n.

- For interactive psql, where commands are echoed to stdout, the \echo
  command, rather than its output, would be matched.

  This would sometimes lead to output from the prior query, or wait_connect(),
  being returned in the next command.

  This also affected wait_connect(), leading to sometimes sending queries to
  psql before the connection actually was established.

While debugging these issues I also found that it's hard to know whether a
query separation banner was attributed to the right query. Make that easier by
counting the queries each BackgroundPsql instance has emitted and include the
number in the banner.

Also emit psql stdout/stderr in query() and wait_connect() as Test::More
notes, without that it's rather hard to debug some issues in CI and buildfarm.

As this can cause issues not just to-be-added tests, but also existing ones,
backpatch the fix to all supported versions.

Reviewed-by: Daniel Gustafsson 
Discussion: https://postgr.es/m/wmovm6xcbwh7twdtymxuboaoarbvwj2haasd3sikzlb3dkgz76@n45rzycluzft
Backpatch-through: 13
---
 .../perl/PostgreSQL/Test/BackgroundPsql.pm| 74 +++
 1 file changed, 59 insertions(+), 15 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index dbfd82e4fac..2fd44c1f867 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -89,7 +89,8 @@ sub new
 		'stdin' => '',
 		'stdout' => '',
 		'stderr' => '',
-		'query_timer_restart' => undef
+		'query_timer_restart' => undef,
+		'query_cnt' => 1,
 	};
 	my $run;

@@ -148,11 +149,24 @@ sub wait_connect
 	# connection failures are caught here, relieving callers of the need to
 	# handle those.  (Right now, we have no particularly good handling for
 	# errors anyway, but that might be added later.)
+	#
+	# See query() for details about why/how th

Re: NOT ENFORCED constraint feature

2025-02-14 Thread Ashutosh Bapat

On Thu, Feb 13, 2025 at 5:27 PM Álvaro Herrera  wrote:
>
> On 2025-Feb-13, Ashutosh Bapat wrote:
>
> > > So considering that, I think a three-state system makes more sense.
> > > Something like:
> > >
> > > 1) NOT ENFORCED -- no data is checked
> > > 2) NOT VALID -- existing data is unchecked, new data is checked
> > > 3) ENFORCED -- all data is checked
> > >
> > > Transitions:
> > >
> > > (1) - [ ALTER TABLE ... ALTER CONSTRAINT ... NOT VALID ] -> (2)
> >
> > Per your notation, this means the the constraint is not enforced but
> > new data is checked - that seems a contradiction, how would we check
> > the data when the constraint is not being enforced. Or do you suggest
> > that we convert a NOT ENFORCED constraint to ENFORCED as a result of
> > converting it to NOT VALID?
>
> I agree this one is a little weird.  For this I would have the command
> be
> ALTER TABLE ... ALTER CONSTRAINT ... ENFORCED NOT VALID
> this way it's explicit that what we want is flip the ENFORCED bit while
> leaving NOT VALID as-is.
>
> > > (2) - [ ALTER TABLE ... VALIDATE CONSTRAINT ... ] -> (3)
> >
> > As a result of this a not enforced constraint would turn into an
> > enforced constraint. The user might have intended to just validate the
> > data but not enforce it to avoid paying price for the checks on new
> > data.
>
> I'm not sure there's a use case for validating existing data without
> starting to enforce the constraint.  The data can become invalid
> immediately after you've run the command, so why bother?

Validating whole table at a time is cheaper than doing it for every
row as it appears. So the ability to validate data in batches at
regular intervals instead of validating every row has some
attractiveness, esp in huge data/analytics cases. And we could
implement it without much cost. But I don't have a concrete usecase.

>
> > I think, what you intend to say is clearer with 4 state system {NE, E}
> > * {NV, V} = {(NE, NV), (NE, V), (E, NV), (E, V)} where (NE, V) is
> > unreachable. Let's name them S1, S2, S3, S4 respectively.
> [...]
> > Notice that there are no edges to and from S2.
>
> So why list it as a possible state?

For the sake of combinatorics. :)

-- 
Best Wishes,
Ashutosh Bapat

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Julien Rouhaud

On Fri, Feb 14, 2025 at 03:56:32PM +0100, Dmitry Dolgov wrote:
> > On Fri, Feb 14, 2025 at 10:39:45PM GMT, Julien Rouhaud wrote:
> > There seems to be an off-by-1 error in parameter numbering when merging 
> > them.
>
> There are indeed three constants, but the second is not visible in the
> query text. Maybe makes sense to adjust the number in this case, let me
> try.

Thanks!
>
> > Note that the query text as-is can still be successfully be used in an 
> > EXPLAIN
> > (GENERIC_PLAN), but it might cause problem to third party tools that try to 
> > do
> > something smarter about the parameters.
>
> Since the normalized query will be a valid one now, I hope that such
> cases will be rare. On top of that it always will be option to not
> enable constants squashing and avoid any troubles.

It might not always be an option.  I have seen application that create
thousands of duplicated queryids just because they have a non deterministic
amount of parameters they put in such IN () clauses.  If that leads to a total
number of unique (dbid, userid, queryid, toplevel) too big for a reasonable
pg_stat_statements.max, they the only choice might be to enable the new merging
parameter or deactivating pg_stat_statements.

> Or do you have some
> particular scenario of what might be problematic?

I don't have a very specific scenario.  It's mostly for things like trying to
"un-jumble" a query, you may need to loop through the parameters and a missing
number could be problematic.  But since the overall number of parameters might
change from execution to execution that's probably the least of the problems to
deal with with this merging feature.

Re: SQLFunctionCache and generic plans

2025-02-14 Thread Alexander Pyhalov


Hi, folks.

I've looked through performance and found that most performance issues 
was caused by CachedPlanSource machinery itself. At least, a lot of it. 
And so decided to go along and implement plan cache for sql functions. 
I'm not sure that it's clean enough, but at least it seems to be 
working. While working at it I've found issues in 
RevalidateCachedQuery() and fixed them. What have changed:

- now plans are released after execution;
- revalidation now takes locks on analyzed_tree;
- query tree is copied prior to analyzing in RevalidateCachedQuery();
- queryTree_list in SQLFunctionCache is no necessary and so has gone.

Now sql functions plans are actually saved. The most of it is a 
simplified version of plpgsql plan cache. Perhaps, I've missed 
something.
We have some cases when we don't save CachedPlanSource for future use. 
One case is for trigger functions (pl_comp.c has some logic to handle 
them specially, and I didn't want to introduce it so far). Another (more 
interesting) issue is invalidation. SQL functions have a feature of 
rewriting query when targetlist doesn't match function call context. I 
haven't thought this through carefully during last patch version, but 
luckily got some tests, which showed this behavior. When compiled with 
RANDOMIZE_ALLOCATED_MEMORY, the following test case dumped core (because 
after invalidation executor got "name" fields, but expected text):


create table t (nspname text, tname  text);

CREATE OR REPLACE FUNCTION get_basic_attributes_from_pg_tables( 
_schemaname name, _tablename name)

 RETURNS TABLE(tname text, tablespace text, owner text)
 LANGUAGE sql
AS $function$
SELECT
schemaname || '.' || tablename AS "full name",
tablespace AS "tablespace",
tableowner AS "tableowner"
FROM pg_tables
WHERE pg_tables.schemaname = _schemaname AND pg_tables.tablename = 
_tablename

ORDER BY 1;
$function$;

create or replace function trg_func() RETURNS TRIGGER
AS
$$
declare
t record;
begin
  FOR t IN (SELECT * FROM  
get_basic_attributes_from_pg_tables(new.nspname, new.tname)) LOOP

RAISE WARNING '"% % %"', t.owner, t.tablespace, t.tname;
  END LOOP;
  RETURN NEW;
END
$$ LANGUAGE PLPGSQL;

create trigger t_ins_t AFTER INSERT ON t FOR EACH ROW EXECUTE FUNCTION 
trg_func();


set debug_discard_caches to 1;
insert into t values('pg_catalog', 'pg_class');

It's harder to achieve this without permanent cache (harder to trigger 
revalidation), but it's still possible.


What happened here is that during revalidation query plan was rebuilt, 
but modifications to query tree, made  by check_sql_fn_retval() , were 
lost.


To fix this issue:
1) We avoid caching modified plans (and check_sql_fn_retval() now 
reports if it touched a query tree);
2) For non-cached plans we still need a hack (callback) into 
CachedPlanSource to rebuild query tree if invalidation happens. This 
callback rebuilds query tree, using check_sql_fn_retval(). We sure that 
callback parameters, specifying actual function return type, should not 
be recalculated, as such plans can appear only during one function 
execution and are not reused.
3) To prove that result type was not changed between plans execution, we 
build plans with fixed_result = true.
4) When we get saved plan, prior to using it, we check that result tlist 
matches the one built while planning function execution. Otherwise, we 
recreate CachedPlanSource.


Well, it appeared more complicated than I've expected, but now it seems 
simple SQL functions have much better performance.


create or replace function fx(int) returns int as $$ select $1 + $1; $$ 
language sql immutable;
create or replace function fx2(int) returns int as $$ select 2 * $1; $$ 
language sql immutable;
create or replace function fx3 (int) returns int immutable begin atomic 
select $1 + $1; end;
create or replace function fx4(int) returns numeric as $$ select $1 + 
$1; $$ language sql immutable;


-- sql function
do $$
begin
  for i in 1..100 loop
perform fx((random()*100)::int);
  end loop;
end;
$$;
Time: 3008.869 ms (00:03.009)


-- dynamic SQL
do
$$ begin
  for i in 1..100 loop
execute 'select $1 + $1' using (random()*100)::int;
  end loop;
end;
$$;
Time: 4915.295 ms (00:04.915)do $$

-- pre-parsed function
begin
  for i in 1..100 loop
perform fx3((random()*100)::int);
  end loop;
end;
$$;
Time: 2992.166 ms (00:02.992)

-- no plan caching due to need in fixing target list:
do $$
begin
  for i in 1..100 loop
perform fx4((random()*100)::int);
  end loop;
end;
$$;
Time: 11020.820 ms (00:11.021)


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom d73e0a9463332d7f25dc42b5bca7e1878d0ee7fe Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 14 Feb 2025 10:08:52 +0300
Subject: [PATCH 4/4] Handle SQL functions which are modified between rewrite
 and plan stages.

Query can be modified between rewrite and plan stages by
check_sql_fn_retva

Re: NOT ENFORCED constraint feature

2025-02-14 Thread Isaac Morland

On Fri, 14 Feb 2025 at 10:11, Ashutosh Bapat 
wrote:

> > > I think, what you intend to say is clearer with 4 state system {NE, E}
> > > * {NV, V} = {(NE, NV), (NE, V), (E, NV), (E, V)} where (NE, V) is
> > > unreachable. Let's name them S1, S2, S3, S4 respectively.
> > [...]
> > > Notice that there are no edges to and from S2.
> >
> > So why list it as a possible state?
>
> For the sake of combinatorics. :)
>

Just because there are 2^n combinations of n boolean values does not mean
there are 2^n actual meaningful states. That's why we have CHECK
constraints.

Re: Allow default \watch interval in psql to be configured

2025-02-14 Thread Daniel Gustafsson

> On 2 Feb 2025, at 19:25, Daniel Gustafsson  wrote:

Attached is rebase with some small levels of polish, unless objected to I would
like to go ahead with this version.

--
Daniel Gustafsson



v5-0001-psql-Make-default-watch-interval-configurable.patch
Description: Binary data

Re: Allow io_combine_limit up to 1MB

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 09:32:32 +0100, Jakub Wartak wrote:
> On Wed, Feb 12, 2025 at 1:03 AM Andres Freund  wrote:
> > FWIW, I see substantial performance *regressions* with *big* IO sizes using
> > fio. Just looking at cached buffered IO.
> >
> > for s in 4 8 16 32 64 128 256 512 1024 2048 4096 8192;do echo -ne "$s\t\t"; 
> > numactl --physcpubind 3 fio --directory /srv/dev/fio/ --size=32GiB 
> > --overwrite 1 --time_based=0 --runtime=10 --name test --rw read --buffered 
> > 0 --ioengine psync --buffered 1 --invalidate 0 --output-format json 
> > --bs=$((1024*${s})) |jq '.jobs[] | .read.bw_mean';done
> >
> > io size kB  throughput in MB/s
> [..]
> > 256 16864
> > 512 19114
> > 102412874
> [..]
> 
> > It's worth noting that if I boot with mitigations=off clearcpuid=smap I get
> > *vastly* better performance:
> >
> > io size kB  throughput in MB/s
> [..]
> > 128 23133
> > 256 23317
> > 512 25829
> > 102415912
> [..]
> > Most of the gain isn't due to mitigations=off but clearcpuid=smap. 
> > Apparently
> > SMAP, which requires explicit code to allow kernel space to access userspace
> > memory, to make exploitation harder, reacts badly to copying lots of memory.
> >
> > This seems absolutely bonkers to me.
> 
> There are two bizarre things there, +35% perf boost just like that due
> to security drama, and that io_size=512kb being so special to give a
> 10-13% boost in Your case? Any ideas, why?

I think there are a few overlapping "cost factors" and that turns out to be
the global minimum:
- syscall overhead: the fewer the better
- memory copy cost: higher for small-ish amounts, then lower
- smap costs: seems to increase with larger amounts of memory
- CPU cache: copying less than L3 cache will be faster, as otherwise memory
  bandwidth plays a role



> I've got on that Lsv2
> individual MS nvme under Hyper-V, on ext4, which seems to be much more
> real world and average Joe situation, and it is much slower, but it is
> not showing advantage for blocksize beyond let's say 128:
> 
> io size kB  throughput in MB/s
> 41070
> 81117
> 161231
> 321264
> 641249
> 1281313
> 2561323
> 5121257
> 10241216
> 20481271
> 40961304
> 81921214
> 
> top hitter on of course stuff like clear_page_rep [k] and
> rep_movs_alternative [k] (that was with mitigations=on).

I think you're measuring something different than I was. I was purposefully
measuring a fully-cached workload, which worked with that recipe, because I
have more than 32GB of RAM available. But I assume you're running this in a VM
that doesnt have that much, and thus your're actually bencmarking reading data
from disk and - probably more influential in this case - finding buffers to
put the newly read data in.

Greetings,

Andres Freund

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 10:04:41 +0100, Jakub Wartak wrote:
> Is there any reason we couldn't have new pg_test_iorates (similiar to
> other pg_test_* proggies), that would literally do this and calibrate
> best e_io_c during initdb and put the result into postgresql.auto.conf
> (pg_test_iorates --adjust-auto-conf) , that way we would avoid user
> questions on how to come with optimal value?

Unfortunately I think this is a lot easier said than done:

- The optimal depth depends a lot on your IO patterns, there's a lot of things
  between fully sequential and fully random.

  You'd really need to test different IO sizes and different patterns. The
  test matrix for that gets pretty big.

- The performance characteristics of storage heavily changes over time.

  This is particularly true in cloud environments, where disk/VM combinations
  will be "burstable", allowing higher throughput for a while, but then not
  anymore.  Measureing during either of those states will not be great for the
  other state.

- e_io_c is per-query-node, but impacts the whole system. If you set
  e_io_c=1000 on a disk with a metered IOPS of say 1k/s you might get a
  slightly higher throughput for a bitmap heap scan, but also your commit
  latency in concurrent will go through the roof, because your fdatasync()
  will be behind a queue of 1k reads that all are throttled.

Greetings,

Andres Freund

Re: BackgroundPsql swallowing errors on windows

2025-02-14 Thread Tom Lane

Andres Freund  writes:
> On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote:
>> ... If there is interest I will
>> bring the work up to date, and maybe we can introduce this early in the v19
>> cycle. It would significantly reduce our dependency on IPC::Run, especially
>> the pump() stuff.

> I definitely am interested.

+1.  Would there be a chance of removing our use of IPC::Run entirely?
There'd be a lot to like about that.

regards, tom lane

New buildfarm animals with FIPS mode enabled

2025-02-14 Thread Tom Lane

I see that somebody decided to crank up some animals running
RHEL8 and RHEL9 with FIPS mode turned on.  The RHEL9 animals
pass on v17 and master, but not older branches; the RHEL8
animals pass nowhere.  This is unsurprising given that the
v17-era commits that allowed our regression tests to pass
under FIPS mode (795592865 and a bunch of others) explicitly
targeted only OpenSSL 3:

These new expected files currently cover the FIPS mode provided by
OpenSSL 3.x as well as the modified OpenSSL 3.x from Red Hat (e.g.,
Fedora 38), but not the modified OpenSSL 1.x from Red Hat (e.g.,
Fedora 35).  (The latter will have some error message wording
differences.)

I'm kind of disinclined to do all the work that'd be needed to turn
these animals completely green, especially when the reason to do it
seems to be that someone decided we should without any community
consultation.  Perhaps others have different opinions though.

Thoughts?

regards, tom lane

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-14 Thread Melanie Plageman

On Fri, Feb 14, 2025 at 1:15 PM Melanie Plageman
 wrote:
>
> On Thu, Feb 13, 2025 at 9:06 PM Melanie Plageman
>  wrote:
> >
> > On Thu, Feb 13, 2025 at 8:30 PM Masahiko Sawada  
> > wrote:
> > > The rest looks good to me.
> >
> > Cool! I'll plan to push this tomorrow barring any objections.
>
> I've committed this and marked it as such in the CF app.

Seems valgrind doesn't like this [1]. I'm looking into it.

- Melanie

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-02-14%2018%3A00%3A12

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 18:18:47 +0100, Tomas Vondra wrote:
> FWIW this does not change anything in the detection of sequential access
> patterns, discussed nearby, because the benchmarks started before Andres
> looked into that. If needed, I can easily rerun these tests, I just need
> a patch to apply.
> 
> But if there really is some sort of issue, it'd make sense why it's much
> worse on the older SATA SSDs, while NVMe devices perform somewhat
> better. Because AFAICS the NVMe devices are better at handling random
> I/O with shorter queues.

I think the results are complicated because there are two counteracting
factors influencing performance:

1) read stream doing larger reads -> considerably faster
2) read stream not doing prefetching -> more IO stalls

1) will be a a bigger boon on disks where you're not bottlenecked as much by
interface limits. Whereas SATA is limited to ~500MB/s, NVMe started out at
3GB/s. So this gain will matter more on NVMes.

At least on my machine 2) is what causes CPU idle states to kick in, which is
what causes a good bit of the slowdown.  How expensive the idle states are,
how quickly they kick in, etc seems to depend a lot on CPU model, bios
settings and "platform settings" (mainboard manufacturer settings).

The worse a disk is at random IO, the longer the stalls are (adding time), the
deeper idle state can be reached (further increasing latency). I.e. SATA will
be worse.

It might be interesting to run the benchmark with cpu idle stats disabled, at
least on the subset of cores you run the test on. E.g.
  cpupower -c 13 idle-set -D1
will disable idle states that have a transition time worse than 1us for core
13.

Sometimes disabling idle states for all cores will have deliterious effects,
due to reducing the thermal budget for turbo boost. E.g. on my older
workstation a core can boost to 3.4GHz if the whole system is at -E and only
3GHz at -D0.

Instead of disabling idle states, you could also just monitor them (cpupower
monitor  or turbostat --quiet ).

Greetings,

Andres Freund

Re: BackgroundPsql swallowing errors on windows

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 12:14:55 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote:
> >> ... If there is interest I will
> >> bring the work up to date, and maybe we can introduce this early in the v19
> >> cycle. It would significantly reduce our dependency on IPC::Run, especially
> >> the pump() stuff.
> 
> > I definitely am interested.
> 
> +1.  Would there be a chance of removing our use of IPC::Run entirely?
> There'd be a lot to like about that.

Unfortunately I doubt it'd get us that close to that.

We do have several tests that intentionally involve psql,
e.g. 010_tab_completion.pl, 001_password.pl. Presumably those would continue
to use something like BackgroundPsql.pm, even if we had a sane way to have
longrunning connections in tap tests.

There also are a bunch of things we use IPC::Run to run synchronously, we
probably could replace most of those without too much pain. The hardest
probably would be timeout handling.

Greetings,

Andres Freund

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 18:36:37 +0100, Tomas Vondra wrote:
> All of this is true, ofc, but maybe it's better to have a tool providing
> at least some advice

I agree, a tool like that would be useful!

One difficulty is that the relevant parameter space is really large, making it
hard to keep the runtime in a reasonable range...

> ? I'd imagine pg_test_fsync is affected by many of
> those issues too (considering both are about I/O).

I think pg_test_fsync is a bit less affected, because it doens't have a high
queue depth, so it doesn't reach limits quite as quickly as something doing
higher queue depth IO.

Orthogonal aside: pg_test_fsync's numbers are not particularly helpful:

- It e.g. tests O_SYNC, O_DSYNC with O_DIRECT, while testing fsync/fdatasync
  with buffered IO.  But all the sync methods apply for both buffered and
  direct IO. And of course it'd be helpful to note what's being used, since
  postgres can do either...

- Only tests O_SYNC, not O_DSYNC with larger write sizes, even though that
  information is a lot more relevant for O_DSYNC (since O_SYNC is useless)

- Only tests write sizes of up to 16kB, even though larger writes are
  extremely common and performance critical...

There's more, but it's already a long enough aside.

> I'd definitely not want initdb to do this automatically, though. Getting
> good numbers is fairly expensive (in time and I/O), can be flaky, etc.

Yea.

> But maybe having a tool that gives you a bunch of numbers, as input for
> manual tuning, would be good enough?

I think it'd be useful. I'd perhaps make it an SQL callable tool though, so
it can be run in cloud environments.

> As you say, it's not just about the hardware (and how that changes over
> time because of "burst" credits etc.), but also about the workload.
> Would it be possible to track something, and adjust this dynamically
> over time? And then adjust the prefetch distance in some adaptive way?

Yes, I do think so!  It's not trivial, but I think we eventually do want it.

Melanie has worked on this a fair bit, fwiw.

My current thinking is that we'd want something very roughly like TCP
BBR. Basically, it predicts the currently available bandwidth not just via
lost packets - the traditional approach - but also by building a continually
updated model of "bytes in flight" and latency and uses that to predict what
the achievable bandwidth is.

There are two snags on the way there:

1) Timestamps aren't cheap, so we probably can't do this for every IO.

   Modern NICs can associate timestamps with packets on a hardware level, we
   don't have that luxury.

2) It's not always easy to know an accurate completion timestamp.

   E.g. if a backend fired off a bunch of reads via io_uring and then is busy
   doing CPU bound work, we don't know how long ago the requests already
   completed.

   We probably can approximate that though. Or we could use a background
   process or timer interrupt to add a timestamp to IOs.

> Perhaps it's a bit naive, but say we know what % of requests is handled
> from cache, how long it took, etc. We opportunistically increase the
> prefetch distance, and check the cache hit ratio after a while. Did it
> help (% increased, consumed less time), maybe try another step. If not,
> maybe try prefetching less?

I don't immediately understand how you can use cache hit ratio here? We only
will read data if it was a cache miss, after all? And we keep buffers pinned,
so they can't be thrown out.

Greetings,

Andres Freund

Re: pg_stat_statements and "IN" conditions

2025-02-14 Thread Dmitry Dolgov

> On Fri, Feb 14, 2025 at 11:12:25PM GMT, Julien Rouhaud wrote:
> > > There seems to be an off-by-1 error in parameter numbering when merging 
> > > them.
> >
> > There are indeed three constants, but the second is not visible in the
> > query text. Maybe makes sense to adjust the number in this case, let me
> > try.

This should do it. The last patch for today, otherwise I'll probably add
more bugs than features :)
>From 122cac8eda36af877ac471322f681aa6ff46d61b Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Tue, 3 Dec 2024 14:55:45 +0100
Subject: [PATCH v27] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is jumbled. In certain situations it's undesirable, especially
if the list becomes too large.

Make an array of Const expressions contribute only the first/last
elements to the jumble hash. Allow to enable this behavior via the new
pg_stat_statements parameter query_id_squash_values with the default value off.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane,
Michael Paquier, Sergei Kornilov, Alvaro Herrera, David Geier, Sutou Kouhei,
Sami Imseih, Julien Rouhaud
Tested-by: Chengxi Sun, Yasuo Honda
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/merging.out   | 465 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  63 ++-
 contrib/pg_stat_statements/sql/merging.sql| 180 +++
 doc/src/sgml/config.sgml  |  28 ++
 doc/src/sgml/pgstatstatements.sgml|  28 +-
 src/backend/nodes/gen_node_support.pl |  21 +-
 src/backend/nodes/queryjumblefuncs.c  | 166 ++-
 src/backend/postmaster/launch_backend.c   |   3 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/nodes/nodes.h |   3 +
 src/include/nodes/primnodes.h |   2 +-
 src/include/nodes/queryjumble.h   |   8 +-
 15 files changed, 956 insertions(+), 26 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/merging.out
 create mode 100644 contrib/pg_stat_statements/sql/merging.sql

diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 241c02587b..eef8d69cc4 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config 
$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
user_activity wal entry_timestamp privileges extended \
-   parallel cleanup oldextversions
+   parallel cleanup oldextversions merging
 # Disabled because these tests require 
"shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/merging.out 
b/contrib/pg_stat_statements/expected/merging.out
new file mode 100644
index 00..ecf0a66a6b
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/merging.out
@@ -0,0 +1,465 @@
+--
+-- Const merging functionality
+--
+CREATE EXTENSION pg_stat_statements;
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- No merging is performed, as a baseline result
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+query  
  | calls 
+-+---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) 
  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, 
$10)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, 
$10, $11) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t
  | 1
+(4 rows)
+
+-- Normal scenario, too many simple constants for an IN query
+SET query_id_squash_values = on;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3);
+ id | data 
++--

Re: Parameter binding for COPY TO queries

2025-02-14 Thread Andres Freund

Hi,

On 2025-02-14 10:06:13 -0500, Tom Lane wrote:
> None of those arguments apply to the sub-SELECT of a "COPY (query) TO"
> command, but there's a practical matter of how to get the parameters
> passed through the COPY to the sub-SELECT without opening Pandora's
> box for every other kind of utility statement.

I think there's already precedent - CREATE TABLE AS does accept parameters
today, and it's a utility command too:

DROP TABLE IF EXISTS foo;
CREATE TABLE foo AS SELECT $1 as a, $2 as b \bind fooval, barval \g \d foo

Is there anything stopping us from implementing COPY along the same lines as
CTAS?  There's some special case code for it, but it seems within reason,
unless we want to make dozens of commands accepting parameters...

Greetings,

Andres Freund

Re: generic plans and "initial" pruning

2025-02-14 Thread Amit Langote

Hi Alexander,

On Sat, Feb 15, 2025 at 6:00 AM Alexander Lakhin  wrote:
>
> Hello Amit,
>
> 06.02.2025 04:35, Amit Langote wrote:
>
> I plan to push 0001 tomorrow, barring any objections.
>
>
> Please try the following script:
> CREATE TABLE pt (a int, b int) PARTITION BY range (a);
> CREATE TABLE tp1 PARTITION OF pt FOR VALUES FROM (1) TO (2);
> CREATE TABLE tp2 PARTITION OF pt FOR VALUES FROM (2) TO (3);
>
> MERGE INTO pt
> USING (SELECT pg_backend_pid() AS pid) AS q JOIN tp1 ON (q.pid = tp1.a)
> ON pt.a = tp1.a
> WHEN MATCHED THEN DELETE;
>
> which fails for me with segfault:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at 
> nodeModifyTable.c:3680
> 3680relationDesc = 
> RelationGetDescr(resultRelInfo->ri_RelationDesc);
> (gdb) bt
> #0  ExecInitMerge (mtstate=0x5a9b9fbccae0, estate=0x5a9b9fbcbe20) at 
> nodeModifyTable.c:3680
> #1  0x5a9b67e6dfb5 in ExecInitModifyTable (node=0x5a9b9fbd5858, 
> estate=0x5a9b9fbcbe20, eflags=0) at nodeModifyTable.c:4906
> #2  0x5a9b67e273f7 in ExecInitNode (node=0x5a9b9fbd5858, 
> estate=0x5a9b9fbcbe20, eflags=0) at execProcnode.c:177
> #3  0x5a9b67e1b9d2 in InitPlan (queryDesc=0x5a9b9fbb9970, eflags=0) at 
> execMain.c:1092
> #4  0x5a9b67e1a524 in standard_ExecutorStart (queryDesc=0x5a9b9fbb9970, 
> eflags=0) at execMain.c:268
> #5  0x5a9b67e1a223 in ExecutorStart (queryDesc=0x5a9b9fbb9970, eflags=0) 
> at execMain.c:142
> ...
>
> starting from cbc127917.
>
> (I've discovered this anomaly with SQLsmith.)

Thanks! It looks like I missed updating the MERGE-related lists in ModifyTable.

I've attached a fix with a test added based on your example. I plan to
push this on Monday.

-- 
Thanks, Amit Langote


0001-Fix-an-oversight-in-cbc127917-for-MERGE-handling.patch
Description: Binary data

92 matches

Mail list logo