date:20241007

On Mon, Oct 07, 2024 at 10:07:10AM +0900, Michael Paquier wrote:
> On Fri, Oct 04, 2024 at 10:02:11AM -0500, Nathan Bossart wrote:
>> Could you explain why you feel the existing support functions are
>> insufficient?
> 
> Because it is not possible to outsource the scan of pg_wal/summaries/
> to a different role, no?

I was under the impression that you could do this with
pg_available_wal_summaries() [0].

[0] 
https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-WAL-SUMMARY

-- 
nathan

Re: Changing the state of data checksums in a running cluster

2024-10-07 Thread Tomas Vondra

Hi,

I did a quick review today. First a couple minor comments:

1) monitoring.sgml

typos: number of database -> number of databases
   calcuated -> calculated

2) unnecessary newline in heapam.c (no other changes)

3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345,
shadowing earlier variable

4) unnecessary comment change in bufmgr.c (no other changes)

5) unnecessary include and newline in bulk_write.c (no other changes)

6) unnecessary newline in pgstat.c (no other changes)

7) controldata.c - maybe this

   if (oldctrl->data_checksum_version == 2)

should use PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION instead of the magic
constant? For "off" we use "0" which seems somewhat acceptable, but for
other values it's less obvious what the meaning is.

8) xlog_internal.h - xl_checksum_state should be added to typedefs

9) system_functions.sql - Isn't it weird that this only creates the new
pg_enable_data_checksums function, but not pg_disable_data_checksums? It
also means it doesn't revoke EXECUTE from public on it, which I guess it
probably should? Or why should this be different for the two functions?

Also the error message seems to differ:

  test=> select pg_enable_data_checksums();
  ERROR:  permission denied for function pg_enable_data_checksums
  test=> select pg_disable_data_checksums();
  ERROR:  must be superuser

Probably harmless, but seems a bit strange.


But there also seems to be a more serious problem with recovery. I did a
simple script that does a loop of

* start a cluster
* initialize a small pgbench database (scale 1 - 100)
* run "long" pgbench
* call pg_enable_data_checksums(), wait for it to complete
* stop the cluster with "-m immediate"
* start the cluster

And this unfortunately hits this assert:

  bool
  AbsorbChecksumsOnBarrier(void)
  {
Assert(LocalDataChecksumVersion ==
PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
return true;
  }

Based on our short discussion about this, the controlfile gets updated
right after pg_enable_data_checksums() completes. The immediate stop
however forces a recovery since the last checkpoint, which means we see
the XLOG_CHECKSUMS WAL message again, and set the barrier. And then we
exit recovery, try to start checkpointer and it trips over this, because
the control file already has the "on" value :-(

I'm not sure what's the best way to fix this. Would it be possible to
remember we saw the XLOG_CHECKSUMS during recovery, and make the assert
noop in that case? Or not set the barrier when exiting recovery. I'm not
sure the relaxed assert would remain meaningful, though. What would it
check on standbys, for example?

Maybe a better way would be to wait for a checkpoint before updating the
controlfile, similar to what we do at the beginning? Possibly even with
the same "fast=true/false" logic. That would prevent us from seeing the
XLOG_CHECKSUMS wal record with the updated flag. It would extend the
"window" where a crash would mean we have to redo the checksums, but I
don't think that matters much. For small databases who cares, and for
large databases it should not be a meaningful difference (setting the
checksums already ran over multiple checkpoints, so one checkpoint is
not a big difference).


regards

-- 
Tomas Vondra

Re: Refactoring postmaster's code to cleanup after child exit

2024-10-07 Thread Andres Freund

Hi,

On 2024-10-05 20:51:50 +0100, Dagfinn Ilmari Mannsåker wrote:
> Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un()
> on Windows, so that can be fixed by bumping the Perl version in
> https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl
> to something more modern (such as 5.40.0.1), and only skipping the test
> if on Windows if Socket is too old.
> 
> The decision to use 5.26 seems to come from the initial creation of the
> CI images in 2021 (when 5.34 was current), with the comment «newer
> versions don't currently work correctly for plperl».  That claim is
> worth revisiting, and fixing if it's still the case.

I think we fixed the issues that were known at the time. I think I tried
upgrading to something newer at some point and there were some weird, but
fixable, encoding issues. Unfortunately I don't have the bandwidth to tackle
this rn.

Greetings,

Andres Freund

Re: Add minimal C example and SQL registration example for custom table access methods.

2024-10-07 Thread Phil Eaton

Glad to hear it. Thank you!


On Mon, Oct 7, 2024 at 2:50 AM Michael Paquier  wrote:
>
> On Fri, May 24, 2024 at 03:59:08PM -0300, Fabrízio de Royes Mello wrote:
> > Nice... LGTM!
>
> I have noticed that this was still in the CF.  After fixing a couple
> of inconsistencies in the markups and the names, trimming down the
> list of headers to avoid rot and adding a static in from of the const,
> the result looked pretty much OK, so applied.
> --
> Michael

Re: POC, WIP: OR-clause support for indexes

2024-10-07 Thread jian he

assume v40 is the latest version.
in group_similar_or_args
we can add a bool variable so

boolmatched = false;
foreach(lc, orargs)
{
if (match_index_to_operand(nonConstExpr, colnum, index))
{
matches[i].indexnum = indexnum;
matches[i].colnum = colnum;
matches[i].opno = opno;
matches[i].inputcollid = clause->inputcollid;
matched = true;
break;
}
}
...
if (!matched)
return orargs;
/* Sort clauses to make similar clauses go together */
qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp);



I guess it can save some cycles?

Re: Add trim_trailing_whitespace to editorconfig file

2024-10-07 Thread Jelte Fennema-Nio

On Thu, 5 Sept 2024 at 23:28, Jelte Fennema-Nio  wrote:
> Okay, I've done this now.

Is this blocked on anything? I feel it's ready to merge at this point.
I'd really like to not have this problem with trailing whitespace in
sgml files anymore.

Re: First draft of PG 17 release notes

On Mon, Oct  7, 2024 at 07:25:11PM -0400, Bruce Momjian wrote:
> > Yes. This change on CREATE INDEX was introduced by 2af07e2f7 together with
> > other commands, but it was missed to be mentioned in the commit message
> > although the description was added to the documentation.
> > 
> > The change on CEATE MATERIALIZED VIEW was introduced by a separate commit
> > b4da732fd, since which the REFRESH logic is used when creating a matview.
> > Should we add here a link to that commit, too?
> 
> I developed the attached patch which adds the two commands and the
> commit item.

Okay, updated commit after running src/tools/add_commit_links.pl.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index ad814737745..0ea9d96a47f 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -125,6 +125,8 @@
 
 
 
@@ -132,11 +134,14 @@ Author: Jeff Davis 
  Change functions to use a safe 
  during maintenance operations (Jeff Davis)
  §
+ §
  
 
  
  This prevents maintenance operations (ANALYZE,
- CLUSTER, REFRESH
+ CLUSTER, CREATE
+ INDEX, CREATE
+ MATERIALIZED VIEW, REFRESH
  MATERIALIZED VIEW, REINDEX,
  or VACUUM) from performing unsafe access.
  Functions used by expression indexes and materialized views that

Re: Doc: typo in config.sgml

2024-10-07 Thread Tatsuo Ishii

> On Tue, 1 Oct 2024 22:20:55 +0900
> Yugo Nagata  wrote:
> 
>> On Tue, 1 Oct 2024 15:16:52 +0900
>> Yugo NAGATA  wrote:
>> 
>> > On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
>> > Tatsuo Ishii  wrote:
>> > 
>> > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
>> > > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
>> > > >> point in Unicode. i.e. U+00A0).
>> > > >> So grep -P "[\xC2\xA0]" should work to detect nbsp.
>> > > > 
>> > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. 
>> > > > ([ and ] were not necessary.)
>> > > > 
>> > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters 
>> > > > in charset.sgml,
>> > > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for 
>> > > > making sure detecting
>> > > > nbsp.
>> > > > 
>> > > > One problem is that -P option can be used in only GNU grep, and grep 
>> > > > in mac doesn't support it.
>> > > > 
>> > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can 
>> > > > assume the shell is bash.
>> > > > 
>> > > > Maybe, better way is use perl itself rather than grep as following.
>> > > > 
>> > > >  `perl -ne '/\xC2\xA0/ and print' `
>> > > > 
>> > > > I attached a patch fixed in this way.
>> > > 
>> > > GNU sed can also be used without setting LC_ALL:
>> > > 
>> > > sed -n /"\xC2\xA0"/p
>> > > 
>> > > However I am not sure if non-GNU sed can do this too...
>> > 
>> > Although I've not check it myself, BSD sed doesn't support \x escape 
>> > according to [1].
>> > 
>> > [1] 
>> > https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference
>> > 
>> > By the way, I've attached a patch a bit modified to use the plural form 
>> > statement
>> > as same as check-tabs.
>> > 
>> >  Non-breaking **spaces** appear in SGML/XML files
>> 
>> The previous patch was broken because the perl command failed to return the 
>> correct result.
>> I've attached an updated patch to fix the return value. In passing, I added 
>> line breaks
>> for long lines.
> 
> I've attached a updated patch. 
> I added the comment to explain why Perl is used instead of grep or sed.

Looks good to me. If there's no objection, I will commit this to
master branch.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: First draft of PG 17 release notes

On Tue, Oct  1, 2024 at 04:36:09PM +0200, Laurenz Albe wrote:
> I think that the removal of the "adminpack" extension should
> be listed in the section "migration to v17" as an incompatibility.
> 
> I have seen one complaint that pg_upgrade fails if the extension
> is installed, but a dump/restore would also throw an error.

Agreed. moved.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: Collation & ctype method table, and extension hooks

2024-10-07 Thread Jeff Davis

On Fri, 2024-10-04 at 15:24 +0200, Andreas Karlsson wrote:
> Great! I had been planning to do this myself so great to see that you
> already did it before me. Will take a look at this work later.

Great! We'll need to test whether there are any regressions in the
regex & pattern matching code due to the indirection.

What would be a good test for that? Just running it over long strings?

Regards,
Jeff Davis

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2024-10-07 Thread Nitin Jadhav

Hi,

It’s been a long gap in the activity of this thread, and I apologize
for the delay. However, I have now returned and reviewed the other
threads [1],[2] that have made changes in this area. I would like to
share a summary of the discussion that took place among Robert,
Andres, Bharath, and Tom on this thread, to make it easier to move
forward.

Robert was dissatisfied with the approach used in the patch to report
progress for the checkpointer process, as he felt the current
mechanism is not suitable. He proposed allocating a dedicated chunk of
shared memory in CheckpointerShmemStruct. Bharath opposed this,
suggesting instead to use PgStat_CheckpointerStats. Andres somewhat
supported Robert’s idea but noted that using PgStat_CheckpointerStats
would allow for more code reuse.

The discussion then shifted towards the statistics handling for the
checkpointer process. Robert expressed dissatisfaction with the
current statistics handling mechanism. Andres explained the rationale
behind the existing setup and the improvements made in pg_stat_io. He
also mentioned the overhead of the changecount mechanism when updating
for every buffer write. However, for updates involving a single
parameter at a time, performance can be improved on platforms that
support atomic 64-bit writes (indicated by #ifdef
PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY). He also shared performance
metrics demonstrating good results with this approach. Tom agreed to
use this and restrict it to the specific case.

But I am not quite clear on the direction ahead. Let me summarise the
approaches based on the above discussion.

Approach-1: The approach used in the current patch which uses the
existing mechanism of progress reporting. The advantage of this
approach is that the machinery is already in place and ready to use.
However, it is not suitable for the checkpointer process because only
the checkpointer process runs the checkpoint, even if the command is
issued from a different backend. The current mechanism is designed for
any backend to report progress for any command it is running, and we
don’t know which command that will be at any given point in time, or
how many backends will be running any given command simultaneously.
Hence, this approach does not fit the checkpointer. Additionally,
there is complexity involved in mapping down to integers and back.

Approach-2: Allocate a dedicated chunk of shared memory in
CheckpointerShmemStruct with an appropriate name and size. This
approach eliminates the complexity involved in Approach-1 related to
mapping down to integers and back. However, it requires building the
necessary machinery to suit checkpointer progress reporting which
might be similar to the current progress reporting mechanism.

Approach-3: Using PgStat_CheckpointerStats to store the progress
information. Have we completely ruled out this approach?

Additionally all three approaches require improvements in the
changecount mechanism on platforms that support atomic 64-bit writes.

I’m inclined to favor Approach-2 because it provides a clearer method
for reporting progress for the checkpointer process, with the
additional work required to implement the necessary machinery.
However, I’m still uncertain about the best path forward. Please share
your thoughts.

[1]: 
https://www.postgresql.org/message-id/flat/CAOtHd0ApHna7_p6mvHoO%2BgLZdxjaQPRemg3_o0a4ytCPijLytQ%40mail.gmail.com#74ae447064932198495aa6d722fdc092
[2]: 
https://www.postgresql.org/message-id/CALj2ACVxX2ii=66rypxrweze2esbripmj0ahfrfhuexjcc7...@mail.gmail.com

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

On Tue, Jan 31, 2023 at 11:16 PM vignesh C  wrote:
>
> On Thu, 8 Dec 2022 at 00:33, Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2022-11-04 09:25:52 +0100, Drouvot, Bertrand wrote:
> > > Please find attached a rebase in v7.
> >
> > cfbot complains that the docs don't build:
> > https://cirrus-ci.com/task/6694349031866368?logs=docs_build#L296
> >
> > [03:24:27.317] ref/checkpoint.sgml:66: element para: validity error : 
> > Element para is not declared in para list of possible children
> >
> > I've marked the patch as waitin-on-author for now.
> >
> >
> > There's been a bunch of architectural feedback too, but tbh, I don't know if
> > we came to any conclusion on that front...
>
> There has been no updates on this thread for some time, so this has
> been switched as Returned with Feedback. Feel free to open it in the
> next commitfest if you plan to continue on this.
>
> Regards,
> Vignesh

Commit fest 2024-09

Hi all,

The commit fest 2024-09 should have ended one week ago, so I have been
taking one step ahead and I have begun cleaning up things.  As of now,
there is still a total of 113 patches marked as "Needs Review", and I
hope that my non-auto-vacuuming will be over tomorrow.

If you are an author of patches still listed, I would recommend to
close or move your entries by yourself to limit errors, because, well,
I am going to mishandle some of them for sure.

I am done processing the patches waiting on author.  Some of them have
been waiting on input from author for a rather long time up to six
months, so I have marked them as returned with feedback where it was
clear that a patch has been given up.  This is where a "Not
Interested" or a "Lack of Interest" field would have been handy.

If you feel that one or more of your patches have been handled in an
incorrect way, feel free to reply back to this thread, to update your
patch entry to a better state, or to reply back to me using this
thread.

Thanks,
--
Michael


signature.asc
Description: PGP signature

Re: Add parallel columns for pg_stat_statements

2024-10-07 Thread Guillaume Lelarge

Le lun. 7 oct. 2024 à 02:18, Michael Paquier  a écrit :

> On Sun, Oct 06, 2024 at 03:32:02PM +0200, Guillaume Lelarge wrote:
> > I'm not sure I follow. That would mean that every time a query is
> executed,
> > it always gets the same amount of workers. Which is not guaranteed to be
> > true.
> >
> > I would agree, though, that parallelized_queries_launched is probably not
> > that interesting. I could get rid of it if you think it should go away.
>
> My point is that these stats are useful to know which action may have
> to be taken when reaching a mean, and numbers in pg_stat_statements
> offer hints that something is going wrong and that a closer lookup at
> an EXPLAIN plan may be required, particularly if the total number of
> workers planned and launched aggregated in the counters is unbalanced
> across queries.  If the planned/launched ratio is balanced across most
> queries queries, a GUC adjustment may be OK.  If the ratio is very
> unbalanced in a lower set of queries, I'd also look at tweaking GUCs
> instead like the per_gather.  These counters give information that one
> or the other may be required.
>
> > Well, I don't see this as an overlap. Rather more information.
>
> Later versions of Benoit's patch have been accumulating this data in
> the executor state.  v4 posted at [1] has the following diffs:
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -724,6 +724,9 @@ typedef struct EState
>  */
> List   *es_insert_pending_result_relations;
> List   *es_insert_pending_modifytables;
> +
> +   int es_workers_launched;
> +   int es_workers_planned;
>  } EState;
>
> Your v2 posted on this thread has that:
> @@ -707,6 +707,12 @@ typedef struct EState
> struct EPQState *es_epq_active;
>
> booles_use_parallel_mode;   /* can we use parallel
> workers? */
> +   booles_used_parallel_mode;  /* was executed in
> parallel */
> +   int es_parallelized_workers_launched;
> +   int es_parallelized_workers_planned;
> +   int es_parallelized_nodes; /* # of
> parallelized nodes */
> +   int es_parallelized_nodes_all_workers; /* # of
> nodes with all workers launched */
> +   int es_parallelized_nodes_no_worker; /* # of
> nodes with no workers launched */
>
> es_parallelized_workers_launched and es_workers_launched are the same
> thing in both.
>
>
My bad. I agree this is the way to go. See patch v3-0001 attached.


> > On this, I would agree with you. They are not that particularly useful to
> > get better setting for parallel GUCs. I can drop them if you want.
>
> Yep.  I would remove them for now.  This leads to more bloat.
>
>
Done. See patch v3-0002 attached.


> > Did this on the v2 version of the patch (attached here).
> >
> > Thanks for your review. If you want the parallelized_queries_launched
> > column and the parallelized_nodes_* columns dropped, I can do that on a
> v3
> > patch.
>
> I'd recommend to split that into more independent patches:
> - Introduce the two counters in EState with the incrementations done
> in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may
> want to coordinate with Benoit to avoid duplicating the work).
> - Expand pg_stat_statements to use them for DMLs, SELECTs, well where
> they matter.
> - Look at expanding that for utilities that can do parallel jobs:
> CREATE INDEX and VACUUM, but this has lower priority to me, and this
> can reuse the same counters as the ones added by patch 2.
>
>
The first two are done. The last one is beyond my scope.

I'm now working on Benoit's patch to make it work with my v3-0001 patch.
I'll send the resulting patch on his thread.


> [1]:
> https://www.postgresql.org/message-id/6ecad3ad-835c-486c-9ebd-da87a9a97...@dalibo.com
> [2]: https://www.postgresql.org/message-id/zv46wtmjltuu2...@paquier.xyz
> --
> Michael
>

Regards.


-- 
Guillaume.
From 95b300eeff0168f2618418102df660f1ba9b9113 Mon Sep 17 00:00:00 2001
From: Guillaume Lelarge 
Date: Mon, 7 Oct 2024 08:45:36 +0200
Subject: [PATCH v3 1/2] Introduce two new counters in EState

They will be used by two other patchs to populate new columns in
pg_stat_database and pg_statements.
---
 src/backend/executor/execUtils.c   | 3 +++
 src/backend/executor/nodeGather.c  | 3 +++
 src/backend/executor/nodeGatherMerge.c | 3 +++
 src/include/nodes/execnodes.h  | 3 +++
 4 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..1908481999 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -162,6 +162,9 @@ CreateExecutorState(void)
 	estate->es_jit_flags = 0;
 	estate->es_jit = NULL;
 
+	estate->es_parallelized_workers_launched = 0;
+	estate->es_parallelized_workers_planned = 0;
+
 	/*
 	 * Retur

Re: Unlinking Parallel Hash Join inner batch files sooner

On Tue, May 14, 2024 at 02:56:37PM -0400, Robert Haas wrote:
> It doesn't appear to me that this got committed. On the procedural
> question, I would personally treat it as a non-back-patchable bug fix
> i.e. master-only but without regard to feature freeze. However, I can
> see arguments for either treating it as a back-patchable fix or for
> waiting until v18 development opens. What would you like to do?

I am not seeing anything committed, either.  Thomas, an update?
--
Michael


signature.asc
Description: PGP signature

Re: On disable_cost

2024-10-07 Thread Alena Rybakina


On 03.10.2024 23:10, Laurenz Albe wrote:

On Thu, 2024-10-03 at 14:24 -0400, Robert Haas wrote:

On Thu, Oct 3, 2024 at 1:35 PM Alena Rybakina  wrote:

I prepared a patch that includes the information we can add.

One general thing to think about is that we really document very
little about EXPLAIN. That might not be good, but we should consider
whether it will look strange if we document a bunch of stuff about
this and still don't talk about anything else.

(This is not a comment on this specific patch, which I have not
examined. It's just a general thought.)
I think we should still add it because it might cause a lot of 
misunderstanding.

The "EXPLAIN Basics" already mention "enable_seqscan", so I think it is
alright to expand on that a bit.

Here is my take on a documentation patch (assuming David's "Disabled: true"
wording):

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index ff689b65245..db906841472 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -578,6 +578,28 @@ WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
  discussed below.
 
  
+   

+Some plan node types cannot be completely disabled.  For example, there is
+no other access method than a sequential scan for a table with no index.
+If you told the planner to disregard a certain node type, but it is forced
+to use it nonetheless, you will see the plan node marked as
+Disabled in the output of EXPLAIN:
+
+
+CREATE TABLE dummy (t text);
+
+SET enable_seqscan = off;
+
+EXPLAIN SELECT * FROM dummy;
+
+QUERY PLAN
+--
+ Seq Scan on dummy  (cost=0.00..23.60 rows=1360 width=32)
+   Disabled: true
+
+
+   
+
 
  
   subplan


Sorry for the late reply, I needed time to look into this feature to 
respond to your email.
I think this is not entirely correct. I tested last version of the patch 
[0]: I created a table and disabled sequential scanning, so there were 
no other options for optimizer to scan table t1. it still displayed that 
it has disabled nodes.
However you are right that this display will not appear for all nodes 
that only contain a data collection procedure, such as Append, 
MergeAppend, Gather, etc. And I agree with you that we should 
information about it. I also think it’s worth adding additional 
information that this option does not appear in the postgres_fdw extension.


--
Regards,
Alena Rybakina
Postgres Professional

Re: Parallel workers stats in pg_stat_database

2024-10-07 Thread Guillaume Lelarge

Hey,

Le mer. 2 oct. 2024 à 11:12, Benoit Lobréau  a
écrit :

> Hi,
>
> Thanks for your imput ! I will fix the doc as proposed and do the split
> as soon as I have time.
>
>
I've done the split, but I didn't go any further than that.

Two patches attached:
* v5-0001 adds the metrics (same patch than v3-0001 for pg_stat_statements)
* v5-0002 handles the metrics for pg_stat_database.

"make check" works, and I also did a few other tests without any issues.

Regards.


-- 
Guillaume.
From 9413829616bdc0806970647c15d7d6bbd96489d1 Mon Sep 17 00:00:00 2001
From: Guillaume Lelarge 
Date: Mon, 7 Oct 2024 08:45:36 +0200
Subject: [PATCH v5 1/2] Introduce two new counters in EState

They will be used by two other patchs to populate new columns in
pg_stat_database and pg_statements.
---
 src/backend/executor/execUtils.c   | 3 +++
 src/backend/executor/nodeGather.c  | 3 +++
 src/backend/executor/nodeGatherMerge.c | 3 +++
 src/include/nodes/execnodes.h  | 3 +++
 4 files changed, 12 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..1908481999 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -162,6 +162,9 @@ CreateExecutorState(void)
 	estate->es_jit_flags = 0;
 	estate->es_jit = NULL;
 
+	estate->es_parallelized_workers_launched = 0;
+	estate->es_parallelized_workers_planned = 0;
+
 	/*
 	 * Return the executor state structure
 	 */
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 5d4ffe989c..0fb915175a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -182,6 +182,9 @@ ExecGather(PlanState *pstate)
 			/* We save # workers launched for the benefit of EXPLAIN */
 			node->nworkers_launched = pcxt->nworkers_launched;
 
+			estate->es_parallelized_workers_launched += pcxt->nworkers_launched;
+			estate->es_parallelized_workers_planned += pcxt->nworkers_to_launch;
+
 			/* Set up tuple queue readers to read the results. */
 			if (pcxt->nworkers_launched > 0)
 			{
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 45f6017c29..149ab23d90 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -223,6 +223,9 @@ ExecGatherMerge(PlanState *pstate)
 			/* We save # workers launched for the benefit of EXPLAIN */
 			node->nworkers_launched = pcxt->nworkers_launched;
 
+			estate->es_parallelized_workers_launched += pcxt->nworkers_launched;
+			estate->es_parallelized_workers_planned += pcxt->nworkers_to_launch;
+
 			/* Set up tuple queue readers to read the results. */
 			if (pcxt->nworkers_launched > 0)
 			{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index aab59d681c..f898590ece 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -708,6 +708,9 @@ typedef struct EState
 
 	bool		es_use_parallel_mode;	/* can we use parallel workers? */
 
+	int			es_parallelized_workers_launched;
+	int			es_parallelized_workers_planned;
+
 	/* The per-query shared memory area to use for parallel execution. */
 	struct dsa_area *es_query_dsa;
 
-- 
2.46.2

From 873c03b99e1ccc05bac4c71b5f070e0dbe0bd779 Mon Sep 17 00:00:00 2001
From: benoit 
Date: Mon, 7 Oct 2024 10:12:46 +0200
Subject: [PATCH v5 2/2] Adds four parallel workers stat columns to
 pg_stat_database

* parallel_workers_planned
* parallel_workers_launched
* parallel_maint_workers_planned
* parallel_maint_workers_launched
---
 doc/src/sgml/monitoring.sgml  | 36 +++
 src/backend/access/brin/brin.c|  4 +++
 src/backend/access/nbtree/nbtsort.c   |  4 +++
 src/backend/catalog/system_views.sql  |  4 +++
 src/backend/commands/vacuumparallel.c |  5 +++
 src/backend/executor/execMain.c   |  7 
 src/backend/utils/activity/pgstat_database.c  | 36 +++
 src/backend/utils/adt/pgstatfuncs.c   | 12 +++
 src/include/catalog/pg_proc.dat   | 20 +++
 src/include/pgstat.h  |  7 
 src/test/regress/expected/rules.out   |  4 +++
 src/test/regress/expected/select_parallel.out | 11 ++
 src/test/regress/expected/vacuum_parallel.out | 19 ++
 src/test/regress/sql/select_parallel.sql  |  8 +
 src/test/regress/sql/vacuum_parallel.sql  | 11 ++
 15 files changed, 188 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 331315f8d3..9567ca5bd1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3611,6 +3611,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   
  
 
+ 
+  
+   parallel_workers_planned bigint
+  
+  
+   Number of parallel workers planned by queries on this database
+  
+ 
+
+ 
+  
+   parallel_workers_launched

Re: Enable data checksums by default

2024-10-07 Thread Daniel Gustafsson

> On 3 Oct 2024, at 23:13, Nathan Bossart  wrote:
> 
> On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
>> I have committed 0001 (the new option) and 0004 (the docs tweak).  I think
>> there is consensus for the rest, too, but I'll leave it for a few more days
>> to think about.  I guess the test failure has to be addressed.
> 
> Here is a rebased patch with the test fix (for cfbot).  I have made no
> other changes.

+Data checksums are enabled by default. They can be
+disabled by use of --no-data-checksums.

Nitpick, but I think this should be an xref like how we link to --no-locale in
the -E docs:  instead.

LGTM otherwise.

--
Daniel Gustafsson

Re: per backend I/O statistics

2024-10-07 Thread Bertrand Drouvot

Hi,

On Fri, Sep 20, 2024 at 12:53:43PM +0900, Michael Paquier wrote:
> On Wed, Sep 04, 2024 at 04:45:24AM +, Bertrand Drouvot wrote:
> > On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:
> >> As an additional benefit of this approach, the client can set a
> >> connection variable, for example, no_backend_iostats to true, or set
> >> its inverse variable to false, to restrict memory usage to only the
> >> required backends.
> > 
> > Thanks for the feedback!
> > 
> > If we were to add an on/off switch button, I think I'd vote for a global one
> > instead. Indeed, I see this feature more like an "Administrator" one, where
> > the administrator wants to be able to find out which session is reponsible 
> > of
> > what (from an I/O point of view): like being able to anwser "which session 
> > is
> > generating this massive amount of reads"?
> > 
> > If we allow each session to disable the feature then the administrator
> > would lost this ability.
> 
> Hmm, I've been studying this patch,

Thanks for looking at it!

> and I am not completely sure to
> agree with this feeling of using fixed-numbered stats, actually, after
> reading the whole and seeing the structure of the patch
> (FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't
> know exactly the number of slots we need to know for the
> fixed-numbered stats as MaxBackends may change).

Right, that's a new way of dealing with "unknown" number of slots (and it has
cons as you mentioned in [1]).

> If we make these
> kind of stats variable-numbered, does it have to actually involve many
> creations or removals of the stats entries, though?  One point is that
> the number of entries to know about is capped by max_connections,
> which is a PGC_POSTMASTER.  That's the same kind of control as
> replication slots.  So one approach would be to reuse entries in the
> dshash and use in the hashing key the number in the procarrays.  If a
> new connection spawns and reuses a slot that was used in the past,
> then reset all the existing fields and assign its PID.

Yeah, like it's done currently with the "fixed-numbered" stats proposal. That
sounds reasonable to me, I'll look at this proposed approach and come back with
a new patch version, thanks!

> Another thing is the consistency of the data that we'd like to keep at
> shutdown.  If the connections have a balanced amount of stats shared
> among them, doing decision-making based on them is kind of easy.  But
> that may cause confusion if the activity is unbalanced across the
> sessions.  We could also not flush them to disk as an option, but it
> still seems more useful to me to save this data across restarts if one
> takes frequent snapshots of the new system view reporting everything,
> so as it is possible to get an idea of the deltas across the snapshots
> for each connection slot.

The idea that has been implemented so far in this patch is that we still 
maintain
an aggregated version of the stats (visible through pg_stat_io) and that only 
the
aggregated stats are flushed/read to/from disk (means we don't flush the
per-backend stats).

I think that it makes sense that way. The way I see it is that the per-backend
I/O stats is more for current activity instrumentation. So it's not clear to me
what would be the benefits of restoring the per-backend stats at startup knowing
that: 1) we restored the aggregated stats and 2) the sessions that were 
responsible
for the the restored stats are gone.

[1]: https://www.postgresql.org/message-id/Zuz5iQ4AjcuOMx_w%40paquier.xyz

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: ECPG cleanup and fix for clang compile-time problem

2024-10-07 Thread John Naylor

On Saturday, October 5, 2024, Tom Lane  wrote:
>
>
> Rebase needed after f22e84df1, so here's an update that rebases
> up to HEAD and adds the missing "static".  No other changes.
>
> (Anybody want to review this?  I'm getting tired of rebasing it,
> and we're missing out on the clang build time savings.)


 Sorry for the delay, I'll respond in a couple days.

Re: per backend I/O statistics

2024-10-07 Thread Bertrand Drouvot

Hi,

On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:
> On Tue, Sep 17, 2024 at 01:56:34PM +, Bertrand Drouvot wrote:
> > No problem at all! (I re-explained because I'm not always 100% sure that my
> > explanations are crystal clear ;-) )
> 
> We've discussed a bit this patch offline, but after studying the patch
> I doubt that this part is a good idea:
> 
> + /* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
> + PgStatShared_IO io;
>  } PgStat_ShmemControl;
> 
> We are going to be in trouble if we introduce a second member in this
> routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl
> relies on the fact that all its members after deterministic offset
> positions in this structure.

Agree that it would be an issue should we have to add a new 
FLEXIBLE_ARRAY_MEMBER.

> So this lacks flexibility.  This choice
> is caused by the fact that we don't exactly know the number of
> backends because that's controlled by the PGC_POSTMASTER GUC
> max_connections so the size of the structure would be undefined.

Right.

> There is a parallel with replication slot statistics here, where we
> save the replication slot data in the dshash based on their index
> number in shmem.  Hence, wouldn't it be better to do like replication
> slot stats, where we use the dshash and a key based on the procnum of
> each backend or auxiliary process (ProcNumber in procnumber.h)?  If at
> restart max_connections is lower than what was previously used, we
> could discard entries that would not fit anymore into the charts.
> This is probably not something that happens often, so I'm not really
> worried about forcing the removal of these stats depending on how the
> upper-bound of ProcNumber evolves.

Yeah, I'll look at implementing the dshash based on their procnum and see
where it goes.

> So, using a new kind of ID and making this kind variable-numbered may
> ease the implementation quite a bit, while avoiding any extensibility
> issues with the shmem portion of the patch if these are
> fixed-numbered.

Agree.

> This would rely on the fact that we would use the ProcNumber for the
> dshash key, and this information is not provided in pg_stat_activity.
> Perhaps we should add this information in pg_stat_activity so as it
> would be easily possible to do joins with a SQL function that returns
> a SRF with all the stats associated with a given connection slot
> (auxiliary or backend process)?

I'm not sure that's needed. What has been done in the previous versions is
to get the stats based on the pid (see pg_stat_get_backend_io()) where the
procnumber is retrieved with something like 
GetNumberFromPGProc(BackendPidGetProc(pid)).

Do you see an issue with that approach?

> The active PIDs of the live sessions are not stored in the active
> stats, why not? 

That was not needed. We can still retrieve the stats based on the pid thanks
to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having
to actually store the pid in the stats. I think that's fine because the pid
only matters at "display" time (pg_stat_get_backend_io()).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Commit fest 2024-09

2024-10-07 Thread Daniel Gustafsson

> On 7 Oct 2024, at 09:47, Michael Paquier  wrote:

> The commit fest 2024-09 should have ended one week ago, so I have been
> taking one step ahead and I have begun cleaning up things.

Thanks for doing this, much appreciated!

--
Daniel Gustafsson

Re: Use heap scan routines directly in vac_update_datfrozenxid()

2024-10-07 Thread Alvaro Herrera

On 2024-Oct-06, Tom Lane wrote:

> Soumyadeep Chakraborty  writes:
> > Attached is a simple patch to directly use heap scan routines in
> > vac_update_datfrozenxid(), avoiding the multilayer overhead from the
> > sysscan infrastructure.

Though if there's anybody with a Postgres fork using catalog tables that
aren't heapam, then they aren't going to be happy with this change.  (I
remember Tom commenting that Salesforce used to do that, I wonder if
they still do.)

> I would think the overhead of that is minuscule.  If it isn't,
> we should try to make it so, not randomly hack up some of the callers
> to avoid it.  The intention certainly was that it wouldn't cost
> anything compared to what happens within the actual table access.

I suspect the problem is not the systable layer per se, but the fact
that it has to go through the table AM interface.  So by replacing
systable with the heap routines, it's actually _two_ layers of
indirection that are being skipped over.  systable seems indeed quite
lean, or at least it was up to postgres 11 ... but it's not clear to me
that it continues to be.

The table AM API is heavily centered on slots, and I think having to
build heap tuples from slots might be slow.  I wonder if it would be
better to add "systable_getnext_slot" which returns a slot and omit the
conversion to heaptuple.   Callers for which it's significant could skip
that conversion by dealing with a slot instead.  Converting just one or
two critical spots (such as vac_update_datfrozenxid, maybe
pg_publication.c) should be easy enough.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)

Re: Add contrib/pg_logicalsnapinspect

2024-10-07 Thread Bertrand Drouvot

Hi,

On Wed, Sep 25, 2024 at 05:04:18PM -0700, Masahiko Sawada wrote:
> I've reviewed v10 patch and here are some comments:

Thanks for looking at it!

> 
>  +static void
>  +ValidateAndRestoreSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk,
>  +  const char *path)
> 
> This function and SnapBuildRestore() have duplicate codes. Can we have
> a common function that reads the snapshot data from disk to
> SnapBuildOnDisk, and have both ValidateAndRestoreSnapshotFile() and
> SnapBuildRestore() call it?

Right. I had in mind to remove the duplicate code while proposing the "in core"
functions version of this patch, see [1]. Now that snapbuild_internal.h is 
there,
I do not see any reason not to remove the duplicate code.

That's done in v11 attached: ValidateAndRestoreSnapshotFile() has been modified
a bit and can now be used in the new module and in SnapBuildRestore().

Bonus points, as compared to v10, it allows to:

1. move the SnapBuildRestoreContents() declaration back from 
snapbuild_internal.h
to its original location (snapbuild.c)
2. same as above for the SnapBuildOnDiskConstantSize, 
SnapBuildOnDiskNotChecksummedSize,
SNAPBUILD_MAGIC and SNAPBUILD_VERSION definitions
3. remove the changes in pg_crc32c.h as the extras "PGDLLIMPORT" are not needed
anymore

> ---
> +CREATE FUNCTION pg_get_logical_snapshot_meta(IN in_lsn pg_lsn,
> (snip)
> +CREATE FUNCTION pg_get_logical_snapshot_info(IN in_lsn pg_lsn,
> 
> Is there any reason why both functions take a pg_lsn value as an
> argument? Given that the main usage would be to use these functions
> with pg_ls_logicalsnapdir(), I think it would be easier for users if
> these functions take a filename as a function argument. That way, we
> can use these functions like:
> 
> select info.* from pg_ls_logicalsnapdir(),
> pg_get_logical_snapshot_info(name) as info;

I think that makes sense. It also simplfies the tests and the examples, done
that way in v11.

> 
> If there are use cases where specifying a LSN is easier, I think we
> would have both types.
> 
> 
> +static const char *const SnapBuildStateDesc[] = {
> +[SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start",
> +[SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building",
> +[SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full",
> +[SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent",
> +};
> 
> I think that it'd be better to have a dedicated function that returns
> a string representation of the given state by using a switch
> statement. That way, we don't need SNAPBUILD_STATE_INCR and a compiler
> warning would help realize a missing state if a new state is
> introduced in the future.

Yeah, that sounds reasonable. Done in v11: the switch does not handle "default"
so that [-Wswitch] would report a warning in case of enumeration value not
handled in the switch (v11 keeps a remark on top of the SnapBuildState enum
definition though).

> It needs a function call but I believe it won't be a problem in this use case.

Agree.


[1]: 
https://www.postgresql.org/message-id/ZtGKi5FdW%2Bky%2B0fV%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 52fbead00e079132881ec51a3f913cd784ccd356 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 14 Aug 2024 08:46:05 +
Subject: [PATCH v11] Add contrib/pg_logicalinspect

Provides SQL functions that allow to inspect logical decoding components.

It currently allows to inspect the contents of serialized logical snapshots of
a running database cluster, which is useful for debugging or educational
purposes.
---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_logicalinspect/.gitignore  |   4 +
 contrib/pg_logicalinspect/Makefile|  31 ++
 .../expected/logical_inspect.out  |  52 
 contrib/pg_logicalinspect/logicalinspect.conf |   1 +
 contrib/pg_logicalinspect/meson.build |  39 +++
 .../pg_logicalinspect--1.0.sql|  43 +++
 contrib/pg_logicalinspect/pg_logicalinspect.c | 199 +
 .../pg_logicalinspect.control |   5 +
 .../specs/logical_inspect.spec|  34 +++
 doc/src/sgml/contrib.sgml |   1 +
 doc/src/sgml/filelist.sgml|   1 +
 doc/src/sgml/pglogicalinspect.sgml| 150 ++
 src/backend/replication/logical/snapbuild.c   | 279 --
 src/include/replication/snapbuild.h   |   6 +-
 src/include/replication/snapbuild_internal.h  | 197 +
 17 files changed, 823 insertions(+), 221 deletions(-)
   7.5% contrib/pg_logicalinspect/expected/
   5.2% contrib/pg_logicalinspect/specs/
  27.9% contrib/pg_logicalinspect/
  14.2% doc/src/sgml/
  25.0% src/backend/replication/logical/
  19.7% src/include/replication/

diff

Re: System views for versions reporting

2024-10-07 Thread Dmitry Dolgov

> On Sun, Oct 06, 2024 at 12:01:29PM GMT, Joe Conway wrote:
> On 10/6/24 11:36, Dmitry Dolgov wrote:
> > Hi,
> >
> > Based on the feedback in [1], here is my attempt at implementing system
> > views for versions reporting. It adds pg_system_versions for showing
> > things like core version, compiler, LLVM, etc, and pg_system_libraries
> > for showing linked shared objects. I think everyone has ageed that the
> > first was a good idea, where the second was only suggested -- after some
> > thinking I find shared obects useful enough to include here as well.
> >
> > The main idea is to facilitate bug reporting. In particular, in many JIT
> > related bug reports one of the first questions is often "which LLVM
> > version is used?". Gathering such information is a manual process,
> > mistakes could be made when veryfing llvm-config output or anywhere
> > else. Having a system view for such purposes makes the process a bit
> > more robust.
> >
> > The first three patches are essential for this purpose, the fourth one
> > is somewhat independent and could be concidered in isolation. The output
> > looks like this :
> >
> > =# select * from pg_system_versions;
> >name   |   version| type
> > --+--+--
> >  Arch | x86_64-linux | Compile Time
> >  ICU  | 15.1 | Compile Time
> >  Core | 18devel  | Compile Time
> >  Compiler | gcc-14.0.1   | Compile Time
> >  LLVM | 18.1.6   | Run Time
>
> I'm not sure why ICU is "Compile Time" rather than "Run Time" when it is not
> statically linked.

It reports U_UNICODE_VERSION at compile time. It's not necessarily
correct though, I can try to replace it with the runtime version. I
think there was some ICU functionality (something like
u_getUnicodeVersion), which is maybe a better fit.

> Also, if we are going to include ICU here, shouldn't we
> also include libc version?

Yeah, why not. One of my goals here is to identify a balanced set of
useful versions to report.

> > =# select * from pg_system_libraries;
> > name
> > -
> >  /lib64/libkrb5.so.3
> >  /lib64/libz.so.1
> >  linux-vdso.so.1
> >  /lib64/libxml2.so.2
> >  [...]
> >
> > Any feedback is appreciated.
>
> I think it would be nice to include a sha256 hash (or something similar) of
> the libraries as well, so that they can be checked against known good
> values.

I was thinking about getting more info to show in this view, but haven't
found any reasonable way to achieve that. So I would agree with Tom on
that.

Re: Make default subscription streaming option as Parallel

2024-10-07 Thread vignesh C

On Mon, 7 Oct 2024 at 12:26, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> > One key point to consider is that the lock on transaction objects will
> > be held for a longer duration when using streaming in parallel. This
> > occurs because the parallel apply worker initiates the transaction as
> > soon as streaming begins, maintaining the lock until the transaction
> > is fully completed. As a result, for long-running transactions, this
> > extended lock can hinder concurrent access that requires a lock.
>
> So, the current default is the most conservative and can run anywhere; your
> proposal is a bit optimistic, right?

Yes, that is correct.

> One comment for your patch;
> Shouldn't we add a streaming=off case for pg_dump test? You added lines but 
> no one
> passes it.
>

Update existing pg_dump tests to cover the streaming options, the
attached patch has the changes for the same.

Regards,
Vignesh


v2-0001-Make-default-value-for-susbcription-streaming-opt.patch
Description: Binary data

Re: Add new COPY option REJECT_LIMIT

2024-10-07 Thread torikoshia


Thanks for your review!

On Thu, Oct 3, 2024 at 4:27 PM jian he  
wrote:



mentioning maxerror is a
bigint type
or explicitly mentioning the maximum allowed value of "maxerror" would 
be great.


Added a description that it allows positive bigint.


On Thu, Oct 3, 2024 at 11:43 PM Fujii Masao 
 wrote:



+   if (opts_out->reject_limit && !opts_out->on_error)
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+   /*- translator: first and second %s are the names of 
COPY
+* option, e.g. ON_ERROR, third is the value of the 
COPY option,

+* e.g. IGNORE */
+errmsg("COPY %s requires %s to be set 
to %s",
+"REJECT_LIMIT", 
"ON_ERROR", "IGNORE")));


This check might be better moved to other place, for example at
the bottom of ProcessCopyOptions(). I noticed a comment in the code:
"Check for incompatible options (must do these two before inserting 
defaults)."
You added your condition after this comment and before inserting the 
defaults,
but it's not related to that process. So, moving this check to other 
place

seems more appropriate.


Agreed. Moved it to the bottom of ProcessCopyOptions().


While reviewing, I also noticed that the check for
"opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP"
is similarly placed before setting the defaults, which might not
be correct. This check should probably be moved as well.
Additionally, the comment mentioning "must do these two" should be
updated to "must do these three." These changes should be handled
in a separate patch.


Agreed and attached 0002 patch.


+   if (cstate->opts.reject_limit &&

Wouldn't it be clearer to use cstate->opts.reject_limit > 0 for better 
readability?


Agreed.

+   cstate->num_errors > 
cstate->opts.reject_limit)

+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+errmsg("skipped more 
than REJECT_LIMIT rows: \"%lld\",",
+   (long 
long) cstate->opts.reject_limit)));


To make the error message clearer, similar to the NOTICE about
skipped rows, how about rewording it to:
"skipped more than REJECT_LIMIT (%lld) rows due to data type 
incompatibility"?


Agreed.
Also considering when REJECT_LIMIT is specified to 1, attached patch 
uses errmsg_plural() instead of errmsg.



--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 360e170b8aee378c0f498d8899d9f3b5d41d0310 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 7 Oct 2024 19:57:51 +0900
Subject: [PATCH v7 1/2] Subject: Add REJECT_LIMIT option to COPY command.

9e2d870 enabled the COPY FROM command to skip soft errors, but there
was no limitation about the number of errors and skipped all the soft
errors.
In some cases there would be a limitation how many errors are
tolerable. This patch introduces REJECT_LIMIT to specify how many
errors can be skipped.
---
 doc/src/sgml/ref/copy.sgml  | 19 +
 src/backend/commands/copy.c | 33 +
 src/backend/commands/copyfrom.c |  9 
 src/include/commands/copy.h |  1 +
 src/test/regress/expected/copy2.out | 10 +
 src/test/regress/sql/copy2.sql  | 21 ++
 6 files changed, 93 insertions(+)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index b9413d4892..59a17d7793 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
 ON_ERROR error_action
+REJECT_LIMIT maxerror
 ENCODING 'encoding_name'
 LOG_VERBOSITY verbosity
 
@@ -413,6 +414,24 @@ COPY { table_name [ ( 

 
+   
+REJECT_LIMIT
+
+ 
+  Specifies the maximum number of errors tolerated while converting a
+  column's input value to its data type, when ON_ERROR is
+  set to ignore.
+  If the input causes more errors than the specified value, the COPY
+  command fails, even with ON_ERROR set to ignore.
+  This clause must be used with ON_ERROR=ignore
+  and maxerror must be positive bigint.
+  If not specified, ON_ERROR=ignore
+  allows an unlimited number of errors, meaning COPY will
+  skip all erroneous data.
+ 
+
+   
+

 ENCODING
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 03eb7a4eba..befab92074 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -418,6 +418,23 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 	return COPY_ON_ERROR_STOP;	/* keep compiler quiet */
 }
 
+/*
+ * Extract REJECT_LIMIT value from a Def

Re: Doc: typo in config.sgml

2024-10-07 Thread Yugo NAGATA

On Tue, 1 Oct 2024 22:20:55 +0900
Yugo Nagata  wrote:

> On Tue, 1 Oct 2024 15:16:52 +0900
> Yugo NAGATA  wrote:
> 
> > On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
> > Tatsuo Ishii  wrote:
> > 
> > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
> > > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
> > > >> point in Unicode. i.e. U+00A0).
> > > >> So grep -P "[\xC2\xA0]" should work to detect nbsp.
> > > > 
> > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. 
> > > > ([ and ] were not necessary.)
> > > > 
> > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters 
> > > > in charset.sgml,
> > > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for 
> > > > making sure detecting
> > > > nbsp.
> > > > 
> > > > One problem is that -P option can be used in only GNU grep, and grep in 
> > > > mac doesn't support it.
> > > > 
> > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can 
> > > > assume the shell is bash.
> > > > 
> > > > Maybe, better way is use perl itself rather than grep as following.
> > > > 
> > > >  `perl -ne '/\xC2\xA0/ and print' `
> > > > 
> > > > I attached a patch fixed in this way.
> > > 
> > > GNU sed can also be used without setting LC_ALL:
> > > 
> > > sed -n /"\xC2\xA0"/p
> > > 
> > > However I am not sure if non-GNU sed can do this too...
> > 
> > Although I've not check it myself, BSD sed doesn't support \x escape 
> > according to [1].
> > 
> > [1] 
> > https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference
> > 
> > By the way, I've attached a patch a bit modified to use the plural form 
> > statement
> > as same as check-tabs.
> > 
> >  Non-breaking **spaces** appear in SGML/XML files
> 
> The previous patch was broken because the perl command failed to return the 
> correct result.
> I've attached an updated patch to fix the return value. In passing, I added 
> line breaks
> for long lines.

I've attached a updated patch. 
I added the comment to explain why Perl is used instead of grep or sed.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 9c9bbfe375..65ed32cd0a 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -194,7 +194,7 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALLSGML) check-tabs
+check: postgres.sgml $(ALLSGML) check-tabs check-nbsp
 	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
 
 
@@ -257,7 +257,15 @@ endif # sqlmansectnum != 7
 
 # tabs are harmless, but it is best to avoid them in SGML files
 check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
+	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
+	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
+
+# Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
+# Use perl command because non-GNU grep or sed could not have hex escape sequence.
+check-nbsp:
+	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
+	  $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
+	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
 
 ##
 ## Clean

Re: Psql meta-command conninfo+

2024-10-07 Thread David G. Johnston

On Sun, Oct 6, 2024 at 11:17 PM Hunaid Sohail  wrote:

>
> PQpass - no need
>

I would include this as presence/absence.

I concur on all of the rest.

>
> For PQparameterStatus, some parameters are already used.
> server_version and application_name were already discussed and removed in
> v12 and v29 respectively. Do we need other parameters?
>

Ok, I'll need to go read the reasoning for why they are deemed unneeded and
form an opinion one way or the other.

>
>> Within that framework having \conninfo[+[CSE][…]] be the command -
>> printing out only the table specified would be the behavior (specifying no
>> suffix letters prints all three) - would be an option.
>>
>
> 3 separate tables without suffix?
>

Yes, the tables need headers specific to their categories.

I do like the idea of having 4 though, placing settings into their own.
Premised on having all or most of the available parameters being on the
table.  If it only ends up being a few of them then keeping those in
the status table makes sense.

David J.

>

Re: Converting tab-complete.c's else-if chain to a switch

2024-10-07 Thread Tom Lane

Thomas Munro  writes:
> On Mon, Oct 7, 2024 at 12:11 PM Tom Lane  wrote:
>> Hmm, I should think that if you break anything in tab-complete.in.c,
>> the breakage would propagate to tab-complete.c without difficulty.
>> Do you have an example of something that the preprocessor would mask?

> Fair point, and nope.  Thought withdrawn.

Pushed, then.

The original motivation for all this was to try to get tab completion
working in MSVC builds.  I'm not planning to do anything more towards
that myself, but maybe somebody else would like to have a go at it.

regards, tom lane

Re: POC, WIP: OR-clause support for indexes

On Mon, Oct 7, 2024 at 12:02 PM Tom Lane  wrote:
> Peter Geoghegan  writes:
> > To be clear, I don't think that it's essential that we have equivalent
> > behavior in those cases where the patch applies its transformations. I
> > have no objections to committing the patch without any handling for
> > that.
>
> Oy.  I don't agree with that *at all*.  An "optimization" that changes
> query semantics is going to be widely seen as a bug.

I think that you must have misinterpreted what I meant by "equivalent
behavior". The context was important. I really meant: "Ideally, the
patch's transformations would produce an equivalent execution strategy
to what we already get in when IN() is used directly, *even in the
presence of constants of mixed though related types*. Ideally, the
final patch would somehow be able to generate a SAOP with one array of
the same common type in cases where an analogous IN() query can do the
same. But I'm not going to insist on adding something for that now."

Importantly, I meant equivalent outcome in terms of execution
strategy, across similar queries where the patch sometimes succeeds in
generating a SAOP, and sometimes fails -- I wasn't trying to say
anything about query semantics. This wasn't intended to be a rigorous
argument (if it was then I'd have explained why my detailed and
rigorous proposal didn't break query semantics).

-- 
Peter Geoghegan

Re: POC, WIP: OR-clause support for indexes

On Mon, Oct 7, 2024 at 12:02 PM Tom Lane  wrote:
> Peter Geoghegan  writes:
> > To be clear, I don't think that it's essential that we have equivalent
> > behavior in those cases where the patch applies its transformations. I
> > have no objections to committing the patch without any handling for
> > that.
>
> Oy.  I don't agree with that *at all*.  An "optimization" that changes
> query semantics is going to be widely seen as a bug.

I think everyone agrees on that. The issue is that I don't know how to
implement the optimization Peter wants without changing the query
semantics, and it seems like Alexander doesn't either. By committing
the patch without that optimization, we're *avoiding* changing the
query semantics.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: On disable_cost

2024-10-07 Thread Laurenz Albe

On Mon, 2024-10-07 at 11:14 -0400, Robert Haas wrote:
> I accept that my commit created this problem and I'm
> certainly willing to be involved too if we need to sort out more
> things.

Thanks you.  I think it is great that disabled nodes are now handled
better, so I appreciate the change as such.  But I had to focus on
the one fly in the ointment; you know how it is...

Yours,
Laurenz Albe

Re: Should rolpassword be toastable?

Committed with the limit set to 512 bytes.  We have plenty of time to
adjust this limit as needed before it takes effect in v18.

-- 
nathan

Re: POC, WIP: OR-clause support for indexes

2024-10-07 Thread Tom Lane

Peter Geoghegan  writes:
> To be clear, I don't think that it's essential that we have equivalent
> behavior in those cases where the patch applies its transformations. I
> have no objections to committing the patch without any handling for
> that.

Oy.  I don't agree with that *at all*.  An "optimization" that changes
query semantics is going to be widely seen as a bug.

regards, tom lane

Re: On disable_cost

2024-10-07 Thread Laurenz Albe

On Mon, 2024-10-07 at 10:17 +0300, Alena Rybakina wrote:
> > diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
> > index ff689b65245..db906841472 100644
> > --- a/doc/src/sgml/perform.sgml
> > +++ b/doc/src/sgml/perform.sgml
> > @@ -578,6 +578,28 @@ WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
> >  discussed below.
> > 
> > 
> > +   
> > +Some plan node types cannot be completely disabled.  For example, 
> > there is
> > +no other access method than a sequential scan for a table with no 
> > index.
> > +If you told the planner to disregard a certain node type, but it is 
> > forced
> > +to use it nonetheless, you will see the plan node marked as
> > +Disabled in the output of EXPLAIN:
> > +
> > +
> > +CREATE TABLE dummy (t text);
> > +
> > +SET enable_seqscan = off;
> > +
> > +EXPLAIN SELECT * FROM dummy;
> > +
> > +QUERY PLAN
> > +--
> > + Seq Scan on dummy  (cost=0.00..23.60 rows=1360 width=32)
> > +   Disabled: true
> > +
> > +
> > +   
> > +
> > 
> >  
> >   subplan
> 
> I think this is not entirely correct. I tested last version of the
> patch [0]: I created a table and disabled sequential scanning, so
> there were no other options for optimizer to scan table t1. it still
> displayed that it has disabled nodes.

Isn't that exactly what my doc patch shows?

> However you are right that this display will not appear for all
> nodes that only contain a data collection procedure, such as Append,
> MergeAppend, Gather, etc. And I agree with you that we should
> information about it. I also think it’s worth adding additional
> information that this option does not appear in the postgres_fdw
> extension.

I cannot quite follow that either...

Yours,
Laurenz Albe

Re: bt Scankey in another contradictory case

On Fri, Aug 30, 2024 at 10:32 AM Peter Geoghegan  wrote:
> It doesn't make a huge difference in practice, because we'll still end
> the scan once the leaf level is reached. But it matters more when
> array keys are involved, where there might be more than one descent to
> the leaf level. Plus we might as well just be thorough about this
> stuff.

Was your "explain (analyze, buffers) select * from c_s where x >4000
and y >10 and y <10 order by x desc" example intended to illustrate
that these earlier remarks of mine about the problem not being so bad
aren't always correct? With your patch, we can detect contradictory quals
regardless of their required-ness. There will be some cases where (with your
patch) we'll now avoid a very inefficient full index scan -- contrary to what I
said about it back on August 30.

If that is what you meant, then I accept your argument. I didn't quite
get your point before, but this is a logical, useful argument. (You
didn't really need to convince me, but this argument still helps your
patch.)

--
Peter Geoghegan

Re: POC, WIP: OR-clause support for indexes

On Mon, Oct 7, 2024 at 12:02 PM Tom Lane  wrote:
> Oy.  I don't agree with that *at all*.  An "optimization" that changes
> query semantics is going to be widely seen as a bug.

I don't believe that I said otherwise?

It's just rather unclear what query semantics really mean here, in
detail. At least to me. But it's obvious that (for example) it would
not be acceptable if a cast were to visibly fail, where that hadn't
happened before.

-- 
Peter Geoghegan

Re: On disable_cost

2024-10-07 Thread Alvaro Herrera

On 2024-Oct-03, Robert Haas wrote:

> One general thing to think about is that we really document very
> little about EXPLAIN. That might not be good, but we should consider
> whether it will look strange if we document a bunch of stuff about
> this and still don't talk about anything else.

I completely agree that we document very little about EXPLAIN.  However,
I disagree that we should continue to do so.  I'd rather take the
opportunity to _add_ more details that we currently omit, and make the
documentation more complete.  A short blurb about Disabled Nodes such as
the one Laurenz proposed seems an excellent way to start; we can add
more later, as people propose them.  We don't have to stop here, and we
don't have to stay at statu quo re. other points.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Re: On disable_cost

On Mon, Oct 7, 2024 at 11:28 AM Alvaro Herrera  wrote:
> On 2024-Oct-03, Robert Haas wrote:
> > One general thing to think about is that we really document very
> > little about EXPLAIN. That might not be good, but we should consider
> > whether it will look strange if we document a bunch of stuff about
> > this and still don't talk about anything else.
>
> I completely agree that we document very little about EXPLAIN.  However,
> I disagree that we should continue to do so.  I'd rather take the
> opportunity to _add_ more details that we currently omit, and make the
> documentation more complete.  A short blurb about Disabled Nodes such as
> the one Laurenz proposed seems an excellent way to start; we can add
> more later, as people propose them.  We don't have to stop here, and we
> don't have to stay at statu quo re. other points.

Sure, that all makes sense. I was just raising it as a point to consider.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Re: bt Scankey in another contradictory case

On Sun, Sep 1, 2024 at 11:44 AM bigbro...@hotmail.com
 wrote:
> I have reanalysed the code of function _bt_first. I notice that using a 
> multi-attribute index
> if we can't identify the starting boundaries and the following attributes 
> markded not required ,
> that means we need start at first or last page in the index to examine every 
> tuple to satisfy the
> qual or not, in the meantime the scan will be stopped while the first 
> attribute evaluated failed.

I'm not sure of what it is that you're trying to draw attention to.

The behavior of _bt_first doesn't seem relevant to this patch to me.
For one thing, _bt_first doesn't actually care about whether or not a
scan key is marked required -- _bt_first independently applies its own
similar rules to determine which scan keys can be used in the
insertion scan key used to find an initial position on the leaf level.
More importantly, whether a scan key is marked required doesn't seem
relevant to this patch (that is relevant to _bt_preprocess_keys, but
doesn't seem relevant to the changes that you propose to make to
_bt_preprocess_keys).

> For instance:
> create table c_s( x int, y int);
> insert into c_s select generate_series(1, 2), generate_series(1, 
> 2);
> create index my_c_s_idx on c_s using btree(x,y);
> explain (analyze, buffers) select * from c_s where x >4000 and y >10 and 
> y <10 order by x desc;

Your patch (which I haven't tested for myself) is based on the
observation that "y > 10 and y < 10" is a contradictory condition.
This is true regardless of any other factor, such as whether we're
able to mark the "y" scan key required.

All that _bt_first has to do is return when it notices "so->qual_ok"
was set to false within _bt_preprocess_keys. It doesn't matter if
there is an "x > 4000", and it doesn't matter if you use a composite
index like this that completely omits a condition on "x".

> What's more, if i change the number 4000 to 1000.
> -
>  Sort  (cost=441.01..441.01 rows=1 width=8) (actual time=2.974..2.975 rows=0 
> loops=1)
>Sort Key: x DESC
>Sort Method: quicksort  Memory: 25kB
>Buffers: shared hit=89
>->  Seq Scan on c_s  (cost=0.00..441.00 rows=1 width=8) (actual 
> time=2.971..2.971 rows=0 loops=1)
>  Filter: ((x > 1000) AND (y > 10) AND (y < 10))
>  Rows Removed by Filter: 2
>  Buffers: shared hit=89
>  Planning:
>Buffers: shared hit=2
>  Planning Time: 0.113 ms
>  Execution Time: 2.990 ms
> (12 rows)
>
> The planner choosed the Seq Scan, and the executor have done the unnecessary 
> jobs 2 times.

It's true that a sequential scan won't ever apply the logic from
_bt_preprocess_keys, nor any similar logic. A sequential scan with
contradictory quals will therefore not be detected in cases where it
would have been detected had we used an index scan with the same
quals. This inconsistency doesn't make much sense; it's just an
implementation deficiency. It's historical.

We've talked about moving the current _bt_preprocess_keys logic to
plan time before. See Tom's remarks at the end of this email:

https://www.postgresql.org/message-id/2587523.1647982...@sss.pgh.pa.us

I think that it would be possible to move _bt_preprocess_keys to plan
time, and to generalize it to work with sequential scans, but that is
a much larger project. It seems out of scope. I think that you should
just focus on the immediate problem of not detecting contradictory
quals that happen to involve (> or >=) and (< or <=) operators. It's a
completely independent problem, and one that is well scoped.

-- 
Peter Geoghegan

Re: Proposal to Enable/Disable Index using ALTER INDEX

On Mon, Sep 23, 2024 at 11:14 AM Peter Eisentraut  wrote:
> I think a better approach would be to make the list of disabled indexes
> a GUC setting, which would then internally have an effect similar to
> enable_indexscan, meaning it would make the listed indexes unattractive
> to the planner.
>
> This seems better than the proposed DDL command, because you'd be able
> to use this per-session, instead of forcing a global state, and even
> unprivileged users could use it.
>
> (I think we have had proposals like this before, but I can't find the
> discussion I'm thinking of right now.)

I feel like a given user could want either one of these things. If
you've discovered that a certain index is causing your production
application to pick the wrong index, disabling it and thereby
affecting all backends is what you want. If you're trying to
experiment with different query plans without changing anything for
other backends, being able to set some session-local state is better.
I don't understand the argument that one of these is categorically
better than the other.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: long-standing data loss bug in initial sync of logical replication

2024-10-07 Thread Shlok Kyal

On Fri, 4 Oct 2024 at 12:52, Shlok Kyal  wrote:
>
> Hi Kuroda-san,
>
> Thanks for reviewing the patch.
> >
> > 1.
> > I feel the name of SnapBuildDistributeNewCatalogSnapshot() should be 
> > updated because it
> > distributes two objects: catalog snapshot and invalidation messages. Do you 
> > have good one
> > in your mind? I considered 
> > "SnapBuildDistributeNewCatalogSnapshotAndInValidations" or
> > "SnapBuildDistributeItems" but seems not good :-(.
>
> I have renamed the function to 'SnapBuildDistributeSnapshotAndInval'. 
> Thoughts?
>
> > 2.
> > Hmm, still, it is overengineering for me to add a new type of invalidation 
> > message
> > only for the publication. According to the ExecRenameStmt() we can 
> > implement an
> > arbitrary rename function like RenameConstraint() and RenameDatabase().
> > Regaring the ALTER PUBLICATION OWNER TO, I feel adding 
> > CacheInvalidateRelcacheAll()
> > and InvalidatePublicationRels() is enough.
>
> I agree with you.
>
> >
> > I attached a PoC which implements above. It could pass tests on my env. 
> > Could you
> > please see it tell me how you think?
>
> I have tested the POC and it is working as expected. The changes look
> fine to me. I have created a patch for the same.
> Currently, we are passing 'PUBLICATION_PART_ALL' as an argument to
> function 'GetPublicationRelations' and
> 'GetAllSchemaPublicationRelations'. Need to check if we can use
> 'PUBLICATION_PART_ROOT' or 'PUBLICATION_PART_LEAF' depending on the
> 'publish_via_partition_root' option. Will test and address this in the
> next version of the patch. For now, I have added a TODO.

I have tested this part. I observed that ,whenever we insert data in a
partition table, the function 'get_rel_sync_entry' is called and a
hash entry is created for the corresponding leaf node relid. So I feel
while invalidating here we can specify 'PUBLICATION_PART_LEAF' . I
have made the corresponding changes 0002 patch.

I have also modified the tests in 0001 patch. These changes are only
related to syntax of writing tests.

Thanks and Regards,
Shlok Kyal


v13-0002-Selective-Invalidation-of-Cache.patch
Description: Binary data


v13-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch
Description: Binary data

Re: Enhance create subscription reference manual

2024-10-07 Thread Yugo NAGATA

On Thu, 03 Oct 2024 12:23:34 +0900 (JST)
Tatsuo Ishii  wrote:

> > parameter in this case (it is an "optional" parameter, though). However,
> > when we refer to the stored catalog value, we should call it an option or
> > a property and calling it parameter is not suitable.
> 
> Not sure. The stored catalog value of a subscription can be changed
> ALTER SUBSCRIPTION. In the ALTER SUBSCRIPTION manual, the placeholders
> for these properties are "parameter". So I think we should use
> "parameter" in this case at least for the stored catalog values of
> subscriptions.
> 
> > If so, I feel that "the failover" in the following statement means
> > the catalog value (or the failover feature itself), so we should not
> > rewrite this to "the failover parameter".
> 
> My conclusion is we should rewrite it as "the failover parameter" for
> the reason above.
> 
> >> To initiate replication, you must manually create the replication slot, 
> >> enable the failover if required, enable the subscription, and refresh the
> >> subscription.
> > 
> > Instead, should we use "failover option"?
> 
> Yes. because "enable the failover" actually means an operation using
> ALTER SUBSCRIPTION IMO.

After reading the above, I think you would prefer "failover parameter" to
"failover option". However, after all, I'm fine with either any way.

If we use "the failover parameter", I would read "enable the failover parameter"
as "enable the failover parameter on executing ALTER SUBSCRIPTION command".
Otherwise in the "failover option" case, I would read "enable the failover 
option" 
as "enable the subscription's failover option by executing ALTER SUBSCRIPTION 
command".

Regards,
Yugo Nagata

> 
> > Or, if it would mean to the failover
> > feature rather than the parameter, is it not proper to add  tag to 
> > this
> > "failover"?
> 
> I don't think so.
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp


-- 
Yugo NAGATA

Re: Set query_id for query contained in utility statement

2024-10-07 Thread jian he

On Mon, Oct 7, 2024 at 1:39 PM Michael Paquier  wrote:
>
> On Fri, Oct 04, 2024 at 08:16:00PM +0800, jian he wrote:
> > about v5 0001
> > select_with_parens:
> > '(' select_no_parens ')'{ $$ = $2; }
> > | '(' select_with_parens ')'{ $$ = $2; }
> > ;
> >
> >
> >  toplevel | calls | query
> > --+---+---
> >  t| 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
> >  t| 0 | SELECT toplevel, calls, query FROM pg_stat_statements+
> >   |   |   ORDER BY query COLLATE "C", toplevel
> >  t| 1 | explain (select $1)
> >  f| 1 | select $1);
> >
> > query "select $1);" looks weird. not sure how to improve it,
> > or this should be the expected behavior?
>
> GOod point, this is confusing.  The point is that having only
> stmt_location is not enough to detect where the element in the query
> you want to track is because it only points at its start location in
> the full query string.  In an ideal world, what we should have is its
> start and end, pass it down to pgss_store(), and store only this
> subquery between the start and end positions in the stats entry.
> Making that right through the parser may be challenging, though.
>

turns out UPDATE/DELETE/MERGE and other utilities stmt cannot have
arbitrary parenthesis with EXPLAIN.

attached patches can solve this specific problem.
(based on v5-0001-Track-location-to-extract-relevant-part-in-nested.patch)

the main gotcha is to add location information for the statement that
is being explained.
typedef struct ExplainStmt
{
NodeTagtype;
Node   *query;/* the query (see comments above) */
List   *options;/* list of DefElem nodes */
ParseLoclocation;/* location of the statement being explained */
} ExplainStmt;

explain select 1;
explain (select 1);
explain (((select 1)));

the above 3 explained select queries will be normalized to one select query.

v5-0001-exposse_explained_stmt_location.no-cfbot
Description: Binary data

Re: [BUG FIX]Connection fails with whitespace after keepalives parameter value

2024-10-07 Thread Fujii Masao





On 2024/10/06 18:35, Michael Paquier wrote:

On Thu, Oct 03, 2024 at 08:12:28PM -0400, Tom Lane wrote:

OK, if there's no objections let's push both remaining patches
to HEAD only.


Done as of f22e84df1dea and 430ce189fc45.


Commit 430ce189fc45 unexpectedly caused psql to report the error
"error: trailing data found" when a connection URI contains
a whitespace, e.g., in a parameter value. For example,
the following command used to work but no longer does after this commit:

$ psql -d "postgresql://localhost:5432/postgres?application_name=a b"

I'm not sure if this URI format is valid (according to RFC 3986), though.


+   for (const char *s = q; *s == ' '; s++)
+   {
+   q++;
+   continue;
+   }

Is the "continue" really necessary? Also could we simplify it like this?

for (; *q == ' '; q++);

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

[WIP]Vertical Clustered Index (columnar store extension) - take2

2024-10-07 Thread Aya Iwata (Fujitsu)

Hi All,

Suggestions
==

When analyzing real-time data collected by PostgreSQL,
it can be difficult to tune the current PostgreSQL server for satisfactory
performance.
Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory column
store function that holds data in a state suitable for business analysis and is
also expected to improve analysis performance.
With VCI, you can also expect to run analysis 7.8 times faster. This is
achieved by the analytics engine, which optimizes parallel processing of
column-oriented data, in addition to the fact that VCI stores data in a
columnar format, enabling efficient retrieval of the columns needed for
analysis.

Similar Features

One column store feature available with postgres is Citus Columnar Table.
If you introduces the citus extension, which allows columnar tables to be used
using the columnar access method.
This function is intended to analyze the accumulated data. Therefore, you
cannot update or delete data.
VCI supports data updates and deletions. This enables you to analyze not only
the accumulated data but also the data that occurs in real time.

Implementing VCI

To introduce an updatable column store, we explain how updates to row-oriented
data are propagated to column-oriented data.

VCI has two storage areas.

- Write Optimized Storage (WOS)
- Read Optimized Storage (ROS)

Describes WOS.
The WOS stores data for all columns in the VCI in a row-oriented format.
All newly added data is stored in the WOS relation along with the transaction
information.
Using WOS to delete and update newly added data has no significant performance
impact compared to deleting from columnar storage.

ROS is the storage area where all column data is stored.

When inserting/updating/deleting, data is written synchronously to WOS. It does
not compress or index the data.
This avoids the overhead of converting to a columnar while updating the data.
After a certain amount of data accumulates in the WOS, the ROS control daemon
converts it to column data asynchronously with updates.
Column data transformation compresses and indexes the data and writes it to ROS.

Describes searching for data.
Since there are two storage formats, the SELECT process needs to convert the
WOS data to local ROS to determine whether it is visible or invisible. This
conversion cost depends on the number of tuples present in the WOS file. This
may introduce some performance overhead.
Obtain search results by referencing the local ROS and referencing the ROS in
parallel.

These implementation ideas are also posted on Fujitsu's blog for your
reference. [1]

Past discussions
===

We've proposed features before. [2]
This thread also explains the details, so please check it.

In a previous thread, we suggested implementing a modification to the
PostgreSQL backend code.
Based on the FB we received at that time, we think we need to re-implement this
feature in pluggable storage using the table access method API.
I also got a FB of the features I needed from a PostgreSQLPro member. We
believe it is necessary to deal with these issues in stages.
- Need to provide vector processing for nodes (filter, grand aggregate,
aggregation with group by...) to speed up computation
- Requires parallel processing support such as scanning

It is assumed that the re-implementation will also require additional
functionality to the current Table Access Method API.
It is useful not only for VCI but also for other access methods.
Therefore, we decided to propose the VCI feature to the community and proceed
with development.

Request matter
===

Are members of the PostgreSQL hackers interested in VCI features?
We welcome your comments and suggestions on this feature.
In particular, if you have any questions, required features, or
implementations, please let me know.

[1]
https://www.postgresql.fastware.com/blog/improve-data-analysis-performance-without-impacting-business-transactions-with-vertical-clustered-index

[2]https://www.postgresql.org/message-id/cajrrpgfac7wc9nk6ptty6yn-nn+hcy8xolah2doyhvg5d6h...@mail.gmail.com

Regards,
Aya Iwata

FUJITSU LIMITED

Re: On disable_cost

On Sat, Oct 5, 2024 at 3:35 PM Tom Lane  wrote:
> BTW, getting off the question of EXPLAIN output for a moment,
> I don't understand why disable_cost is still a thing.  The
> one remaining usage seems trivial to replace, as attached.

I believe I commented on that somewhere upthread, but maybe I meant to
and didn't or maybe you didn't see it in the flurry of emails.
Basically, I wasn't confident that it made sense to treat this as the
same kind of thing as other cases where we increment disabled_nodes.
Because I couldn't make up my mind what to do and didn't get clear
feedback from anybody else, I did nothing.

The thing is, if somebody says enable_mergejoin=false, presumably they
REALLY, REALLY don't want a merge join. If we start using that same
mechanism for other purposes -- like making sure that a hash join
doesn't overrun work_mem -- then the user might get a merge join
anyway because we've represented a hash join that is big, but not
disabled, in the same way that we represent as merge join that is
actually disabled. I'm pretty uncomfortable with that. Sure, the user
probably doesn't want us to overrun work_mem either, but when push
comes to shove, shouldn't a very explicit user instruction like "don't
use a merge join, l don't want that!" take precedence over any sort of
planner estimate? Estimates can be wrong, and the user is in charge.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: On disable_cost

2024-10-07 Thread David G. Johnston

On Fri, Oct 4, 2024 at 10:37 PM David Rowley  wrote:

> I'd encourage anyone else on the sidelines who has an opinion on how
> to display the disabled-ness of a plan node in EXPLAIN to speak up
> now, even if it's just a +1 to something someone has already written.
> It would be nice to see what more people think.
>
>
 As a DBA when I set one or more of the enable_* settings to false, and
explain a query, I need to know:

1, that the plan shown to me is constrained,
2, which constraints are in place, and
3, which constraints were violated.

The Settings option to Explain fulfills my second need.  It is not a
precise solution nor is it automatic.  Because of these two things it
really doesn't qualify as fulfilling the first need.

To fulfill the first need I would want to see a data block containing the
following information:
How many (>= 1) enable_* settings are set to false.  This is the bare
requirement, but we can also include a count of how many violations exist,
thus aggregating the count of the third need. This information is not
specific to any node and thus should be displayed outside of the execution
tree, the specific choice consistent with the output format under
consideration.

The detail for the third need, violations, is tied to specific executor
nodes.  The information provided here should inform me as to which specific
setting was violated as well as, if possible, why. This is effectively
three pieces of information:  "Disabled: * (footnote)"  The word disabled
is the indicator that this node type was requested to not be included in
the query plan.  The * tells me exactly which of the disabled settings is
at play here, reducing the cognitive burden of memorizing node types to
settings.  The footnote would be a reference into the documentation under
the enable_* setting that explains why this node is appearing in the query
plan even though I explicitly asked for it to be excluded.  In a verbose
output (add a new violations option for this) it would even be possible to
save the trip to the documentation by adding the footnote text to the
explain output.

Now, existing proposals include another piece of data - for every node
calculate how many violations occur in its tree (inclusive).  I'm not
understanding the motivation for this data.  Knowing which nodes are
violations seems like it is enough.  I could always count, and processing
tools could add this aggregate to their displays, but the count itself only
seems useful at the scope of the entire query plan.  And possibly sub-plans.

So, by way of example:

set enable_sort=0;
explain (costs off, settings, violations) select * from lp order by a;

 Append
   ->  Index Only Scan using lp1_a_idx on lp1 lp_1
   ->  Sort
 Disabled: Sort (A)
 Sort Key: lp_2.a
 ->  Seq Scan on lp2 lp_2

Disabled Planner Settings: 1
Disabled Node Violations: 1
Settings:
 ...
enable_sort = false

Violation Reasons:
Sort (A): The query contains an order by clause over data coming from a
table being sequentially scanned.  This scan's output must be sorted to
fulfill the order by requirement.

I was considering doing a layout like:

Sort (disabled_sort.A) (cost...) (actual...)

but having its own line on those nodes having the violation seems
reasonable.  It should be noticeable when the violations occur and this
does stand out.  The less pronounced format would be more appropriate for
the "Disabled: #" display that would appear on every single node; which I
believe is counter-productive.  Only marking the violation provides the
same amount of detail and allows for the computation of those counts should
the need arise.  As a DBA, though, I do not know how to use that count in a
meaningful way.

In text format we place additional information at the bottom of the query
result.  It is worth considering whether to place information before the
planner tree.  If that is acceptable the two "Disabled Planner %:" counts
should be moved to before the node tree.  This draws immediate attention to
the explain output consumer that this plan is constrained and that other
options, like settings and violations, should be added to the explain
command to show additional details.  But the two counts and the node detail
"Disabled: * (footnote)" will always be visible.

The footnote definitely is its own feature added to increase usability.
I'm expecting it to not be accepted given the current design of explain,
and also it seems quite difficult to get good data out of the planner to
make the display accurate.  But if we tell someone that a setting they
disable is violated they are going to wonder why.

David J.

Re: POC, WIP: OR-clause support for indexes

On Fri, Oct 4, 2024 at 2:40 PM Robert Haas  wrote:
>
> On Fri, Oct 4, 2024 at 2:20 PM Peter Geoghegan  wrote:
> > On Fri, Oct 4, 2024 at 2:00 PM Alexander Korotkov  
> > wrote:
> > > Yes, transformAExprIn() does the work to coerce all the expressions in
> > > the right part to the same type.  Similar logic could be implemented
> > > in match_orclause_to_indexcol().  What worries me is whether it's
> > > quite late stage for this kind of work.  transformAExprIn() works
> > > during parse stage, when we need to to resolve types, operators etc.
> > > And we do that once.
> >
> > I agree that it would be a bit awkward. Especially having spent so
> > much time talking about doing this later on, not during parsing. That
> > doesn't mean that it's necessarily the wrong thing to do, though.
>
> True, but we also can't realistically use select_common_type() here. I
> mean, it thinks that we have a ParseState and that there might be
> values with type UNKNOWNOID floating around. By the time we reach the
> planner, neither thing is true. And honestly, it looks to me like
> that's pointing to a deeper problem with your idea.

OK.

To be clear, I don't think that it's essential that we have equivalent
behavior in those cases where the patch applies its transformations. I
have no objections to committing the patch without any handling for
that. It's an important patch, and I really want it to get into 18 in
a form that everybody can live with.

-- 
Peter Geoghegan

Re: On disable_cost

On Sat, Oct 5, 2024 at 1:37 AM David Rowley  wrote:
> Thanks for explaining your point of view. I've not shifted my opinion
> any, so I guess we just disagree. I feel a strong enough dislike for
> the current EXPLAIN output to feel it's worth working harder to have a
> better output.
>
> I won't push my point any further unless someone else appears
> supporting Laurenz and I. Thank you for working on getting rid of the
> disabled_cost. I think what we have is now much better than before.
> The EXPLAIN output is the only part I dislike about this work.
>
> I'd encourage anyone else on the sidelines who has an opinion on how
> to display the disabled-ness of a plan node in EXPLAIN to speak up
> now, even if it's just a +1 to something someone has already written.
> It would be nice to see what more people think.

I think you have adequate consensus to proceed with this. I'd just ask
that you don't disappear completely if it turns out that there are
problems. I accept that my commit created this problem and I'm
certainly willing to be involved too if we need to sort out more
things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Changing the state of data checksums in a running cluster

2024-10-07 Thread Dagfinn Ilmari Mannsåker

Tomas Vondra  writes:

> 3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345,
>shadowing earlier variable

All the ListCell variables can be eliminated by using the foreach_ptr
and foreach_oid macros instead of plain foreach.

- ilmari

Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.

On Fri, Sep 27, 2024 at 02:10:47PM -0500, Nathan Bossart wrote:
> Alright.  I was able to back-patch it to v12 without too much trouble,
> fortunately.  I'll commit that soon unless anyone else has feedback.

Committed, thanks!

I refrained from introducing INT64_HEX_FORMAT/UINT64_HEX_FORMAT in c.h
because I felt there was a nonzero chance of that causing problems with
third-party code on the back-branches.  We could probably add them for v18,
though.

-- 
nathan

Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.

On Mon, Oct 07, 2024 at 02:00:05PM -0500, Nathan Bossart wrote:
> I refrained from introducing INT64_HEX_FORMAT/UINT64_HEX_FORMAT in c.h
> because I felt there was a nonzero chance of that causing problems with
> third-party code on the back-branches.  We could probably add them for v18,
> though.

Like so...

-- 
nathan
>From 03f7536acd06b91e6891ccfc86470a5a51b45736 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 7 Oct 2024 14:16:07 -0500
Subject: [PATCH 1/1] add INT64_HEX_FORMAT and UINT64_HEX_FORMAT

---
 contrib/postgres_fdw/option.c| 2 +-
 src/backend/utils/error/csvlog.c | 2 +-
 src/backend/utils/error/elog.c   | 4 ++--
 src/backend/utils/error/jsonlog.c| 2 +-
 src/include/c.h  | 2 ++
 src/test/modules/test_radixtree/test_radixtree.c | 2 --
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index d740893918..9a6785720d 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -522,7 +522,7 @@ process_pgfdw_appname(const char *appname)
appendStringInfoString(&buf, application_name);
break;
case 'c':
-   appendStringInfo(&buf, "%" INT64_MODIFIER 
"x.%x", MyStartTime, MyProcPid);
+   appendStringInfo(&buf, INT64_HEX_FORMAT ".%x", 
MyStartTime, MyProcPid);
break;
case 'C':
appendStringInfoString(&buf, cluster_name);
diff --git a/src/backend/utils/error/csvlog.c b/src/backend/utils/error/csvlog.c
index eab8df3fcc..acdffb6d06 100644
--- a/src/backend/utils/error/csvlog.c
+++ b/src/backend/utils/error/csvlog.c
@@ -120,7 +120,7 @@ write_csvlog(ErrorData *edata)
appendStringInfoChar(&buf, ',');
 
/* session id */
-   appendStringInfo(&buf, "%" INT64_MODIFIER "x.%x", MyStartTime, 
MyProcPid);
+   appendStringInfo(&buf, INT64_HEX_FORMAT ".%x", MyStartTime, MyProcPid);
appendStringInfoChar(&buf, ',');
 
/* Line number */
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 987ff98067..c71508c923 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2944,12 +2944,12 @@ log_status_format(StringInfo buf, const char *format, 
ErrorData *edata)
{
charstrfbuf[128];
 
-   snprintf(strfbuf, sizeof(strfbuf) - 1, 
"%" INT64_MODIFIER "x.%x",
+   snprintf(strfbuf, sizeof(strfbuf) - 1, 
INT64_HEX_FORMAT ".%x",
 MyStartTime, 
MyProcPid);
appendStringInfo(buf, "%*s", padding, 
strfbuf);
}
else
-   appendStringInfo(buf, "%" 
INT64_MODIFIER "x.%x", MyStartTime, MyProcPid);
+   appendStringInfo(buf, INT64_HEX_FORMAT 
".%x", MyStartTime, MyProcPid);
break;
case 'p':
if (padding != 0)
diff --git a/src/backend/utils/error/jsonlog.c 
b/src/backend/utils/error/jsonlog.c
index 2c7b14cacb..492383a89e 100644
--- a/src/backend/utils/error/jsonlog.c
+++ b/src/backend/utils/error/jsonlog.c
@@ -168,7 +168,7 @@ write_jsonlog(ErrorData *edata)
}
 
/* Session id */
-   appendJSONKeyValueFmt(&buf, "session_id", true, "%" INT64_MODIFIER 
"x.%x",
+   appendJSONKeyValueFmt(&buf, "session_id", true, INT64_HEX_FORMAT ".%x",
  MyStartTime, MyProcPid);
 
/* Line number */
diff --git a/src/include/c.h b/src/include/c.h
index 3297d89ff0..9520b89b00 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -550,6 +550,8 @@ typedef unsigned long long int uint64;
 /* snprintf format strings to use for 64-bit integers */
 #define INT64_FORMAT "%" INT64_MODIFIER "d"
 #define UINT64_FORMAT "%" INT64_MODIFIER "u"
+#define INT64_HEX_FORMAT "%" INT64_MODIFIER "x"
+#define UINT64_HEX_FORMAT "%" INT64_MODIFIER "x"
 
 /*
  * 128-bit signed and unsigned integers
diff --git a/src/test/modules/test_radixtree/test_radixtree.c 
b/src/test/modules/test_radixtree/test_radixtree.c
index 1d9165a3a2..3e6863f660 100644
--- a/src/test/modules/test_radixtree/test_radixtree.c
+++ b/src/test/modules/test_radixtree/test_radixtree.c
@@ -23,8 +23,6 @@
 /* uncomment to use shared memory for the tree */
 /* #define TEST_SHARED_RT */
 
-#define UINT64_HEX_FORMAT "%" INT64_MODIFIER "X"
-
 /* Convenient macros to test results */
 #define EXPECT_TRUE(expr)  \
do { \
-- 
2.39.5 (Apple Git-154)

Re: pg_upgrade check for invalid databases

On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
> > On 1 Oct 2024, at 00:20, Tom Lane  wrote:
> > 
> > Daniel Gustafsson  writes:
> >>> On 30 Sep 2024, at 16:55, Tom Lane  wrote:
> >>> TBH I'm not finding anything very much wrong with the current
> >>> behavior... this has to be a rare situation, do we need to add
> >>> debatable behavior to make it easier?
> > 
> >> One argument would be to make the checks consistent, pg_upgrade generally 
> >> tries
> >> to report all the offending entries to help the user when fixing the source
> >> database.  Not sure if it's a strong enough argument for carrying code 
> >> which
> >> really shouldn't see much use though.
> > 
> > OK, but the consistency argument would be to just report and fail.
> > I don't think there's a precedent in other pg_upgrade checks for
> > trying to fix problems automatically.
> 
> Correct, sorry for being unclear.  The consistency argument would be to expand
> pg_upgrade to report all invalid databases rather than just the first found;
> attempting to fix problems would be a new behavior.

Yes, historically pg_upgrade will fail if it finds anything unusual,
mostly because what it does normally is already scary enough.  If users
what pg_upgrade to do cleanups, it would be enabled by a separate flag,
or even a new command-line app.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: First draft of PG 17 release notes

On Mon, Sep 30, 2024 at 02:20:21PM +0900, Yugo NAGATA wrote:
> On Sat, 28 Sep 2024 21:19:11 -0400
> Bruce Momjian  wrote:
> 
> > On Thu, Sep 26, 2024 at 02:19:21PM +0900, Yugo Nagata wrote:
> > > On Thu, 9 May 2024 00:03:50 -0400
> > > Bruce Momjian  wrote:
> > > 
> > > > I have committed the first draft of the PG 17 release notes;  you can
> > > > see the results here:
> > > 
> > > I propose to improve the following description in "Migration to Version 
> > > 17"
> > > section by adding CREATE INDEX and CREATE MATERIALIZED VIEW into the 
> > > command list.
> > > 
> > >  
> > >  Change functions to use a safe 
> > >  during maintenance operations (Jeff Davis)
> > >  §
> > >  
> > > 
> > > It is suggested in the thread [1] that users could not notice the 
> > > behaviour
> > > of CREATE INDEX is changed because the explicit command name is not 
> > > listed in
> > > the release notes. So, I think it is better to add CREATE INDEX and
> > > CREATE MATERIALIZED VIEW into the command list. 
> > > 
> > > I've attached a patch.
> > 
> > It this a valid change?  Seems so.
> 
> Yes. This change on CREATE INDEX was introduced by 2af07e2f7 together with
> other commands, but it was missed to be mentioned in the commit message
> although the description was added to the documentation.
> 
> The change on CEATE MATERIALIZED VIEW was introduced by a separate commit
> b4da732fd, since which the REFRESH logic is used when creating a matview.
> Should we add here a link to that commit, too?

I developed the attached patch which adds the two commands and the
commit item.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index ad814737745..7acdee05e42 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -125,6 +125,8 @@
 
 
 
@@ -136,7 +138,9 @@ Author: Jeff Davis 
 
  
  This prevents maintenance operations (ANALYZE,
- CLUSTER, REFRESH
+ CLUSTER, CREATE
+ INDEX, CREATE
+ MATERIALIZED VIEW, REFRESH
  MATERIALIZED VIEW, REINDEX,
  or VACUUM) from performing unsafe access.
  Functions used by expression indexes and materialized views that

Re: [BUG FIX]Connection fails with whitespace after keepalives parameter value

On Tue, Oct 08, 2024 at 01:19:59AM +0900, Fujii Masao wrote:
> Commit 430ce189fc45 unexpectedly caused psql to report the error
> "error: trailing data found" when a connection URI contains
> a whitespace, e.g., in a parameter value. For example,
> the following command used to work but no longer does after this commit:
> 
> $ psql -d "postgresql://localhost:5432/postgres?application_name=a b"
> 
> I'm not sure if this URI format is valid (according to RFC 3986),
> though.

I may be missing something, of course, but I am under the impression
that ' ' is invalid, meaning that you should use "a%20b" here to get
what you want as %20 would would be translated to a space character.

> Is the "continue" really necessary? Also could we simplify it like this?
> 
> for (; *q == ' '; q++);

Sure, it could be changed this way.
--
Michael


signature.asc
Description: PGP signature

Re: Avoiding superfluous buffer locking during nbtree backwards scans

On Mon, Sep 02, 2024 at 01:31:55PM +0200, Matthias van de Meent wrote:
> I noticed I attached an older version of the patch which still had 1
> assertion failure case remaining (thanks cfbot), so here's v3 which
> solves that problem.

Peter G., this is in your area of expertise.  Could you look at what
Matthias has proposed?
--
Michael


signature.asc
Description: PGP signature

Re: Support specify tablespace for each merged/split partition

On Thu, Aug 08, 2024 at 09:47:20AM +0800, Junwang Zhao wrote:
> Thanks for your review.

The SPLIT/MERGE grammar has been reverted in 3890d90c1508, so this
patch concept does not apply anymore.
--
Michael


signature.asc
Description: PGP signature

Re: Commit fest 2024-09

On Mon, Oct 07, 2024 at 04:47:12PM +0900, Michael Paquier wrote:
> The commit fest 2024-09 should have ended one week ago, so I have been
> taking one step ahead and I have begun cleaning up things.  As of now,
> there is still a total of 113 patches marked as "Needs Review", and I
> hope that my non-auto-vacuuming will be over tomorrow.

And the CF is closed.  Final score:
Committed: 90.
Moved to next CF: 202.
Withdrawn: 14.
Rejected: 6.
Returned with Feedback: 54.
Total: 366.

> If you feel that one or more of your patches have been handled in an
> incorrect way, feel free to reply back to this thread, to update your
> patch entry to a better state, or to reply back to me using this
> thread.

The number of entries that had an incorrect state or were left
unanswered was much higher than the last time I've done such cleanup.
Moving CF entries marked as "Needs Review" in block is perhaps not the
best thing, because this contributes to the bloat in the CF entries.

Please double-check your CF entries.
--
Michael

signature.asc
Description: PGP signature

Re: Function for listing pg_wal/summaries directory

2024-10-07 Thread Fujii Masao





On 2024/10/07 23:35, Nathan Bossart wrote:

On Mon, Oct 07, 2024 at 10:07:10AM +0900, Michael Paquier wrote:

On Fri, Oct 04, 2024 at 10:02:11AM -0500, Nathan Bossart wrote:

Could you explain why you feel the existing support functions are
insufficient?


Because it is not possible to outsource the scan of pg_wal/summaries/
to a different role, no?


I was under the impression that you could do this with
pg_available_wal_summaries() [0].

[0] 
https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-WAL-SUMMARY


One benefit of supporting something like pg_ls_summariesdir() is that
it allows us to view the last modification time of each WAL summary file
and estimate when they'll be removed based on wal_summary_keep_time.

Of course, we could also extend the existing function to report
the last modification time if this use case is valid, though.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Expand applicability of aggregate's sortop optimization

On Tue, Oct 01, 2024 at 04:25:59PM +0700, Andrei Lepikhov wrote:
> I have written a sketch patch to implement the idea with aggregate
> prosupport in code - see the attachment.
> My intention was to provide an example, not a ready-to-commit patch.
> As I see, the only problem there lies in the tests - CREATE AGGREGATE
> doesn't allow us to set a prosupport routine, and we need to update the
> pg_proc table manually.

Please note that the CF bot is red.
--
Michael


signature.asc
Description: PGP signature

pgindent fails with perl 5.40

2024-10-07 Thread Erik Wienhold

I get this error when running pgindent with perl 5.40:

Attempt to call undefined import method with arguments ("devnull") via 
package "File::Spec" (Perhaps you forgot to load the package?) at 
src/tools/pgindent/pgindent line 10.
BEGIN failed--compilation aborted at src/tools/pgindent/pgindent line 10.

It definitely worked with perl 5.38 before.  Not sure if something's
wrong on my side.  Is anybody else already using 5.40?  For the moment,
I just changed the File::Spec import to make it work:

diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 48d83bc434..028d057ea4 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -7,7 +7,7 @@ use warnings FATAL => 'all';

 use Cwd qw(abs_path getcwd);
 use File::Find;
-use File::Spec qw(devnull);
+use File::Spec;
 use File::Temp;
 use IO::Handle;
 use Getopt::Long;

-- 
Erik

Re: [PATCH] pg_permissions

On Thu, Jun 13, 2024 at 07:34:30AM +0200, Joel Jacobson wrote:
> Hmm, strange, the commitfest system didn't pick up the email with patch 0006 
> for some reason,
> with message id 0c5a6b79-408c-4910-9b2e-4aa9a7b30...@app.fastmail.com
> 
> It's rebased to latest HEAD, so not sure why.
> 
> Maybe it got confused when I quickly afterwards sent a new email without a 
> patch?
> 
> Here is a new attempt, file content unchanged, just named to 0007 and added 
> "pg_get_acl" to the name.

The CF bot is red for some time now.  Please rebase.
--
Michael


signature.asc
Description: PGP signature

Re: pgindent fails with perl 5.40

2024-10-07 Thread Erik Wienhold

I wrote:
> I get this error when running pgindent with perl 5.40:
> 
> Attempt to call undefined import method with arguments ("devnull") via 
> package "File::Spec" (Perhaps you forgot to load the package?) at 
> src/tools/pgindent/pgindent line 10.
> BEGIN failed--compilation aborted at src/tools/pgindent/pgindent line 10.
> 
> It definitely worked with perl 5.38 before.  Not sure if something's
> wrong on my side.

Ah, it's intentional:
https://metacpan.org/release/HAARG/perl-5.40.0/view/pod/perldelta.pod#Calling-the-import-method-of-an-unknown-package-produces-a-warning

> Calling the import method of an unknown package produces a warning
> [...]
> It will also detect cases where a user passes an argument when using a
> package that does not provide its own import
> [...]

Because we use fatal warnings, pgindent fails.

-- 
Erik

Re: Improve EXPLAIN output for multicolumn B-Tree Index

On Tue, Jul 02, 2024 at 03:44:01AM +, masahiro.ik...@nttdata.com wrote:
> Although I plan to support "Rows Removed by Non Key Filtered"(or "Skip Scan 
> Filtered"),
> I'd like to know whether the current direction is good. One of my concerns is 
> there might
> be a better way to exact quals for boundary conditions in btexplain().

The CF bot is red for some time now, please provide a rebase.
--
Michael


signature.asc
Description: PGP signature

Re: Assorted style changes with a tiny improvement

On Tue, Jul 02, 2024 at 02:39:20PM -0300, Ranier Vilela wrote:
> This is a series of patches to change styles, in assorted sources.
> IMO, this improves a tiny bit and is worth trying.
> 
> 1. Avoid dereference iss_SortSupport if it has nulls.
> 2. Avoid dereference plan_node_id if no dsm area.
> 3. Avoid dereference spill partitions if zero ntuples.
> 4. Avoid possible useless palloc call with zero size.
> 5. Avoid redundant variable initialization in foreign.
> 6. Check the cheap test first in ExecMain.
> 7. Check the cheap test first in pathkeys.
> 8. Delay possible expensive bms_is_empty call in sub_select.
> 9. Reduce some scope in execPartition.
> 10. Reduce some scope for TupleDescAttr array dereference.
> 11. Remove useless duplicate test in ruleutils.
> This is already checked at line 4566.
> 
> 12. Remove useless duplicate test in string_utils.
> This is already checked at line 982.

You have something here, but not everything is worth changing without
a reason to do so, other than code "correctness".  For example,
bms_is_empty() is a NULL comparison, so it does not matter.  I don't
see a point in the three dereference patches, either, as these states
are expected AFAIK so it does not matter.  If we crash, it's actually
an indication that something has gone wrong.  Same comment about the
two remove-useless patches and the two reduce-some-scope.

The point about group_keys_reorder_by_pathkeys() is to be proved; I
doubt it matters.  Same for the ExecBuildSlotValueDescription()
business to check for the acl_result before bms_is_member() does not
really matter performance-wise.

The allocation in execTuplesMatchPrepare() is indeed something that
we'd better avoid, that's minimal but memory that can be avoided is
always less error-prone.  pg_options_to_table() also, is a bit better
this way.  Applied these two, let's move on.
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] Add log_transaction setting

On Wed, Aug 14, 2024 at 09:08:00PM +0300, Сергей Соловьев wrote:
> From b5e779771e8a7582951aff6f43a716e9e5975884 Mon Sep 17 00:00:00 2001
> From: "Sergey Solovev" 
> Date: Thu, 4 Jul 2024 17:02:13 +0300
> Subject: [PATCH] Add log_transaction configuration setting

CF bot is red, please provide a rebase.  And also, perhaps avoid HTML
messages..
--
Michael


signature.asc
Description: PGP signature

Re: Partition-wise join with whole row vars

On Fri, Jul 12, 2024 at 02:39:13PM +0300, Alexander Pyhalov wrote:
> I was looking at enabling partition-wise join with whole row vars. My main
> motivation
> was to enable push down DML with partition-wise join in postgres_fdw. The
> work is
> based on the earlier patches of Ashutosh Bapat [1].

The CF bot is red for some time now.  Please provide a rebase.
--
Michael


signature.asc
Description: PGP signature

Re: Parallel CREATE INDEX for GIN indexes

On Fri, Jul 12, 2024 at 05:34:25PM +0200, Tomas Vondra wrote:
> I got to do the detailed benchmarking on the latest version of the patch
> series, so here's the results. My goal was to better understand the
> impact of each patch individually - especially the two parts introduced
> by Matthias, but not only - so I ran the test on a build with each fo
> the 0001-0009 patches.

_gin_parallel_build_main() is introduced in 0001.  Please make sure to
pass down a query ID.
--
Michael


signature.asc
Description: PGP signature

Re: WIP: parallel GiST index builds

On Tue, Jul 30, 2024 at 11:05:56AM +0200, Tomas Vondra wrote:
> I tried implementing this, see the attached 0002 patch that replaces the
> fake LSN with an atomic counter in shared memory. It seems to work (more
> testing needed), but I can't say I'm very happy with the code :-(

While scanning quickly the code, please be careful about the query ID
that should be passed down to _gist_parallel_build_main()..
--
Michael


signature.asc
Description: PGP signature

RE: long-standing data loss bug in initial sync of logical replication

2024-10-07 Thread Hayato Kuroda (Fujitsu)

Dear Shlok,

> I have tested this part. I observed that ,whenever we insert data in a
> partition table, the function 'get_rel_sync_entry' is called and a
> hash entry is created for the corresponding leaf node relid. So I feel
> while invalidating here we can specify 'PUBLICATION_PART_LEAF' . I
> have made the corresponding changes 0002 patch.

I also verified and it seems true. The root table is a virtual table and actual
changes are recorded in leaf ones. It is same for WAL layer. Logical decoding
obtains info from WAL records so leaf tables are passed to pgoutput layer as
"relation". I.e., I think it is enough to invalidate relcache of leaf.

> I have also modified the tests in 0001 patch. These changes are only
> related to syntax of writing tests.

LGTM. I found small improvements, please find the attached.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



minor_fix.diffs
Description: minor_fix.diffs

Re: Function for listing pg_wal/summaries directory

On Tue, Oct 08, 2024 at 12:41:16PM +0900, Fujii Masao wrote:
> One benefit of supporting something like pg_ls_summariesdir() is that
> it allows us to view the last modification time of each WAL summary file
> and estimate when they'll be removed based on wal_summary_keep_time.
> 
> Of course, we could also extend the existing function to report
> the last modification time if this use case is valid, though.

My argument is about knowing the size of each file, for monitoring of
disk space.  The retention can be controlled by a GUC based on time,
and this function requires knowing about the file name format.
--
Michael


signature.asc
Description: PGP signature

Re: Consider the number of columns in the sort cost model


On 8/23/24 01:46, Andrei Lepikhov wrote:

Just a rebase onto current master to make cf-bot be happy.

--
regards, Andrei Lepikhov
From 73472897442516f0df4ed945cda28d125df08197 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Tue, 8 Oct 2024 11:20:46 +0700
Subject: [PATCH v2] Introduce the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out| 15 +++--
 contrib/postgres_fdw/postgres_fdw.c   |  2 +-
 src/backend/optimizer/path/costsize.c | 34 ++
 src/backend/optimizer/plan/createplan.c   |  2 +-
 src/backend/optimizer/plan/planner.c  |  2 +-
 src/backend/optimizer/prep/prepunion.c|  2 +-
 src/backend/optimizer/util/pathnode.c |  8 +--
 src/include/optimizer/cost.h  |  2 +-
 src/test/regress/expected/aggregates.out  | 13 ++--
 src/test/regress/expected/join.out| 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out   | 66 +--
 12 files changed, 95 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f2bcd6aa98..5da33ab69e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
- QUERY PLAN 
-
- Foreign Scan
+QUERY PLAN 
+---
+ Sort
Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10 WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+ Output: t1.a, fprt2.b, fprt2.c
+ Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+ Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10 WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..f9a2e88a17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3667,7 +3667,7 @@ adjust_foreign_grouping_path_cost(PlannerInfo *root,
 
 		cost_sort(&sort_path,
   root,
-  pathkeys,
+  list_length(pathkeys),
   0,
   *p_startup_cost + *p_run_cost,
   retrieved_rows,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..a4d5e94b65 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsiz

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-10-07 Thread Masahiko Sawada

On Sat, Sep 28, 2024 at 8:56 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 27 Sep 2024 16:33:13 -0700,
>   Masahiko Sawada  wrote:
>
> >> * 0005 (that add "void *opaque" to Copy{From,To}StateData)
> >>   has a bit negative impact for FROM and a bit positive
> >>   impact for TO
> >>   * But I don't know why. This doesn't change per row
> >> related codes. Increasing Copy{From,To}StateData size
> >> ("void *opaque" is added) may be related.
> >
> > I was surprised that the 0005 patch made COPY FROM slower (with fewer
> > rows) and COPY TO faster overall in spite of just adding one struct
> > field and some functions.
>
> Me too...
>
> > I'm interested in why the performance trends of COPY FROM are
> > different between fewer than 6M rows and more than 6M rows.
>
> My hypothesis:
>
> With this patch set:
>   1. One row processing is faster than master.
>   2. Non row related processing is slower than master.
>
> If we have many rows, 1. impact is greater than 2. impact.
>
>
> > Separating the patches into two parts (one is for COPY TO and another
> > one is for COPY FROM) could be a good idea. It would help reviews and
> > investigate performance regression in COPY FROM cases. And I think we
> > can commit them separately.
> >
> > Also, could you please rebase the patches as they conflict with the
> > current HEAD?
>
> OK. I've prepared 2 patch sets:
>
> v20: It just rebased on master. It still mixes COPY TO and
> COPY FROM implementations.
>
> v21: It's based on v20 but splits COPY TO implementations
> and COPY FROM implementations.
> 0001-0005 includes only COPY TO related changes.
> 0006-0010 includes only COPY FROM related changes.
>
> (v21 0001 + 0006) == (v20 v0001),
> (v21 0002 + 0007) == (v20 v0002) and so on.
>
> >   I'll run some benchmarks on my environment as well.
>

Thank you for updating the patches!

I've run the same benchmark script on my various machines (Mac, Linux
(with Intel CPU and Ryzen CPU) and Raspberry Pi etc). I've not
investigated the results in depth yet but let me share the results.
Please find the attached file, extensible_copy_benchmark_20241007.pdf.

In the benchmark, I've applied the v20 patch set and 'master' in the
result refers to a19f83f87966. And I disabled CPU turbo boost where
possible. Overall, v20 patch got a similar or better performance in
both COPY FROM and COPY TO compared to master except for on MacOS. I'm
not sure that changes made to master since the last benchmark run by
Tomas and Suto-san might contribute to these results. I'll try to
investigate the performance regression that happened on MacOS. I think
that other performance differences in my results seem to be within
noises and could be acceptable. Of course, it would be great if others
also could try to run benchmark tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

extensible_copy_benchmark_20241007.pdf
Description: Adobe PDF document

Re: On disable_cost

2024-10-07 Thread Tom Lane

Robert Haas  writes:
> On Sat, Oct 5, 2024 at 3:35 PM Tom Lane  wrote:
>> BTW, getting off the question of EXPLAIN output for a moment,
>> I don't understand why disable_cost is still a thing.  The
>> one remaining usage seems trivial to replace, as attached.

> I believe I commented on that somewhere upthread, but maybe I meant to
> and didn't or maybe you didn't see it in the flurry of emails.
> Basically, I wasn't confident that it made sense to treat this as the
> same kind of thing as other cases where we increment disabled_nodes.

I don't buy your argument that this case is so special that it
warrants preserving disable_cost.  I certainly didn't think it
was special when I added it.

There may be another way to do this that doesn't rely on disabling
the path in the same way as the user-accessible knobs do, but
I don't really believe it's worth the trouble to think of one.
And I definitely don't want to keep disable_cost around even for
just one usage, because then we've not fixed the user-experience
aspect of this (that is, "why does this plan have a ridiculously high
cost?"), nor have we fixed all the concerns you had about higher-level
planning decisions being skewed by that cost.

One other point here is that if disable_cost remains exposed as a
global variable (as it is in HEAD), there is no reason to expect
that any extensions that are using it will get on board with the
new approach.

regards, tom lane

Re: pg_parse_json() should not leak token copies on failure

2024-10-07 Thread Jacob Champion

On Wed, Oct 2, 2024 at 10:45 AM Andrew Dunstan  wrote:
> Generally looks good. Should we have a check in
> setJsonLexContextOwnsTokens() that we haven't started parsing yet, for
> the incremental case?

Good catch. Added in v4.

> > At the moment, we have a test matrix consisting of "standard frontend"
> > and "shlib frontend" tests for the incremental parser. I'm planning
> > for the v4 patch to extend that with a "owned/not owned" dimension;
> > any objections?
> >
>
> Sounds reasonable.

This is also done, along with runs of pgindent/perltidy.

I've added a 0002 as well. While running tests under Valgrind, it
complained about the uninitialized dummy_lex on the stack, which is
now fixed. (That bug was introduced with my patch in 0785d1b8b and is
technically orthogonal to 0001, but I figured I could track it here
for now.) This is also how I found out that my existing fuzzers
weren't checking for uninitialized memory, like I thought they were.
Grumble...

Thanks,
--Jacob
1:  81b1d7cdb6 ! 1:  6a0d088cbe jsonapi: add lexer option to keep token 
ownership
@@ Commit message
 of any tokens it wants to persist past the callback lifetime, but the
 lexer will handle necessary cleanup on failure.
 
+A -o option has been added to test_json_parser_incremental to exercise
+the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
+tests make use of it. (The test program now cleans up allocated memory,
+so that tests can be usefully run under leak sanitizers.)
+
  ## src/common/jsonapi.c ##
+@@ src/common/jsonapi.c: struct JsonParserStack
+  */
+ struct JsonIncrementalState
+ {
++  boolstarted;
+   boolis_last_chunk;
+   boolpartial_completed;
+   jsonapi_StrValType partial_token;
 @@ src/common/jsonapi.c: static JsonParseErrorType 
parse_array_element(JsonLexContext *lex, const JsonSem
  static JsonParseErrorType parse_array(JsonLexContext *lex, const 
JsonSemAction *sem);
  static JsonParseErrorType report_parse_error(JsonParseContext ctx, 
JsonLexContext *lex);
@@ src/common/jsonapi.c: makeJsonLexContextIncremental(JsonLexContext *lex, 
int enc
 +void
 +setJsonLexContextOwnsTokens(JsonLexContext *lex, bool owned_by_context)
 +{
++  if (lex->incremental && lex->inc_state->started)
++  {
++  /*
++   * Switching this flag after parsing has already started is a
++   * programming error.
++   */
++  Assert(false);
++  return;
++  }
++
 +  if (owned_by_context)
 +  lex->flags |= JSONLEX_CTX_OWNS_TOKENS;
 +  else
@@ src/common/jsonapi.c: inc_lex_level(JsonLexContext *lex)
 +  if (lex->incremental)
 +  {
 +  /*
-+   * Ensure freeJsonLexContext() remains safe even if no fname is 
assigned
-+   * at this level.
++   * Ensure freeJsonLexContext() remains safe even if no fname is
++   * assigned at this level.
 +   */
 +  lex->pstack->fnames[lex->lex_level] = NULL;
 +  }
@@ src/common/jsonapi.c: inc_lex_level(JsonLexContext *lex)
  static inline void
  dec_lex_level(JsonLexContext *lex)
  {
-+  set_fname(lex, NULL); /* free the current level's fname, if needed */
++  set_fname(lex, NULL);   /* free the current level's fname, if 
needed */
lex->lex_level -= 1;
  }
  
@@ src/common/jsonapi.c: freeJsonLexContext(JsonLexContext *lex)
FREE(lex->pstack);
}
  
+@@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex,
+   lex->input = lex->token_terminator = lex->line_start = json;
+   lex->input_length = len;
+   lex->inc_state->is_last_chunk = is_last;
++  lex->inc_state->started = true;
+ 
+   /* get the initial token */
+   result = json_lex(lex);
 @@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex,
if (sfunc != NULL)
{
@@ src/common/jsonapi.c: pg_parse_json_incremental(JsonLexContext *lex,
 +
 +  /*
 +   * Either ownership of 
the token passed to the
-+   * callback, or we need 
to free it now. Either way,
-+   * clear our pointer to 
it so it doesn't get freed
-+   * in the future.
++   * callback, or we need 
to free it now. Either
++   * way, clear our 
pointer to it so it doesn't get
++   * fr

Re: Doc: typo in config.sgml

On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
> > On 30 Sep 2024, at 11:03, Tatsuo Ishii  wrote:
> > 
>  I think there's an unnecessary underscore in config.sgml.
> > 
> > I was wrong. The particular byte sequences just looked an underscore
> > on my editor but the byte sequence is actually 0xc2a0, which must be a
> > "non breaking space" encoded in UTF-8. I guess someone mistakenly
> > insert a non breaking space while editing config.sgml.
> 
> I wonder if it would be worth to add a check for this like we have to tabs?
> The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
> (doing so made me realize we don't have an equivalent meson target).

Can we check for any character outside the support range of SGML?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: pg_upgrade check for invalid databases

On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
> On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
>> Correct, sorry for being unclear.  The consistency argument would be to 
>> expand
>> pg_upgrade to report all invalid databases rather than just the first found;
>> attempting to fix problems would be a new behavior.
> 
> Yes, historically pg_upgrade will fail if it finds anything unusual,
> mostly because what it does normally is already scary enough.  If users
> what pg_upgrade to do cleanups, it would be enabled by a separate flag,
> or even a new command-line app.

While I suspect it's rare that someone CTRL-C's out of an accidental DROP
DATABASE and then runs pg_upgrade before trying to recover the data, I
agree with the principle of having pg_upgrade fail by default for things
like this.  If we did add a new flag, the new invalid database report that
Daniel mentions could say something like "try again with
--skip-invalid-databases to have pg_upgrade automatically drop invalid
databases."

-- 
nathan

Re: Refactoring postmaster's code to cleanup after child exit

2024-10-07 Thread Heikki Linnakangas


On 05/10/2024 22:51, Dagfinn Ilmari Mannsåker wrote:

Heikki Linnakangas  writes:

Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or
at least on this perl distribution we're using in Cirrus CI):

Socket::pack_sockaddr_un not implemented on this architecture at
C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872.

so I'll have to disable this test on Windows anyway.


Socket version 2.028 (included in Perl 5.32) provides pack_sockaddr_un()
on Windows, so that can be fixed by bumping the Perl version in
https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl
to something more modern (such as 5.40.0.1), and only skipping the test
if on Windows if Socket is too old.

The decision to use 5.26 seems to come from the initial creation of the
CI images in 2021 (when 5.34 was current), with the comment «newer
versions don't currently work correctly for plperl».  That claim is
worth revisiting, and fixing if it's still the case.


Yeah, it would be nice to update it. I wonder if commit 
341f4e002d461a3c5513cb864490cddae2b43a64 fixed whatever the problem was.


In the meanwhile, here is a one more version of the test patches, with a 
SKIP that checks that IO::Socket::UNIX works.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 9b499e8cdf09a127d7506837d5e2a697dd342820 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 8 Oct 2024 00:54:30 +0300
Subject: [PATCH v6 1/3] Add test for connection limits

Reviewed-by: Andres Freund 
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec...@iki.fi
---
 src/test/Makefile |  2 +-
 src/test/meson.build  |  1 +
 src/test/postmaster/Makefile  | 23 ++
 src/test/postmaster/README| 27 +++
 src/test/postmaster/meson.build   | 12 +++
 .../postmaster/t/001_connection_limits.pl | 79 +++
 6 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 src/test/postmaster/Makefile
 create mode 100644 src/test/postmaster/README
 create mode 100644 src/test/postmaster/meson.build
 create mode 100644 src/test/postmaster/t/001_connection_limits.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index dbd3192874..abdd6e5a98 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl postmaster regress isolation modules authentication recovery subscription
 
 ifeq ($(with_icu),yes)
 SUBDIRS += icu
diff --git a/src/test/meson.build b/src/test/meson.build
index c3d0dfedf1..67376e4b7f 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -4,6 +4,7 @@ subdir('regress')
 subdir('isolation')
 
 subdir('authentication')
+subdir('postmaster')
 subdir('recovery')
 subdir('subscription')
 subdir('modules')
diff --git a/src/test/postmaster/Makefile b/src/test/postmaster/Makefile
new file mode 100644
index 00..dfcce9c9ee
--- /dev/null
+++ b/src/test/postmaster/Makefile
@@ -0,0 +1,23 @@
+#-
+#
+# Makefile for src/test/postmaster
+#
+# Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/postmaster/Makefile
+#
+#-
+
+subdir = src/test/postmaster
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean:
+	rm -rf tmp_check
diff --git a/src/test/postmaster/README b/src/test/postmaster/README
new file mode 100644
index 00..7e47bf5cff
--- /dev/null
+++ b/src/test/postmaster/README
@@ -0,0 +1,27 @@
+src/test/postmaster/README
+
+Regression tests for postmaster
+===
+
+This directory contains a test suite for postmaster's handling of
+connections, connection limits, and startup/shutdown sequence.
+
+
+Running the tests
+=
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+
+Run
+make check
+or
+make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops a test Postgres
+cluster.
+
+See src/test/perl/README for more info about running these tests.
diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
new file mode 100644
index 00..c2de2e0eb5
--- /dev/null
+++ b/src/test/postmaster/meson.build
@@ -0,0 +1,12 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+tests += {
+  'n

Re: First draft of PG 17 release notes

On Sun, Sep 29, 2024 at 06:33:29PM +0530, Amit Kapila wrote:
> > > It is better to write the above statement as:
> > > "pg_upgrade now preserves replication slots on
> > > publishers and full subscription's state on subscribers". This is
> > > because replication slots are preserved on publishers. The subscribers
> > > preserve the subscription state.
> >
> > So, as I understand it, this preservation only happens when the _old_
> > Postgres version is 17+.
> >
> 
> Yes.
> 
> >  Do we want to try and explain that in the
> > Postgres 17 release notes?
> >
> 
> It would be good if we can capture that information without bloating
> the release document. However, this information is already present in
> pg_upgrade docs, so users have a way to know the same even if we can't
> mention it in the release notes.

I have developed the attached patch to mention it is "logical" slots,
and to mention its future use.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index eb65d1d725d..1e620d810d4 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -63,7 +63,10 @@
 
  pg_upgrade now
-  preserves replication slots on both publishers and subscribers
+  preserves logical replication slots on publishers and full
+  subscription state on subscribers.  This will allow upgrades
+  to future major versions to continue logical replication without
+  requiring copy to resynchronize.

Re: Add contrib/pg_logicalsnapinspect

2024-10-07 Thread Peter Smith

Hi, here are some review comments for patch v11.

==
contrib/pg_logicalinspect/specs/logical_inspect.spec

1.
nit - Add some missing spaces after commas (,) in the SQL.
nit - Also update the expected results accordingly

==
doc/src/sgml/pglogicalinspect.sgml

2.
+ 
+  
+   The pg_logicalinspect functions are called
+   using a text argument that can be extracted from the output name of the
+   pg_ls_logicalsnapdir() function.
+  
+ 

2a. wording

The wording "using a text argument that can be extracted" seems like a
hangover from the previous implementation; it does not even say what
that "text argument" means. Why not just say it is a snapshot
filename, something like below?

SUGGESTION:
The pg_logicalinspect functions are called passing a snapshot filename
to be inspected. For example, pass a name obtained from the
pg_ls_logicalsnapdir() function.

~

2b.  formatting

nit - In the previous implementation the extraction of the LSN was
trickier, so this part was worthy of an SGML "NOTE". Now that it is
just a filename, I don't know if it needs to be a special note
anymore.

~~~

3.
+postgres=# SELECT meta.* FROM pg_ls_logicalsnapdir(),
+pg_get_logical_snapshot_meta(name) AS meta;
+
+-[ RECORD 1 ]
+magic| 1369563137
+checksum | 1028045905
+version  | 6

3a.
If you are going to wrap the SQL across multiple lines like this, then
you should show the psql continuation prompt, so that the example
looks the same as what the user would see.

~

3b.
FYI, the output of that can return multiple records, which is
b.i) probably not what you intended to demonstrate
b.ii) not the same as what the example says

e.g., I got this:
test_pub=# SELECT meta.* FROM pg_ls_logicalsnapdir(),
test_pub-# pg_get_logical_snapshot_meta(name) AS meta;
-[ RECORD 1 ]
magic| 1369563137
checksum | 681884630
version  | 6
-[ RECORD 2 ]
magic| 1369563137
checksum | 2213048308
version  | 6
-[ RECORD 3 ]
magic| 1369563137
checksum | 3812680762
version  | 6
-[ RECORD 4 ]
magic| 1369563137
checksum | 3759893001
version  | 6


~~~

(Also those #3a, #3b comments apply to both examples)

==
src/backend/replication/logical/snapbuild.c

4.
- SnapBuild builder;
-
- /* variable amount of TransactionIds follows */
-} SnapBuildOnDisk;
-
 #define SnapBuildOnDiskConstantSize \
  offsetof(SnapBuildOnDisk, builder)
 #define SnapBuildOnDiskNotChecksummedSize \

Is it better to try to keep those "Size" macros defined along with
wherever the SnapBuildOnDisk is defined? Otherwise, if the structure
is ever changed, adjusting the macros could be easily overlooked.

~~~

5.
ValidateAndRestoreSnapshotFile

nit - See [1] #4 suggestion to declare 'sz' at scope where used. The
previous reason not to change this (e.g. "mainly inspired from
SnapBuildRestore") seems less relevant because now most lines of this
function have already been modified for other reasons.

~~~

6.
SnapBuildRestore:

+ if (fd < 0 && errno == ENOENT)
+ return false;
+ else if (fd < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", path)));

I think this code fragment looked like this before, and you only
relocated it, but it still seems a bit awkward to write this way.
Since so much else has changed, how about also improving this in
passing, like below:

if (fd < 0)
{
  if (errno == ENOENT)
return false;

  ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\": %m", path)));
}

==
[1] 
https://www.postgresql.org/message-id/ZusgK0/B8F/1rqft%40ip-10-97-1-34.eu-west-3.compute.internal

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/pg_logicalinspect/expected/logical_inspect.out 
b/contrib/pg_logicalinspect/expected/logical_inspect.out
index 219afd6..71dea4a 100644
--- a/contrib/pg_logicalinspect/expected/logical_inspect.out
+++ b/contrib/pg_logicalinspect/expected/logical_inspect.out
@@ -37,7 +37,7 @@ table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
 COMMIT   
 (3 rows)
 
-step s1_get_logical_snapshot_info: SELECT 
info.state,info.catchange_count,array_length(info.catchange_xip,1) AS 
catchange_array_length,info.committed_count,array_length(info.committed_xip,1) 
AS committed_array_length FROM pg_ls_logicalsnapdir(), 
pg_get_logical_snapshot_info(name) AS info ORDER BY 2;
+step s1_get_logical_snapshot_info: SELECT info.state, info.catchange_count, 
array_length(info.catchange_xip, 1) AS catchange_array_length, 
info.committed_count, array_length(info.committed_xip, 1) AS 
committed_array_length FROM pg_ls_logicalsnapdir(), 
pg_get_logical_snapshot_info(name) AS info ORDER BY 2;
 state 
|catchange_count|catchange_array_length|committed_count|committed_array_length
 
--+---+--+---+--
 consistent|  0|  |  2| 
2
diff --git a/contrib/pg_

Re: long-standing data loss bug in initial sync of logical replication

2024-10-07 Thread Shlok Kyal

Hi Kuroda-san,

> > I have also modified the tests in 0001 patch. These changes are only
> > related to syntax of writing tests.
>
> LGTM. I found small improvements, please find the attached.

I have applied the changes and updated the patch.

Thanks & Regards,
Shlok Kyal
From 07f94de76be177d0e39762cb2bd36a4bc04a7993 Mon Sep 17 00:00:00 2001
From: Shlok Kyal 
Date: Fri, 23 Aug 2024 14:02:20 +0530
Subject: [PATCH v14 1/2] Distribute invalidatons if change in catalog tables

Distribute invalidations to inprogress transactions if the current
committed transaction change any catalog table.
---
 .../replication/logical/reorderbuffer.c   |   5 +-
 src/backend/replication/logical/snapbuild.c   |  34 ++-
 src/include/replication/reorderbuffer.h   |   4 +
 src/test/subscription/t/100_bugs.pl   | 267 ++
 4 files changed, 296 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 22bcf171ff..c5dfc1ab06 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -221,9 +221,6 @@ int			debug_logical_replication_streaming = DEBUG_LOGICAL_REP_STREAMING_BUFFERED
  */
 static ReorderBufferTXN *ReorderBufferGetTXN(ReorderBuffer *rb);
 static void ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static ReorderBufferTXN *ReorderBufferTXNByXid(ReorderBuffer *rb,
-			   TransactionId xid, bool create, bool *is_new,
-			   XLogRecPtr lsn, bool create_as_top);
 static void ReorderBufferTransferSnapToParent(ReorderBufferTXN *txn,
 			  ReorderBufferTXN *subtxn);
 
@@ -622,7 +619,7 @@ ReorderBufferReturnRelids(ReorderBuffer *rb, Oid *relids)
  * (with the given LSN, and as top transaction if that's specified);
  * when this happens, is_new is set to true.
  */
-static ReorderBufferTXN *
+ReorderBufferTXN *
 ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create,
 	  bool *is_new, XLogRecPtr lsn, bool create_as_top)
 {
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0450f94ba8..1f7c24cad0 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -300,7 +300,7 @@ static void SnapBuildFreeSnapshot(Snapshot snap);
 
 static void SnapBuildSnapIncRefcount(Snapshot snap);
 
-static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
+static void SnapBuildDistributeSnapshotAndInval(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid);
 
 static inline bool SnapBuildXidHasCatalogChanges(SnapBuild *builder, TransactionId xid,
  uint32 xinfo);
@@ -859,18 +859,21 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 }
 
 /*
- * Add a new Snapshot to all transactions we're decoding that currently are
- * in-progress so they can see new catalog contents made by the transaction
- * that just committed. This is necessary because those in-progress
- * transactions will use the new catalog's contents from here on (at the very
- * least everything they do needs to be compatible with newer catalog
- * contents).
+ * Add a new Snapshot and invalidation messages to all transactions we're
+ * decoding that currently are in-progress so they can see new catalog contents
+ * made by the transaction that just committed. This is necessary because those
+ * in-progress transactions will use the new catalog's contents from here on
+ * (at the very least everything they do needs to be compatible with newer
+ * catalog contents).
  */
 static void
-SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn)
+SnapBuildDistributeSnapshotAndInval(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid)
 {
 	dlist_iter	txn_i;
 	ReorderBufferTXN *txn;
+	ReorderBufferTXN *curr_txn;
+
+	curr_txn = ReorderBufferTXNByXid(builder->reorder, xid, false, NULL, InvalidXLogRecPtr, false);
 
 	/*
 	 * Iterate through all toplevel transactions. This can include
@@ -913,6 +916,14 @@ SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn)
 		SnapBuildSnapIncRefcount(builder->snapshot);
 		ReorderBufferAddSnapshot(builder->reorder, txn->xid, lsn,
  builder->snapshot);
+
+		/*
+		 * Add invalidation messages to the reorder buffer of inprogress
+		 * transactions except the current committed transaction
+		 */
+		if (txn->xid != xid && curr_txn->ninvalidations > 0)
+			ReorderBufferAddInvalidations(builder->reorder, txn->xid, lsn,
+		  curr_txn->ninvalidations, curr_txn->invalidations);
 	}
 }
 
@@ -1184,8 +1195,11 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		/* refcount of the snapshot builder for the new snapshot */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 
-		/* add a new catalog snapshot to all currently running transactions */
-		SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
+

Re: Expand applicability of aggregate's sortop optimization


On 10/8/24 08:19, Michael Paquier wrote:

On Tue, Oct 01, 2024 at 04:25:59PM +0700, Andrei Lepikhov wrote:

I have written a sketch patch to implement the idea with aggregate
prosupport in code - see the attachment.
My intention was to provide an example, not a ready-to-commit patch.
As I see, the only problem there lies in the tests - CREATE AGGREGATE
doesn't allow us to set a prosupport routine, and we need to update the
pg_proc table manually.


Please note that the CF bot is red.

Thanks, I suppose CATALOG_VERSION_NO was the only reason for this fail.

Also, I see that the union test fails:

 left join lateral
   (select t1.tenthous from tenk2 t2 union all (values(1)))
 on true limit 1;
-QUERY PLAN

- Limit
-   ->  Nested Loop Left Join
- ->  Seq Scan on tenk1 t1
- ->  Append
-   ->  Index Only Scan using tenk2_hundred on tenk2 t2
-   ->  Result
-(6 rows)
-
+ERROR:  deadlock detected
+DETAIL:  Process 414506 waits for AccessShareLock on relation 24696 of 
database 16387; blocked by process 414511.
+Process 414511 waits for AccessExclusiveLock on relation 16421 of 
database 16387; blocked by process 414506.

+HINT:  See server log for query details.

But I wonder if new prosupport routine caused that. Please, let me know 
if it is not a well-known issue.


--
regards, Andrei Lepikhov
From c3678a52d8cac9293a50ff5a4bab951155d45c5c Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Tue, 1 Oct 2024 16:08:11 +0700
Subject: [PATCH v1] Introduce prosupport helpers for aggregates.

For example, add minmax_support routine to simplify min/max aggregates.
---
 src/backend/optimizer/plan/planagg.c |  76 ++
 src/backend/optimizer/prep/prepagg.c |  26 +
 src/include/catalog/catversion.h |   2 +-
 src/include/catalog/pg_proc.dat  |  93 
 src/include/nodes/supportnodes.h |   7 ++
 src/test/regress/expected/aggregates.out | 128 ++-
 src/test/regress/sql/aggregates.sql  |  55 ++
 src/tools/pgindent/typedefs.list |   1 +
 8 files changed, 340 insertions(+), 48 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..598da61af8 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -510,3 +512,77 @@ fetch_agg_sort_op(Oid aggfnoid)
 
 	return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+	Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Oid		aggsortop;
+
+	if (IsA(rawreq, SupportRequestMinMax))
+	{
+		SupportRequestMinMax   *req = (SupportRequestMinMax *) rawreq;
+		Aggref   *aggref = req->aggref;
+
+		if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+			PG_RETURN_POINTER(NULL);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			PG_RETURN_POINTER(NULL);		/* not a MIN/MAX aggregate */
+
+		if (aggref->aggorder != NIL)
+		{
+			SortGroupClause	   *orderClause;
+			TargetEntry		   *curTarget;
+
+			curTarget = (TargetEntry *) linitial(aggref->args);
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+elog(ERROR, "Aggregate order clause isn't found in target list");
+
+			if (orderClause->sortop != aggsortop)
+			{
+List	   *btclasses;
+ListCell   *lc;
+
+btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+foreach(lc, btclasses)
+{
+	OpBtreeInterpretation *interpretation;
+
+	interpretation = (OpBtreeInterpretation *) lfirst(lc);
+	if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+	{
+		aggref->aggorder = NIL;
+		break;
+	}
+}
+
+list_free_deep(btclasses);
+			}
+			else
+aggref->aggorder = NIL;
+		}
+	}
+
+	PG_RETURN_POINTER(NULL);

Re: MergeJoin beats HashJoin in the case of multiple hash clauses


On 7/8/24 19:45, Andrei Lepikhov wrote:

On 3/11/2023 23:43, Tomas Vondra wrote:

On 9/11/23 10:04, Lepikhov Andrei wrote:
  * Determine bucketsize fraction and MCV frequency for the inner
  * relation. We use the smallest bucketsize or MCV frequency estimated
  * for any individual hashclause; this is undoubtedly conservative.

I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.

Yes, It was contradictory idea, though.

IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.
And I've done it. Sorry for so long response. This patch employs of 
extended statistics for estimation of the HashJoin bucket_size. In 
addition, I describe the idea in more convenient form here [1].
Obviously, it needs the only ndistinct to make a prediction that allows 
to reduce computational cost of this statistic.

Minor change to make cfbot happier.

--
regards, Andrei Lepikhov
From 73e4d16812431b91fb760d994d0198baab2fe034 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Mon, 13 May 2024 16:15:02 +0700
Subject: [PATCH v1] Use extended statistics for precise estimation of bucket
 size in hash join.

Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.
In the case of hash join, it can end up with a small number of predicted hash
buckets and, as a result, picking non-optimal merge join.

To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.
Clauses are grouped into lists, each containing expressions referencing the
same relation. The result of the multicolumn estimation made over such a list
is combined with others according to the caller's logic.
Clauses that are not estimated are returned to the caller for further
estimation.
---
 src/backend/optimizer/path/costsize.c |  12 +-
 src/backend/utils/adt/selfuncs.c  | 172 ++
 src/include/utils/selfuncs.h  |   4 +
 3 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..af1cb8ba78 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4294,9 +4294,19 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 	}
 	else
 	{
+		List *otherclauses;
+
 		innerbucketsize = 1.0;
 		innermcvfreq = 1.0;
-		foreach(hcl, hashclauses)
+
+		/* At first, try to estimate bucket size using extended statistics. */
+		otherclauses = estimate_multivariate_bucketsize(root,
+		inner_path->parent,
+		hashclauses,
+		&innerbucketsize);
+
+		/* Pass through the remaining clauses */
+		foreach(hcl, otherclauses)
 		{
 			RestrictInfo *restrictinfo = lfirst_node(RestrictInfo, hcl);
 			Selectivity thisbucketsize;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 03d7fb5f48..7a57f444c8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3752,6 +3752,178 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	return numdistinct;
 }
 
+/*
+ * Try to estimate the bucket size of the hash join inner side when the join
+ * condition contains two or more clauses by employing extended statistics.
+ *
+ * The main idea of this approach is that the distinct value generated by
+ * multivariate estimation on two or more columns would provide less bucket size
+ * than estimation on one separate column.
+ *
+ * IMPORTANT: It is crucial to synchronise the approach of combining different
+ * estimations with the caller's method.
+ * Return a list of clauses that didn't fetch any extended statistics.
+ */
+List *
+estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize)
+{
+	List   *clauses = list_copy(hashclauses);
+	List   *otherclauses = NIL;
+	double	ndistinct = 1.0;
+
+	if (list_length(hashclauses) <= 1)
+		/*
+		 * Nothing to do for a single clause. Could we employ univariate
+		 * extended stat here?
+		 */
+		return hashclauses;
+
+	while (clauses != NIL)
+	{
+		ListCell   *lc;
+		int			relid = -1;
+		List	   *varinfos = NIL;
+		List	   *saveList = NIL;
+		double		mvndistinct;
+		List	   *origin_varinfos;
+		int			group_relid = -1;
+		RelOptInfo *group_rel = NULL;
+		ListCell   *lc1, *lc2;
+
+		/*
+		 * Find clauses, referenci

Re: Incremental Sort Cost Estimation Instability


On 9/23/24 20:02, Andrei Lepikhov wrote:

On 12/9/2024 12:12, David Rowley wrote:

On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov  wrote:

Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.


I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass.  If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate?  (That assumes the less distinct member has every
value the more distinct member has, which might not be true)

Finally, I implemented this approach in code (see attachment).
The effectiveness may be debatable, but general approach looks even 
better than previous one.

Change status to 'Need review'.

Minor change to make compiler and cfbot happy.

--
regards, Andrei Lepikhov
From bdaf7cd46eb10353b6cbaf5f22d1ea835881db24 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Mon, 23 Sep 2024 14:30:28 +0200
Subject: [PATCH v3] Stabilize incremental sort estimation.

Carefully identify an expression that can represent the path key on sort
estimation. Columns may have different distinct estimations that can trigger
wrong cost estimations and choice of sort strategy.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find
columns that are used under the estimating sort operator and chooses the most
negligible ndistinct value.
---
 src/backend/optimizer/path/costsize.c | 52 ---
 src/backend/optimizer/util/pathnode.c |  1 +
 src/include/optimizer/cost.h  |  1 +
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..01f166f504 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1984,6 +1984,50 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 	*run_cost = cpu_operator_cost * tuples;
 }
 
+static Expr *
+identify_sort_expression(PlannerInfo *root, PathKey *key, Relids relids)
+{
+	EquivalenceMember  *candidate = linitial(key->pk_eclass->ec_members);
+	double ndistinct = -1.0; /* Not defined */
+
+	if (root == NULL || list_length(key->pk_eclass->ec_members) == 1)
+		/* Fast path */
+		return candidate->em_expr;
+
+	foreach_node(EquivalenceMember, em, key->pk_eclass->ec_members)
+	{
+		VariableStatData	vardata;
+		boolisdefault = true;
+		doublendist = -1.0;
+
+		/* Sorting over single partition? */
+		if (em->em_is_child && bms_num_members(relids) > 1)
+			continue;
+
+		if (!bms_is_subset(em->em_relids, relids))
+			/*
+			 * Avoid expressions not based on a table column. At least, the
+			 * candidate value already initialised as the first EC member.
+			 */
+			continue;
+
+		/* Let's check candidate's ndistinct value */
+		examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+		if (HeapTupleIsValid(vardata.statsTuple))
+			ndist = get_variable_numdistinct(&vardata, &isdefault);
+		ReleaseVariableStats(vardata);
+
+		if (ndistinct < 0.0 || (!isdefault && ndist < ndistinct))
+		{
+			candidate = em;
+			ndistinct = ndist;
+		}
+	}
+
+	Assert(candidate != NULL);
+	return candidate->em_expr;
+}
+
 /*
  * cost_incremental_sort
  * 	Determines and returns the cost of sorting a relation incrementally, when
@@ -1999,6 +2043,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 void
 cost_incremental_sort(Path *path,
 	  PlannerInfo *root, List *pathkeys, int presorted_keys,
+	  Relids relids,
 	  int input_disabled_nodes,
 	  Cost input_startup_cost, Cost input_total_cost,
 	  double input_tuples, int width, Cost comparison_cost, int sort_mem,
@@ -2053,21 +2098,20 @@ cost_incremental_sort(Path *path,
 	foreach(l, pathkeys)
 	{
 		PathKey*key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+		Expr	   *expr = identify_sort_expression(root, key, relids);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
+		presortedExprs = lappend(presortedExprs, expr);
 
 		if (foreach_current_index(l) + 1 >= presorted_keys)
 			break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..8ed566b1bc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/

Re: per backend I/O statistics