Re: Add contrib/pg_logicalsnapinspect

2024-09-17 Thread Bertrand Drouvot
Hi,

On Mon, Sep 16, 2024 at 10:16:16PM -0700, David G. Johnston wrote:
> On Monday, September 16, 2024, shveta malik  wrote:
> 
> > On Tue, Sep 17, 2024 at 10:18 AM shveta malik 
> > wrote:
> > >
> > > Thanks for addressing the comments. I have not started reviewing v4
> > > yet, but here are few more comments on v3:
> > >
> >
> > I just noticed that when we pass NULL input, both the new functions
> > give 1 row as output, all cols as NULL:
> >
> > newdb1=# SELECT * FROM pg_get_logical_snapshot_meta(NULL);
> >  magic | checksum | version
> > ---+--+-
> >|  |
> >
> > (1 row)
> >
> > Similar behavior with pg_get_logical_snapshot_info(). While the
> > existing 'pg_ls_logicalsnapdir' function gives this error, which looks
> > more meaningful:
> >
> > newdb1=# select * from pg_ls_logicalsnapdir(NULL);
> > ERROR:  function pg_ls_logicalsnapdir(unknown) does not exist
> > LINE 1: select * from pg_ls_logicalsnapdir(NULL);
> > HINT:  No function matches the given name and argument types. You
> > might need to add explicit type casts.
> >
> >
> > Shouldn't the new functions have same behavior?
> >
> 
> No. Since the name pg_ls_logicalsnapdir has zero single-argument
> implementations passing a null value as an argument is indeed attempt to
> invoke a function signature that doesn’t exist.

Agree.

> I can see an argument that they should produce an empty set instead of a
> single all-null row,

Yeah, it's outside the scope of this patch but I've seen different behavior
in this area.

For example:

postgres=# select * from  pg_ls_replslotdir(NULL);
 name | size | modification
--+--+--
(0 rows)

as compared to:

postgres=# select * from  pg_walfile_name_offset(NULL);
 file_name | file_offset
---+-
   |
(1 row)

I thought that it might be linked to the volatility but it is not:

postgres=# select * from pg_stat_get_xact_blocks_fetched(NULL);
 pg_stat_get_xact_blocks_fetched
-

(1 row)

postgres=# select * from pg_get_multixact_members(NULL);
 xid | mode
-+--
(0 rows)

while both are volatile.

I think both make sense: It's "empty" or we "don't know the values of the 
fields".
I don't know if there is any reason about this "inconsistency".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add contrib/pg_logicalsnapinspect

2024-09-17 Thread Bertrand Drouvot
Hi,

On Tue, Sep 17, 2024 at 10:18:35AM +0530, shveta malik wrote:
> Thanks for addressing the comments. I have not started reviewing v4
> yet, but here are few more comments on v3:
> 
> 1)
> +#include "port/pg_crc32c.h"
> 
> It is not needed in pg_logicalinspect.c as it is already included in
> internal_snapbuild.h

Yeap, forgot to remove that one when creating the new "internal".h file, done
in v5 attached, thanks!

> 
> 2)
> + values[0] = Int16GetDatum(ondisk.builder.state);
> 
> + values[8] = LSNGetDatum(ondisk.builder.last_serialized_snapshot);
> + values[9] = TransactionIdGetDatum(ondisk.builder.next_phase_at);
> + values[10] = Int64GetDatum(ondisk.builder.committed.xcnt);
> 
> We can have values[i++] in all the places and later we can check :
> Assert(i == PG_GET_LOGICAL_SNAPSHOT_INFO_COLS);
> Then we need not to keep track of number even in later part of code,
> as it goes till 14.

Right, let's do it that way (as it is done in pg_walinspect for example).

> 4)
> Most of the output columns in pg_get_logical_snapshot_info() look
> self-explanatory except 'state'. Should we have meaningful 'text' here
> corresponding to SnapBuildState? Similar to what we do for
> 'invalidation_reason' in pg_replication_slots. (SlotInvalidationCauses
> for ReplicationSlotInvalidationCause)

Yeah we could. I was not sure about that (and that was my first remark in [1])
, as the module is mainly for debugging purpose, I was thinking that the one
using it could refer to "snapbuild.h". Let's see what others think.

[1]: 
https://www.postgresql.org/message-id/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From ef6ce23b7145d3a1639dc776c56f0b94ae33bff0 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 14 Aug 2024 08:46:05 +
Subject: [PATCH v5] Add contrib/pg_logicalinspect

Provides SQL functions that allow to inspect logical decoding components.

It currently allows to inspect the contents of serialized logical snapshots of
a running database cluster, which is useful for debugging or educational
purposes.
---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_logicalinspect/.gitignore  |   4 +
 contrib/pg_logicalinspect/Makefile|  31 +++
 .../expected/logical_inspect.out  |  52 
 contrib/pg_logicalinspect/logicalinspect.conf |   1 +
 contrib/pg_logicalinspect/meson.build |  39 +++
 .../pg_logicalinspect--1.0.sql|  43 +++
 contrib/pg_logicalinspect/pg_logicalinspect.c | 253 ++
 .../pg_logicalinspect.control |   5 +
 .../specs/logical_inspect.spec|  34 +++
 doc/src/sgml/contrib.sgml |   1 +
 doc/src/sgml/filelist.sgml|   1 +
 doc/src/sgml/pglogicalinspect.sgml| 145 ++
 src/backend/replication/logical/snapbuild.c   | 190 +
 src/include/port/pg_crc32c.h  |  16 +-
 src/include/replication/snapbuild.h   |   2 +-
 src/include/replication/snapbuild_internal.h  | 204 ++
 18 files changed, 830 insertions(+), 193 deletions(-)
   7.6% contrib/pg_logicalinspect/expected/
   5.7% contrib/pg_logicalinspect/specs/
  32.6% contrib/pg_logicalinspect/
  13.2% doc/src/sgml/
  17.4% src/backend/replication/logical/
   4.1% src/include/port/
  18.9% src/include/replication/

diff --git a/contrib/Makefile b/contrib/Makefile
index abd780f277..952855d9b6 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
 		passwordcheck	\
 		pg_buffercache	\
 		pg_freespacemap \
+		pg_logicalinspect \
 		pg_prewarm	\
 		pg_stat_statements \
 		pg_surgery	\
diff --git a/contrib/meson.build b/contrib/meson.build
index 14a8906865..159ff41555 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -46,6 +46,7 @@ subdir('passwordcheck')
 subdir('pg_buffercache')
 subdir('pgcrypto')
 subdir('pg_freespacemap')
+subdir('pg_logicalinspect')
 subdir('pg_prewarm')
 subdir('pgrowlocks')
 subdir('pg_stat_statements')
diff --git a/contrib/pg_logicalinspect/.gitignore b/contrib/pg_logicalinspect/.gitignore
new file mode 100644
index 00..5dcb3ff972
--- /dev/null
+++ b/contrib/pg_logicalinspect/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_logicalinspect/Makefile b/contrib/pg_logicalinspect/Makefile
new file mode 100644
index 00..55124514d4
--- /dev/null
+++ b/contrib/pg_logicalinspect/Makefile
@@ -0,0 +1,31 @@
+# contrib/pg_logicalinspect/Makefile
+
+MODULE_big = pg_logicalinspect
+OBJS = \
+	$(WIN32RES) \
+	pg_logicalinspect.o
+PGFILEDESC = "pg_logicalinspect - functions to inspect logical decoding components"
+
+EXTENSION = pg_logicalinspect
+DATA = pg_logicalinspect--1.0.sql
+
+EXTRA_INSTALL = contrib/test_decoding

Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith
Here are some review comments for v31-0001 (for the docs only)

There may be some overlap here with some comments already made for
v30-0001 which are not yet addressed in v31-0001.

==
Commit message

1.
When introducing the 'publish_generated_columns' parameter, you must
also say this is a PUBLICATION parameter.

~~~

2.
With this enhancement, users can now include the 'include_generated_columns'
option when querying logical replication slots using either the pgoutput
plugin or the test_decoding plugin. This option, when set to 'true' or '1',
instructs the replication system to include generated column information
and data in the replication stream.

~

The above is stale information because it still refers to the old name
'include_generated_columns', and to test_decoding which was already
removed in this patch.

==
doc/src/sgml/ddl.sgml

3.
+  Generated columns may be skipped during logical replication
according to the
+  CREATE PUBLICATION option
+  
+  publish_generated_columns.

3a.
nit - The linkend is based on the old name instead of the new name.

3b.
nit - Better to call this a parameter instead of an option because
that is what the CREATE PUBLICATION docs call it.

==
doc/src/sgml/protocol.sgml

4.
+
+ publish_generated_columns
+  
+   
+Boolean option to enable generated columns. This option controls
+whether generated columns should be included in the string
+representation of tuples during logical decoding in PostgreSQL.
+   
+  
+
+

Is this even needed anymore? Now that the implementation is using a
PUBLICATION parameter, isn't everything determined just by that
parameter? I don't see the reason why a protocol change is needed
anymore. And, if there is no protocol change needed, then this
documentation change is also not needed.



5.
  
-  Next, the following message part appears for each column included in
-  the publication (except generated columns):
+  Next, the following message parts appear for each column included in
+  the publication (generated columns are excluded unless the parameter
+  
+  publish_generated_columns specifies otherwise):
  

Like the previous comment above, I think everything is now determined
by the PUBLICATION parameter. So maybe this should just be referring
to that instead.

==
doc/src/sgml/ref/create_publication.sgml

6.
+   
+publish_generated_columns
(boolean)
+

nit - the ID is based on the old parameter name.

~

7.
+ 
+  This option is only available for replicating generated
column data from the publisher
+  to a regular, non-generated column in the subscriber.
+ 

IMO remove this paragraph. I really don't think you should be
mentioning the subscriber here at all. AFAIK this parameter is only
for determining if the generated column will be published or not. What
happens at the other end (e.g. logic whether it gets ignored or not by
the subscriber) is more like a matrix of behaviours that could be
documented in the "Logical Replication" section. But not here.

(I removed this in my nitpicks attachment)

~~~

8.
+ 
+ This parameter can only be set true if
copy_data is
+ set to false.
+ 

IMO remove this paragraph too. The user can create a PUBLICATION
before a SUBSCRIPTION even exists so to say it "can only be set..." is
not correct. Sure, your patch 0001 does not support the COPY of
generated columns but if you want to document that then it should be
documented in the CREATE SUBSCRIBER docs. But not here.

(I removed this in my nitpicks attachment)

TBH, it would be better if patches 0001 and 0002 were merged then you
can avoid all this. IIUC they were only separate in the first place
because 2 different people wrote them. It is not making reviews easier
with them split.

==

Please see the attachment which implements some of the nits above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2e7804e..cca54bc 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -515,8 +515,8 @@ CREATE TABLE people (
 
  
   Generated columns may be skipped during logical replication according to 
the
-  CREATE PUBLICATION option
-  
+  CREATE PUBLICATION parameter
+  
   publish_generated_columns.
  
 
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index e133dc3..cd20bd4 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -223,7 +223,7 @@ CREATE PUBLICATION name
 

 
-   
+   
 publish_generated_columns 
(boolean)
 
  
@@ -231,14 +231,6 @@ CREATE PUBLICATION name
   associated with the publication should be replicated.
   The default is false.
  
- 
-  This opti

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-09-17 Thread Alexander Korotkov
On Tue, Sep 17, 2024 at 9:00 AM Alexander Lakhin  wrote:
> 16.09.2024 21:55, Alexander Korotkov wrote:
> > Please find two patches attached.  The first one does minor cleanup
> > including misuse of words you've pointed.  The second one adds missing
> > wait_for_catchup().  That should fix the test failure you've spotted.
> > Please, check if it fixes an issue for you.
>
> Thank you for looking at that!
>
> Unfortunately, the issue is still here — the test failed for me 6 out of
> 10 runs, as below:
> [05:14:02.807](0.135s) ok 8 - got error after standby promote
> error running SQL: 'psql::1: ERROR:  recovery is not in progress
> HINT:  Waiting for LSN can only be executed during recovery.'
> while running 'psql -XAtq -d port=12734 host=/tmp/04hQ75NuXf 
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'CALL
> pg_wal_replay_wait('0/300F248');' at 
> .../src/test/recovery/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 
> 2140.
>
> 043_wal_replay_wait_standby.log:
> 2024-09-17 05:14:02.714 UTC [1817258] 043_wal_replay_wait.pl ERROR: recovery 
> is not in progress
> 2024-09-17 05:14:02.714 UTC [1817258] 043_wal_replay_wait.pl DETAIL:  
> Recovery ended before replaying target LSN
> 2/570CD648; last replay LSN 0/300F210.
> 2024-09-17 05:14:02.714 UTC [1817258] 043_wal_replay_wait.pl STATEMENT:  CALL 
> pg_wal_replay_wait('2/570CD648');
> 2024-09-17 05:14:02.714 UTC [1817155] LOG:  checkpoint starting: force
> 2024-09-17 05:14:02.714 UTC [1817154] LOG:  database system is ready to 
> accept connections
> 2024-09-17 05:14:02.811 UTC [1817270] 043_wal_replay_wait.pl LOG: statement: 
> CALL pg_wal_replay_wait('0/300F248');
> 2024-09-17 05:14:02.811 UTC [1817270] 043_wal_replay_wait.pl ERROR: recovery 
> is not in progress
>
> pg_waldump -p .../t_043_wal_replay_wait_primary_data/pgdata/pg_wal/ 
> 00010003
> rmgr: Transaction len (rec/tot): 34/34, tx:748, lsn: 
> 0/0300F1E8, prev 0/0300F1A8, desc: COMMIT
> 2024-09-17 05:14:01.654874 UTC
> rmgr: Standby len (rec/tot): 50/50, tx:  0, lsn: 
> 0/0300F210, prev 0/0300F1E8, desc: RUNNING_XACTS
> nextXid 749 latestCompletedXid 748 oldestRunningXid 749
>
> I wonder, can you reproduce this with that bgwriter's modification?

Yes, now I did reproduce.  I got that the problem could be that insert
LSN is not yet written at primary, thus wait_for_catchup doesn't wait
for it.  I've workarounded that using pg_switch_wal().  The revised
patchset is attached.

> I've also found two more "achievements" coined by 3c5db1d6b:
> doc/src/sgml/func.sgml:   It may also happen that target 
> lsn is not achieved
> src/backend/access/transam/xlog.c-   * recovery was ended before 
> achieving the target LSN.

Fixed this at well in 0001.

--
Regards,
Alexander Korotkov
Supabase


v2-0001-Minor-cleanup-related-to-pg_wal_replay_wait-proce.patch
Description: Binary data


v2-0002-Ensure-standby-promotion-point-in-043_wal_replay_.patch
Description: Binary data


Re: First draft of PG 17 release notes

2024-09-17 Thread Jelte Fennema-Nio
On Wed, 11 Sept 2024 at 16:10, Bruce Momjian  wrote:
> You are right that I do mention changes specifically designed for the
> use of extensions, but there is no mention in the commit message of its
> use for extensions.  In fact, I thought this was too low-level to be of
> use for extensions.  However, if people feel it should be added, we have
> enough time to add it.

Another new API that is useful for extension authors is the following
one (I'm obviously biased since I'm the author, and I don't know if
there's still time):

commit 14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
Author: Nathan Bossart 
Date:   Thu Jan 4 16:09:34 2024 -0600

Add macros for looping through a List without a ListCell.

Many foreach loops only use the ListCell pointer to retrieve the
content of the cell, like so:

ListCell   *lc;

foreach(lc, mylist)
{
int myint = lfirst_int(lc);

...
}

This commit adds a few convenience macros that automatically
declare the loop variable and retrieve the current cell's contents.
This allows us to rewrite the previous loop like this:

foreach_int(myint, mylist)
{
...
}

> An interesting idea would be
> to report all function signature changes in each major release in some
> way.

I think that might be useful, but it very much depends how long that
list gets. If it gets too long I think authors will just try to
compile and only look at the ones that break for them.




Re: Detailed release notes

2024-09-17 Thread Alvaro Herrera
On 2024-Sep-16, Bruce Momjian wrote:

> We could try:
> 
>Add backend support for injection points (Michael Paquier) §1 §2 §3 §4
> 
> and maybe only add numbers if there is more than one commit.

Well, this reads like references to sections 1, 2, 3 and 4, but they
aren't that, and probably such sections don't even exist.  I think the
use of the section symbol is misleading and typographically wrong.
https://www.monotype.com/resources/punctuation-series-section-sign


Maybe it would work to use numbers in square brackets instead:

Add backend support for injection points (Michael Paquier) [1] [2] [3] [4]

Maybe use the word "commit" for the first one, which would make it
clearer what the links are about:

Add backend support for injection points (Michael Paquier) [commit 1] [2] 
[3] [4]

and if there's only one, say just "[commit]".

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: First draft of PG 17 release notes

2024-09-17 Thread Alvaro Herrera
On 2024-Sep-11, Bruce Momjian wrote:

> An interesting idea would be to report all function signature changes
> in each major release in some way.

Hmm, extension authors are going to realize this as soon as they try to
compile, so it doesn't seem necessary.  Having useful APIs _added_ is a
different matter, because those might help them realize that they can
remove parts (or #ifdef-out for newer PG versions) of their code, or add
new features; there's no Clippit saying "it looks like you're compiling
for Postgres 18, would you like to ...?".

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them."(Freeman Dyson)




Re: Detailed release notes

2024-09-17 Thread Jelte Fennema-Nio
On Tue, 17 Sept 2024 at 10:12, Alvaro Herrera  wrote:
> Maybe it would work to use numbers in square brackets instead:
>
> Add backend support for injection points (Michael Paquier) [1] [2] [3] [4]

Another option would be to add them in superscript using the 
html tag (or even using some unicode stuff):

Add backend support for injection points (Michael Paquier)1,
2, 3, 4

So similar to footnotes in a sense, i.e. if you want details click on
the small numbers




Re: Can we rely on the ordering of paths in pathlist?

2024-09-17 Thread wenhui qiu
Hi Richard Guo
I  tried to changed the comment, can you help me to check if this is
correct?Many thanks.
-  /*
-  * get_cheapest_parallel_safe_total_inner
-  *  Find the unparameterized parallel-safe path with the least total cost.
-  */
+ /* get_cheapest_parallel_safe_total_inner
+  *  Skip paths that do not meet the criteria,find the unparameterized
parallel-safe path with the least total cost and return NULL if it does not
exist.
+  *
+  */

Thanks

wenhui qiu  于2024年7月31日周三 09:21写道:

> Hi Richard Guo
> Today is the last day of the commitfest cycle.I think this patch
> should be commented ,Except for the comments, I tested it good to me
>
>
> Thanks
>
> wenhui qiu  于2024年7月25日周四 16:18写道:
>
>> Hi Richard Guo
>> Is it necessary to add some comments here?
>>
>>
>> + if (!innerpath->parallel_safe ||
>> + !bms_is_empty(PATH_REQ_OUTER(innerpath)))
>> + continue;
>> +
>> + if (matched_path != NULL &&
>> + compare_path_costs(matched_path, innerpath, TOTAL_COST) <= 0)
>> + continue;
>> +
>> + matched_path = innerpath;
>>
>> Richard Guo  于2024年1月10日周三 15:08写道:
>>
>>>
>>> On Tue, Apr 11, 2023 at 11:43 AM Andy Fan 
>>> wrote:
>>>
 On Tue, Apr 11, 2023 at 11:03 AM Richard Guo 
 wrote:

> As the comment above add_path() says, 'The pathlist is kept sorted by
> total_cost, with cheaper paths at the front.'  And it seems that
> get_cheapest_parallel_safe_total_inner() relies on this ordering
> (without being mentioned in the comments, which I think we should do).
>

 I think the answer for ${subject} should be yes. Per the comments in
 add_partial_path, we have

  * add_partial_path
  *
  *  As in add_path, the partial_pathlist is kept sorted with the
 cheapest
  *  total path in front.  This is depended on by multiple places, which
  *  just take the front entry as the cheapest path without searching.
  *

>>>
>>> I'm not sure about this conclusion.  Surely we can depend on that the
>>> partial_pathlist is kept sorted by total_cost ASC.  This is emphasized
>>> in the comment of add_partial_path, and also leveraged in practice, such
>>> as in many places we just use linitial(rel->partial_pathlist) as the
>>> cheapest partial path.
>>>
>>> But get_cheapest_parallel_safe_total_inner works on pathlist not
>>> partial_pathlist.  And for pathlist, I'm not sure if it's a good
>>> practice to depend on its ordering.  Because
>>>
>>> 1) the comment of add_path only mentions that add_path_precheck relies
>>> on this ordering, but it does not mention that other functions also do;
>>>
>>> 2) other than add_path_precheck, I haven't observed any other functions
>>> that rely on this ordering.  The only exception as far as I notice is
>>> get_cheapest_parallel_safe_total_inner.
>>>
>>> On the other hand, if we declare that we can rely on the pathlist being
>>> sorted in ascending order by total_cost, we should update the comment
>>> for add_path to highlight this aspect.  We should also include a comment
>>> for get_cheapest_parallel_safe_total_inner to clarify why an early exit
>>> is possible, similar to what we do for add_path_precheck.  Additionally,
>>> in several places, we can optimize our code by taking advantage of this
>>> fact and terminate the iteration through the pathlist early when looking
>>> for the cheapest path of a certain kind.
>>>
>>> Thanks
>>> Richard
>>>
>>


Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-09-17 Thread Alexander Lakhin

17.09.2024 10:47, Alexander Korotkov wrote:

Yes, now I did reproduce.  I got that the problem could be that insert
LSN is not yet written at primary, thus wait_for_catchup doesn't wait
for it.  I've workarounded that using pg_switch_wal().  The revised
patchset is attached.


Thank you for the revised patch!

The improved test works reliably for me (100 out of 100 runs passed),

Best regards,
Alexander




Re: Using per-transaction memory contexts for storing decoded tuples

2024-09-17 Thread Amit Kapila
On Mon, Sep 16, 2024 at 10:43 PM Masahiko Sawada  wrote:
>
> On Fri, Sep 13, 2024 at 3:58 AM Amit Kapila  wrote:
> >
> > On Thu, Sep 12, 2024 at 4:03 AM Masahiko Sawada  
> > wrote:
> > >
> > > We have several reports that logical decoding uses memory much more
> > > than logical_decoding_work_mem[1][2][3]. For instance in one of the
> > > reports[1], even though users set logical_decoding_work_mem to
> > > '256MB', a walsender process was killed by OOM because of using more
> > > than 4GB memory.
> > >
> > > As per the discussion in these threads so far, what happened was that
> > > there was huge memory fragmentation in rb->tup_context.
> > > rb->tup_context uses GenerationContext with 8MB memory blocks. We
> > > cannot free memory blocks until all memory chunks in the block are
> > > freed. If there is a long-running transaction making changes, its
> > > changes could be spread across many memory blocks and we end up not
> > > being able to free memory blocks unless the long-running transaction
> > > is evicted or completed. Since we don't account fragmentation, block
> > > header size, nor chunk header size into per-transaction memory usage
> > > (i.e. txn->size), rb->size could be less than
> > > logical_decoding_work_mem but the actual allocated memory in the
> > > context is much higher than logical_decoding_work_mem.
> > >
> >
> > It is not clear to me how the fragmentation happens. Is it because of
> > some interleaving transactions which are even ended but the memory
> > corresponding to them is not released?
>
> In a generation context, we can free a memory block only when all
> memory chunks there are freed. Therefore, individual tuple buffers are
> already pfree()'ed but the underlying memory blocks are not freed.
>

I understood this part but didn't understand the cases leading to this
problem. For example, if there is a large (and only) transaction in
the system that allocates many buffers for change records during
decoding, in the end, it should free memory for all the buffers
allocated in the transaction. So, wouldn't that free all the memory
chunks corresponding to the memory blocks allocated? My guess was that
we couldn't free all the chunks because there were small interleaving
transactions that allocated memory but didn't free it before the large
transaction ended.

> > Can we try reducing the size of
> > 8MB memory blocks? The comment atop allocation says: "XXX the
> > allocation sizes used below pre-date generation context's block
> > growing code.  These values should likely be benchmarked and set to
> > more suitable values.", so do we need some tuning here?
>
> Reducing the size of the 8MB memory block would be one solution and
> could be better as it could be back-patchable. It would mitigate the
> problem but would not resolve it. I agree to try reducing it and do
> some benchmark tests. If it reasonably makes the problem less likely
> to happen, it would be a good solution.
>

makes sense.

-- 
With Regards,
Amit Kapila.




[PATCH] Mention service key word more prominently in pg_service.conf docs

2024-09-17 Thread Dagfinn Ilmari Mannsåker
Hi hackers,

I just noticed that the docs for pg_service.conf
(https://www.postgresql.org/docs/current/libpq-pgservice.html) don't
mention the actual key word to use in the libpq connection string until
the example in the last sentence, but it mentions the env var in the
first paragraph.  Here's a patch to make the key word equally prominent.

- ilmari

>From 380f9a717ff8216201e44c4f2caa34b596b13e7a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 17 Sep 2024 10:17:46 +0100
Subject: [PATCH] Mention the `service` key word more prominently in the
 pg_service.conf docs

The env var was mentioned in the first paragraph, but the key word
wasn't mentioned until the last sentence.
---
 doc/src/sgml/libpq.sgml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 783e8e750b..ef7e3076ce 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -9317,7 +9317,8 @@
   
The connection service file allows libpq connection parameters to be
associated with a single service name. That service name can then be
-   specified in a libpq connection string, and the associated settings will be
+   specified using the service key word
+   in a libpq connection string, and the associated settings will be
used. This allows connection parameters to be modified without requiring
a recompile of the libpq-using application. The service name can also be
specified using the PGSERVICE environment variable.
-- 
2.39.5



jsonb_strip_nulls with arrays?

2024-09-17 Thread Florents Tselai
Currently: 

jsonb_strip_nulls ( jsonb ) → jsonb
Deletes all object fields that have null values from the given JSON value, 
recursively. Null values that are not object fields are untouched.

> Null values that are not object fields are untouched. 

Can we revisit this and make it work with arrays, too?
Tbh, at first sight that looked like the expected behavior for me.
That is strip nulls from arrays as well.

This has been available since 9.5 and iiuc predates lots of the jsonb array 
work.

In practice, though, whenever jsonb_build_array is used (especially with 
jsonpath),
a few nulls do appear in the resulting array most of the times,
Currently, there’s no expressive way to remove this.  

We could also have jsonb_array_strip_nulls(jsonb) as well



Re: [PATCH] Mention service key word more prominently in pg_service.conf docs

2024-09-17 Thread Daniel Gustafsson
> On 17 Sep 2024, at 11:24, Dagfinn Ilmari Mannsåker  wrote:

> I just noticed that the docs for pg_service.conf
> (https://www.postgresql.org/docs/current/libpq-pgservice.html) don't
> mention the actual key word to use in the libpq connection string until
> the example in the last sentence, but it mentions the env var in the
> first paragraph.  Here's a patch to make the key word equally prominent.

Fair point, that seems like a good change, will apply.

--
Daniel Gustafsson





Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith
Review comments for v31-0001.

(I tried to give only new comments, but there might be some overlap
with comments I previously made for v30-0001)

==
src/backend/catalog/pg_publication.c

1.
+
+ if (publish_generated_columns_given)
+ {
+ values[Anum_pg_publication_pubgencolumns - 1] =
BoolGetDatum(publish_generated_columns);
+ replaces[Anum_pg_publication_pubgencolumns - 1] = true;
+ }

nit - unnecessary whitespace above here.

==
src/backend/replication/pgoutput/pgoutput.c

2. prepare_all_columns_bms

+ /* Iterate the cols until generated columns are found. */
+ cols = bms_add_member(cols, i + 1);

How does the comment relate to the statement that follows it?

~~~

3.
+ * Skip generated column if pubgencolumns option was not
+ * specified.

nit - /pubgencolumns option/publish_generated_columns parameter/

==
src/bin/pg_dump/pg_dump.c

4.
getPublications:

nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler)

==
src/bin/pg_dump/pg_dump.h

5.
+ bool pubgencolumns;
 } PublicationInfo;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

==
vsrc/bin/psql/describe.c

6.
  bool has_pubviaroot;
+ bool has_pubgencol;

nit - /has_pubgencol/has_pubgencols/ (plural consistency)

==
src/include/catalog/pg_publication.h

7.
+ /* true if generated columns data should be published */
+ bool pubgencolumns;
 } FormData_pg_publication;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

~~~

8.
+ bool pubgencolumns;
  PublicationActions pubactions;
 } Publication;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

==
src/test/regress/sql/publication.sql

9.
+-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter
+SET client_min_messages = 'ERROR';
+CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1);
+\dRp+ pub1
+
+CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0);
+\dRp+ pub2

9a.
nit - Use lowercase for the parameters.

~

9b.
nit - Fix the comment to say what the test is actually doing:
"Test the publication 'publish_generated_columns' parameter enabled or disabled"

==
src/test/subscription/t/031_column_list.pl

10.
Later I think you should add another test here to cover the scenario
that I was discussing with Sawada-San -- e.g. when there are 2
publications for the same table subscribed by just 1 subscription but
having different values of the 'publish_generated_columns' for the
publications.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2e7804e..cca54bc 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -515,8 +515,8 @@ CREATE TABLE people (
 
  
   Generated columns may be skipped during logical replication according to 
the
-  CREATE PUBLICATION option
-  
+  CREATE PUBLICATION parameter
+  
   publish_generated_columns.
  
 
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index e133dc3..cd20bd4 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -223,7 +223,7 @@ CREATE PUBLICATION name
 

 
-   
+   
 publish_generated_columns 
(boolean)
 
  
@@ -231,14 +231,6 @@ CREATE PUBLICATION name
   associated with the publication should be replicated.
   The default is false.
  
- 
-  This option is only available for replicating generated column data 
from the publisher
-  to a regular, non-generated column in the subscriber.
- 
- 
- This parameter can only be set true if 
copy_data is
- set to false.
- 
 

 
diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 272b6a1..7ebb851 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -999,7 +999,7 @@ GetPublication(Oid pubid)
pub->pubactions.pubdelete = pubform->pubdelete;
pub->pubactions.pubtruncate = pubform->pubtruncate;
pub->pubviaroot = pubform->pubviaroot;
-   pub->pubgencolumns = pubform->pubgencolumns;
+   pub->pubgencols = pubform->pubgencols;
 
ReleaseSysCache(tup);
 
@@ -1211,7 +1211,7 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
if (att->attisdropped)
continue;
 
-   if (att->attgenerated && !pub->pubgencolumns)
+   if (att->attgenerated && !pub->pubgencols)
continue;
 
attnums[nattnums++] = att->attnum;
diff --git a/src/backend/commands/publicationcmds.c 
b/src/backend/commands/publicationcmds.c
index 6242a09..0129db1 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/sr

Re: SQL:2011 application time

2024-09-17 Thread Peter Eisentraut

On 05.09.24 14:09, Peter Eisentraut wrote:

On 07.08.24 22:54, Paul Jungwirth wrote:
Here are some fixes based on outstanding feedback (some old some new). 


I have studied your patches v39-0001 through v39-0004, which correspond 
to what had been reverted plus the new empty range check plus various 
minor fixes.  This looks good to me now, so I propose to go ahead with 
that.


Btw., in your 0003 you point out that this prevents using the WITHOUT 
OVERLAPS functionality for non-range types.  But I think this could be 
accomplished by adding an "is empty" callback as a support function or 
something like that.  I'm not suggesting to do that here, but it might 
be worth leaving a comment about that possibility.


I have committed these, as explained here.

I look forward to an updated patch set from you to review the "FOR 
PORTION OF" patches next.






Re: Add contrib/pg_logicalsnapinspect

2024-09-17 Thread shveta malik
On Tue, Sep 17, 2024 at 10:46 AM David G. Johnston
 wrote:
>
>
>
> On Monday, September 16, 2024, shveta malik  wrote:
>>
>> On Tue, Sep 17, 2024 at 10:18 AM shveta malik  wrote:
>> >
>> > Thanks for addressing the comments. I have not started reviewing v4
>> > yet, but here are few more comments on v3:
>> >
>>
>> I just noticed that when we pass NULL input, both the new functions
>> give 1 row as output, all cols as NULL:
>>
>> newdb1=# SELECT * FROM pg_get_logical_snapshot_meta(NULL);
>>  magic | checksum | version
>> ---+--+-
>>|  |
>>
>> (1 row)
>>
>> Similar behavior with pg_get_logical_snapshot_info(). While the
>> existing 'pg_ls_logicalsnapdir' function gives this error, which looks
>> more meaningful:
>>
>> newdb1=# select * from pg_ls_logicalsnapdir(NULL);
>> ERROR:  function pg_ls_logicalsnapdir(unknown) does not exist
>> LINE 1: select * from pg_ls_logicalsnapdir(NULL);
>> HINT:  No function matches the given name and argument types. You
>> might need to add explicit type casts.
>>
>>
>> Shouldn't the new functions have same behavior?
>
>
> No. Since the name pg_ls_logicalsnapdir has zero single-argument 
> implementations passing a null value as an argument is indeed attempt to 
> invoke a function signature that doesn’t exist.
>
> If there is exactly one single input argument function of the given name the 
> parser is going to cast the null literal to the data type of the single 
> argument and invoke the function.  It will not and cannot be convinced to 
> fail to find a matching function.

Okay, understood. Thanks for explaining.

> I can see an argument that they should produce an empty set instead of a 
> single all-null row, but the idea that they wouldn’t even be found is 
> contrary to a core design of the system.

Okay, a single row can be investigated if it comes under this scope.
But I see why 'ERROR' is not a possibility here.

thanks
Shveta




Re: per backend I/O statistics

2024-09-17 Thread Nazir Bilal Yavuz
Hi,

On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
 wrote:
> On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
> > - The pgstat_reset_io_counter_internal() is called in the
> > pgstat_shutdown_hook(). This causes the stats_reset column showing the
> > termination time of the old backend when its proc num is reassigned to
> > a new backend.
>
> doh! Nice catch, thanks!
>
> And also new backends that are not re-using a previous "existing" process slot
> are getting the last reset time (if any). So the right place to fix this is in
> pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time 
> to
> 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to 
> be
> right "conceptually" speaking).

Thanks, I confirm that it is fixed.

You mentioned some refactoring upthread:

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
 wrote:
>
> - If we agree on the current proposal then I'll do  some refactoring in
> pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
> duplicated code (it's not done yet to ease the review).

Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats? If you meant the same thing, please don't
mind it.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Psql meta-command conninfo+

2024-09-17 Thread Hunaid Sohail
Hi,

On Mon, Sep 16, 2024 at 8:31 PM Tom Lane  wrote:

> Alvaro Herrera  writes:
> > On 2024-Sep-16, Jim Jones wrote:
> >> * The value of "Current User" does not match the function current_user()
> >> --- as one might expcect. It is a little confusing, as there is no
> >> mention of "Current User" in the docs. In case this is the intended
> >> behaviour, could you please add it to the docs?
>
> > It is intended.  As Peter said[1], what we wanted was to display
> > client-side info, so PQuser() is the right thing to do.  Now maybe
> > "Current User" is not the perfect column header, but at least the
> > definition seems consistent with the desired end result.
>
> Seems like "Session User" would be closer to being accurate, since
> PQuser()'s result does not change when you do SET ROLE etc.
>
> > Now, I think
> > the current docs saying to look at session_user() are wrong, they should
> > point to the libpq docs for the function instead; something like "The
> > name of the current user, as returned by PQuser()" and so on.
>
> Sure, but this does not excuse choosing a misleading column name
> when there are better choices readily available.
>

Maybe we can rename "Current User" to "Authenticated User" just like the
previous author because it is a user returned by PQuser().

For the "Session User", I believe it is working as expected, since
session_user can be changed with SET SESSION AUTHORIZATION.

```
$ bin/psql "port=5430 sslmode=disable dbname=postgres" -x -h localhost

postgres=# \conninfo+
Connection Information
-[ RECORD 1 ]+--
Database | postgres
Current User | hunaid
Session User | hunaid
Host | localhost
Host Address | 127.0.0.1
Port | 5430
Protocol Version | 3
SSL Connection   | false
GSSAPI Authenticated | false
Client Encoding  | UTF8
Server Encoding  | UTF8
Backend PID  | 1337

postgres=# set SESSION AUTHORIZATION postgres;
SET
postgres=# \conninfo+
Connection Information
-[ RECORD 1 ]+--
Database | postgres
Current User | hunaid
Session User | postgres
Host | localhost
Host Address | 127.0.0.1
Port | 5430
Protocol Version | 3
SSL Connection   | false
GSSAPI Authenticated | false
Client Encoding  | UTF8
Server Encoding  | UTF8
Backend PID  | 1337
```

We can update the docs as follows:
Authenticated User: The name of the user returned by PQuser().
Session User: The session user's name.

Opinions?

Regards,
Hunaid Sohail


Re: Parallel workers stats in pg_stat_database

2024-09-17 Thread Benoit Lobréau

Here is an updated patch fixing the aforementionned problems
with tests and vacuum stats.

--
Benoit Lobréau
Consultant
http://dalibo.comFrom 1b52f5fb4e977599b8925c69193d31148042ca7d Mon Sep 17 00:00:00 2001
From: benoit 
Date: Wed, 28 Aug 2024 02:27:13 +0200
Subject: [PATCH] Adds four parallel workers stat columns to pg_stat_database

* parallel_workers_planned
* parallel_workers_launched
* parallel_maint_workers_planned
* parallel_maint_workers_launched
---
 doc/src/sgml/monitoring.sgml  | 36 +++
 src/backend/access/brin/brin.c|  4 +++
 src/backend/access/nbtree/nbtsort.c   |  4 +++
 src/backend/catalog/system_views.sql  |  4 +++
 src/backend/commands/vacuumparallel.c |  5 +++
 src/backend/executor/execMain.c   |  7 
 src/backend/executor/execUtils.c  |  3 ++
 src/backend/executor/nodeGather.c |  3 ++
 src/backend/executor/nodeGatherMerge.c|  3 ++
 src/backend/utils/activity/pgstat_database.c  | 36 +++
 src/backend/utils/adt/pgstatfuncs.c   | 12 +++
 src/include/catalog/pg_proc.dat   | 20 +++
 src/include/nodes/execnodes.h |  3 ++
 src/include/pgstat.h  |  7 
 src/test/regress/expected/rules.out   |  4 +++
 src/test/regress/expected/select_parallel.out | 27 ++
 src/test/regress/expected/vacuum_parallel.out | 19 ++
 src/test/regress/sql/select_parallel.sql  | 15 
 src/test/regress/sql/vacuum_parallel.sql  | 11 ++
 19 files changed, 223 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 55417a6fa9..8c4b11c11d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3508,6 +3508,42 @@ description | Waiting for a newly initialized WAL file 
to reach durable storage
   
  
 
+ 
+  
+   parallel_workers_planned bigint
+  
+  
+   Number of parallel workers planned by queries on this database
+  
+ 
+
+ 
+  
+   parallel_workers_launched bigint
+  
+  
+   Number of parallel workers obtained by queries on this database
+  
+ 
+
+ 
+  
+   parallel_maint_workers_planned 
bigint
+  
+  
+   Number of parallel workers planned by utilities on this database
+  
+ 
+
+ 
+  
+   parallel_maint_workers_launched 
bigint
+  
+  
+   Number of parallel workers obtained by utilities on this database
+  
+ 
+
  
   
stats_reset timestamp with time 
zone
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 6467bed604..9eceb87b52 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2540,6 +2540,10 @@ _brin_end_parallel(BrinLeader *brinleader, 
BrinBuildState *state)
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(brinleader->pcxt);
 
+   pgstat_update_parallel_maint_workers_stats(
+   (PgStat_Counter) brinleader->pcxt->nworkers_to_launch,
+   (PgStat_Counter) brinleader->pcxt->nworkers_launched);
+
/*
 * Next, accumulate WAL usage.  (This must wait for the workers to 
finish,
 * or we might get incomplete data.)
diff --git a/src/backend/access/nbtree/nbtsort.c 
b/src/backend/access/nbtree/nbtsort.c
index f5d7b3b0c3..232e1a0942 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1611,6 +1611,10 @@ _bt_end_parallel(BTLeader *btleader)
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
 
+   pgstat_update_parallel_maint_workers_stats(
+   (PgStat_Counter) btleader->pcxt->nworkers_to_launch,
+   (PgStat_Counter) btleader->pcxt->nworkers_launched);
+
/*
 * Next, accumulate WAL usage.  (This must wait for the workers to 
finish,
 * or we might get incomplete data.)
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 19cabc9a47..48bf9e5535 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1073,6 +1073,10 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_sessions_abandoned(D.oid) AS sessions_abandoned,
 pg_stat_get_db_sessions_fatal(D.oid) AS sessions_fatal,
 pg_stat_get_db_sessions_killed(D.oid) AS sessions_killed,
+pg_stat_get_db_parallel_workers_planned(D.oid) as 
parallel_workers_planned,
+pg_stat_get_db_parallel_workers_launched(D.oid) as 
parallel_workers_launched,
+pg_stat_get_db_parallel_maint_workers_planned(D.oid) as 
parallel_maint_workers_planned,
+pg_stat_get_db_parallel_maint_workers_launched(D.oid) as 
parallel_maint_workers_launched,
 pg_stat_get_db_stat_reset_time(D.oid) AS stats_re

Re: Exit walsender before confirming remote flush in logical replication

2024-09-17 Thread Greg Sabino Mullane
Thanks for everyone's work on this, I am very interested in it getting into
a release. What is the status of this?

My use case is Patroni - when it needs to do a failover, it shuts down the
primary. However, large transactions can cause it to stay in the "shutting
down" state for a long time, which means your entire HA system is now
non-functional. I like the idea of a new flag. I'll test this out soon if
the original authors want to make a rebased patch. This thread is old, so
if I don't hear back in a bit, I'll create and test a new one myself. :)

Cheers,
Greg


Re: per backend I/O statistics

2024-09-17 Thread Bertrand Drouvot
Hi,

On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
>  wrote:
> > On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
> > > - The pgstat_reset_io_counter_internal() is called in the
> > > pgstat_shutdown_hook(). This causes the stats_reset column showing the
> > > termination time of the old backend when its proc num is reassigned to
> > > a new backend.
> >
> > doh! Nice catch, thanks!
> >
> > And also new backends that are not re-using a previous "existing" process 
> > slot
> > are getting the last reset time (if any). So the right place to fix this is 
> > in
> > pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset 
> > time to
> > 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just 
> > to be
> > right "conceptually" speaking).
> 
> Thanks, I confirm that it is fixed.

Thanks!

> You mentioned some refactoring upthread:
> 
> On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
>  wrote:
> >
> > - If we agree on the current proposal then I'll do  some refactoring in
> > pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
> > duplicated code (it's not done yet to ease the review).
> 
> Could we remove pg_stat_get_my_io() completely and use
> pg_stat_get_backend_io() with the current backend's pid to get the
> current backend's stats?

The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the
statistics snapshot is build for "my backend stats" (means it depends of the
stats_fetch_consistency value). I can see use cases for that.

OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
none (each execution re-fetches counters from shared memory) (see [2]). Indeed,
the snapshot is not build in each backend to copy all the others backends stats,
as 1/ I think that there is no use case (there is no need to get others backends
I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
could be memory expensive depending of the number of max connections.

So I think it's better to keep both functions as they behave differently.

Thoughts?

> If you meant the same thing, please don't
> mind it.

What I meant to say is to try to remove the duplicate code that we can find in
those 3 functions (say creating a new function that would contain the duplicate
code and make use of this new function in the 3 others). Did not look at it in
details yet.

[1]: 
https://www.postgresql.org/message-id/ZtXR%2BCtkEVVE/LHF%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: 
https://www.postgresql.org/message-id/ZtsZtaRza9bFFeF8%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: per backend I/O statistics

2024-09-17 Thread Nazir Bilal Yavuz
Hi,

On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
 wrote:
> On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
> > Could we remove pg_stat_get_my_io() completely and use
> > pg_stat_get_backend_io() with the current backend's pid to get the
> > current backend's stats?
>
> The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), 
> the
> statistics snapshot is build for "my backend stats" (means it depends of the
> stats_fetch_consistency value). I can see use cases for that.
>
> OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
> none (each execution re-fetches counters from shared memory) (see [2]). 
> Indeed,
> the snapshot is not build in each backend to copy all the others backends 
> stats,
> as 1/ I think that there is no use case (there is no need to get others 
> backends
> I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
> could be memory expensive depending of the number of max connections.
>
> So I think it's better to keep both functions as they behave differently.
>
> Thoughts?

Yes, that is correct. Sorry, you already had explained it and I had
agreed with it but I forgot.

> > If you meant the same thing, please don't
> > mind it.
>
> What I meant to say is to try to remove the duplicate code that we can find in
> those 3 functions (say creating a new function that would contain the 
> duplicate
> code and make use of this new function in the 3 others). Did not look at it in
> details yet.

I got it, thanks for the explanation.

--
Regards,
Nazir Bilal Yavuz
Microsoft




Re: per backend I/O statistics

2024-09-17 Thread Bertrand Drouvot
Hi,

On Tue, Sep 17, 2024 at 04:47:51PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
>  wrote:
> > So I think it's better to keep both functions as they behave differently.
> >
> > Thoughts?
> 
> Yes, that is correct. Sorry, you already had explained it and I had
> agreed with it but I forgot.

No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: jsonb_strip_nulls with arrays?

2024-09-17 Thread Andrew Dunstan


On 2024-09-17 Tu 5:26 AM, Florents Tselai wrote:


Currently:


|jsonb_strip_nulls| ( |jsonb| ) → |jsonb|

Deletes all object fields that have null values from the given JSON 
value, recursively. Null values that are not object fields are untouched.



> Null values that are not object fields are untouched.


Can we revisit this and make it work with arrays, too?

Tbh, at first sight that looked like the expected behavior for me.

That is strip nulls from arrays as well.


This has been available since 9.5 and iiuc predates lots of the jsonb 
array work.




I don't think that's a great idea. Removing an object field which has a 
null value shouldn't have any effect on the surrounding data, nor really 
any on other operations (If you try to get the value of the missing 
field it should give you back null). But removing a null array member 
isn't like that at all - unless it's the trailing member of the array it 
will renumber all the succeeding array members.


And I don't think we should be changing the behaviour of a function, 
that people might have been relying on for the better part of a decade.





In practice, though, whenever jsonb_build_array is used (especially 
with jsonpath),


a few nulls do appear in the resulting array most of the times,

Currently, there’s no expressive way to remove this.


We could also have jsonb_array_strip_nulls(jsonb) as well



We could, if we're going to do anything at all in this area. Another 
possibility would be to provide a second optional parameter for 
json{b}_strip_nulls. That's probably a better way to go.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Regression tests fail with tzdata 2024b

2024-09-17 Thread Tom Lane
Wolfgang Walther  writes:
> The core regression tests need to be run with a timezone that tests 
> special cases in the timezone handling code. But that might not be true 
> for extensions - all they want could be a stable output across major and 
> minor versions of postgres and versions of tzdata. It could be helpful 
> to set pg_regress' timezone to UTC, for example?

I would not recommend that choice.  It would mask simple errors such
as failing to apply the correct conversion between timestamptz and
timestamp values.  Also, if you have test cases that are affected by
this issue at all, you probably have a need/desire to test things like
DST transitions.

regards, tom lane




Re: A starter task

2024-09-17 Thread Andrew Dunstan


On 2024-09-15 Su 6:17 PM, sia kc wrote:



About inlining not sure how it is done with gmail. Maybe should use 
another email client.



Click the three dots with the tooltip "Show trimmed content". Then you 
can scroll down and put your reply inline. (Personally I detest the 
Gmail web interface, and use a different MUA, but you can do this even 
with the Gmail web app.)



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bind

2024-09-17 Thread Alexander Lakhin

Hello Michael and Anthonin,

22.08.2024 10:33, Michael Paquier wrote:

Looks OK to me.  I have spent more time double-checking the whole, and
it looks like we're there, so applied.  Now let's play with it in more
regression tests.  Note that the refactoring patch has been merged
with the original one in a single commit.


Please look at an assertion failure, caused by \bind_named:
regression=# SELECT $1 \parse s
\bind_named s

regression=# \bind_named
\bind_named: missing required argument
regression=# 1 \g
psql: common.c:1501: ExecQueryAndProcessResults: Assertion `pset.stmtName != 
((void *)0)' failed.

Best regards,
Alexander




Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-17 Thread Peter Geoghegan
On Mon, Sep 16, 2024 at 3:13 PM Peter Geoghegan  wrote:
> I agree with your approach, but I'm concerned about it causing
> confusion inside _bt_parallel_done. And so I attach a v2 revision of
> your bug fix. v2 adds a check that nails that down, too.

Pushed this just now.

Thanks
-- 
Peter Geoghegan




RE: AIX support

2024-09-17 Thread Srirama Kucherlapati
> Do you still need mkldexport.sh? Surely there's a better way to do that
> in year 2024. Some quick googling says there's a '-bexpall' option to
> 'ld', which kind of sounds like what we want. Will that work? How do
> other programs do this?

We have noticed couple of caveats with these flags -bexpall/-bexpfull in other
opensource tools on AIX.  This option would export too many symbols causing
problems because a shared library may re-export symbols from another library
causing confused dependencies, duplicate symbols.




We have similar discussion wrt to these flag in Cmake

https://gitlab.kitware.com/cmake/cmake/-/issues/19163





Also, I tried some sample program to verify the same as below



>> cat foo.c

#include 

#include 

int func1()

{

char str1[] = "Hello ", str2[] = "world! ";

strcat(str1,str2);

puts(str1);

return 0;

}



>> gcc -c foo.c -o foo.o

>> gcc -shared -Wl,-bexpall -o foo.so foo.o

>> dump -Tov foo.so



foo.so:



***Object Module Header***

# Sections  Symbol Ptr  # Symbols   Opt Hdr Len Flags

 4  0x0d8812072 0x3002

Flags=( EXEC DYNLOAD SHROBJ DEP_SYSTEM )

Timestamp = "Sep 17 10:17:35 2024"

Magic = 0x1df  (32-bit XCOFF)



***Optional Header***

Tsize   Dsize   Bsize   Tstart  Dstart

0x0548  0x010c  0x0004  0x1128  0x2670



SNloaderSNentry SNtext  SNtoc   SNdata

0x0004  0x  0x0001  0x0002  0x0002



TXTalignDATAalign   TOC vstamp  entry

0x0005  0x0004  0x2750  0x0001  0x



maxSTACKmaxDATA SNbss   magic   modtype

0x  0x  0x0003  0x010bRE



***Loader Section***



***Loader Symbol Table Information***

[Index]  Value  Scn IMEX Sclass   Type   IMPid Name



[0] 0x268c.data  RW SECdef[noIMid] __rtinit

[1] 0xundef  IMP DS EXTref libgcc_s.a(shr.o) 
__cxa_finalize

[2] 0xundef  IMP DS EXTref libgcc_s.a(shr.o) 
_GLOBAL__AIXI_shr_o

[3] 0xundef  IMP DS EXTref libgcc_s.a(shr.o) 
_GLOBAL__AIXD_shr_o

[4] 0xundef  IMP DS EXTref   libc.a(shr.o) 
__strtollmax

[5] 0xundef  IMP DS EXTref   libc.a(shr.o) puts

[6] 0x26f4.data  EXP DS   Ldef[noIMid] 
__init_aix_libgcc_cxa_atexit

[7] 0x2724.data  EXP DS   Ldef[noIMid] 
_GLOBAL__AIXI_foo_so

[8] 0x2730.data  EXP DS   Ldef[noIMid] 
_GLOBAL__AIXD_foo_so

>>[9] 0x273c.data  EXP DS SECdef[noIMid] strcat

[10]0x2744.data  EXP DS   Ldef[noIMid] func1



The code makes use of strcat from libc but re-exports the symbol (because of 
-bexpall).





As of now due to the limitation with these flags (-bexpall / -bexpfull ? ), the

solution here is to continue to extract the symbols from the object files and

use that export file as part of building the shared library. (Continue to use

the mkldexport.sh script to generate the export symbols).



Thanks,
Sriram.




Re: Conflict detection for update_deleted in logical replication

2024-09-17 Thread Masahiko Sawada
On Mon, Sep 16, 2024 at 11:53 PM Amit Kapila  wrote:
>
> On Tue, Sep 17, 2024 at 6:08 AM Masahiko Sawada  wrote:
> >
> > On Fri, Sep 13, 2024 at 12:56 AM shveta malik  
> > wrote:
> > >
> > > On Fri, Sep 13, 2024 at 11:38 AM Amit Kapila  
> > > wrote:
> > > >
> > > > > >
> > > > > > So in brief, this solution is only for bidrectional setup? For 
> > > > > > non-bidirectional,
> > > > > > feedback_slots is non-configurable and thus irrelevant.
> > > > >
> > > > > Right.
> > > > >
> > > >
> > > > One possible idea to address the non-bidirectional case raised by
> > > > Shveta is to use a time-based cut-off to remove dead tuples. As
> > > > mentioned earlier in my email [1], we can define a new GUC parameter
> > > > say vacuum_committs_age which would indicate that we will allow rows
> > > > to be removed only if the modified time of the tuple as indicated by
> > > > committs module is greater than the vacuum_committs_age. We could keep
> > > > this parameter a table-level option without introducing a GUC as this
> > > > may not apply to all tables. I checked and found that some other
> > > > replication solutions like GoldenGate also allowed similar parameters
> > > > (tombstone_deletes) to be specified at table level [2]. The other
> > > > advantage of allowing it at table level is that it won't hamper the
> > > > performance of hot-pruning or vacuum in general. Note, I am careful
> > > > here because to decide whether to remove a dead tuple or not we need
> > > > to compare its committs_time both during hot-pruning and vacuum.
> > >
> > > +1 on the idea,
> >
> > I agree that this idea is much simpler than the idea originally
> > proposed in this thread.
> >
> > IIUC vacuum_committs_age specifies a time rather than an XID age.
> >
>
> Your understanding is correct that vacuum_committs_age specifies a time.
>
> >
> > But
> > how can we implement it? If it ends up affecting the vacuum cutoff, we
> > should be careful not to end up with the same result of
> > vacuum_defer_cleanup_age that was discussed before[1]. Also, I think
> > the implementation needs not to affect the performance of
> > ComputeXidHorizons().
> >
>
> I haven't thought about the implementation details yet but I think
> during pruning (for example in heap_prune_satisfies_vacuum()), apart
> from checking if the tuple satisfies
> HeapTupleSatisfiesVacuumHorizon(), we should also check if the tuple's
> committs is greater than configured vacuum_committs_age (for the
> table) to decide whether tuple can be removed.

Sounds very costly. I think we need to do performance tests. Even if
the vacuum gets slower only on the particular table having the
vacuum_committs_age setting, it would affect overall autovacuum
performance. Also, it would affect HOT pruning performance.

>
> > > but IIUC this value doesn’t need to be significant; it
> > > can be limited to just a few minutes. The one which is sufficient to
> > > handle replication delays caused by network lag or other factors,
> > > assuming clock skew has already been addressed.
> >
> > I think that in a non-bidirectional case the value could need to be a
> > large number. Is that right?
> >
>
> As per my understanding, even for non-bidirectional cases, the value
> should be small. For example, in the case, pointed out by Shveta [1],
> where the updates from 2 nodes are received by a third node, this
> setting is expected to be small. This setting primarily deals with
> concurrent transactions on multiple nodes, so it should be small but I
> could be missing something.
>

I might be missing something but the scenario I was thinking of is
something below.

Suppose that we setup uni-directional logical replication between Node
A and Node B (e.g., Node A -> Node B) and both nodes have the same row
with key = 1:

Node A:
T1: UPDATE t SET val = 2 WHERE key = 1; (10:00 AM)
  -> This change is applied on Node B at 10:01 AM.

Node B:
T2: DELETE FROM t WHERE key = 1; (05:00 AM)

If a vacuum runs on Node B at 06:00 AM, the change of T1 coming from
Node A would raise an "update_missing" conflict. On the other hand, if
a vacuum runs on Node B at 11:00 AM, the change would raise an
"update_deleted" conflict. It looks whether we detect an
"update_deleted" or an "updated_missing" depends on the timing of
vacuum, and to avoid such a situation, we would need to set
vacuum_committs_age to more than 5 hours.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: AIO v2.0

2024-09-17 Thread Noah Misch
On Mon, Sep 16, 2024 at 01:51:42PM -0400, Andres Freund wrote:
> On 2024-09-16 07:43:49 -0700, Noah Misch wrote:
> > On Fri, Sep 06, 2024 at 03:38:16PM -0400, Andres Freund wrote:

> > Reattaching descriptors and memory in each child may work, or one could just
> > block io_method=io_uring under EXEC_BACKEND.
> 
> I think the latter option is saner

Works for me.

> > > +pgaio_uring_submit(uint16 num_staged_ios, PgAioHandle **staged_ios)
> > > +{
> 
> > > + if (ret == -EINTR)
> > > + {
> > > + elog(DEBUG3, "submit EINTR, nios: %d", num_staged_ios);
> > > + continue;
> > > + }
> >
> > Since io_uring_submit() is a wrapper around io_uring_enter(), this should 
> > also
> > retry on EAGAIN.  "man io_uring_enter" has:
> >
> > EAGAIN The kernel was unable to allocate memory for the request, or
> > otherwise ran out of resources to handle it. The application should wait
> > for some completions and try again.
> 
> Hm. I'm not sure that makes sense. We only allow a limited number of IOs to be
> in flight for each uring instance. That's different to a use of uring to
> e.g. wait for incoming network data on thousands of sockets, where you could
> have essentially unbounded amount of requests outstanding.
> 
> What would we wait for? What if we were holding a critical lock in that
> moment? Would it be safe to just block for some completions? What if there's
> actually no IO in progress?

I'd try the following.  First, scan for all IOs of all processes at
AHS_DEFINED and later, advancing them to AHS_COMPLETED_SHARED.  This might be
unsafe today, but discovering why it's unsafe likely will inform design beyond
EAGAIN returns.  I don't specifically know of a way it's unsafe.  Do just one
pass of that; there may be newer IOs in progress afterward.  If submit still
gets EAGAIN, sleep a bit and retry.  Like we do in pgwin32_open_handle(), fail
after a fixed number of iterations.  This isn't great if we hold a critical
lock, but it beats the alternative of PANIC on the first EAGAIN.

> > > +FileStartWriteV(struct PgAioHandle *ioh, File file,
> > > + int iovcnt, off_t offset,
> > > + uint32 wait_event_info)
> > > +{
> > > ...
> >
> > FileStartWriteV() gets to state AHS_PREPARED, so let's align with the state
> > name by calling it FilePrepareWriteV (or FileWriteVPrepare or whatever).
> 
> Hm - that doesn't necessarily seem right to me. I don't think the caller
> should assume that the IO will just be prepared and not already completed by
> the time FileStartWriteV() returns - we might actually do the IO
> synchronously.

Yes.  Even if it doesn't become synchronous IO, some other process may advance
the IO to AHS_COMPLETED_SHARED by the next wake-up of the process that defined
the IO.  Still, I think this shouldn't use the term "Start" while no state
name uses that term.  What else could remove that mismatch?

> > Is there any specific decision you'd like to settle before patch 6 exits
> > WIP?
> 
> Patch 6 specifically? That I really mainly kept separate for review - it

No.  I'll rephrase as "Is there any specific decision you'd like to settle
before the next cohort of patches exits WIP?"

> doesn't seem particularly interesting to commit it earlier than 7, or do you
> think differently?

No, I agree a lone commit of 6 isn't a win.  Roughly, the eight patches
6-9,12-15 could be a minimal attractive unit.  I've not thought through that
grouping much.

> In case you mean 6+7 or 6 to ~11, I can think of the following:
> 
> - I am worried about the need for bounce buffers for writes of checksummed
>   buffers. That quickly ends up being a significant chunk of memory,
>   particularly when using a small shared_buffers with a higher than default
>   number of connection. I'm currently hacking up a prototype that'd prevent us
>   from setting hint bits with just a share lock. I'm planning to start a
>   separate thread about that.

AioChooseBounceBuffers() limits usage to 256 blocks (2MB) per MaxBackends.
Doing better is nice, but I don't consider this a blocker.  I recommend
dealing with the worry by reducing the limit initially (128 blocks?).  Can
always raise it later.

> - The header split doesn't yet quite seem right yet

I won't have a strong opinion on that one.  The aio.c/aio_io.c split did catch
my attention.  I made a note to check it again once those files get header
comments.

> - I'd like to implement retries in the later patches, to make sure that it
>   doesn't have design implications

Yes, that's a blocker to me.

> - Worker mode needs to be able to automatically adjust the number of running
>   workers, I think - otherwise it's going to be too hard to tune.

Changing that later wouldn't affect much else, so I'd not consider it a
blocker.  (The worst case is that we think the initial AIO release will be a
loss for most users, so we wrap it in debug_ terminology like we did for
debug_io_direct.  I'm

Re: Using per-transaction memory contexts for storing decoded tuples

2024-09-17 Thread Masahiko Sawada
On Tue, Sep 17, 2024 at 2:06 AM Amit Kapila  wrote:
>
> On Mon, Sep 16, 2024 at 10:43 PM Masahiko Sawada  
> wrote:
> >
> > On Fri, Sep 13, 2024 at 3:58 AM Amit Kapila  wrote:
> > >
> > > On Thu, Sep 12, 2024 at 4:03 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > We have several reports that logical decoding uses memory much more
> > > > than logical_decoding_work_mem[1][2][3]. For instance in one of the
> > > > reports[1], even though users set logical_decoding_work_mem to
> > > > '256MB', a walsender process was killed by OOM because of using more
> > > > than 4GB memory.
> > > >
> > > > As per the discussion in these threads so far, what happened was that
> > > > there was huge memory fragmentation in rb->tup_context.
> > > > rb->tup_context uses GenerationContext with 8MB memory blocks. We
> > > > cannot free memory blocks until all memory chunks in the block are
> > > > freed. If there is a long-running transaction making changes, its
> > > > changes could be spread across many memory blocks and we end up not
> > > > being able to free memory blocks unless the long-running transaction
> > > > is evicted or completed. Since we don't account fragmentation, block
> > > > header size, nor chunk header size into per-transaction memory usage
> > > > (i.e. txn->size), rb->size could be less than
> > > > logical_decoding_work_mem but the actual allocated memory in the
> > > > context is much higher than logical_decoding_work_mem.
> > > >
> > >
> > > It is not clear to me how the fragmentation happens. Is it because of
> > > some interleaving transactions which are even ended but the memory
> > > corresponding to them is not released?
> >
> > In a generation context, we can free a memory block only when all
> > memory chunks there are freed. Therefore, individual tuple buffers are
> > already pfree()'ed but the underlying memory blocks are not freed.
> >
>
> I understood this part but didn't understand the cases leading to this
> problem. For example, if there is a large (and only) transaction in
> the system that allocates many buffers for change records during
> decoding, in the end, it should free memory for all the buffers
> allocated in the transaction. So, wouldn't that free all the memory
> chunks corresponding to the memory blocks allocated? My guess was that
> we couldn't free all the chunks because there were small interleaving
> transactions that allocated memory but didn't free it before the large
> transaction ended.

We haven't actually checked with the person who reported the problem,
so this is just a guess, but I think there were concurrent
transactions, including long-running INSERT transactions. For example,
suppose a transaction that inserts 10 million rows and many OLTP-like
(short) transactions are running at the same time. The scenario I
thought of was that one 8MB Generation Context Block contains 1MB of
the large insert transaction changes, and the other 7MB contains short
OLTP transaction changes. If there are just 256 such blocks, even
after all short-transactions have completed, the Generation Context
will have allocated 2GB of memory until we decode the commit record of
the large transaction, but rb->size will say 256MB.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Detailed release notes

2024-09-17 Thread Marcos Pegoraro
Em ter., 17 de set. de 2024 às 05:12, Alvaro Herrera <
alvhe...@alvh.no-ip.org> escreveu:

> Add backend support for injection points (Michael Paquier) [commit 1] [2]
> [3] [4]
>
I think this way would be fine.

And it would be good to have a
target="_blank"
on commit links so a new window or tab would be opened instead of going
back and forth.
This way you can open these 4 links and then navigate on them to see
exactly what was done on every commit.

regards
Marcos


Re: [PATCH] WIP: replace method for jsonpath

2024-09-17 Thread David E. Wheeler
On Sep 16, 2024, at 18:39, Florents Tselai  wrote:

> Here’s an updated version of this patch.

Oh, nice function.

But a broader question for hackers: Is replace() specified in the SQL/JSON 
spec? If not, what’s the process for evaluating whether or not to add features 
not specified by the spec?

> As a future note:
> It’s worth noting that both this newly added jspItem and other ones like 
> (jpiDecimal, jpiString)
> use jspGetRightArg and jspGetLeftArg.
> left and right can be confusing if more complex methods are added in the 
> future.
> i.e. jsonpath methods with nargs>=3 .
> I was wondering if we’d like something like JSP_GETARG(n)

So far I think we have only functions defined by the spec, which tend to be 
unary or binary, so left and right haven’t been an issue.

Best,

David





Re: [PATCH] WIP: replace method for jsonpath

2024-09-17 Thread Florents Tselai



> On 17 Sep 2024, at 9:40 PM, David E. Wheeler  wrote:
> 
> On Sep 16, 2024, at 18:39, Florents Tselai  wrote:
> 
>> Here’s an updated version of this patch.
> 
> Oh, nice function.
> 
> But a broader question for hackers: Is replace() specified in the SQL/JSON 
> spec? If not, what’s the process for evaluating whether or not to add 
> features not specified by the spec?

That’s the first argument I was expecting, and it’s a valid one.

From a user’s perspective the answer is:
Why not?
The more text-processing facilities I have in jsonb,
The less back-and-forth-parentheses-fu I do,
The easier my life is.

From a PG gatekeeping it’s a more complicated issue. 

It’s not part of the spec, 
But I think the jsonb infrastructure in PG is really powerful already and we 
can built on it,
And can evolve into a superset DSL of jsonpath.

For example, apache/age have lift-and-shifted this infra and built another DSL 
(cypher) on top of it.

> 
>> As a future note:
>> It’s worth noting that both this newly added jspItem and other ones like 
>> (jpiDecimal, jpiString)
>> use jspGetRightArg and jspGetLeftArg.
>> left and right can be confusing if more complex methods are added in the 
>> future.
>> i.e. jsonpath methods with nargs>=3 .
>> I was wondering if we’d like something like JSP_GETARG(n)
> 
> So far I think we have only functions defined by the spec, which tend to be 
> unary or binary, so left and right haven’t been an issue.

If the answer to the Q above is: “we stick to the spec” then there’s no 
thinking about this.

But tbh, I’ve already started experimenting  with other text methods in text 
$.strip() / trim() / upper() / lower() etc.

Fallback scenario: make this an extension, but in a first pass I didn’t find 
any convenient hooks.
One has to create a whole new scanner, grammar etc.

> 
> Best,
> 
> David
> 





Re: query_id, pg_stat_activity, extended query protocol

2024-09-17 Thread Michael Paquier
On Wed, Sep 18, 2024 at 07:50:27AM +0900, Michael Paquier wrote:
> On Tue, Sep 17, 2024 at 05:01:18PM -0500, Sami Imseih wrote:
> > > Then, please see attached two lightly-updated patches. 0001 is for a
> > > backpatch down to v14. This is yours to force things in the exec and
> > > bind messages for all portal types, with the test (placed elsewhere in
> > > 14~15 branches). 0002 is for HEAD to add some sanity checks, blowing
> > > up the tests of pg_stat_statements if one is not careful with the
> > > query ID reporting.
> > 
> > These 2 patches look good to me; except for the slight typo
> > In the commit message of 0002. "backpatch" instead of "backpatck".
> 
> Yes, I've noticed this one last Friday and fixed the typo in the
> commit log after sending the previous patch series.

So, I have applied 0001 down to 14, followed by 0002 on HEAD.

For the sake of completeness, I have tested all the five
PortalStrategys with the extended query protocol and with the sanity
checks of 0002 in place and compute_query_id=regress to force the
computations, so I'd like to think that we are pretty good now.

0002 is going to be interesting to see moving forward.  I am wondering
how existing out-of-core extensions will react on that and if it will
help catching up any issues.  So, let's see how the experiment goes
with HEAD on this side.  Perhaps we'll have to revert 0002 at the end,
or perhaps not...
--
Michael


signature.asc
Description: PGP signature


Re: Conflict detection for update_deleted in logical replication

2024-09-17 Thread Amit Kapila
On Tue, Sep 17, 2024 at 11:24 PM Masahiko Sawada  wrote:
>
> On Mon, Sep 16, 2024 at 11:53 PM Amit Kapila  wrote:
> >
> > On Tue, Sep 17, 2024 at 6:08 AM Masahiko Sawada  
> > wrote:
> >
> > I haven't thought about the implementation details yet but I think
> > during pruning (for example in heap_prune_satisfies_vacuum()), apart
> > from checking if the tuple satisfies
> > HeapTupleSatisfiesVacuumHorizon(), we should also check if the tuple's
> > committs is greater than configured vacuum_committs_age (for the
> > table) to decide whether tuple can be removed.
>
> Sounds very costly. I think we need to do performance tests. Even if
> the vacuum gets slower only on the particular table having the
> vacuum_committs_age setting, it would affect overall autovacuum
> performance. Also, it would affect HOT pruning performance.
>

Agreed that we should do some performance testing and additionally
think of any better way to implement. I think the cost won't be much
if the tuples to be removed are from a single transaction because the
required commit_ts information would be cached but when the tuples are
from different transactions, we could see a noticeable impact. We need
to test to say anything concrete on this.

> >
> > > > but IIUC this value doesn’t need to be significant; it
> > > > can be limited to just a few minutes. The one which is sufficient to
> > > > handle replication delays caused by network lag or other factors,
> > > > assuming clock skew has already been addressed.
> > >
> > > I think that in a non-bidirectional case the value could need to be a
> > > large number. Is that right?
> > >
> >
> > As per my understanding, even for non-bidirectional cases, the value
> > should be small. For example, in the case, pointed out by Shveta [1],
> > where the updates from 2 nodes are received by a third node, this
> > setting is expected to be small. This setting primarily deals with
> > concurrent transactions on multiple nodes, so it should be small but I
> > could be missing something.
> >
>
> I might be missing something but the scenario I was thinking of is
> something below.
>
> Suppose that we setup uni-directional logical replication between Node
> A and Node B (e.g., Node A -> Node B) and both nodes have the same row
> with key = 1:
>
> Node A:
> T1: UPDATE t SET val = 2 WHERE key = 1; (10:00 AM)
>   -> This change is applied on Node B at 10:01 AM.
>
> Node B:
> T2: DELETE FROM t WHERE key = 1; (05:00 AM)
>
> If a vacuum runs on Node B at 06:00 AM, the change of T1 coming from
> Node A would raise an "update_missing" conflict. On the other hand, if
> a vacuum runs on Node B at 11:00 AM, the change would raise an
> "update_deleted" conflict. It looks whether we detect an
> "update_deleted" or an "updated_missing" depends on the timing of
> vacuum, and to avoid such a situation, we would need to set
> vacuum_committs_age to more than 5 hours.
>

Yeah, in this case, it would detect a different conflict (if we don't
set vacuum_committs_age to greater than 5 hours) but as per my
understanding, the primary purpose of conflict detection and
resolution is to avoid data inconsistency in a bi-directional setup.
Assume, in the above case it is a bi-directional setup, then we want
to have the same data in both nodes. Now, if there are other cases
like the one you mentioned that require to detect the conflict
reliably than I agree this value could be large and probably not the
best way to achieve it. I think we can mention in the docs that the
primary purpose of this is to achieve data consistency among
bi-directional kind of setups.

Having said that even in the above case, the result should be the same
whether the vacuum has removed the row or not. Say, if the vacuum has
not yet removed the row (due to vacuum_committs_age or otherwise) then
also because the incoming update has a later timestamp, we will
convert the update to insert as per last_update_wins resolution
method, so the conflict will be considered as update_missing. And,
say, the vacuum has removed the row and the conflict detected is
update_missing, then also we will convert the update to insert. In
short, if UPDATE has lower commit-ts, DELETE should win and if UPDATE
has higher commit-ts, UPDATE should win.

So, we can expect data consistency in bidirectional cases and expect a
deterministic behavior in other cases (e.g. the final data in a table
does not depend on the order of applying the transactions from other
nodes).

-- 
With Regards,
Amit Kapila.




Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-09-17 Thread Tom Lane
=?UTF-8?B?0JXQs9C+0YAg0KfQuNC90LTRj9GB0LrQuNC9?=  writes:
> This query does not expect that test database may already contain some 
> information about custom user that ran test_pg_dump-running.

I'm perfectly content to reject this as being an abuse of the test
case.  Our TAP tests are built on the assumption that they use
databases created within the test case.  Apparently, you've found a
way to use the meson test infrastructure to execute a TAP test in
the equivalent of "make installcheck" rather than "make check" mode.
I am unwilling to buy into the proposition that our TAP tests should
be proof against doing that after making arbitrary changes to the
database's initial state.  If anything, the fact that this is possible
is a bug in our meson scripts.

regards, tom lane




Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

2024-09-17 Thread nikhil raj
Hi All,

I hope you're doing well.

I'm writing to kindly requesting if there is a bug tracker ID or any
reference number associated with this issue, I would appreciate it if you
could share it with me.

Thank you for your time and assistance. Please let me know if there's any
additional information you need from me.

Best regards,

Nikhil

On Fri, 30 Aug, 2024, 2:23 am Tom Lane,  wrote:

> Richard Guo  writes:
> > On Thu, Aug 29, 2024 at 4:47 AM Tom Lane  wrote:
> >> In the meantime, I think this test case is mighty artificial,
> >> and it wouldn't bother me any to just take it out again for the
> >> time being.
>
> > Yeah, I think we can remove the 't1.two+t2.two' test case if we go
> > with your proposed patch to address this performance regression.
>
> Here's a polished-up patchset for that.  I made the memoize test
> removal a separate patch because (a) it only applies to master
> and (b) it seems worth calling out as something we might be able
> to revert later.
>
> I found one bug in the draft patch: add_nulling_relids only processes
> Vars of level zero, so we have to apply it before not after adjusting
> the Vars' levelsup.  An alternative could be to add a levelsup
> parameter to add_nulling_relids, but that seemed like unnecessary
> complication.
>
> regards, tom lane
>
>


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-17 Thread shveta malik
On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
>
> Please find the attached v46 patch having changes for the above review
> comments and your test review comments and Shveta's review comments.
>

Thanks for addressing comments.

Is there a reason that we don't support this invalidation on hot
standby for non-synced slots? Shouldn't we support this time-based
invalidation there too just like other invalidations?

thanks
Shveta




Re: Conflict Detection and Resolution

2024-09-17 Thread vignesh C
On Thu, 12 Sept 2024 at 14:03, Ajin Cherian  wrote:
>
> On Tue, Sep 3, 2024 at 7:42 PM vignesh C  wrote:
>
> On Fri, 30 Aug 2024 at 11:01, Nisha Moond  
> wrote:
> >
> > Here is the v11 patch-set. Changes are:
>
> 1) This command crashes:
> ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR NULL;
> #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116
> #1  0x55c67270600a in ResetConflictResolver (subid=16404,
> conflict_type=0x0) at conflict.c:744
> #2  0x55c67247e0c3 in AlterSubscription (pstate=0x55c6748ff9d0,
> stmt=0x55c67497dfe0, isTopLevel=true) at subscriptioncmds.c:1664
>
> +   | ALTER SUBSCRIPTION name RESET CONFLICT
> RESOLVER FOR conflict_type
> +   {
> +   AlterSubscriptionStmt *n =
> +   
> makeNode(AlterSubscriptionStmt);
> +
> +   n->kind =
> ALTER_SUBSCRIPTION_RESET_CONFLICT_RESOLVER;
> +   n->subname = $3;
> +   n->conflict_type = $8;
> +   $$ = (Node *) n;
> +   }
> +   ;
> +conflict_type:
> +   Sconst
>  { $$ = $1; }
> +   | NULL_P
>  { $$ = NULL; }
> ;
>
> May be conflict_type should be changed to:
> +conflict_type:
> +   Sconst
>  { $$ = $1; }
> ;
>
>
> Fixed.
>

Few comments:
1) This should be in  (fout->remoteVersion >= 18) check to support
dumping backward compatible server objects, else dump with older
version will fail:
+   /* Populate conflict type fields using the new query */
+   confQuery = createPQExpBuffer();
+   appendPQExpBuffer(confQuery,
+ "SELECT confrtype,
confrres FROM pg_catalog.pg_subscription_conflict "
+ "WHERE confsubid =
%u;", subinfo[i].dobj.catId.oid);
+   confRes = ExecuteSqlQuery(fout, confQuery->data,
PGRES_TUPLES_OK);
+
+   ntuples = PQntuples(confRes);
+   for (j = 0; j < ntuples; j++)

2) Can we check and throw an error before the warning is logged in
this case as it seems strange to throw a warning first and then an
error for the same track_commit_timestamp configuration:
postgres=# create subscription sub1 connection ... publication pub1
conflict resolver (insert_exists = 'last_update_wins');
WARNING:  conflict detection and resolution could be incomplete due to
disabled track_commit_timestamp
DETAIL:  Conflicts update_origin_differs and delete_origin_differs
cannot be detected, and the origin and commit timestamp for the local
row will not be logged.
ERROR:  resolver last_update_wins requires "track_commit_timestamp" to
be enabled
HINT:  Make sure the configuration parameter "track_commit_timestamp" is set.

Regards,
Vignesh




define pg_structiszero(addr, s, r)

2024-09-17 Thread Bertrand Drouvot
Hi hackers,

There is some places where we check that a struct is full of zeroes:

pgstat_report_bgwriter()
pgstat_report_checkpointer()
pgstat_relation_flush_cb()

Indeed that's the way we check if there is pending statistics to flush/report.

The current code is like (taking pgstat_relation_flush_cb() as an example):

"
static const PgStat_TableCounts all_zeroes;
.
.
if (memcmp(&lstats->counts, &all_zeroes,
  sizeof(PgStat_TableCounts)) == 0)
.
.
"

The static declaration is not "really" related to the purpose of the function
it is declared in. It's there "only" to initialize a memory area with zeroes 
and to use it in the memcmp.

I think it would make sense to "hide" all of this in a new macro, so please find
attached a patch proposal doing so (Andres suggested something along those lines
in [1] IIUC).

The macro is created in pgstat_internal.h as it looks like that "only" the 
statistics related code would benefit of it currently (could be moved to other
header file later on if needed).

[1]: 
https://www.postgresql.org/message-id/20230105002733.ealhzubjaiqis6ua%40awork3.anarazel.de

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 2e9071b54f7fd28027759de871b662a3eaa8e4a3 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 10 Sep 2024 01:22:02 +
Subject: [PATCH v1] define pg_structiszero(addr, s, r)

This new macro allows to test if a memory region (representing a struct "s")
starting at addr and of size sizeof(s) is full of zeroes.
---
 src/backend/utils/activity/pgstat_bgwriter.c |  6 --
 src/backend/utils/activity/pgstat_checkpointer.c |  9 +
 src/backend/utils/activity/pgstat_relation.c |  9 -
 src/include/utils/pgstat_internal.h  | 11 +++
 4 files changed, 24 insertions(+), 11 deletions(-)
  69.1% src/backend/utils/activity/
  30.8% src/include/utils/

diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c
index 364a7a2024..61cbc27760 100644
--- a/src/backend/utils/activity/pgstat_bgwriter.c
+++ b/src/backend/utils/activity/pgstat_bgwriter.c
@@ -30,7 +30,7 @@ void
 pgstat_report_bgwriter(void)
 {
 	PgStatShared_BgWriter *stats_shmem = &pgStatLocal.shmem->bgwriter;
-	static const PgStat_BgWriterStats all_zeroes;
+	bool		is_all_zeroes;
 
 	Assert(!pgStatLocal.shmem->is_shutdown);
 	pgstat_assert_is_up();
@@ -39,7 +39,9 @@ pgstat_report_bgwriter(void)
 	 * This function can be called even if nothing at all has happened. In
 	 * this case, avoid unnecessarily modifying the stats entry.
 	 */
-	if (memcmp(&PendingBgWriterStats, &all_zeroes, sizeof(all_zeroes)) == 0)
+	pg_structiszero(&PendingBgWriterStats, PgStat_BgWriterStats, is_all_zeroes);
+
+	if (is_all_zeroes)
 		return;
 
 	pgstat_begin_changecount_write(&stats_shmem->changecount);
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index bbfc9c7e18..3b9b688ae8 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -29,8 +29,7 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0};
 void
 pgstat_report_checkpointer(void)
 {
-	/* We assume this initializes to zeroes */
-	static const PgStat_CheckpointerStats all_zeroes;
+	bool		is_all_zeroes;
 	PgStatShared_Checkpointer *stats_shmem = &pgStatLocal.shmem->checkpointer;
 
 	Assert(!pgStatLocal.shmem->is_shutdown);
@@ -40,8 +39,10 @@ pgstat_report_checkpointer(void)
 	 * This function can be called even if nothing at all has happened. In
 	 * this case, avoid unnecessarily modifying the stats entry.
 	 */
-	if (memcmp(&PendingCheckpointerStats, &all_zeroes,
-			   sizeof(all_zeroes)) == 0)
+	pg_structiszero(&PendingCheckpointerStats, PgStat_CheckpointerStats,
+	is_all_zeroes);
+
+	if (is_all_zeroes)
 		return;
 
 	pgstat_begin_changecount_write(&stats_shmem->changecount);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 8a3f7d434c..09e519a6ff 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -801,7 +801,7 @@ pgstat_twophase_postabort(TransactionId xid, uint16 info,
 bool
 pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 {
-	static const PgStat_TableCounts all_zeroes;
+	bool		is_all_zeroes;
 	Oid			dboid;
 	PgStat_TableStatus *lstats; /* pending stats entry  */
 	PgStatShared_Relation *shtabstats;
@@ -816,11 +816,10 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	 * Ignore entries that didn't accumulate any actual counts, such as
 	 * indexes that were opened by the planner but not used.
 	 */
-	if (memcmp(&lstats->counts, &all_zeroes,
-			   sizeof(PgStat_TableCounts)) == 0)
-	{
+	pg_structiszero(&lstats->counts, PgStat_TableCounts, is_all_zeroes);
+
+	if (is_all_zeroes

Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-09-17 Thread Masahiko Sawada
On Mon, Sep 16, 2024 at 9:24 AM Noah Misch  wrote:
>
> On Thu, Sep 12, 2024 at 03:42:48PM -0700, Masahiko Sawada wrote:
> > On Tue, Sep 10, 2024 at 3:05 PM Noah Misch  wrote:
> > > On Tue, Sep 10, 2024 at 05:56:47PM -0400, Tom Lane wrote:
> > > > Got it.  So now I'm wondering if we need all the complexity of storing
> > > > stuff in the GIN metapages.  Could we simply read the (primary's)
> > > > signedness out of pg_control and use that?
>
> > I've attached a PoC patch for this idea. We write  the default char
> > signedness to the control file at initdb time. Then when comparing two
> > trgms, pg_trgm opclasses use a comparison function based on the char
> > signedness of the cluster. I've confirmed that the patch fixes the
> > reported case at least.
>
> I agree that proves the concept.

Thanks. I like the simplicity of this approach. If we agree with this
approach, I'd like to proceed with it.

Regardless of what approach we take, I wanted to provide some
regression tests for these changes, but I could not come up with a
reasonable idea. It would be great if we could do something like
027_stream_regress.pl on cross-architecture replication. But just
executing 027_stream_regress.pl on cross-architecture replication
could not be sufficient since we would like to confirm query results
as well. If we could have a reasonable tool or way, it would also help
find other similar bugs related architectural differences.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] WIP: replace method for jsonpath

2024-09-17 Thread David E. Wheeler
On Sep 17, 2024, at 15:03, Florents Tselai  wrote:

> Fallback scenario: make this an extension, but in a first pass I didn’t find 
> any convenient hooks.
> One has to create a whole new scanner, grammar etc.

Yeah, it got me thinking about the RFC-9535 JSONPath "Function Extension" 
feature[1], which allows users to add functions. Would be cool to have a way to 
register jsonpath functions somehow, but I would imagine it’d need quite a bit 
of specification similar to RFC-9535. Wouldn’t surprise me to see something 
like that appear in a future version of the spec, with an interface something 
like CREATE OPERATOR.

I don’t have a strong feeling about what should be added that’s not in the 
spec; my main interest is not having to constantly sync my port[2]. I’m already 
behind, and’t just been a couple months! 😂

Best,

David

[1]: https://www.rfc-editor.org/rfc/rfc9535.html#name-function-extensions
[2]: https://github.com/theory/sqljson





miscellaneous pg_upgrade cleanup

2024-09-17 Thread Nathan Bossart
Here are a few miscellaneous cleanup patches for pg_upgrade.  I don't think
there's anything controversial here.

0001 removes some extra whitespace in the status message for failed data
type checks.  I noticed that when the check fails, this status message is
indented beyond all the other output.  This appears to have been introduced
in commit 347758b, so I'd back-patch this one to v17.

0002 improves the coding style in many of the new upgrade task callback
functions.  I refrained from adjusting this code too much when converting
these tasks to use the new pg_upgrade task framework (see commit 40e2e5e),
but now I think we should.  This decreases the amount of indentation in
some places and removes a few dozen lines of code.

0003 adds names to the UpgradeTaskSlotState enum and the UpgradeTaskSlot
struct.  I'm not aware of any established project policy in this area, but
I figured it'd be good to at least be consistent within the same file.

Thoughts?

-- 
nathan
>From c958311e5fd6df460e0c1289646fe99b9dd5f7f7 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 17 Sep 2024 11:30:50 -0500
Subject: [PATCH v1 1/3] remove extra whitespace in pg_upgrade report

---
 src/bin/pg_upgrade/check.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 1c277ce17b..c9a3f06b80 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -419,7 +419,7 @@ process_data_type_check(DbInfo *dbinfo, PGresult *res, void 
*arg)
 */
if (!(*state->result))
{
-   pg_log(PG_REPORT, "failed check: %s", 
_(state->check->status));
+   pg_log(PG_REPORT, "failed check: %s", 
_(state->check->status));
appendPQExpBuffer(*state->report, "\n%s\n%s%s\n",
  
_(state->check->report_text),
  _("A list of the 
problem columns is in the file:"),
-- 
2.39.3 (Apple Git-146)

>From 6f3cd97974eb6f73c1513c4e6093c168c42c Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 17 Sep 2024 11:57:52 -0500
Subject: [PATCH v1 2/3] improve style of upgrade task callback functions

---
 src/bin/pg_upgrade/check.c   | 205 +++
 src/bin/pg_upgrade/version.c |  29 +++--
 2 files changed, 99 insertions(+), 135 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index c9a3f06b80..bc4c9b11a0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -390,69 +390,55 @@ process_data_type_check(DbInfo *dbinfo, PGresult *res, 
void *arg)
 {
struct data_type_check_state *state = (struct data_type_check_state *) 
arg;
int ntups = PQntuples(res);
+   charoutput_path[MAXPGPATH];
+   int i_nspname = PQfnumber(res, "nspname");
+   int i_relname = PQfnumber(res, "relname");
+   int i_attname = PQfnumber(res, "attname");
+   FILE   *script = NULL;
 
AssertVariableIsOfType(&process_data_type_check, UpgradeTaskProcessCB);
 
-   if (ntups)
-   {
-   charoutput_path[MAXPGPATH];
-   int i_nspname;
-   int i_relname;
-   int i_attname;
-   FILE   *script = NULL;
-   booldb_used = false;
+   if (ntups == 0)
+   return;
 
-   snprintf(output_path, sizeof(output_path), "%s/%s",
-log_opts.basedir,
-state->check->report_filename);
+   snprintf(output_path, sizeof(output_path), "%s/%s",
+log_opts.basedir,
+state->check->report_filename);
 
-   /*
-* Make sure we have a buffer to save reports to now that we 
found a
-* first failing check.
-*/
-   if (*state->report == NULL)
-   *state->report = createPQExpBuffer();
+   /*
+* Make sure we have a buffer to save reports to now that we found a 
first
+* failing check.
+*/
+   if (*state->report == NULL)
+   *state->report = createPQExpBuffer();
 
-   /*
-* If this is the first time we see an error for the check in 
question
-* then print a status message of the failure.
-*/
-   if (!(*state->result))
-   {
-   pg_log(PG_REPORT, "failed check: %s", 
_(state->check->status));
-   appendPQExpBuffer(*state->report, "\n%s\n%s%s\n",
- 
_(state->check->report_text),
-

Re: jsonb_strip_nulls with arrays?

2024-09-17 Thread Florents Tselai
On Tue, Sep 17, 2024 at 5:11 PM Andrew Dunstan  wrote:

>
> On 2024-09-17 Tu 5:26 AM, Florents Tselai wrote:
>
> Currently:
>
>
> jsonb_strip_nulls ( jsonb ) → jsonb
>
> Deletes all object fields that have null values from the given JSON value,
> recursively. Null values that are not object fields are untouched.
>
>
> > Null values that are not object fields are untouched.
>
>
> Can we revisit this and make it work with arrays, too?
>
> Tbh, at first sight that looked like the expected behavior for me.
>
> That is strip nulls from arrays as well.
>
>
> This has been available since 9.5 and iiuc predates lots of the jsonb
> array work.
>
>
> I don't think that's a great idea. Removing an object field which has a
> null value shouldn't have any effect on the surrounding data, nor really
> any on other operations (If you try to get the value of the missing field
> it should give you back null). But removing a null array member isn't like
> that at all - unless it's the trailing member of the array it will renumber
> all the succeeding array members.
>
> And I don't think we should be changing the behaviour of a function, that
> people might have been relying on for the better part of a decade.
>
>
>
> In practice, though, whenever jsonb_build_array is used (especially with
> jsonpath),
>
> a few nulls do appear in the resulting array most of the times,
>
> Currently, there’s no expressive way to remove this.
>
>
> We could also have jsonb_array_strip_nulls(jsonb) as well
>
>
> We could, if we're going to do anything at all in this area. Another
> possibility would be to provide a second optional parameter for
> json{b}_strip_nulls. That's probably a better way to go.
>
Here's a patch that adds that argument (only for jsonb; no json
implementation yet)

That's how I imagined & implemented it,
but there may be non-obvious pitfalls in the semantics.

as-is version

select jsonb_strip_nulls('[1,2,null,3,4]');
 jsonb_strip_nulls

 [1, 2, null, 3, 4]
(1 row)

select
jsonb_strip_nulls('{"a":1,"b":null,"c":[2,null,3],"d":{"e":4,"f":null}}');
 jsonb_strip_nulls

 {"a": 1, "c": [2, null, 3], "d": {"e": 4}}
(1 row)

with the additional boolean flag added

select jsonb_strip_nulls('[1,2,null,3,4]', *true*);
 jsonb_strip_nulls
---
 [1, 2, 3, 4]
(1 row)

select
jsonb_strip_nulls('{"a":1,"b":null,"c":[2,null,3],"d":{"e":4,"f":null}}',
*true*);
  jsonb_strip_nulls
--
 {"a": 1, "c": [2, 3], "d": {"e": 4}}
(1 row)


GH PR view: https://github.com/Florents-Tselai/postgres/pull/6/files

> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


v1-0002-Add-docs-for-strip_in_arrays-argument.patch
Description: Binary data


v1-0001-jsonb_strip_nulls-jsonb-bool-wip.patch
Description: Binary data


Re: Test improvements and minor code fixes for formatting.c.

2024-09-17 Thread Nathan Bossart
On Sun, Sep 08, 2024 at 05:32:16PM -0400, Tom Lane wrote:
> In looking at this, I found that there's also no test coverage
> for the , V, or PL format codes.  Also, the possibility of
> overflow while converting an input value to int in order to
> pass it to int_to_roman was ignored.  Attached is a patch that
> adds more test coverage and cleans up the Roman-numeral code
> a little bit.

I stared at the patch for a while, and it looks good to me.

> BTW, I also discovered that there is a little bit of support
> for a "B" format code: we can parse it, but then we ignore it.
> And it's not documented.  Oracle defines this code as a flag
> that:
> 
>   Returns blanks for the integer part of a fixed-point number
>   when the integer part is zero (regardless of zeros in the
>   format model).
> 
> It doesn't seem super hard to implement that, but given the
> complete lack of complaints about it being missing, maybe we
> should just rip out the incomplete support instead?

AFAICT it's been like that since it was introduced [0].  I searched the
archives and couldn't find any discussion about this format code.  Given
that, I don't have any concerns about removing it unless it causes ERRORs
for calls that currently succeed, but even then, it's probably fine.  This
strikes me as something that might be fun for an aspiring hacker, though.

[0] https://postgr.es/c/b866d2e

-- 
nathan




Re: query_id, pg_stat_activity, extended query protocol

2024-09-17 Thread Sami Imseih
> Then, please see attached two lightly-updated patches. 0001 is for a
> backpatch down to v14. This is yours to force things in the exec and
> bind messages for all portal types, with the test (placed elsewhere in
> 14~15 branches). 0002 is for HEAD to add some sanity checks, blowing
> up the tests of pg_stat_statements if one is not careful with the
> query ID reporting.

These 2 patches look good to me; except for the slight typo
In the commit message of 0002. "backpatch" instead of "backpatck".

That leaves us with considering v5-0002 [1]. I do think this is good
for overall correctness of the queryId being advertised after a cache 
revalidation, even if users of pg_stat_activity will hardly notice this.

[1] 
https://www.postgresql.org/message-id/DB325894-3EE3-4B2E-A18C-4B34E7B2F5EC%40gmail.com
 


Regards,

Sami 






Custom connstr in background_psql()

2024-09-17 Thread a . imamov

Hi, hackers!

I've noticed that there is no way to specify a custom connection string 
when
calling the PostgreSQL::Test::Cluster->background_psql() method compared 
to the
PostgreSQL::Test:Cluster->psql(). It seems useful to have this feature 
while
testing with BackgroundPsql, for example, when the default host value 
needs to

be overridden to establish different types of connections.

What do you think?diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 143dc8c101..ff0bda0c8d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2181,6 +2181,11 @@ returned.  Set B to 0 to ignore errors instead.
 Set a timeout for a background psql session. By default, timeout of
 $PostgreSQL::Test::Utils::timeout_default is set up.
 
+=item connstr => B
+
+If set, use this as the connection string for the connection to the
+backend.
+
 =item replication => B
 
 If set, add B to the conninfo string.
@@ -2204,12 +2209,23 @@ sub background_psql
 	my $replication = $params{replication};
 	my $timeout = undef;
 
+	# Build the connection string.
+	my $psql_connstr;
+	if (defined $params{connstr})
+	{
+		$psql_connstr = $params{connstr};
+	}
+	else
+	{
+		$psql_connstr = $self->connstr($dbname);
+	}
+	$psql_connstr .= defined $replication ? " replication=$replication" : "";
+
 	my @psql_params = (
 		$self->installed_command('psql'),
 		'-XAtq',
 		'-d',
-		$self->connstr($dbname)
-		  . (defined $replication ? " replication=$replication" : ""),
+		$psql_connstr,
 		'-f',
 		'-');
 


Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-09-17 Thread Peter Geoghegan
On Mon, Sep 16, 2024 at 6:05 PM Tomas Vondra  wrote:
> I've been looking at this patch over the couple last days, mostly doing
> some stress testing / benchmarking (hence the earlier report) and basic
> review.

Thanks for taking a look! Very helpful.

> I do have some initial review comments, and the testing produced
> some interesting regressions (not sure if those are the cases where
> skipscan can't really help, that Peter mentioned he needs to look into).

The one type of query that's clearly regressed in a way that's just
not acceptable are queries where we waste CPU cycles during scans
where it's truly hopeless. For example, I see a big regression on one
of the best cases for the Postgres 17 work, described here:

https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans#a-practical-example-3x-performance-improvement

Notably, these cases access exactly the same buffers/pages as before,
so this really isn't a matter of "doing too much skipping". The number
of buffers hit exactly matches what you'll see on Postgres 17. It's
just that we waste too many CPU cycles in code such as
_bt_advance_array_keys, to uselessly maintain skip arrays.

I'm not suggesting that there won't be any gray area with these
regressions -- nothing like this will ever be that simple. But it
seems to me like I should go fix these obviously-not-okay cases next,
and then see where that leaves everything else, regressions-wise. That
seems likely to be the most efficient way of dealing with the
regressions. So I'll start there.

That said, I *would* be surprised if you found a regression in any
query that simply didn't receive any new scan key transformations in
new preprocessing code in places like _bt_decide_skipatts and
_bt_skip_preproc_shrink. I see that many of the queries that you're
using for your stress-tests "aren't really testing skip scan", in this
sense. But I'm hardly about to tell you that you shouldn't spend time
on such queries -- that approach just discovered a bug affecting
Postgres 17 (that was also surprising, but it still happened!). My
point is that it's worth being aware of which test queries actually
use skip arrays in the first place -- it might help you with your
testing. There are essentially no changes to _bt_advance_array_keys
that'll affect traditional SAOP arrays (with the sole exception of
changes made by
v6-0003-Refactor-handling-of-nbtree-array-redundancies.patch, which
affect every kind of array in the same way).

> 1) v6-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch
>
> - I find the places that increment "nsearches" a bit random. Each AM
> does it in entirely different place (at least it seems like that to me).
> Is there a way make this a bit more consistent?

>From a mechanical perspective there is nothing at all random about it:
we do this at precisely the same point that we currently call
pgstat_count_index_scan, which in each index AM maps to one descent of
the index. It is at least consistent. Whenever a B-Tree index scan
shows "Index Scans: N", you'll see precisely the same number by
swapping it with an equivalent contrib/btree_gist-based GiST index and
running the same query again (assuming that index tuples that match
the array keys are spread apart in both the B-Tree and GiST indexes).

(Though I see problems with the precise place that nbtree calls
pgstat_count_index_scan right now, at least in certain edge-cases,
which I discuss below in response to your questions about that.)

> uint64   btps_nsearches; /* instrumentation */
>
> Instrumentation what? What's the counter for?

Will fix.

In case you missed it, there is another thread + CF Entry dedicated to
discussing this instrumentation patch:

https://commitfest.postgresql.org/49/5183/
https://www.postgresql.org/message-id/flat/cah2-wzkrqvaqr2ctnqtzp0z6ful4-3ed6eqb0yx38xbnj1v...@mail.gmail.com

> - I see _bt_first moved the pgstat_count_index_scan, but doesn't that
> mean we skip it if the earlier code does "goto readcomplete"? Shouldn't
> that still count as an index scan?

In my opinion, no, it should not.

We're counting the number of times we'll have descended the tree using
_bt_search (or using _bt_endpoint, perhaps), which is a precisely
defined physical cost. A little like counting the number of buffers
accessed. I actually think that this aspect of how we call
pgstat_count_index_scan is a bug that should be fixed, with the fix
backpatched to Postgres 17. Right now, we see completely different
counts for a parallel index scan, compared to an equivalent serial
index scan -- differences that cannot be explained as minor
differences caused by parallel scan implementation details. I think
that it's just wrong right now, on master, since we're simply not
counting the thing that we're supposed to be counting (not reliably,
not if it's a parallel index scan).

> - show_indexscan_nsearches does this:
>
> if (scanDesc && scanDesc->nsearches > 0)
> ExplainPropertyUInteger("Index Searches", NULL,
>   

Re: query_id, pg_stat_activity, extended query protocol

2024-09-17 Thread Michael Paquier
On Tue, Sep 17, 2024 at 05:01:18PM -0500, Sami Imseih wrote:
> > Then, please see attached two lightly-updated patches. 0001 is for a
> > backpatch down to v14. This is yours to force things in the exec and
> > bind messages for all portal types, with the test (placed elsewhere in
> > 14~15 branches). 0002 is for HEAD to add some sanity checks, blowing
> > up the tests of pg_stat_statements if one is not careful with the
> > query ID reporting.
> 
> These 2 patches look good to me; except for the slight typo
> In the commit message of 0002. "backpatch" instead of "backpatck".

Yes, I've noticed this one last Friday and fixed the typo in the
commit log after sending the previous patch series.

> That leaves us with considering v5-0002 [1]. I do think this is good
> for overall correctness of the queryId being advertised after a cache 
> revalidation, even if users of pg_stat_activity will hardly notice this.
> 
> [1] 
> https://www.postgresql.org/message-id/DB325894-3EE3-4B2E-A18C-4B34E7B2F5EC%40gmail.com
>  

Yeah.  I need more time to evaluate this one.

Also, please find one of the scripts I have used for the execute/fetch
case, that simply does an INSERT RETURNING with a small fetch size to
create a larger window in pg_stat_activity where we don't report the
query ID.  One can run it like that, crude still on point:
# Download a JDBC driver
# Create the table to use.
psql -c 'create table aa (a int);' postgres
CLASSPATH=postgresql-42.7.4.jar java TestReturning.java

Then, while running the script, you would notice that pg_stat_activity
reports NULL for the query ID with the query text while the batch
fetches are processing.  I've taken and expanded one of the scripts
you have sent for 1d477a907e63.

I'd like to get to the point where we are able to test that in core
reliably.  The sanity checks in the executor paths are a good step
forward but they do nothing for the fetch cases.  Perhaps Andrew
Dunstan work to expose libpq's APIs with the perl TAP tests would
help at some point to control the extended protocol queries, but we
are going to need more for the fetch case as there are no hooks that
would help to grab a query ID.  A second option I have in mind would
be to set up an injection point that produces a NOTICE if a query ID
is set when we end processing an execute message, then check the
number of NOTICE messages produced as these can be predictible
depending on the number of rows and the fetch size..  This won't fly
far unless we can control the fetch size.
--
Michael
import java.sql.*;

public class Test {
static final String DB_URL = "jdbc:postgresql://localhost/postgres";

public static void main(String[] args) {
Connection conn = null;
// Open a connection
try {
conn = DriverManager.getConnection(DB_URL);
conn.setAutoCommit(false);
PreparedStatement statement = conn.prepareStatement("INSERT into aa SELECT generate_series(1,1000) RETURNING a");
			statement.setFetchSize(100);
ResultSet rs = statement.executeQuery();
			while (rs.next()) {
			}
conn.commit();
			statement.close();
} catch (SQLException e) {
e.printStackTrace();
try {
conn.rollback();
} catch (SQLException ee) {
ee.printStackTrace();
}
}
}
}


signature.asc
Description: PGP signature


Re: Custom connstr in background_psql()

2024-09-17 Thread Michael Paquier
On Wed, Sep 18, 2024 at 01:08:26AM +0300, a.ima...@postgrespro.ru wrote:
> I've noticed that there is no way to specify a custom connection string when
> calling the PostgreSQL::Test::Cluster->background_psql() method compared to 
> the
> PostgreSQL::Test:Cluster->psql(). It seems useful to have this feature while
> testing with BackgroundPsql, for example, when the default host value needs
> to be overridden to establish different types of connections.
> 
> What do you think?

I think that it makes sense to extend the routine as you are
suggesting.  At least I can see myself using it depending on the test
suite I am dealing with.  So count me in.
--
Michael


signature.asc
Description: PGP signature


Re: query_id, pg_stat_activity, extended query protocol

2024-09-17 Thread Sami Imseih
> would help to grab a query ID. A second option I have in mind would
> be to set up an injection point that produces a NOTICE if a query ID
> is set when we end processing an execute message, then check the
> number of NOTICE messages produced as these can be predictible
> depending on the number of rows and the fetch size.. This won't fly
> far unless we can control the fetch size.

FWIW, I do like the INJECTION_POINT idea and actually mentioned something 
similar up the thread [1] for the revalidate cache case, but I can see it being 
applied
to all the other places we expect the queryId to be set. 


[1] 
https://www.postgresql.org/message-id/465EECA3-D98C-4E46-BBDB-4D057617DD89%40gmail.com

--

Sami 






Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION

2024-09-17 Thread Justin Pryzby
This commit seems to trigger elog(), not reproducible in the
parent commit.

6e086fa2e77 Allow parallel workers to cope with a newly-created session user ID.

postgres=# SET min_parallel_table_scan_size=0; CLUSTER pg_attribute USING 
pg_attribute_relid_attnum_index;
ERROR:  pg_attribute catalog is missing 26 attribute(s) for relation OID 70321
postgres=# \errverbose
ERROR:  XX000: pg_attribute catalog is missing 26 attribute(s) for relation OID 
70321
LOCATION:  RelationBuildTupleDesc, relcache.c:658

This is not completely deterministic:

postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index;
CLUSTER
postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index;
CLUSTER
postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index;
CLUSTER
postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index;
CLUSTER
postgres=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index;
ERROR:  pg_attribute catalog is missing 26 attribute(s) for relation OID 70391

But I think this will be reproducible in any database with a nontrivial
number of attributes.




Re: Detailed release notes

2024-09-17 Thread Bruce Momjian
On Tue, Sep 17, 2024 at 03:29:54PM -0300, Marcos Pegoraro wrote:
> Em ter., 17 de set. de 2024 às 05:12, Alvaro Herrera 
> escreveu:
> 
> Add backend support for injection points (Michael Paquier) [commit 1] [2]
> [3] [4]
> 
> I think this way would be fine.
> 
> And it would be good to have a 
> target="_blank"
> on commit links so a new window or tab would be opened instead of going back
> and forth. 
> This way you can open these 4 links and then navigate on them to see exactly
> what was done on every commit.

I think trying to add text to each item is both redundant and confusing,
because I am guessing that many people will not even know what a commit
is, and will be confused by clicking on the link.

What I have done is to add text to the top of "Appendix E. Release
Notes" explaining the symbol and what it links to.   Patch attached.

We can still consider a different symbol or number labeling, but I think
this patch is in the right direction.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/release.sgml b/doc/src/sgml/release.sgml
index cf6ba540876..8d408a50eb1 100644
--- a/doc/src/sgml/release.sgml
+++ b/doc/src/sgml/release.sgml
@@ -69,6 +69,15 @@ For new features, add links to the documentation sections.
review, so each item is truly a community effort.
   
 
+  
+   Section markers (§) in the release notes link to https://git.postgresql.org/gitweb/?p=postgresql.git";>gitweb
+   pages which show the primary git commit
+   messages and source tree changes responsible for the release note item.
+   There might be additional git commits which
+   are not shown.
+  
+
 

Re: Detailed release notes

2024-09-17 Thread Tom Lane
Bruce Momjian  writes:
> On Tue, Sep 17, 2024 at 03:29:54PM -0300, Marcos Pegoraro wrote:
>> Em ter., 17 de set. de 2024 às 05:12, Alvaro Herrera 
>> 
>>> Add backend support for injection points (Michael Paquier) [commit 1] [2]

> I think trying to add text to each item is both redundant and confusing,

Also very clutter-y.  I'm not convinced that any of this is a good
idea that will stand the test of time: I estimate that approximately
0.01% of people who read the release notes will want these links.
But if we keep it I think the annotations have to be very unobtrusive.

(Has anyone looked at the PDF rendering of this?  It seems rather
unfortunate to me.)

regards, tom lane




Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION

2024-09-17 Thread Tom Lane
Justin Pryzby  writes:
> This commit seems to trigger elog(), not reproducible in the
> parent commit.

Yeah, I can reproduce that.  Will take a look tomorrow.

regards, tom lane




Re: query_id, pg_stat_activity, extended query protocol

2024-09-17 Thread Michael Paquier
On Tue, Sep 17, 2024 at 06:39:17PM -0500, Sami Imseih wrote:
> FWIW, I do like the INJECTION_POINT idea and actually mentioned something 
> similar up the thread [1] for the revalidate cache case, but I can see it 
> being applied
> to all the other places we expect the queryId to be set. 
> 
> [1] 
> https://www.postgresql.org/message-id/465EECA3-D98C-4E46-BBDB-4D057617DD89%40gmail.com

FWIW, I was thinking about something like what has been done in
indexcmds.c for 5bbdfa8a18dc as the query ID value is not predictible
across releases, but we could see whether it is set or not.
--
Michael


signature.asc
Description: PGP signature


Re: Detailed release notes

2024-09-17 Thread Bruce Momjian
On Tue, Sep 17, 2024 at 08:22:41PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Tue, Sep 17, 2024 at 03:29:54PM -0300, Marcos Pegoraro wrote:
> >> Em ter., 17 de set. de 2024 às 05:12, Alvaro Herrera 
> >> 
> >>> Add backend support for injection points (Michael Paquier) [commit 1] [2]
> 
> > I think trying to add text to each item is both redundant and confusing,
> 
> Also very clutter-y.  I'm not convinced that any of this is a good
> idea that will stand the test of time: I estimate that approximately
> 0.01% of people who read the release notes will want these links.

Yes, I think 0.01% is accurate.

> But if we keep it I think the annotations have to be very unobtrusive.

I very much agree.

> (Has anyone looked at the PDF rendering of this?  It seems rather
> unfortunate to me.)

Oh, not good.  See my build:

https://momjian.us/expire/postgres-US.pdf#page=2890

Those numbers are there because all external links get numbers that
start a 1 at the top of the section and increment for every link --- I
have no idea why.  You can see that acronyms that have external links
have the same numbering:

https://momjian.us/expire/postgres-US.pdf#page=3150

I wonder if we should remove the numbers in both cases.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: Virtual generated columns

2024-09-17 Thread jian he
On Mon, Sep 16, 2024 at 5:22 PM jian he  wrote:
>
> in v7.
>
seems I am confused with the version number.

here, I attached another minor change in tests.

make
ERROR:  invalid ON DELETE action for foreign key constraint containing
generated column
becomes
ERROR:  foreign key constraints on virtual generated columns are not supported

change contrib/pageinspect/sql/page.sql
expand information on t_infomask, t_bits information.

change RelationBuildLocalRelation
make the transient TupleDesc->TupleConstr three bool flags more accurate.


v6-0001-virtual-generated-columns-misc-changes.no-cfbot
Description: Binary data


Re: Trim the heap free memory

2024-09-17 Thread shawn wang
Thank you very much for your response and suggestions.

As you mentioned, the patch here is actually designed for glibc's ptmalloc2
andis not applicable to other platforms. I will consider supporting it only
on the Linux platform in the future. In the memory management strategy of
ptmalloc2, there is a certain amount of non-garbage-collected memory, which
is closely related to the order and method of memory allocation and
release. To reduce the performance overhead caused by frequent allocation
and release of small blocks of memory, ptmalloc2 intentionally retains this
part of the memory. The malloc_trim function locks, traverses memory
blocks, and uses madvise to release this part of the memory, but this
process may also have a negative impact on performance. In the process of
exploring solutions, I also considered a variety of strategies, including
scheduling malloc_trim to be executed at regular intervals or triggering
malloc_trim after a specific number of free operations. However, we found
that these methods are not optimal solutions.

> We can see that out of about 43K test queries, 32K saved nothing
> whatever, and in only four was more than a couple of meg saved.
> That's pretty discouraging IMO.  It might be useful to look closer
> at the behavior of those top four though.  I see them as


I have previously encountered situations where the non-garbage-collected
memory of wal_sender was approximately hundreds of megabytes or even
exceeded 1GB, but I was unable to reproduce this situation using simple
SQL. Therefore, I introduced an asynchronous processing function, hoping to
manage memory more efficiently without affecting performance.


In addition, I have considered the following optimization strategies:

   1.

   Adjust the configuration of ptmalloc2 through the mallopt function to
   use mmap rather than sbrk for memory allocation. This can immediately
   return the memory to the operating system when it is released, but it may
   affect performance due to the higher overhead of mmap.
   2.

   Use other memory allocators such as jemalloc or tcmalloc, and adjust
   relevant parameters to reduce the generation of non-garbage-collected
   memory. However, these allocators are designed for multi-threaded and may
   lead to increased memory usage per process.
   3.

   Build a set of memory context (memory context) allocation functions
   based on mmap, delegating the responsibility of memory management entirely
   to the database level. Although this solution can effectively control
   memory allocation, it requires a large-scale engineering implementation.

I look forward to further discussing these solutions with you and exploring
the best memory management practices together.

Best regards, Shawn

Tom Lane  于2024年9月16日周一 03:16写道:

> I wrote:
> > The single test case you showed suggested that maybe we could
> > usefully prod glibc to free memory at query completion, but we
> > don't need all this interrupt infrastructure to do that.  I think
> > we could likely get 95% of the benefit with about a five-line
> > patch.
>
> To try to quantify that a little, I wrote a very quick-n-dirty
> patch to apply malloc_trim during finish_xact_command and log
> the effects.  (I am not asserting this is the best place to
> call malloc_trim; it's just one plausible possibility.)  Patch
> attached, as well as statistics collected from a run of the
> core regression tests followed by
>
> grep malloc_trim postmaster.log | sed 's/.*LOG:/LOG:/' | sort -k4n | uniq
> -c >trim_savings.txt
>
> We can see that out of about 43K test queries, 32K saved nothing
> whatever, and in only four was more than a couple of meg saved.
> That's pretty discouraging IMO.  It might be useful to look closer
> at the behavior of those top four though.  I see them as
>
> 2024-09-15 14:58:06.146 EDT [960138] LOG:  malloc_trim saved 7228 kB
> 2024-09-15 14:58:06.146 EDT [960138] STATEMENT:  ALTER TABLE
> delete_test_table ADD PRIMARY KEY (a,b,c,d);
>
> 2024-09-15 14:58:09.861 EDT [960949] LOG:  malloc_trim saved 12488 kB
> 2024-09-15 14:58:09.861 EDT [960949] STATEMENT:  with recursive
> search_graph(f, t, label, is_cycle, path) as (
> select *, false, array[row(g.f, g.t)] from graph g
> union distinct
> select g.*, row(g.f, g.t) = any(path), path || row(g.f,
> g.t)
> from graph g, search_graph sg
> where g.f = sg.t and not is_cycle
> )
> select * from search_graph;
>
> 2024-09-15 14:58:09.866 EDT [960949] LOG:  malloc_trim saved 12488 kB
> 2024-09-15 14:58:09.866 EDT [960949] STATEMENT:  with recursive
> search_graph(f, t, label) as (
> select * from graph g
> union distinct
> select g.*
> from graph g, search_graph sg
> where g.f = sg.t
> ) cycle f, t set is_cycle to 'Y' default 'N' using path
> select * from search_graph;
>
> 2024-09-15 14:58:09.853

Re: Use streaming read API in ANALYZE

2024-09-17 Thread Thomas Munro
On Sun, Sep 15, 2024 at 12:14 AM Mats Kindahl  wrote:
> I used the combination of your patch and making the computation of 
> vacattrstats for a relation available through the API and managed to 
> implement something that I think does the right thing. (I just sampled a few 
> different statistics to check if they seem reasonable, like most common vals 
> and most common freqs.) See attached patch.

Cool.  I went ahead and committed that small new function and will
mark the open item closed.

> I need the vacattrstats to set up the two streams for the internal relations. 
> I can just re-implement them in the same way as is already done, but this 
> seems like a small change that avoids unnecessary code duplication.

Unfortunately we're not in a phase where we can make non-essential
changes, we're right about to release and we're only committing fixes,
and it seems like you have a way forward (albeit with some
duplication).  We can keep talking about that for v18.

>From your earlier email:
> I'll take a look at the thread. I really think the ReadStream abstraction is 
> a good step in the right direction.

Here's something you or your colleagues might be interested in: I was
looking around for a fun extension to streamify as a demo of the
technology, and I finished up writing a quick patch to streamify
pgvector's HNSW index scan, which worked well enough to share[1] (I
think it should in principle be able to scale with the number of graph
connections, at least 16x), but then people told me that it's of
limited interest because everybody knows that HNSW indexes have to fit
in memory (I think there may also be memory prefetch streaming
opportunities, unexamined for now).  But that made me wonder what the
people with the REALLY big indexes do for hyperdimensional graph
search on a scale required to build Skynet, and that led me back to
Timescale pgvectorscale[2].  I see two obvious signs that this thing
is eminently and profitably streamifiable: (1) The stated aim is
optimising for indexes that don't fit in memory, hence "Disk" in the
name of the research project it is inspired by, (2) I see that
DIskANN[3] is aggressively using libaio (Linux) and overlapped/IOCP
(Windows).  So now I am waiting patiently for a Rustacean to show up
with patches for pgvectorscale to use ReadStream, which would already
get read-ahead advice and vectored I/O (Linux, macOS, FreeBSD soon
hopefully), and hopefully also provide a nice test case for the AIO
patch set which redirects buffer reads through io_uring (Linux,
basically the newer better libaio) or background I/O workers (other
OSes, which works surprisingly competitively).  Just BTW for
comparison with DiskANN we have also had early POC-quality patches
that drive AIO with overlapped/IOCP (Windows) which will eventually be
rebased and proposed (Windows isn't really a primary target but we
wanted to validate that the stuff we're working on has abstractions
that will map to the obvious system APIs found in the systems
PostgreSQL targets).  For completeness, I've also had it mostly
working on the POSIX AIO of FreeBSD, HP-UX and AIX (though we dropped
support for those last two so that was a bit of a dead end).

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ_7NKd46nx1wbyXWriuZSNzsTfm%2BrhEuvU6nxZi3-KVw%40mail.gmail.com
[2] https://github.com/timescale/pgvectorscale
[3] https://github.com/microsoft/DiskANN




Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith
Hi, here are my review comments for patch v31-0002.

==

1. General.

IMO patches 0001 and 0002 should be merged when next posted. IIUC the
reason for the split was only because there were 2 different authors
but that seems to be not relevant anymore.

==
Commit message

2.
When 'copy_data' is true, during the initial sync, the data is replicated from
the publisher to the subscriber using the COPY command. The normal COPY
command does not copy generated columns, so when 'publish_generated_columns'
is true, we need to copy using the syntax:
'COPY (SELECT column_name FROM table_name) TO STDOUT'.

~

2a.
Should clarify that 'copy_data' is a SUBSCRIPTION parameter.

2b.
Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.

==
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

3.
- for (i = 0; i < rel->remoterel.natts; i++)
+ desc = RelationGetDescr(rel->localrel);
+ localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));

Each time I review this code I am tricked into thinking it is wrong to
use rel->remoterel.natts here for the localgenlist. AFAICT the code is
actually fine because you do not store *all* the subscriber gencols in
'localgenlist' -- you only store those with matching names on the
publisher table. It might be good if you could add an explanatory
comment about that to prevent any future doubts.

~~~

4.
+ if (!remotegenlist[remote_attnum])
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication target relation \"%s.%s\" has a
generated column \"%s\" "
+ "but corresponding column on source relation is not a generated column",
+ rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname;

This error message has lots of good information. OTOH, I think when
copy_data=false the error would report the subscriber column just as
"missing", which is maybe less helpful. Perhaps that other
copy_data=false "missing" case can be improved to share the same error
message that you have here.

~~~

fetch_remote_table_info:

5.
IIUC, this logic needs to be more sophisticated to handle the case
that was being discussed earlier with Sawada-san [1]. e.g. when the
same table has gencols but there are multiple subscribed publications
where the 'publish_generated_columns' parameter differs.

Also, you'll need test cases for this scenario, because it is too
difficult to judge correctness just by visual inspection of the code.



6.
nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
readability, and initialize it to 'false' to make it easy to use
later.

~~~

7.
- * Get column lists for each relation.
+ * Get column lists for each relation and check if any of the publication
+ * has generated column option.

and

+ /* Check if any of the publication has generated column option */
+ if (server_version >= 18)

nit - tweak the comments to name the publication parameter properly.

~~~

8.
foreach(lc, MySubscription->publications)
{
if (foreach_current_index(lc) > 0)
appendStringInfoString(&pub_names, ", ");
appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc;
}

I know this is existing code, but shouldn't all this be done by using
the purpose-built function 'get_publications_str'

~~~

9.
+ ereport(ERROR,
+ errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not fetch gencolumns information from publication list: %s",
+pub_names.data));

and

+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("failed to fetch tuple for gencols from publication list: %s",
+pub_names.data));

nit - /gencolumns information/generated column publication
information/ to make the errmsg more human-readable

~~~

10.
+ bool gencols_allowed = server_version >= 18 && hasgencolpub;
+
+ if (!gencols_allowed)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

Can the 'gencols_allowed' var be removed, and the condition just be
replaced with if (!has_pub_with_pubgencols)? It seems equivalent
unless I am mistaken.

==

Please refer to the attachment which implements some of the nits
mentioned above.

==
[1] 
https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 723c44c..6d17984 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -850,7 +850,7 @@ fetch_remote_table_info(char *nspname, char *relname, bool 
**remotegenlist_res,
Oid qualRow[] = {TEXTOID};
boolisnull;
bool   *remotegenlist;
-   boolhasgencolpub;
+   boolhas_pub_with_pubgencols = false;
int natt;
ListCell   *lc;
Bitmapset  *included_cols = NULL;
@@ -897,8 +897,8 @@ fetch_remote_table_info(char *nspname, char *relname, bool 
**

Re: Switch PgStat_HashKey.objoid from Oid to uint64

2024-09-17 Thread Michael Paquier
On Fri, Sep 13, 2024 at 04:03:13AM +, Bertrand Drouvot wrote:
> Overall, the patch LGTM.

Thanks for the review, I've applied that, then, detailing in the
commit log what this changes and the three format bumps required.
--
Michael


signature.asc
Description: PGP signature


Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-09-17 Thread Егор Чиндяскин

 
>On Thu, Jun 20, 2024 at 3:43PM Hannu Krosing < han...@google.com > wrote:
>> Still it would be nice to have some public support for users of
>> non-managed PostgreSQL databases as well
>+1.
>
>--
>Robert Haas
>EDB:  http://www.enterprisedb.com
Hello! I have recently been researching postgres build using meson. And I came 
across a failure of the src/test/modules/test_pg_dump-running test.
You can see regression.diffs in attached file.
It turned out that the test fails if the cluster is initialized by custom user. 
In my case by user postgres.
Script to reproduce test_pg_fump failure:
---
meson setup -Dprefix=$PGPREFIX build
 
ninja -j64 -C build install >/dev/null
ninja -j64 -C build install-test-files >/dev/null
 
initdb -U postgres -k -D $PGPREFIX/data
pg_ctl -D $PGPREFIX/data -l logfile start
 
psql -U postgres -c "CREATE USER test SUPERUSER"
psql -U postgres -c "CREATE DATABASE test"
 
meson test --setup running --suite postgresql:test_pg_dump-running -C build/
---
You can catch the same failure if build using make and run make installcheck -C 
src/test/modules/test_pg_dump.
 
I have looked at the test and found queries like below:
---
SELECT pg_describe_object(classid,objid,objsubid) COLLATE "C" AS obj,
  pg_describe_object(refclassid,refobjid,0) AS refobj,
  deptype
  FROM pg_shdepend JOIN pg_database d ON dbid = d.oid
  WHERE d.datname = current_database()
  ORDER BY 1, 3;
---
This query does not expect that test database may already contain some 
information about custom user that ran test_pg_dump-running.
--
Egor Chindyaskin
Postgres Professional:  https://postgrespro.com
 

regression.diffs
Description: Binary data


Re: define pg_structiszero(addr, s, r)

2024-09-17 Thread Michael Paquier
On Wed, Sep 18, 2024 at 04:16:12AM +, Bertrand Drouvot wrote:
> The macro is created in pgstat_internal.h as it looks like that "only" the 
> statistics related code would benefit of it currently (could be moved to other
> header file later on if needed).

I'm OK to add a helper macro in pgstat_internal.h as this is a pattern
used only for some stats kinds (the other one I'm aware of is the
allzero check for pages around bufmgr.c), cleaning up all these static
declarations to make the memcpy() calls cheaper.  That can also be
useful for anybody doing a custom pgstats kind, fixed or
variable-numbered.

#define pg_structiszero(addr, s, r) \

Locating that at the top of pgstat_internal.h seems a bit out of order
to me.  Perhaps it would be better to move it closer to the inline
functions?

Also, is this the best name to use here?  Right, this is something
that may be quite generic.  However, if we limit its scope in the
stats, perhaps this should be named pgstat_entry_all_zeros() or
something like that?
--
Michael


signature.asc
Description: PGP signature


Re: query_id, pg_stat_activity, extended query protocol

2024-09-17 Thread Michael Paquier
On Wed, Sep 18, 2024 at 09:38:32AM +0900, Michael Paquier wrote:
> FWIW, I was thinking about something like what has been done in
> indexcmds.c for 5bbdfa8a18dc as the query ID value is not predictible
> across releases, but we could see whether it is set or not.

By the way, with the main issue fixed as of 933848d16dc9, could it be
possible to deal with the plan cache part in a separate thread?  This
is from the start a separate thread to me, and we've done quite a bit
here already.
--
Michael


signature.asc
Description: PGP signature


Re: Add contrib/pg_logicalsnapinspect

2024-09-17 Thread shveta malik
On Tue, Sep 17, 2024 at 12:44 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Tue, Sep 17, 2024 at 10:18:35AM +0530, shveta malik wrote:
> > Thanks for addressing the comments. I have not started reviewing v4
> > yet, but here are few more comments on v3:
> >
> > 1)
> > +#include "port/pg_crc32c.h"
> >
> > It is not needed in pg_logicalinspect.c as it is already included in
> > internal_snapbuild.h
>
> Yeap, forgot to remove that one when creating the new "internal".h file, done
> in v5 attached, thanks!
>
> >
> > 2)
> > + values[0] = Int16GetDatum(ondisk.builder.state);
> > 
> > + values[8] = LSNGetDatum(ondisk.builder.last_serialized_snapshot);
> > + values[9] = TransactionIdGetDatum(ondisk.builder.next_phase_at);
> > + values[10] = Int64GetDatum(ondisk.builder.committed.xcnt);
> >
> > We can have values[i++] in all the places and later we can check :
> > Assert(i == PG_GET_LOGICAL_SNAPSHOT_INFO_COLS);
> > Then we need not to keep track of number even in later part of code,
> > as it goes till 14.
>
> Right, let's do it that way (as it is done in pg_walinspect for example).
>
> > 4)
> > Most of the output columns in pg_get_logical_snapshot_info() look
> > self-explanatory except 'state'. Should we have meaningful 'text' here
> > corresponding to SnapBuildState? Similar to what we do for
> > 'invalidation_reason' in pg_replication_slots. (SlotInvalidationCauses
> > for ReplicationSlotInvalidationCause)
>
> Yeah we could. I was not sure about that (and that was my first remark in [1])
> , as the module is mainly for debugging purpose, I was thinking that the one
> using it could refer to "snapbuild.h". Let's see what others think.
>

okay, makes sense. lets wait what others have to say.

Thanks for the patch. Few trivial things:

1)
May be we shall change 'INTERNAL_SNAPBUILD_H' in snapbuild_internal.h
to 'SNAPBUILD_INTERNAL_H'?

2)
ValidateSnapshotFile()

It is not only validating, but loading the content as well. So may be
we can rename to ValidateAndRestoreSnapshotFile?

3) sgml:
a)
+ The pg_logicalinspect functions are called using an LSN argument
that can be extracted from the output name of the
pg_ls_logicalsnapdir() function.

Is it possible to give link to pg_ls_logicalsnapdir function here?

b)
+ Gets logical snapshot metadata about a snapshot file that is located
in the pg_logical/snapshots directory.

located in server's pg_logical/snapshots directory
 (i.e. use server keyword, similar to how pg_ls_logicalsnapdir ,
pg_ls_logicalmapdir explains it)

thanks
Shveta




Re: Regression tests fail with tzdata 2024b

2024-09-17 Thread Sven Klemm
On Tue, Sep 17, 2024 at 4:15 PM Tom Lane  wrote:

> Wolfgang Walther  writes:
> > The core regression tests need to be run with a timezone that tests
> > special cases in the timezone handling code. But that might not be true
> > for extensions - all they want could be a stable output across major and
> > minor versions of postgres and versions of tzdata. It could be helpful
> > to set pg_regress' timezone to UTC, for example?
>
> I would not recommend that choice.  It would mask simple errors such
> as failing to apply the correct conversion between timestamptz and
> timestamp values.  Also, if you have test cases that are affected by
> this issue at all, you probably have a need/desire to test things like
> DST transitions.
>

As far as I'm aware timescaledb does not rely on specifics of tzdata
version but just needs a stable setting for timezone. I guess I'll adjust
our tests to not depend on upstream pg_regress timezone.

-- 
Regards, Sven Klemm