Fix a comment error in logicalrep_write_typ()

2024-07-11 Thread cca5507
Hi,


-       /* use Oid as relation identifier */
+       /* use Oid as type identifier */
        pq_sendint32(out, typoid);



I think it must be "type identifier" rather than "relation identifier".


--
Regards,
ChangAo Chen

0001-Fix-a-comment-error-in-logicalrep_write_typ.patch
Description: Binary data


Re: Pluggable cumulative statistics

2024-07-11 Thread Michael Paquier
 On Wed, Jul 10, 2024 at 09:00:31AM +, Bertrand Drouvot wrote:
> On Wed, Jul 10, 2024 at 08:28:56AM +, Bertrand Drouvot wrote:
>> v5-0001 LGTM.

Thanks.  I've applied this refactoring piece.

> /*
>  * We found an existing statistics file. Read it and put all the hash
>  * table entries into place.
>  */

Indeed.  Reworded that slightly and applied it as well.

So we are down to the remaining parts of the patch, and this is going
to need a consensus about a few things because this impacts the
developer experience when implementing one's own custom stats:
- Are folks OK with the point of fixing the kind IDs in time like
RMGRs with a control in the wiki?  Or should a more artistic approach
be used like what I am mentioning at the bottom of [1].  The patch
allows a range of IDs to be used, to make the access to the stats
faster even if some area of memory may not be used.
- The fixed-numbered custom stats kinds are stored in an array in
PgStat_Snapshot and PgStat_ShmemControl, so as we have something
consistent with the built-in kinds.  This makes the tracking of the
validity of the data in the snapshots split into parts of the
structure for builtin and custom kinds.  Perhaps there are better
ideas than that?  The built-in fixed-numbered kinds have no
redirection.
- The handling of both built-in and custom kinds touches some areas of
pgstat.c and pgstat_shmem.c, which is the minimal I could come up
with.

Attached is a rebased patch set with the remaining pieces.

[1]: https://www.postgresql.org/message-id/ZoshTO9K7O7Z1wrX%40paquier.xyz
--
Michael
From 2cb5aa81d0d93a910934105756d893d2c0586ea2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 8 Jul 2024 12:28:26 +0900
Subject: [PATCH v6 1/5] Switch PgStat_Kind from enum to uint32

A follow-up patch is planned to make this counter extensible, and
keeping a trace of the kind behind a type is useful in the internal
routines used by pgstats.  While on it, switch pgstat_is_kind_valid() to
use PgStat_Kind, to be more consistent with its callers.
---
 src/include/pgstat.h| 35 ++---
 src/backend/utils/activity/pgstat.c |  6 ++---
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6b99bb8aad..cc97274708 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -32,26 +32,25 @@
 #define PG_STAT_TMP_DIR		"pg_stat_tmp"
 
 /* The types of statistics entries */
-typedef enum PgStat_Kind
-{
-	/* use 0 for INVALID, to catch zero-initialized data */
-	PGSTAT_KIND_INVALID = 0,
+#define PgStat_Kind uint32
 
-	/* stats for variable-numbered objects */
-	PGSTAT_KIND_DATABASE,		/* database-wide statistics */
-	PGSTAT_KIND_RELATION,		/* per-table statistics */
-	PGSTAT_KIND_FUNCTION,		/* per-function statistics */
-	PGSTAT_KIND_REPLSLOT,		/* per-slot statistics */
-	PGSTAT_KIND_SUBSCRIPTION,	/* per-subscription statistics */
+/* use 0 for INVALID, to catch zero-initialized data */
+#define PGSTAT_KIND_INVALID 0
 
-	/* stats for fixed-numbered objects */
-	PGSTAT_KIND_ARCHIVER,
-	PGSTAT_KIND_BGWRITER,
-	PGSTAT_KIND_CHECKPOINTER,
-	PGSTAT_KIND_IO,
-	PGSTAT_KIND_SLRU,
-	PGSTAT_KIND_WAL,
-} PgStat_Kind;
+/* stats for variable-numbered objects */
+#define PGSTAT_KIND_DATABASE	1	/* database-wide statistics */
+#define PGSTAT_KIND_RELATION	2	/* per-table statistics */
+#define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
+#define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
+#define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+
+/* stats for fixed-numbered objects */
+#define PGSTAT_KIND_ARCHIVER	6
+#define PGSTAT_KIND_BGWRITER	7
+#define PGSTAT_KIND_CHECKPOINTER	8
+#define PGSTAT_KIND_IO	9
+#define PGSTAT_KIND_SLRU	10
+#define PGSTAT_KIND_WAL	11
 
 #define PGSTAT_KIND_FIRST_VALID PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_LAST PGSTAT_KIND_WAL
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index c0ec9e8195..805712d414 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -182,7 +182,7 @@ static void pgstat_prep_snapshot(void);
 static void pgstat_build_snapshot(void);
 static void pgstat_build_snapshot_fixed(PgStat_Kind kind);
 
-static inline bool pgstat_is_kind_valid(int ikind);
+static inline bool pgstat_is_kind_valid(PgStat_Kind kind);
 
 
 /* --
@@ -1298,9 +1298,9 @@ pgstat_get_kind_from_str(char *kind_str)
 }
 
 static inline bool
-pgstat_is_kind_valid(int ikind)
+pgstat_is_kind_valid(PgStat_Kind kind)
 {
-	return ikind >= PGSTAT_KIND_FIRST_VALID && ikind <= PGSTAT_KIND_LAST;
+	return kind >= PGSTAT_KIND_FIRST_VALID && kind <= PGSTAT_KIND_LAST;
 }
 
 const PgStat_KindInfo *
-- 
2.45.2

From 0869c45608b7acb73afa4afc7bf2e481c239d705 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 8 Jul 2024 14:03:17 +0900
Subject: [PATCH v6 2/5] Introduce pluggable APIs for Cumulative Statistics

This commit adds support in the backend 

CFbot failed on Windows platform

2024-07-11 Thread Tatsuo Ishii
Hi,

A cfbot test suddenly failed with regression test error.

http://cfbot.cputube.org/highlights/all.html#4460

Following the link "regress" shows:
https://api.cirrus-ci.com/v1/artifact/task/5428792720621568/testrun/build/testrun/recovery/027_stream_regress/data/regression.diffs

diff --strip-trailing-cr -U3 
C:/cirrus/src/test/regress/expected/collate.windows.win1252.out 
C:/cirrus/build/testrun/recovery/027_stream_regress/data/results/collate.windows.win1252.out
--- C:/cirrus/src/test/regress/expected/collate.windows.win1252.out 
2024-07-11 02:44:08.385966100 +
+++ 
C:/cirrus/build/testrun/recovery/027_stream_regress/data/results/collate.windows.win1252.out
2024-07-11 02:49:51.280212100 +
@@ -21,10 +21,10 @@
 );
 \d collate_test1
 Table "collate_tests.collate_test1"
- Column |  Type   | Collation | Nullable | Default
+ Column |  Type   | Collation | Nullable | Default 
 +-+---+--+-
- a  | integer |   |  |
- b  | text| en_US | not null |
+ a  | integer |   |  | 
+ b  | text| en_US | not null | 
 :
 :

The differences are that the result has an extra space at the end of
line.  I am not familiar with Windows and has no idea why this could
happen (I haven't changed the patch set since May 24. The last
succeeded test was on July 9 14:58:44 (I am not sure about time zone).
Also I see exactly the same test failures in some other tests (for
example http://cfbot.cputube.org/highlights/all.html#4337)

Any idea?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: tests fail on windows with default git settings

2024-07-11 Thread Nazir Bilal Yavuz
Hi,

On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan  wrote:
>
>
> On 2024-07-10 We 9:25 AM, Tom Lane wrote:
> > Dave Page  writes:
> >> On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan  wrote:
> >>> As I was looking at this I wondered if there might be anywhere else that
> >>> needed adjustment. One thing that occurred to me was that that maybe we
> >>> should replace the use of "-w" in pg_regress.c with this rather less
> >>> dangerous flag, so instead of ignoring any white space difference we would
> >>> only ignore line end differences. The use of "-w" apparently dates back to
> >>> 2009.
> >> That seems like a good improvement to me.
> > +1
> >
> >
>
>
> OK, done.

It looks like Postgres CI did not like this change. 'Windows - Server
2019, VS 2019 - Meson & ninja' [1] task started to fail after this
commit, there is one extra space at the end of line in regress test's
output.

[1] https://cirrus-ci.com/task/6753781205958656

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: CFbot failed on Windows platform

2024-07-11 Thread Nazir Bilal Yavuz
Hi,

On Thu, 11 Jul 2024 at 11:07, Tatsuo Ishii  wrote:
>
> Hi,
>
> A cfbot test suddenly failed with regression test error.
>
> http://cfbot.cputube.org/highlights/all.html#4460
>
> Following the link "regress" shows:
> https://api.cirrus-ci.com/v1/artifact/task/5428792720621568/testrun/build/testrun/recovery/027_stream_regress/data/regression.diffs
>
> diff --strip-trailing-cr -U3 
> C:/cirrus/src/test/regress/expected/collate.windows.win1252.out 
> C:/cirrus/build/testrun/recovery/027_stream_regress/data/results/collate.windows.win1252.out
> --- C:/cirrus/src/test/regress/expected/collate.windows.win1252.out 
> 2024-07-11 02:44:08.385966100 +
> +++ 
> C:/cirrus/build/testrun/recovery/027_stream_regress/data/results/collate.windows.win1252.out
> 2024-07-11 02:49:51.280212100 +
> @@ -21,10 +21,10 @@
>  );
>  \d collate_test1
>  Table "collate_tests.collate_test1"
> - Column |  Type   | Collation | Nullable | Default
> + Column |  Type   | Collation | Nullable | Default
>  +-+---+--+-
> - a  | integer |   |  |
> - b  | text| en_US | not null |
> + a  | integer |   |  |
> + b  | text| en_US | not null |
>  :
>  :
>
> The differences are that the result has an extra space at the end of
> line.  I am not familiar with Windows and has no idea why this could
> happen (I haven't changed the patch set since May 24. The last
> succeeded test was on July 9 14:58:44 (I am not sure about time zone).
> Also I see exactly the same test failures in some other tests (for
> example http://cfbot.cputube.org/highlights/all.html#4337)
>
> Any idea?

I think It is related to the '628c1d1f2c' commit. This commit changed
the output of the regress test in Windows.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Conflict detection and logging in logical replication

2024-07-11 Thread shveta malik
On Wed, Jul 10, 2024 at 3:09 PM shveta malik  wrote:
>
> On Wed, Jul 3, 2024 at 8:31 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Wednesday, June 26, 2024 10:58 AM Zhijie Hou (Fujitsu) 
> >  wrote:
> > >
> >
> > Hi,
> >
> > As suggested by Sawada-san in another thread[1].
> >
> > I am attaching the V4 patch set which tracks the delete_differ
> > conflict in logical replication.
> >
> > delete_differ means that the replicated DELETE is deleting a row
> > that was modified by a different origin.
> >

Thanks for the patch. please find few comments for patch002:

1)
Commit msg says: The conflicts will be tracked only when
track_conflict option of the subscription is enabled.

track_conflict --> detect_conflict

2)
monitoring.sgml: Below are my suggestions, please change if you feel apt.

2a) insert_exists_count : Number of times inserting a row that
violates a NOT DEFERRABLE unique constraint while applying changes.
Suggestion: Number of times a row insertion violated a NOT DEFERRABLE
unique constraint while applying changes.

2b) update_differ_count : Number of times updating a row that was
previously modified by another source while applying changes.
Suggestion: Number of times update was performed on a row that was
previously modified by another source while applying changes.

2c) delete_differ_count: Number of times deleting a row that was
previously modified by another source while applying changes.
Suggestion: Number of times delete was performed on a row that was
previously modified by another source while applying changes.

2d) To be consistent, we can change  'is not found' to 'was not found'
in update_missing_count , delete_missing_count cases as well.


3)
create_subscription.sgml has:
When conflict detection is enabled, additional logging is triggered
and the conflict statistics are collected in the following scenarios:

--Can we rephrase a little and link pg_stat_subscription_stats
structure here for reference.

4)
IIUC, conflict_count array (in pgstat.h) maps directly to ConflictType
enum. So if the order of entries ever changes in this enum, without
changing it in pg_stat_subscription_stats and pg_proc, we may get
wrong values under each column when querying
pg_stat_subscription_stats. If so, then perhaps it is good to add a
comment atop ConflictType that if someone changes this order, order in
other files too needs to be changed.

5)
conflict.h:CONFLICT_NUM_TYPES

--Shall the macro be CONFLICT_TYPES_NUM  instead?

6)
pgstatsfuncs.c

-
/* conflict count */
for (int i = 0; i < CONFLICT_NUM_TYPES; i++)
values[3 + i] = Int64GetDatum(subentry->conflict_count[i]);

/* stats_reset */
if (subentry->stat_reset_timestamp == 0)
nulls[8] = true;
else
values[8] = TimestampTzGetDatum(subentry->stat_reset_timestamp);
-

After setting values for [3+i], we abruptly had [8]. I think it will
be better to use i++ to increment values' index. And at the end, we
can check if it reached 'PG_STAT_GET_SUBSCRIPTION_STATS_COLS'.

thanks
Shveta




Re: Check lateral references within PHVs for memoize cache keys

2024-07-11 Thread Richard Guo
On Fri, Jun 28, 2024 at 10:14 PM Andrei Lepikhov  wrote:
> I have reviewed v7 of the patch. This improvement is good enough to be
> applied, thought.

Thank you for reviewing this patch!

> Comment may be rewritten for clarity:
> "Determine if the clauses in param_info and innerrel's lateral_vars" -
> I'd replace lateral_vars with 'lateral references' to combine in one
> phrase PHV from rel and root->placeholder_list sources.

Makes sense.  I ended up using 'innerrel's lateral vars' to include
both the lateral Vars/PHVs found in innerrel->lateral_vars and those
extracted from within PlaceHolderVars that are due to be evaluated at
innerrel.

> I wonder if we can add whole PHV expression instead of the Var (as
> discussed above) just under some condition:
> if (!bms_intersect(pull_varnos(root, (Node *) phinfo->ph_var->phexpr),
> innerrelids))
> {
>// Add whole PHV
> }
> else
> {
>// Add only pulled vars
> }

Good point.  After considering it further, I think we should do this.
As David explained, this can be beneficial in cases where the whole
expression results in fewer distinct values to cache tuples for.

> I got the point about Memoize over join, but as a join still calls
> replace_nestloop_params to replace parameters in its clauses, why not to
> invent something similar to find Memoize keys inside specific JoinPath
> node? It is not the issue of this patch, though - but is it doable?

I don't think it's impossible to do, but I'm skeptical that there's an
easy way to identify all the cache keys for joinrels, without having
available ppi_clauses and lateral_vars.

> IMO, the code:
> if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids))
>cache_purge_all(node);
>
> is a good place to check an assertion: is it really the parent query
> parameters that make a difference between memoize keys and node list of
> parameters?

I don't think we have enough info available here to identify which
params within outerPlan->chgParam are from outer levels.  Maybe we can
store root->outer_params in the MemoizeState node to help with this
assertion, but I'm not sure if it's worth the trouble.

Attached is an updated version of this patch.

Thanks
Richard


v8-0001-Check-lateral-references-within-PHVs-for-memoize-cache-keys.patch
Description: Binary data


Re: Improve the connection failure error messages

2024-07-11 Thread Nisha Moond
On Tue, Jul 9, 2024 at 1:00 AM Tom Lane  wrote:
>
> Nisha Moond  writes:
> > Attached v5 patch with the translator comments as suggested.
>
> I looked at this, and I agree with the goal, but I find just about all
> of the translator comments unnecessary.  The ones that are useful are
> useful only because the message is violating one of our message style
> guidelines [1]:
>
> When citing the name of an object, state what kind of object it is.
>
> Rationale: Otherwise no one will know what “foo.bar.baz” refers to.
>
> So, for example, where you have
>
> +
> +/*
> + * translator: first %s is the subscription name, second %s is 
> the
> + * error
> + */
> + errmsg("subscription \"%s\" could not connect to the 
> publisher: %s", stmt->subname, err)));
>
> I don't find that that translator comment is adding anything.
> But there are a couple of places like
>
> +/*
> + * translator: first %s is the slotsync worker name, second %s is the
> + * error
> + */
> +errmsg("\"%s\" could not connect to the primary server: %s", 
> app_name.data, err));
>
> I think that the right cure for the ambiguity here is not to add a
> translator comment, but to label the name properly, perhaps like
>
> errmsg("synchronization worker \"%s\" could not connect to 
> the primary server: %s",
>app_name.data, err));
>
>
> regards, tom lane
>
> [1] 
> https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-OBJECT-TYPE

Thank you for the review.

Attached the patch v6 with suggested improvements.
- Removed unnecessary translator comments.
- Added appropriate identifier names where missing.

--
Thanks,
Nisha


v6-0001-Improve-the-connection-failure-error-messages.patch
Description: Binary data


Re: CFbot failed on Windows platform

2024-07-11 Thread Tatsuo Ishii
>> The differences are that the result has an extra space at the end of
>> line.  I am not familiar with Windows and has no idea why this could
>> happen (I haven't changed the patch set since May 24. The last
>> succeeded test was on July 9 14:58:44 (I am not sure about time zone).
>> Also I see exactly the same test failures in some other tests (for
>> example http://cfbot.cputube.org/highlights/all.html#4337)
>>
>> Any idea?
> 
> I think It is related to the '628c1d1f2c' commit. This commit changed
> the output of the regress test in Windows.

Yeah, it seems that explains. I see few buildfarm window animals
complain too.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp






Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2024-07-11 Thread Aleksander Alekseev
Hi,

> The proposed patch looks rather incomplete to me, based on the fact
> that this stuff has a lot of inconsistencies with the types used when
> manipulating 64b SLRU pages.  Some of them are harder to catch as the
> variables don't specifically refer to pages.
>
> So, even after v2, there are two more of these in asyncQueueUsage()
> with the two QUEUE_POS_PAGE() for the head and tail positions:
> int headPage = QUEUE_POS_PAGE(QUEUE_HEAD);
> int tailPage = QUEUE_POS_PAGE(QUEUE_TAIL);
>
> asyncQueueReadAllNotifications() also has one:
> int curpage = QUEUE_POS_PAGE(pos);
>
> asyncQueueAdvanceTail() declares the following:
> int oldtailpage;
> int newtailpage;
> int boundary;
>
> AsyncQueueControl.stopPage is an int.
>
> And that's only for async.c.  Alexander K., as the owner of the open
> item, are you planning to look at that?

Thanks, Michael. I prepared a corrected patchset.

-- 
Best regards,
Aleksander Alekseev


v3-0001-Fix-the-comment-for-SlruCtlData.long_segment_name.patch
Description: Binary data


v3-0002-Use-int64-for-page-numbers-in-clog.c-async.c.patch
Description: Binary data


Re: speed up a logical replica setup

2024-07-11 Thread Amit Kapila
On Tue, Jul 9, 2024 at 4:30 PM Alexander Lakhin  wrote:
>
> Please look at another failure of the test [1]:
> [13:28:05.647](2.460s) not ok 26 - failover slot is synced
> [13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
> #   at 
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
>  line 307.
> [13:28:05.648](0.000s) #  got: ''
> # expected: 'failover_slot'
>
> with 040_pg_createsubscriber_node_s.log containing:
> 2024-07-08 13:28:05.369 UTC [3985464][client backend][0/2:0] LOG: statement: 
> SELECT pg_sync_replication_slots()
> 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] LOG: could not 
> sync slot "failover_slot" as remote slot
> precedes local slot
> 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] DETAIL:  Remote 
> slot has LSN 0/30047B8 and catalog xmin
> 743, but local slot has LSN 0/30047B8 and catalog xmin 744.
>
> I could not reproduce it locally, but I've discovered that that subtest
> somehow depends on pg_createsubscriber executed for the
> 'primary contains unmet conditions on node P' check. For example with this
> test modification:
> @@ -249,7 +249,7 @@ command_fails(
>   $node_p->connstr($db1), '--socket-directory',
>   $node_s->host, '--subscriber-port',
>   $node_s->port, '--database',
> -$db1, '--database',
> +'XXX', '--database',
>   $db2
>   ],
>   'primary contains unmet conditions on node P');
>
> I see the same failure:
> 2024-07-09 10:19:43.284 UTC [938890] 040_pg_createsubscriber.pl LOG:  
> statement: SELECT pg_sync_replication_slots()
> 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl LOG:  could 
> not sync slot "failover_slot" as remote slot
> precedes local slot
> 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl DETAIL:  
> Remote slot has LSN 0/3004780 and catalog xmin
> 743, but local slot has LSN 0/3004780 and catalog xmin 744.
>
> Thus maybe even a normal pg_createsubscriber run can affect the primary
> server (it's catalog xmin) differently?
>

Yes, pg_createsubscriber can affect the primary server's catalog xmin
because it starts the standby server that can send HSFeedback (See
XLogWalRcvSendHSFeedback()), which can advance the physical slot's
xmin corresponding the following Insert in the test:

# Insert another row on node P and wait node S to catch up
$node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
$node_p->wait_for_replay_catchup($node_s);

In the success case, pg_createsubscriber is able to send HSFeedback
and in the failure case, it won't. We can see the following logs in
040_pg_createsubscriber_node_p.log:

2024-07-08 13:28:00.872 UTC [3982331][walsender][:0] FATAL:  the
database system is starting up
2024-07-08 13:28:00.875 UTC [3982328][startup][:0] LOG:  database
system was shut down at 2024-07-08 13:28:00 UTC
2024-07-08 13:28:01.105 UTC [3981996][postmaster][:0] LOG:  database
system is ready to accept connections

This shows that when the test  'primary contains unmet conditions on
node P' starts the standby server the corresponding primary node was
not ready because we just restarted node_p before that test and didn't
ensure that the node_p is up and ready to accept connections before
starting the pg_createsubscriber test.

Even in the successful cases where the standby is able to connect to
primary for test 'primary contains unmet conditions on node P', there
is no guarantee that xmin of the physical slot will be updated at
least, we don't have anything in the test to ensure the same.

Now as before creating logical replication, we didn't ensure that the
physical slot's xmin has been caught up to the latest value, the test
can lead to failure like: "Remote slot has LSN 0/3004780 and catalog
xmin 743, but local slot has LSN 0/3004780 and catalog xmin 744".

The xmin on standby could have been advanced due to the following Insert:
# Insert another row on node P and wait node S to catch up
$node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
$node_p->wait_for_replay_catchup($node_s);

We don't wait for the xmin to catch up corresponding to this insert
and I don't know if there is a way to do that. So, we should move this
Insert to after the call to pg_sync_replication_slots(). It won't
impact the general test of pg_createsubscriber.

Thanks to Hou-San for helping me in the analysis of this BF failure.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-07-11 Thread Amit Kapila
On Wed, Jul 10, 2024 at 4:51 PM Euler Taveira  wrote:
>
> On Tue, Jul 9, 2024, at 8:00 AM, Alexander Lakhin wrote:
>
> Hello Amit and Kuroda-san,
>
> 03.07.2024 14:02, Amit Kapila wrote:
> > Pushed 0002 and 0003. Let's wait for a discussion on 0001.
>
> Please look at another failure of the test [1]:
> [13:28:05.647](2.460s) not ok 26 - failover slot is synced
> [13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
> #   at 
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
>  line 307.
> [13:28:05.648](0.000s) #  got: ''
> # expected: 'failover_slot'
>
>
> I'm wondering if the attached patch is sufficient to move the restart_lsn
> forward.
>

This won't guarantee that xmin of the logical slot on the primary has
moved ahead or at least we haven't ensured the same. So, we should try
not to perform statements that can increase the xmin on standby before
creating a logical slot. See the analysis sent in the previous email.

-- 
With Regards,
Amit Kapila.




Re: Fix a comment error in logicalrep_write_typ()

2024-07-11 Thread Amit Kapila
On Thu, Jul 11, 2024 at 12:46 PM cca5507  wrote:
>
> -   /* use Oid as relation identifier */
> +   /* use Oid as type identifier */
> pq_sendint32(out, typoid);
>
> I think it must be "type identifier" rather than "relation identifier".
>

LGTM.

-- 
With Regards,
Amit Kapila.




Re: Send duration output to separate log files

2024-07-11 Thread Alastair Turner
On Wed, 10 Jul 2024 at 16:58, Greg Sabino Mullane 
wrote:
--snip--

> Why not build a more generic log filtering case?
>
> I looked into this, but it would be a large undertaking, given the way our
> logging system works. And as per above, I don't think the pain would be
> worth it, as duration covers 99% of the use cases for separate logs.
>

 The other category of logging which would benefit from a separate file is
audit. It also can create massive volumes of log content. Splitting audit
information off into a separate file for use by a separate team or function
is also a request I have heard from some financial institutions adopting
Postgres. With audit being provided by an extension, this would become
quite an intrusive change.

Why not use an extension for this?
>
> I did start this as an extension, but it only goes so far. We can use
> emit_log_hook, but it requires careful coordination of open filehandles,
> has to do inefficient regex of every log message, and cannot do things like
> log rotation.
>

Would an extension be able to safely modify the message_type field you have
added using emit_log_hook? If so, the field becomes more of a log
destination label than a type marker. If an extension could hook into the
log file creation/rotation process, that would be a nice basis for enabling
extensions (I'm particularly thinking of pgAudit) to manage separate
logging destinations.


Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal

2024-07-11 Thread Fujii Masao




On 2024/07/11 1:35, Robert Haas wrote:

On Wed, Jul 10, 2024 at 10:10 AM Robert Haas  wrote:

On Wed, Jul 10, 2024 at 1:56 AM Fujii Masao  wrote:

I believe this issue occurs when the server is shut down cleanly.
The shutdown-checkpoint record retains the wal_level value used
before the shutdown. If wal_level is changed after this,
the wal_level that indicated by the shutdown-checkpoint record
and that the WAL data generated afterwards depends on may not match.


Oh, that's a problem. I'll have to think about that.


Here is an attempt at fixing this problem.


Thanks for updating the patch!

+* fast_forward is normally false, but is true when we have encountered
+* WAL generated with wal_level=minimal and are skipping over it without
+* emitting summary files. In this case, summarized_tli and 
summarized_lsn
+* will advance even though nothing is being written to disk, until we
+* again reach a point where wal_level>minimal.
+*
 * summarizer_pgprocno is the proc number of the summarizer process, if
 * one is running, or else INVALID_PROC_NUMBER.
 *
@@ -83,6 +89,7 @@ typedef struct
TimeLineID  summarized_tli;
XLogRecPtr  summarized_lsn;
boollsn_is_exact;
+   boolfast_forward;

It looks like the fast_forward field in WalSummarizerData is no longer 
necessary.

So far, I haven't found any other issues with the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Fix a comment error in logicalrep_write_typ()

2024-07-11 Thread cca5507
Thank you for review!

The commitfest link for tracking:
https://commitfest.postgresql.org/49/5121/


--
Regards,
ChangAo Chen

Re: MERGE/SPLIT partition commands should create new partitions in the parent's tablespace?

2024-07-11 Thread Fujii Masao



On 2024/07/10 22:35, Masahiko Sawada wrote:

BTW the new regression tests don't check the table and index names.
Isn't it better to show table and index names for better
diagnosability?


Sounds good to me. I've updated the patch as suggested.
Please see the attached patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONFrom 46d4cebc88b5f7e89205bc02c0689872233bafd8 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Sat, 6 Jul 2024 14:18:54 +0900
Subject: [PATCH v2] Ensure MERGE/SPLIT partition commands create new
 partitions in the parent's tablespace.

Previously, ALTER TABLE MERGE PARTITIONS and ALTER TABLE SPLIT PARTITION
commands always created new partitions in the default tablespace,
ignoring the parent's tablespace. This commit fixes that, making sure
new partitions inherit the parent's tablespace.

Backpatch to v17 where those commands were added.
---
 doc/src/sgml/ref/alter_table.sgml |  6 +++--
 src/backend/commands/tablecmds.c  |  2 +-
 src/test/regress/expected/partition_merge.out | 23 
 src/test/regress/expected/partition_split.out | 26 +++
 src/test/regress/sql/partition_merge.sql  | 12 +
 src/test/regress/sql/partition_split.sql  | 13 ++
 6 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index c062a36880..6a2822adad 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1163,7 +1163,8 @@ WITH ( MODULUS numeric_literal, REM
   New partitions will have the same table access method as the parent.
   If the parent table is persistent then new partitions are created
   persistent.  If the parent table is temporary then new partitions
-  are also created temporary.
+  are also created temporary.  New partitions will also be created in
+  the same tablespace as the parent.
  
  
   
@@ -1235,7 +1236,8 @@ WITH ( MODULUS numeric_literal, REM
   The new partition will have the same table access method as the parent.
   If the parent table is persistent then the new partition is created
   persistent.  If the parent table is temporary then the new partition
-  is also created temporary.
+  is also created temporary.  The new partition will also be created in
+  the same tablespace as the parent.
  
  
   
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dbfe0d6b1c..9e1eff19e0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -20331,7 +20331,7 @@ createPartitionTable(RangeVar *newPartName, Relation 
modelRel,
createStmt->constraints = NIL;
createStmt->options = NIL;
createStmt->oncommit = ONCOMMIT_NOOP;
-   createStmt->tablespacename = NULL;
+   createStmt->tablespacename = 
get_tablespace_name(modelRel->rd_rel->reltablespace);
createStmt->if_not_exists = false;
createStmt->accessMethod = get_am_name(modelRel->rd_rel->relam);
 
diff --git a/src/test/regress/expected/partition_merge.out 
b/src/test/regress/expected/partition_merge.out
index 9c67a4a8b1..26bf58b23d 100644
--- a/src/test/regress/expected/partition_merge.out
+++ b/src/test/regress/expected/partition_merge.out
@@ -861,6 +861,29 @@ SET search_path = partitions_merge_schema, pg_temp, public;
 -- Can't merge temporary partitions into a persistent partition
 ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2;
 ROLLBACK;
+-- Check the new partition inherits parent's tablespace
+CREATE TABLE t (i int PRIMARY KEY USING INDEX TABLESPACE regress_tblspace)
+  PARTITION BY RANGE (i) TABLESPACE regress_tblspace;
+CREATE TABLE tp_0_1 PARTITION OF t FOR VALUES FROM (0) TO (1);
+CREATE TABLE tp_1_2 PARTITION OF t FOR VALUES FROM (1) TO (2);
+ALTER TABLE t MERGE PARTITIONS (tp_0_1, tp_1_2) INTO tp_0_2;
+SELECT tablename, tablespace FROM pg_tables
+  WHERE tablename IN ('t', 'tp_0_2') ORDER BY tablename, tablespace;
+ tablename |tablespace
+---+--
+ t | regress_tblspace
+ tp_0_2| regress_tblspace
+(2 rows)
+
+SELECT tablename, indexname, tablespace FROM pg_indexes
+  WHERE tablename IN ('t', 'tp_0_2') ORDER BY tablename, indexname, tablespace;
+ tablename |  indexname  |tablespace
+---+-+--
+ t | t_pkey  | regress_tblspace
+ tp_0_2| tp_0_2_pkey | regress_tblspace
+(2 rows)
+
+DROP TABLE t;
 -- Check the new partition inherits parent's table access method
 SET search_path = partitions_merge_schema, public;
 CREATE ACCESS METHOD partitions_merge_heap TYPE TABLE HANDLER 
heap_tableam_handler;
diff --git a/src/test/regress/expected/partition_split.out 
b/src/test/regress/expected/partition_split.out
index 5fbfc8f805..1a8c95ad81 100644
--- a/src/test/regress/expected/partition_split.out
++

RE: speed up a logical replica setup

2024-07-11 Thread Zhijie Hou (Fujitsu)
On Thursday, July 11, 2024 6:21 PM Amit Kapila  wrote:
> 
> On Tue, Jul 9, 2024 at 4:30 PM Alexander Lakhin 
> wrote:
> >
> > Please look at another failure of the test [1]:
> > [13:28:05.647](2.460s) not ok 26 - failover slot is synced
> > [13:28:05.648](0.001s) #   Failed test 'failover slot is synced'
> > #   at
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/bin/pg_basebackup/t/04
> 0_pg_createsubscriber.pl line 307.
> > [13:28:05.648](0.000s) #  got: ''
> > # expected: 'failover_slot'
> >
> > with 040_pg_createsubscriber_node_s.log containing:
> > 2024-07-08 13:28:05.369 UTC [3985464][client backend][0/2:0] LOG:
> > statement: SELECT pg_sync_replication_slots()
> > 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] LOG:
> > could not sync slot "failover_slot" as remote slot precedes local slot
> > 2024-07-08 13:28:05.557 UTC [3985464][client backend][0/2:0] DETAIL:
> > Remote slot has LSN 0/30047B8 and catalog xmin 743, but local slot has LSN
> 0/30047B8 and catalog xmin 744.
> >
> > I could not reproduce it locally, but I've discovered that that
> > subtest somehow depends on pg_createsubscriber executed for the
> > 'primary contains unmet conditions on node P' check. For example with
> > this test modification:
> > @@ -249,7 +249,7 @@ command_fails(
> >   $node_p->connstr($db1), '--socket-directory',
> >   $node_s->host, '--subscriber-port',
> >   $node_s->port, '--database',
> > -$db1, '--database',
> > +'XXX', '--database',
> >   $db2
> >   ],
> >   'primary contains unmet conditions on node P');
> >
> > I see the same failure:
> > 2024-07-09 10:19:43.284 UTC [938890] 040_pg_createsubscriber.pl LOG:
> > statement: SELECT pg_sync_replication_slots()
> > 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl LOG:
> > could not sync slot "failover_slot" as remote slot precedes local slot
> > 2024-07-09 10:19:43.292 UTC [938890] 040_pg_createsubscriber.pl
> > DETAIL:  Remote slot has LSN 0/3004780 and catalog xmin 743, but local
> slot has LSN 0/3004780 and catalog xmin 744.
> >
> > Thus maybe even a normal pg_createsubscriber run can affect the
> > primary server (it's catalog xmin) differently?
> >
> 
> Yes, pg_createsubscriber can affect the primary server's catalog xmin because
> it starts the standby server that can send HSFeedback (See
> XLogWalRcvSendHSFeedback()), which can advance the physical slot's xmin
> corresponding the following Insert in the test:
> 
> # Insert another row on node P and wait node S to catch up
> $node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
> $node_p->wait_for_replay_catchup($node_s);
> 
> In the success case, pg_createsubscriber is able to send HSFeedback and in
> the failure case, it won't. We can see the following logs in
> 040_pg_createsubscriber_node_p.log:
> 
> 2024-07-08 13:28:00.872 UTC [3982331][walsender][:0] FATAL:  the database
> system is starting up
> 2024-07-08 13:28:00.875 UTC [3982328][startup][:0] LOG:  database system
> was shut down at 2024-07-08 13:28:00 UTC
> 2024-07-08 13:28:01.105 UTC [3981996][postmaster][:0] LOG:  database
> system is ready to accept connections
> 
> This shows that when the test  'primary contains unmet conditions on node P'
> starts the standby server the corresponding primary node was not ready
> because we just restarted node_p before that test and didn't ensure that the
> node_p is up and ready to accept connections before starting the
> pg_createsubscriber test.
> 
> Even in the successful cases where the standby is able to connect to primary
> for test 'primary contains unmet conditions on node P', there is no guarantee
> that xmin of the physical slot will be updated at least, we don't have 
> anything in
> the test to ensure the same.
> 
> Now as before creating logical replication, we didn't ensure that the physical
> slot's xmin has been caught up to the latest value, the test can lead to 
> failure
> like: "Remote slot has LSN 0/3004780 and catalog xmin 743, but local slot has
> LSN 0/3004780 and catalog xmin 744".
> 
> The xmin on standby could have been advanced due to the following Insert:
> # Insert another row on node P and wait node S to catch up
> $node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES('second row')");
> $node_p->wait_for_replay_catchup($node_s);
> 
> We don't wait for the xmin to catch up corresponding to this insert and I 
> don't
> know if there is a way to do that. So, we should move this Insert to after 
> the call
> to pg_sync_replication_slots(). It won't impact the general test of
> pg_createsubscriber.

The analysis and suggestion look reasonable to me.
Here is a small patch which does the same.


> Thanks to Hou-San for helping me in the analysis of this BF failure.

Best Regards,
Hou zj


0001-fix-unstable-test-in-040_pg_createsubscriber.patch
Description: 0001-fix-unstable-test-in-040_pg_createsubscriber.patch


Re: tests fail on windows with default git settings

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan  wrote:


On 2024-07-10 We 9:25 AM, Tom Lane wrote:

Dave Page  writes:

On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan  wrote:

As I was looking at this I wondered if there might be anywhere else that
needed adjustment. One thing that occurred to me was that that maybe we
should replace the use of "-w" in pg_regress.c with this rather less
dangerous flag, so instead of ignoring any white space difference we would
only ignore line end differences. The use of "-w" apparently dates back to
2009.

That seems like a good improvement to me.

+1




OK, done.

It looks like Postgres CI did not like this change. 'Windows - Server
2019, VS 2019 - Meson & ninja' [1] task started to fail after this
commit, there is one extra space at the end of line in regress test's
output.

[1] https://cirrus-ci.com/task/6753781205958656



Oh, that's annoying. Will investigate. Thanks for the heads up.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Things I don't like about \du's "Attributes" column

2024-07-11 Thread Rafia Sabih
On Thu, 6 Jun 2024 at 23:10, Pavel Luzanov  wrote:
>
> On 06.06.2024 17:29, Robert Haas wrote:
>
> I think the first of these special interpretations is unnecessary and
> should be removed. It seems pretty clear what 0 means.
>
> Agree.
> There is an additional technical argument for removing this replacement.
> I don't like explicit cast to text of the "Connection limit" column.
> Without 'Not allowed' it is no longer required.
> Value -1 can be replaced by NULL with an implicit cast to integer.
>
> Next version with this change attached.
>
> Example output:
>
> \du+ regress_du*
> List of roles
> Role name | Login | Attributes  | Valid until  | 
> Connection limit |   Description
> --+---+-+--+--+--
>  regress_du_admin | yes   | Superuser  +|  |  
> | some description
>   |   | Create DB  +|  |  
> |
>   |   | Create role+|  |  
> |
>   |   | Inherit+|  |  
> |
>   |   | Replication+|  |  
> |
>   |   | Bypass RLS  |  |  
> |
>  regress_du_role0 | yes   | Inherit | Tue Jun 04 00:00:00 2024 PDT |  
>   0 |
>  regress_du_role1 | no| Create role+| infinity |  
> |
>   |   | Inherit |  |  
> |
>  regress_du_role2 | yes   | Inherit+|  |  
>  42 |
>   |   | Replication+|  |  
> |
>   |   | Bypass RLS  |  |  
> |
> (4 rows)
>
> Current version for comparison:
This looks much better than the current version. Only thing is, I find
the column name Valid until confusing. With that name I am in danger
of taking it as the role's validity and not the passwords'.
How about naming it to something like Password validity...?
>
>   List of roles
> Role name | Attributes
>  |   Description
> --++--
>  regress_du_admin | Superuser, Create role, Create DB, Replication, Bypass 
> RLS | some description
>  regress_du_role0 | No connections
> +|
>   | Password valid until 2024-06-04 00:00:00+03   
>  |
>  regress_du_role1 | Create role, Cannot login 
> +|
>   | Password valid until infinity 
>  |
>  regress_du_role2 | Replication, Bypass RLS   
> +|
>   | 42 connections
>  |
>
>
> Data:
> CREATE ROLE regress_du_role0 LOGIN PASSWORD '123' VALID UNTIL '2024-06-04' 
> CONNECTION LIMIT 0;
> CREATE ROLE regress_du_role1 CREATEROLE CONNECTION LIMIT -1 VALID UNTIL 
> 'infinity';
> CREATE ROLE regress_du_role2 LOGIN REPLICATION BYPASSRLS CONNECTION LIMIT 42;
> CREATE ROLE regress_du_admin LOGIN SUPERUSER CREATEROLE CREATEDB BYPASSRLS 
> REPLICATION INHERIT;
> COMMENT ON ROLE regress_du_admin IS 'some description';
>
> --
> Pavel Luzanov
> Postgres Professional: https://postgrespro.com

-- 
Regards,
Rafia Sabih




Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-07-11 Thread Justin Pryzby
On Sat, Jun 15, 2024 at 07:56:38PM +0100, Ilya Gladyshev wrote:
> In addition, I noticed that progress tracking is once again broken for
> partitioned tables, while looking at REINDEX implementation, attaching the
> second patch to fix it.

Thanks for the fixes, I started reviewing them but need some more time
to digest.

Do you mean that progress reporting is broken in master, for REINDEX, or
just with this patch ?

-- 
Justin




Re: long-standing data loss bug in initial sync of logical replication

2024-07-11 Thread Nitin Motiani
On Wed, Jul 10, 2024 at 11:22 PM Nitin Motiani  wrote:
>
> On Wed, Jul 10, 2024 at 10:39 PM vignesh C  wrote:
> >
> > On Wed, 10 Jul 2024 at 12:28, Amit Kapila  wrote:
> > > The patch missed to use the ShareRowExclusiveLock for partitions, see
> > > attached. I haven't tested it but they should also face the same
> > > problem. Apart from that, I have changed the comments in a few places
> > > in the patch.
> >
> > I could not hit the updated ShareRowExclusiveLock changes through the
> > partition table, instead I could verify it using the inheritance
> > table. Added a test for the same and also attaching the backbranch
> > patch.
> >
>
> Hi,
>
> I tested alternative-experimental-fix-lock.patch provided by Tomas
> (replaces SUE with SRE in OpenTableList). I believe there are a couple
> of scenarios the patch does not cover.
>
> 1. It doesn't handle the case of "ALTER PUBLICATION  ADD TABLES
> IN SCHEMA  ".
>
> I took crash-test.sh provided by Tomas and modified it to add all
> tables in the schema to publication using the following command :
>
>ALTER PUBLICATION p ADD TABLES IN SCHEMA  public
>
> The modified script is attached (crash-test-with-schema.sh). With this
> script, I can reproduce the issue even with the patch applied. This is
> because the code path to add a schema to the publication doesn't go
> through OpenTableList.
>
> I have also attached a script run-test-with-schema.sh to run
> crash-test-with-schema.sh in a loop with randomly generated parameters
> (modified from run.sh provided by Tomas).
>
> 2.  The second issue is a deadlock which happens when the alter
> publication command is run for a comma separated list of tables.
>
> I created another script create-test-tables-order-reverse.sh. This
> script runs a command like the following :
>
> ALTER PUBLICATION p ADD TABLE test_2,test_1
>
> Running the above script, I was able to get a deadlock error (the
> output is attached in deadlock.txt). In the alter publication command,
> I added the tables in the reverse order to increase the probability of
> the deadlock. But it should happen with any order of tables.
>
> I am not sure if the deadlock is a major issue because detecting the
> deadlock is better than data loss. The schema issue is probably more
> important. I didn't test it out with the latest patches sent by
> Vignesh but since the code changes in that patch are also in
> OpenTableList, I think the schema scenario won't be covered by those.
>


Hi,

I looked further into the scenario of adding the tables in schema to
the publication. Since in that case, the entry is added to
pg_publication_namespace instead of pg_publication_rel, the codepaths
for 'add table' and 'add tables in schema' are different. And in the
'add tables in schema' scenario, the OpenTableList function is not
called to get the relation ids. Therefore even with the proposed
patch, the data loss issue still persists in that case.

To validate this idea, I tried locking all the affected tables in the
schema just before the invalidation for those relations (in
ShareRowExclusiveLock mode). I am attaching the small patch for that
(alter_pub_for_schema.patch) where the change is made in the function
publication_add_schema in pg_publication.c. I am not sure if this is
the best place to make this change or if it is the right fix. It is
conceptually similar to the proposed change in OpenTableList but here
we are not just changing the lockmode but taking locks which were not
taken before. But with this change, the data loss errors went away in
my test script.

Another issue which persists with this change is the deadlock. Since
multiple table locks are acquired, the test script detects deadlock a
few times. Therefore I'm also attaching another modified script which
does a few retries in case of deadlock. The script is
crash-test-with-retries-for-schema.sh. It runs the following command
in a retry loop :

  ALTER PUBLICATION p ADD TABLES IN SCHEMA  public

If the command fails, it sleeps for a random amount of time (upper
bound by a MAXWAIT parameter) and then retries the command. If it
fails to run the command in the max number of retries, the final
return value from the script is DEADLOCK as we can't do a consistency
check in this scenario. Also attached is another script
run-with-deadlock-detection.sh which can run the above script for
multiple iterations.

I tried the test scripts with and without alter_pub_for_schema.patch.
Without the patch, I get the final output ERROR majority of the time
which means that the publication was altered successfully but the data
was lost on the subscriber. When I run it with the patch, I get a mix
of OK (no data loss) and DEADLOCK (the publication was not altered)
but no ERROR. I think by changing the parameters of sleep time and
number of retries we can get different fractions of OK and DEADLOCK.

I am not sure if this is the right or a clean way to fix the issue but
I think conceptually this might be 

Re: Built-in CTYPE provider

2024-07-11 Thread Noah Misch
On Tue, Jul 09, 2024 at 04:20:12PM -0700, Jeff Davis wrote:
> On Mon, 2024-07-08 at 18:05 -0700, Noah Misch wrote:
> > I'm thinking about these
> > aggravating factors for $SUBJECT:
> 
> This is still marked as an open item for 17, but you've already
> acknowledged[1] that no code changes are necessary in version 17.

Later posts on the thread made that obsolete.  The next step is to settle the
question at https://postgr.es/m/20240706195129...@rfd.leadboat.com.  If that
conclusion entails a remedy, v17 code changes may be part of that remedy.

> The idea that you're arguing against is "stability within a PG major
> version". There's no new discovery here: it was listed under the
> heading of "Benefits" near the top of my initial proposal[2]

Thanks for that distillation.  More specifically, I'm arguing against the lack
of choice about instability across major versions:

  | ICU collations| pg_c_utf8
--|---|--
Corruption within a major version | packager's choice | no
Corruption at pg_upgrade time | packager's choice | yes

I am a packager who chooses no-corruption (chooses stability).  As a packager,
the pg_c_utf8 stability within major versions is never a bad thing, but it
does not compensate for instability across major versions.  I don't want a
future in which pg_c_utf8 is the one provider that integrity-demanding
pg_upgrade users should not use.

> and known to all reviewers.

If after https://postgr.es/m/20240706195129...@rfd.leadboat.com and
https://postgr.es/m/20240709010545.8c.nmi...@google.com they think $SUBJECT
should continue as-committed, they should vote that way.  Currently, we have
multiple votes in one direction and multiple votes in the other direction.  If
all three reviewers were to vote in the same direction (assuming no other new
votes), I argue that such a count would render whichever way they vote as the
conclusion.  Does that match your count?

> [1]
> https://www.postgresql.org/message-id/20240701230352.2c.nmi...@google.com
> [2]
> https://www.postgresql.org/message-id/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.ca...@j-davis.com




Re: MAINTAIN privilege -- what do we need to un-revert it?

2024-07-11 Thread Noah Misch
On Tue, Jul 09, 2024 at 05:47:36PM -0700, Jeff Davis wrote:
> On Tue, 2024-07-09 at 15:20 +0900, Michael Paquier wrote:
> > On Sun, Jun 30, 2024 at 03:23:44PM -0700, Noah Misch wrote:

> > Hmm.  Is RestrictSearchPath() something that we should advertise more
> > strongly, thinking here about extensions that call NewGUCNestLevel()?
> > That would be really easy to miss, and it could have bad
> > consequences.
> > I know that this is not something that's published in the release
> > notes, but it looks like something sensible to have, though.
> 
> The pattern also involves SetUserIdAndSecContext(). Perhaps we could
> come up with a wrapper function to better encapsulate the general
> pattern?

Worth a look.  usercontext.c has an existing wrapper for a superuser process
switching to an untrusted user.  It could become the home for another wrapper
targeting MAINTAIN-relevant callers.

> > > While "not necessary for security", ExecCreateTableAs() should do
> > > it for the
> > > same reason it calls NewGUCNestLevel().
> > 
> > +1.
> 
> Do you have a suggestion about how that should be done?
> 
> It's not trivial, because the both creates the table and populates it
> in ExecutorRun. For table creation, we need to use the original
> search_path, but we need to use the restricted search_path when
> populating it.
> 
> I could try to refactor it into two statements and execute them
> separately, or I could try to rewrite the statement to use a fully-
> qualified destination table before execution. Thoughts?

Those sound fine.  Also fine: just adding a comment on why creation namespace
considerations led to not doing it there.




Re: Pluggable cumulative statistics

2024-07-11 Thread Bertrand Drouvot
Hi,

On Fri, Jul 05, 2024 at 09:35:19AM +0900, Michael Paquier wrote:
> On Thu, Jul 04, 2024 at 11:30:17AM +, Bertrand Drouvot wrote:
> > 
> > /*
> >  * Reads in existing statistics file into the shared stats hash.
> > 
> > This comment above pgstat_read_statsfile() is not correct, fixed stats
> > are not going to the hash (was there before your patch though).
> 
> Good catch.  Let's adjust that separately.

Please find attached a patch to do so (attached as .txt to not perturb the
cfbot).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 67fdfa3eeef3e089a7b99b97f47f6c1e64b501cf Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 11 Jul 2024 12:26:05 +
Subject: [PATCH v1] Fix comment on top of pgstat_read_statsfile()

The stats are read from the file and then loaded into the shared stats hash (
for non fixed amount stats) or into the fixed amount stats. Previous comment
was mentioning only the shared stats hash case.
---
 src/backend/utils/activity/pgstat.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/pgstat.c 
b/src/backend/utils/activity/pgstat.c
index c0ec9e8195..a7b8ecc1b6 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1495,10 +1495,11 @@ read_chunk(FILE *fpin, void *ptr, size_t len)
 #define read_chunk_s(fpin, ptr) read_chunk(fpin, ptr, sizeof(*ptr))
 
 /*
- * Reads in existing statistics file into the shared stats hash.
+ * Reads in existing statistics file into the shared stats hash (for non fixed
+ * amount stats) or into the fixed amount stats.
  *
- * This function is called in the only process that is accessing the shared
- * stats so locking is not required.
+ * This function is called in the only process that is accessing the stats
+ * so locking is not required.
  */
 static void
 pgstat_read_statsfile(void)
-- 
2.34.1



Re: speed up a logical replica setup

2024-07-11 Thread Euler Taveira
On Thu, Jul 11, 2024, at 8:22 AM, Zhijie Hou (Fujitsu) wrote:
> The analysis and suggestion look reasonable to me.
> Here is a small patch which does the same.

LGTM. However, I would add a comment above the INSERT you moved around to remind
you that a transaction cannot be added there because it will break this test.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Redundant syscache access in get_rel_sync_entry()

2024-07-11 Thread Ashutosh Bapat
On Thu, Jul 11, 2024 at 11:38 AM cca5507  wrote:

> Hi,
> in func get_rel_sync_entry() we access the same tuple in pg_class three
> times:
> Oid schemaId = get_rel_namespace(relid);
> bool am_partition = get_rel_relispartition(relid);
> char relkind = get_rel_relkind(relid);
> Why not just merge into one?
>

I think it's just convenient. We do that at multiple places; not exactly
these functions but functions which fetch relation attributes from cached
tuples. Given that the tuple is cached and local to the backend, it's not
too expensive.  But if there are multiple places which do something like
this, we may want to create more function get_rel_* function which return
multiple properties in one function call. I see get_rel_namspace() and
get_rel_name() called together at many places.

-- 
Best Wishes,
Ashutosh Bapat


Re: CFbot failed on Windows platform

2024-07-11 Thread Tom Lane
Tatsuo Ishii  writes:
>> I think It is related to the '628c1d1f2c' commit. This commit changed
>> the output of the regress test in Windows.

> Yeah, it seems that explains. I see few buildfarm window animals
> complain too.

I think that the contents of
src/test/regress/expected/collate.windows.win1252.out are actually
wrong, and we'd not noticed because it was only checked with diff -w.
psql does put an extra trailing space in some lines of table output,
but that space isn't there in collate.windows.win1252.out.

regards, tom lane




Re: CFbot failed on Windows platform

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 5:34 AM, Tatsuo Ishii wrote:

The differences are that the result has an extra space at the end of
line.  I am not familiar with Windows and has no idea why this could
happen (I haven't changed the patch set since May 24. The last
succeeded test was on July 9 14:58:44 (I am not sure about time zone).
Also I see exactly the same test failures in some other tests (for
example http://cfbot.cputube.org/highlights/all.html#4337)

Any idea?

I think It is related to the '628c1d1f2c' commit. This commit changed
the output of the regress test in Windows.

Yeah, it seems that explains. I see few buildfarm window animals
complain too.



I have partially reverted that patch. Thanks for the report.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: tests fail on windows with default git settings

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 7:29 AM, Andrew Dunstan wrote:


On 2024-07-11 Th 4:59 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 10 Jul 2024 at 17:04, Andrew Dunstan  
wrote:


On 2024-07-10 We 9:25 AM, Tom Lane wrote:

Dave Page  writes:
On Wed, 10 Jul 2024 at 12:12, Andrew Dunstan  
wrote:
As I was looking at this I wondered if there might be anywhere 
else that
needed adjustment. One thing that occurred to me was that that 
maybe we

should replace the use of "-w" in pg_regress.c with this rather less
dangerous flag, so instead of ignoring any white space difference 
we would
only ignore line end differences. The use of "-w" apparently 
dates back to

2009.

That seems like a good improvement to me.

+1




OK, done.

It looks like Postgres CI did not like this change. 'Windows - Server
2019, VS 2019 - Meson & ninja' [1] task started to fail after this
commit, there is one extra space at the end of line in regress test's
output.

[1] https://cirrus-ci.com/task/6753781205958656



Oh, that's annoying. Will investigate. Thanks for the heads up.





I have reverted the pg_regress.c portion of the patch. I will 
investigate non line-end differences on Windows further.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Redundant syscache access in get_rel_sync_entry()

2024-07-11 Thread cca5507
-- Original --
From:   
 "Ashutosh Bapat"   
 


Re: Restart pg_usleep when interrupted

2024-07-11 Thread Sami Imseih
I did a few tests with the patch and did not see any "large" drifts like theones observed above.Thanks for testing.I think it could be "simplified" by making use of instr_time instead of timespecfor current and absolute. Then I think it would be enough to compare theirticks.Correct I attached a v2 of this patch that uses instr_time to check the elapsedtime and break out of the loop. It needs some more benchmarking.Since sub-millisecond sleep times are not guaranteed as suggested bythe vacuum_cost_delay docs ( see below ), an alternative ideais to use clock_nanosleep for vacuum delay when it’s available, elsefallback to WaitLatch.Wouldn't that increase even more the cases where sub-millisecond won't beguaranteed?Yes, nanosleep is going to provide the most coverage as it’s widely available.Regards,Sami

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Re: Restart pg_usleep when interrupted

2024-07-11 Thread Sami Imseih

> 
> Correct I attached a v2 of this patch that uses instr_time to check the 
> elapsed
> time and break out of the loop. It needs some more benchmarking.

My email client has an issue sending attachments it seems. Reattaching 

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Regards,

Sami

Re: Restart pg_usleep when interrupted

2024-07-11 Thread Nathan Bossart
+   /*
+* We allow nanosleep to handle interrupts and retry with the 
remaining time.
+* However, since nanosleep is susceptible to time drift when 
interrupted
+* frequently, we add a safeguard to break out of the nanosleep 
whenever the
+* total time of the sleep exceeds the requested sleep time. 
Using nanosleep
+* is a more portable approach than clock_nanosleep.
+*/

I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check.  Also, I
think we need some commentary about why we want to retry after an interrupt
in this case.

-- 
nathan




Re: tests fail on windows with default git settings

2024-07-11 Thread Dave Page
On Wed, 10 Jul 2024 at 10:35, Dave Page  wrote:

> > The other failures I see are the following, which I'm just starting to
>>> dig
>>> > into:
>>> >
>>> >  26/298 postgresql:recovery / recovery/019_replslot_limit
>>> > ERROR43.05s   exit status 2
>>> >  44/298 postgresql:recovery / recovery/027_stream_regress
>>> > ERROR   383.08s   exit status 1
>>> >  50/298 postgresql:recovery / recovery/035_standby_logical_decoding
>>> > ERROR   138.06s   exit status 25
>>> >  68/298 postgresql:recovery / recovery/040_standby_failover_slots_sync
>>> >  ERROR   132.87s   exit status 25
>>> > 170/298 postgresql:pg_dump / pg_dump/002_pg_dump
>>> >  ERROR93.45s   exit status 2
>>> > 233/298 postgresql:bloom / bloom/001_wal
>>> >  ERROR54.47s   exit status 2
>>> > 236/298 postgresql:subscription / subscription/001_rep_changes
>>> >  ERROR46.46s   exit status 2
>>> > 246/298 postgresql:subscription / subscription/010_truncate
>>> > ERROR47.69s   exit status 2
>>> > 253/298 postgresql:subscription / subscription/013_partition
>>> >  ERROR   125.63s   exit status 25
>>> > 255/298 postgresql:subscription / subscription/022_twophase_cascade
>>> > ERROR58.13s   exit status 2
>>> > 257/298 postgresql:subscription / subscription/015_stream
>>> > ERROR   128.32s   exit status 2
>>> > 262/298 postgresql:subscription / subscription/028_row_filter
>>> > ERROR43.14s   exit status 2
>>> > 263/298 postgresql:subscription / subscription/027_nosuperuser
>>> >  ERROR   102.02s   exit status 2
>>> > 269/298 postgresql:subscription / subscription/031_column_list
>>> >  ERROR   123.16s   exit status 2
>>> > 271/298 postgresql:subscription / subscription/032_subscribe_use_index
>>> >  ERROR   139.33s   exit status 2
>>>
>>> Hm, it'd be good to see some of errors behind that ([1]).
>>>
>>> I suspect it might be related to conflicting ports. I had to use
>>> PG_TEST_USE_UNIX_SOCKETS to avoid random tests from failing:
>>>
>>>   # use unix socket to prevent port conflicts
>>>   $env:PG_TEST_USE_UNIX_SOCKETS = 1;
>>>   # otherwise pg_regress insists on creating the directory and
>>> does it
>>>   # in a non-existing place, this needs to be fixed :(
>>>   mkdir d:/sockets
>>>   $env:PG_REGRESS_SOCK_DIR = "d:/sockets/"
>>>
>>
>> No, it all seems to be fallout from GSSAPI being included in the build.
>> If I build without that, everything passes. Most of the tests are failing
>> with a "too many clients already" error, but a handful do seem to include
>> GSSAPI auth related errors as well. For example, this is from
>>
>
>
> ... this is from subscription/001_rep_changes:
>
> [14:46:57.723](2.318s) ok 11 - check rows on subscriber after table drop
> from publication
> connection error: 'psql: error: connection to server at "127.0.0.1", port
> 58059 failed: could not initiate GSSAPI security context: No credentials
> were supplied, or the credentials were unavailable or inaccessible:
> Credential cache is empty
> connection to server at "127.0.0.1", port 58059 failed: FATAL:  sorry, too
> many clients already'
> while running 'psql -XAtq -d port=58059 host=127.0.0.1 dbname='postgres'
> -f - -v ON_ERROR_STOP=1' at
> C:/Users/dpage/git/postgresql/src/test/perl/PostgreSQL/Test/Cluster.pm line
> 2129.
> # Postmaster PID for node "publisher" is 14488
> ### Stopping node "publisher" using mode immediate
> # Running: pg_ctl -D
> C:\Users\dpage\git\postgresql\build/testrun/subscription/001_rep_changes\data/t_001_rep_changes_publisher_data/pgdata
> -m immediate stop
> waiting for server to shut down done
> server stopped
> # No postmaster PID for node "publisher"
> # Postmaster PID for node "subscriber" is 15012
> ### Stopping node "subscriber" using mode immediate
> # Running: pg_ctl -D
> C:\Users\dpage\git\postgresql\build/testrun/subscription/001_rep_changes\data/t_001_rep_changes_subscriber_data/pgdata
> -m immediate stop
> waiting for server to shut down done
> server stopped
> # No postmaster PID for node "subscriber"
> [14:46:59.068](1.346s) # Tests were run but no plan was declared and
> done_testing() was not seen.
> [14:46:59.069](0.000s) # Looks like your test exited with 2 just after 11.
>

So I received an off-list tip to checkout [1], a discussion around GSSAPI
causing test failures on windows that Alexander Lakhin was looking at.
Thomas Munro's v2 patch to try to address the issue brought me down to just
a single test failure with GSSAPI enabled on 17b2 (with a second, simple
fix for the OpenSSL/Kerberos/x509 issue): pg_dump/002_pg_dump. The
relevant section from the log looks like this:

[15:28:42.692](0.006s) not ok 2 - connecting to a non-existent database:
matches
[15

Re: MIN/MAX functions for a record

2024-07-11 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jul 08, 2024 at 12:20:30PM +0300, Aleksander Alekseev wrote:
> -and arrays of any of these types.
> +and also arrays and records of any of these types.

> This update of the docs is incorrect, no?  Records could include much
> more types than the ones currently supported for min()/max().

Yeah, actually the contained data types could be anything with
btree sort support.  This is true for arrays too, so the text
was wrong already.  I changed it to

+and also arrays and composite types containing sortable data types.

(Using "composite type" not "record" is a judgment call here, but
I don't see anyplace else in func.sgml preferring "record" for this
meaning.)

> I am not sure to get the concerns of upthread regarding the type
> caching in the context of an aggregate, which is the business with
> lookup_type_cache(), especially since there is a btree operator
> relying on record_cmp().  Tom, what were your concerns here?

Re-reading, I was just mentioning that as something to check,
not a major problem.  It isn't, because array min/max are already
relying on the ability to use fcinfo->flinfo->fn_extra as cache space
in an aggregate.  (Indeed, the array aggregate code is almost
identical to where we ended up.)

AFAICS this is good to go.  I made a couple of tiny cosmetic
adjustments, added a catversion bump, and pushed.

regards, tom lane




Re: Logging which local address was connected to in log_line_prefix

2024-07-11 Thread Greg Sabino Mullane
Thanks for the review. Please find attached a new version with proper tabs
and indenting.

Cheers,
Greg


0002-Add-local-address-to-log_line_prefix.patch
Description: Binary data


Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal

2024-07-11 Thread Robert Haas
On Thu, Jul 11, 2024 at 6:51 AM Fujii Masao  wrote:
> It looks like the fast_forward field in WalSummarizerData is no longer 
> necessary.
>
> So far, I haven't found any other issues with the patch.

Thanks for reviewing. Regarding fast_forward, I think I had the idea
in mind that perhaps it should be exposed by
pg_get_wal_summarizer_state(), but I didn't actually implement that.
Thinking about it again, I think maybe it's fine to just remove it
from the shared memory state, as this should be a rare scenario in
practice. What is your opinion?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC, WIP: OR-clause support for indexes

2024-07-11 Thread Alena Rybakina

Hi, again!

I have finished patch and processed almost your suggestions (from [0], 
[1], [2]). It remainsonlyto addtestswherethe conversionshouldwork,butI 
willaddthis inthe nextversion.


[0] 
https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro


[1] 
https://www.postgresql.org/message-id/9736220.CDJkKcVGEf%40thinkpad-pgpro


[2] 
https://www.postgresql.org/message-id/2193851.QkHrqEjB74%40thinkpad-pgpro


On 09.07.2024 04:57, Alena Rybakina wrote:


Hi! Thank you for your review! Sorryforthe delayin responding.

Irewrotethe patchasyourequested,butnowI'm facedwiththe problemof 
processingthe elementsof the or_entries list.For somereason, 
thepointerto thelistis cleared and I couldn't find the place where it 
happened.MaybeI'mmissingsomethingsimpleinviewof the heavyworkloadright 
now,butmaybeyou'll seea problem?Ihave displayedpart of stackbelow.


#5 0x5b0f6d9f6a6a in ExceptionalCondition 
(conditionName=0x5b0f6dbb74f7 "IsPointerList(list)", 
fileName=0x5b0f6dbb7418 "list.c", lineNumber=341) at assert.c:66 #6 
0x5b0f6d5dc3ba in lappend (list=0x5b0f6eec5ca0, 
datum=0x5b0f6eec0d90) at list.c:341 #7 0x5b0f6d69230c in 
transform_or_to_any (root=0x5b0f6eeb13c8, orlist=0x5b0f6eec57c0) at 
initsplan.c:2818 #8 0x5b0f6d692958 in add_base_clause_to_rel 
(root=0x5b0f6eeb13c8, relid=1, restrictinfo=0x5b0f6eec5990) at 
initsplan.c:2982 #9 0x5b0f6d692e5f in 
distribute_restrictinfo_to_rels (root=0x5b0f6eeb13c8, 
restrictinfo=0x5b0f6eec5990) at initsplan.c:3175 #10 
0x5b0f6d691bf2 in distribute_qual_to_rels (root=0x5b0f6eeb13c8, 
clause=0x5b0f6eec0fc0, jtitem=0x5b0f6eec4330, sjinfo=0x0, 
security_level=0, qualscope=0x5b0f6eec4730, ojscope=0x0, 
outerjoin_nonnullable=0x0, incompatible_relids=0x0, 
allow_equivalence=true, has_clone=false, is_clone=false, 
postponed_oj_qual_list=0x0) at initsplan.c:2576 #11 0x5b0f6d69146f 
in distribute_quals_to_rels (root=0x5b0f6eeb13c8, 
clauses=0x5b0f6eec0bb0, jtitem=0x5b0f6eec4330, sjinfo=0x0, 
security_level=0, qualscope=0x5b0f6eec4730, ojscope=0x0, 
outerjoin_nonnullable=0x0, incompatible_relids=0x0, 
allow_equivalence=true, has_clone=false, is_clone=false, 
postponed_oj_qual_list=0x0) at initsplan.c:2144


Thisis stillthe firstiterationof the fixesyouhave proposed,soI have 
attachedthe patchindiffformat.I rewroteit,asyousuggestedinthe 
firstletter[0].Icreateda separatefunctionthattriesto forman 
OrClauseGroup node,butifit failsinthis, it returnsfalse,otherwiseit 
processesthe generatedelementaccordingtowhat it found-eitheraddsit to 
thelistasnew,oraddsa constantto anexistingone.


Ialsodividedonegenerallistof 
suitableforconversionandunsuitableintotwodifferentones:appropriate_entriesandor_entries.Nowweare 
onlylookinginthe listof suitableelementstoformANYexpr.


Thishelpsusto get ridofrepetitionsinthe codeyoumentioned. 
Pleasewriteifthisis notthelogicthatyouhave seenbefore.


[0] 
https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro


The errorwascausedby the specificsof storingthe "OR"clausesinthe 
RestrictInfostructure.Scanning the orclauses list of the RestrictInfo 
variable, wecouldfacenotonlytheitem with RestrictInfo 
type,butalsotheBoolExpr type.


For example, when we have both or clauses and "AND" clauses together, 
like x = 1 and (y =1 or y=2 or y=3 and z = 1). The structure looks like:


RestrictInfo->orclauses = [RestrictInfo [x=1],
RestrictInfo->orclauses = [RestrictInfo[y=1],
                    RestrictInfo [y=2],
                    BoolExpr = [Restrictinfo [y=3], RestrictInfo [z=1]
                   ]
                                     ]

It'sworkingfinenow.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
From 26eca98229749b20ad0c82a9fa55f4f37fd34d29 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 11 Jul 2024 19:01:10 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
during adding restrictinfo's to the base relation.

Here Cn is a n-th constant or parameters expression, 'expr' is non-constant
expression, 'op' is an operator which returns boolean result and has a commuter
(for the case of reverse order of constant and non-constant parts of the
expression, like 'Cn op expr').

Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina 
Author: Andrey Lepikhov 
Reviewed-by: Peter Geoghegan 
Reviewed-by: Ranier Vilela 
Reviewed-by: Alexander Korotkov 
Reviewed-by: Robert Haas 
Reviewed-by: Jian He 
Reviewed-by: Tom Lane 
Reviewed-by: Nikolay Shaplov 
---
 src/backend/optimizer/plan/initsplan.c| 335 ++
 src/include/nodes/pathnodes.h |  31 ++
 src/test/regress/expected/create_index.out|  32 +-
 src/test/regress/expected/partition_prune.out |  46 +--
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/tidscan.out |   6 +-
 src/tools

Re: CFbot failed on Windows platform

2024-07-11 Thread Andrew Dunstan



On 2024-07-11 Th 9:50 AM, Tom Lane wrote:

Tatsuo Ishii  writes:

I think It is related to the '628c1d1f2c' commit. This commit changed
the output of the regress test in Windows.

Yeah, it seems that explains. I see few buildfarm window animals
complain too.

I think that the contents of
src/test/regress/expected/collate.windows.win1252.out are actually
wrong, and we'd not noticed because it was only checked with diff -w.
psql does put an extra trailing space in some lines of table output,
but that space isn't there in collate.windows.win1252.out.





Yeah, makes sense I will produce an updated output file and then 
re-enable --strip-trailing-cr (after a full test run).



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: POC, WIP: OR-clause support for indexes

2024-07-11 Thread Alena Rybakina
Sorry for repeating, but I have noticed that this message displays 
incorrectly and just in case I'll duplicate it.


On 11.07.2024 19:17, Alena Rybakina wrote:
The errorwascausedby the specificsof storingthe "OR"clausesinthe 
RestrictInfostructure.Scanning the orclauses list of the RestrictInfo 
variable, wecouldfacenotonlytheitem with RestrictInfo 
type,butalsotheBoolExpr type.


For example, when we have both or clauses and "AND" clauses together, 
like x = 1 and (y =1 or y=2 or y=3 and z = 1). The structure looks like:


RestrictInfo->orclauses = [RestrictInfo [x=1],
RestrictInfo->orclauses = [RestrictInfo[y=1],
                    RestrictInfo [y=2],
                    BoolExpr = [Restrictinfo [y=3], RestrictInfo [z=1]
                   ]
                                     ]

It'sworkingfinenow.

The error was caused by the specifics of storing the "OR" clauses in the 
RestrictInfo structure. When viewing the list of or offers, we could 
encounter not only the RestrictInfo type, but also the BoolExpr type. 
It's working fine now.


For example, when we have both or clauses and "AND" clauses together, 
like x = 1 and (y =1 or y=2 or y=3 and z = 1). The structure looks like:


RestrictInfo->orclauses = [RestrictInfo [x=1],
RestrictInfo->orclauses = [RestrictInfo[y=1],
RestrictInfo [y=2],
BoolExpr = [Restrictinfo [y=3], RestrictInfo [z=1]
]
 ]

It's working fine now.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: speed up a logical replica setup

2024-07-11 Thread Alexander Lakhin

Hello Amit and Hou-San,

11.07.2024 13:21, Amit Kapila wrote:


We don't wait for the xmin to catch up corresponding to this insert
and I don't know if there is a way to do that. So, we should move this
Insert to after the call to pg_sync_replication_slots(). It won't
impact the general test of pg_createsubscriber.

Thanks to Hou-San for helping me in the analysis of this BF failure.


Thank you for investigating that issue!

May I ask you to look at another failure of the test occurred today [1]?
The failure log contains:
recovery_target_lsn = '0/30098D0'
pg_createsubscriber: starting the subscriber
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
2024-07-11 07:40:10.837 UTC [2948531][postmaster][:0] LOG:  received fast 
shutdown request

Though what I'm seeing after a successful run is:
recovery_target_lsn = '0/3009860'
pg_createsubscriber: starting the subscriber
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
2024-07-11 15:19:40.733 UTC [207517] 040_pg_createsubscriber.pl LOG:  
statement: SELECT pg_catalog.pg_is_in_recovery()
2024-07-11 15:19:41.635 UTC [207514] LOG:  recovery stopping after WAL location (LSN) 
"0/3009860"
2024-07-11 15:19:41.635 UTC [207514] LOG:  redo done at 0/3009860 system usage: CPU: user: 0.00 s, system: 0.00 s, 
elapsed: 21.00 s


I've managed to reproduce the failure locally. With the bgwriter mod:
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 150

and wal_debug=on, when running 5 test instances with parallel, I get the
failure with the following log:
recovery_target_lsn = '0/3009A20'
pg_createsubscriber: starting the subscriber

2024-07-11 14:40:04.551 UTC [205589:72][startup][33/0:0] LOG:  REDO @ 0/30099E8; LSN 0/3009A20: prev 0/30099B0; xid 0; 
len 24 - Standby/RUNNING_XACTS: nextXid 747 latestCompletedXid 746 oldestRunningXid 747

# ^^^ the last REDO record in the log
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
pg_createsubscriber: server was stopped
pg_createsubscriber: error: recovery timed out
...
[14:43:06.011](181.800s) not ok 30 - run pg_createsubscriber on node S
[14:43:06.012](0.000s)
[14:43:06.012](0.000s) #   Failed test 'run pg_createsubscriber on node S'
#   at t/040_pg_createsubscriber.pl line 356.

$ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata/pg_wal/ 
00010003 00010003 | tail -2
rmgr: Standby len (rec/tot): 50/    50, tx:  0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747
rmgr: Standby len (rec/tot): 50/    50, tx:  0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747


Whilst
$ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_p_data/pgdata/pg_wal/ 
00010003 00010003 | tail
rmgr: Standby len (rec/tot): 50/    50, tx:  0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747
rmgr: Standby len (rec/tot): 50/    50, tx:  0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS 
nextXid 747 latestCompletedXid 746 oldestRunningXid 747
rmgr: Heap2   len (rec/tot): 60/    60, tx:    747, lsn: 0/03009A20, prev 0/030099E8, desc: NEW_CID rel: 
1663/16384/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295
rmgr: Heap    len (rec/tot): 54/    54, tx:    747, lsn: 0/03009A60, prev 0/03009A20, desc: DELETE xmax: 
747, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16384/6104 blk 0
rmgr: Transaction len (rec/tot): 78/    78, tx:    747, lsn: 0/03009A98, prev 0/03009A60, desc: INVALIDATION ; 
inval msgs: catcache 49 catcache 46 relcache 0
rmgr: Transaction len (rec/tot): 98/    98, tx:    747, lsn: 0/03009AE8, prev 0/03009A98, desc: COMMIT 
2024-07-11 14:43:05.994561 UTC; relcache init file inval dbid 16384 tsid 1663; inval msgs: catcache 49 catcache 46 
relcache 0
rmgr: Heap2   len (rec/tot): 60/    60, tx:    748, lsn: 0/03009B50, prev 0/03009AE8, desc: NEW_CID rel: 
1663/16385/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295
rmgr: Heap    len (rec/tot): 54/    54, tx:    748, lsn: 0/03009B90, prev 0/03009B50, desc: DELETE xmax: 
748, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16385/6104 blk 0
rmgr: Transaction len (rec/tot): 78/    78, tx:    748, lsn: 0/03009BC8, prev 0/03009B90, desc: INVALIDATION ; 
inval msgs: catcache 49 catcache 46 relcache 0
rmgr: Transaction len (rec/tot): 98/    98, tx:    748, lsn: 0/03009C18, prev 0/03009BC8, desc: COMMIT 
2024-07-11 14:43:06.008619

Re: Improve the connection failure error messages

2024-07-11 Thread Tom Lane
Nisha Moond  writes:
> Attached the patch v6 with suggested improvements.
> - Removed unnecessary translator comments.
> - Added appropriate identifier names where missing.

I think this is generally OK, although one could argue that it's
violating our message style guideline that primary error messages
should be short [1].  The text itself isn't that bad, but once you
tack on a libpq connection failure message it's hard to claim that
the result "fits on one line".

Another way we could address this that'd reduce that problem is to
leave the primary messages as-is and add an errdetail or errcontext
message to show what's throwing the error.  However, I'm not convinced
that's better.  The main argument against it is that detail/context
lines can get lost, eg if you're running the server in terse logging
mode.  

On balance I think it's OK, so I pushed it.  I did take out a couple
of uses of "logical replication" that seemed unnecessary and were
making the length problem worse.

regards, tom lane

[1] 
https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-WHAT-GOES-WHERE




Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal

2024-07-11 Thread Robert Haas
On Thu, Jul 11, 2024 at 6:51 AM Fujii Masao  wrote:
> So far, I haven't found any other issues with the patch.

Here is a new version that removes the hunks you highlighted and also
adds a test case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v4-0001-Do-not-summarize-WAL-if-generated-with-wal_level-.patch
Description: Binary data


Re: Restart pg_usleep when interrupted

2024-07-11 Thread Sami Imseih


> 
> I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
> the bottom of the while loop to avoid needing this extra check.  

Can you elaborate further? I am not sure how this will work since delay is a 
timespec 
and elapsed time is an instr_time. 

Also, in every iteration of the loop, the delay must be set to the remaining 
time. The
purpose of the elapsed_time is to make sure that we don’t surpass requested time
delay as an additional safeguard.

> Also, I
> think we need some commentary about why we want to retry after an interrupt
> in this case.

I will elaborate further in the comments for the next revision.


Regards,

Sami 





Re: Restart pg_usleep when interrupted

2024-07-11 Thread Nathan Bossart
On Thu, Jul 11, 2024 at 01:10:25PM -0500, Sami Imseih wrote:
>> I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
>> the bottom of the while loop to avoid needing this extra check.  
> 
> Can you elaborate further? I am not sure how this will work since delay is a 
> timespec 
> and elapsed time is an instr_time. 
> 
> Also, in every iteration of the loop, the delay must be set to the remaining 
> time. The
> purpose of the elapsed_time is to make sure that we don´t surpass requested 
> time
> delay as an additional safeguard.

I'm imagining something like this:

struct timespec delay;
TimestampTz end_time;

end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

do
{
longsecs;
int microsecs;

TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, µsecs);

delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;

} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

-- 
nathan




Re: Parent/child context relation in pg_get_backend_memory_contexts()

2024-07-11 Thread Robert Haas
On Wed, Jul 10, 2024 at 9:16 PM David Rowley  wrote:
> Melih and I talked about this in a meeting yesterday evening. I think
> I'm about on the fence about having the IDs in leaf-to-root or
> root-to-leaf.  My main concern about which order is chosen is around
> how easy it is to write hierarchical queries. I think I'd feel better
> about having it in root-to-leaf order if "level" was 1-based rather
> than 0-based. That would allow querying CacheMemoryContext and all of
> its descendants with:
>
> WITH c AS (SELECT * FROM pg_backend_memory_contexts)
> SELECT c1.*
> FROM c c1, c c2
> WHERE c2.name = 'CacheMemoryContext'
> AND c1.path[c2.level] = c2.path[c2.level];

I don't object to making it 1-based.

> Ideally, no CTE would be needed here, but unfortunately, there's no
> way to know the CacheMemoryContext's ID beforehand.  We could make the
> ID more stable if we did a breadth-first traversal of the context.
> i.e., assign IDs in level order.  This would stop TopMemoryContext's
> 2nd child getting a different ID if its first child became a parent
> itself.

Do we ever have contexts with the same name at the same level? Could
we just make the path an array of strings, so that you could then say
something like this...

SELECT * FROM pg_backend_memory_contexts where path[2] = 'CacheMemoryContext'

...and get all the things with that in the path?

> select * from pg_backend_memory_contexts;
> -- Observe that CacheMemoryContext has ID=22 and level=2. Get the
> total of that and all of its descendants.
> select sum(total_bytes) from pg_backend_memory_contexts where path[2] = 22;
> -- or just it and direct children
> select sum(total_bytes) from pg_backend_memory_contexts where path[2]
> = 22 and level <= 3;

I'm doubtful about this because nothing prevents the set of memory
contexts from changing between one query and the next. We should try
to make it so that it's easy to get what you want in a single query.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Converting tab-complete.c's else-if chain to a switch

2024-07-11 Thread Tom Lane
Per discussion elsewhere [1], I've been looking at $SUBJECT.
I have a very crude Perl hack (too ugly to show yet ;-)) that
produces output along the lines of


Existing code:

/* CREATE STATISTICS  */
else if (Matches("CREATE", "STATISTICS", MatchAny))
COMPLETE_WITH("(", "ON");
else if (Matches("CREATE", "STATISTICS", MatchAny, "("))
COMPLETE_WITH("ndistinct", "dependencies", "mcv");
else if (Matches("CREATE", "STATISTICS", MatchAny, "(*)"))
COMPLETE_WITH("ON");
else if (HeadMatches("CREATE", "STATISTICS", MatchAny) &&
 TailMatches("FROM"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);


Generated tables:

enum TCPatternID
{
...
M_CREATE_STATISTICS_ANY,
M_CREATE_STATISTICS_ANY_LPAREN,
M_CREATE_STATISTICS_ANY_PARENSTARPAREN,
HM_CREATE_STATISTICS_ANY,
...
};

enum TCPatternKind
{
Match,
MatchCS,
HeadMatch,
HeadMatchCS,
TailMatch,
TailMatchCS,
};

typedef struct
{
enum TCPatternID id;
enum TCPatternKind kind;
int nwords;
const char *const *words;
} TCPattern;

#define TCPAT(id, kind, ...) \
{ (id), (kind), VA_ARGS_NARGS(__VA_ARGS__), \
  (const char * const []) { __VA_ARGS__ } }

static const TCPattern tcpatterns[] =
{
...
TCPAT(M_CREATE_STATISTICS_ANY,
  Match, "CREATE", "STATISTICS", MatchAny),
TCPAT(M_CREATE_STATISTICS_ANY_LPAREN,
  Match, "CREATE", "STATISTICS", MatchAny, "("),
TCPAT(M_CREATE_STATISTICS_ANY_PARENSTARPAREN,
  Match, "CREATE", "STATISTICS", MatchAny, "(*)"),
TCPAT(HM_CREATE_STATISTICS_ANY,
  HeadMatch, "CREATE", "STATISTICS", MatchAny),
...
};


Converted code:

/* CREATE STATISTICS  */
case M_CREATE_STATISTICS_ANY:
COMPLETE_WITH("(", "ON");
break;
case M_CREATE_STATISTICS_ANY_LPAREN:
COMPLETE_WITH("ndistinct", "dependencies", "mcv");
break;
case M_CREATE_STATISTICS_ANY_PARENSTARPAREN:
COMPLETE_WITH("ON");
break;
case HM_CREATE_STATISTICS_ANY:
if (TailMatches("FROM"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
break;


The idea (I've not actually written this part yet) is that an
outer loop would iterate through the table entries and invoke
the appropriate switch case for any successful match.  As soon
as the switch code sets "matches" non-NULL (it might not, as in
the last case in the example), we can exit.

While this clearly can be made to work, it doesn't seem like the
result will be very maintainable.  You have to invent a single-use
enum label, and the actual definition of the primary match pattern
is far away from the code using it, and we have two fundamentally
different ways of writing the same pattern test (since we'll still
need some instances of direct calls to TailMatches and friends,
as in the last example case).

What I'm thinking about doing instead of pursuing this exact
implementation idea is that we should create a preprocessor
to deal with building the table.  I'm envisioning that the
new source code will look like


/* CREATE STATISTICS  */
case Matches("CREATE", "STATISTICS", MatchAny):
COMPLETE_WITH("(", "ON");
break;
case Matches("CREATE", "STATISTICS", MatchAny, "("):
COMPLETE_WITH("ndistinct", "dependencies", "mcv");
break;
case Matches("CREATE", "STATISTICS", MatchAny, "(*)"):
COMPLETE_WITH("ON");
break;
case HeadMatches("CREATE", "STATISTICS", MatchAny):
if (TailMatches("FROM"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
break;


Of course this isn't legal C, since the case labels are not
constant expressions.  The preprocessing script would look
for case labels that look like this, generate the appropriate
table entries, and replace the case labels with mechanically-
generated ID codes that don't need to be particularly human
readable.  On the downside, YA preprocessor and mini-language
isn't much fun; but at least this is a lot closer to real C
than some of the other things we've invented.

(Further down the line, we can look into improvements such as
avoiding duplicate string comparisons; but that can be done behind
the scenes, without messing with this source-code notation.)

Does anyone particularly hate this plan, or have a better idea?

BTW, we have quite a few instances of code like the aforementioned

else if (HeadMatches("CREATE", "STATISTICS", MatchAny) &&
 TailMatches("FROM"))

I'm thinking we should invent a Matches() option "MatchAnyN",
which could appear at most once and would represent an automatic
match to zero or more words appearing between the head part and
the tail part.  Then this could be transformed to

else if (Matches("CREATE", "STATISTICS", MatchAny, MatchAnyN, "FROM"))

the advantage being that more of the pattern can be embedded in the
table and we need fewer bits of ad-hoc logic.  Maybe this'd be worth
doing eve

Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-07-11 Thread Ilya Gladyshev
It is broken in master, I just didn’t want to create a separate thread, but it 
can be fixed independently. As I remember, the problem is that progress is 
tracked for each table in the hierarchy as if the table is processed 
separately, without ever setting partitions_total and partitions_done counters.

> 11 июля 2024 г., в 13:31, Justin Pryzby  написал(а):
> 
> On Sat, Jun 15, 2024 at 07:56:38PM +0100, Ilya Gladyshev wrote:
>> In addition, I noticed that progress tracking is once again broken for
>> partitioned tables, while looking at REINDEX implementation, attaching the
>> second patch to fix it.
> 
> Thanks for the fixes, I started reviewing them but need some more time
> to digest.
> 
> Do you mean that progress reporting is broken in master, for REINDEX, or
> just with this patch ?
> 
> -- 
> Justin





Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal

2024-07-11 Thread Nathan Bossart
On Wed, Jul 10, 2024 at 01:41:41PM -0500, Nathan Bossart wrote:
> On Wed, Jul 10, 2024 at 11:11:13AM -0400, Tom Lane wrote:
>> We went through a ton of permutations of that kind of
>> idea years ago, when it first became totally clear that cross-checks
>> between GUCs do not work nicely if implemented in check_hooks.
>> (You can find all the things we tried in the commit log, although
>> I don't recall exactly when.)
> 
> Do you remember the general timeframe or any of the GUCs involved?  I spent
> some time searching through the commit log and mailing lists, but I've thus
> far only found allusions to past bad experiences with such hooks.

Could it be the effective_cache_size work from 2013-2014?


https://www.postgresql.org/message-id/flat/CAHyXU0weDECnab1pypNum-dWGwjso_XMTY8-NvvzRphzM2Hv5A%40mail.gmail.com

https://www.postgresql.org/message-id/flat/CAMkU%3D1zTMNZsnUV6L7aMvfJZfzjKbzAtuML3N35wyYaia9MJAw%40mail.gmail.com

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ee1e5662d8d8330726eaef7d3110cb7add24d058

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2850896961994aa0993b9e2ed79a209750181b8a

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=af930e606a3217db3909029c6c3f8d003ba70920

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a16d421ca4fc639929bc964b2585e8382cf16e33;hp=08c8e8962f56c23c6799178d52d3b31350a0708f

-- 
nathan




Re: CFbot failed on Windows platform

2024-07-11 Thread Tatsuo Ishii
> On 2024-07-11 Th 5:34 AM, Tatsuo Ishii wrote:
 The differences are that the result has an extra space at the end of
 line.  I am not familiar with Windows and has no idea why this could
 happen (I haven't changed the patch set since May 24. The last
 succeeded test was on July 9 14:58:44 (I am not sure about time zone).
 Also I see exactly the same test failures in some other tests (for
 example http://cfbot.cputube.org/highlights/all.html#4337)

 Any idea?
>>> I think It is related to the '628c1d1f2c' commit. This commit changed
>>> the output of the regress test in Windows.
>> Yeah, it seems that explains. I see few buildfarm window animals
>> complain too.
>>
> 
> I have partially reverted that patch. Thanks for the report.

Thank you for taking care of this!
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Add support to TLS 1.3 cipher suites and curves lists

2024-07-11 Thread Peter Eisentraut

On 03.07.24 17:20, Daniel Gustafsson wrote:

After fiddling a bit with the code and documentation I came up with the
attached version which also makes the testsuite use the list syntax in order to
test it.  It's essentially just polish and adding comments with the functional
changes that a) it parses the entire list of curves so all errors can be
reported instead of giving up at the first error; b) leaving the cipher suite
GUC blank will set the suites to the OpenSSL default vale.


It would be worth checking the discussion at 
 
about strtok()/strtok_r() issues.  First, for list parsing, it sometimes 
gives the wrong semantics, which I think might apply here.  Maybe it's 
worth comparing this with the semantics that OpenSSL provides natively. 
And second, strtok_r() is not available on Windows without the 
workaround provided in that thread.


I'm doubtful that it's worth replicating all this list parsing logic 
instead of just letting OpenSSL do it.  This is a very marginal feature 
after all.






Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-07-11 Thread Peter Eisentraut

On 07.05.24 11:36, Daniel Gustafsson wrote:

Yeah, that depends on how much version you expect your application to
work on.  Still it seems to me that there's value in mentioning that
if your application does not care about anything older than OpenSSL
1.1.0, like PG 18 assuming that this patch is merged, then these calls
are pointless for HEAD.  The routine definitions would be around only
for the .so compatibility.


Fair enough.  I've taken a stab at documenting that the functions are
deprecated, while at the same time documenting when and how they can be used
(and be useful).  The attached also removes one additional comment in the
testcode which is now obsolete (since removing 1.0.1 support really), and fixes
the spurious whitespace you detected upthread.


The 0001 patch removes the functions pgtls_init_library() and 
pgtls_init() but keeps the declarations in libpq-int.h.  This should be 
fixed.






Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal

2024-07-11 Thread Tom Lane
Nathan Bossart  writes:
> On Wed, Jul 10, 2024 at 01:41:41PM -0500, Nathan Bossart wrote:
>> On Wed, Jul 10, 2024 at 11:11:13AM -0400, Tom Lane wrote:
>>> We went through a ton of permutations of that kind of
>>> idea years ago, when it first became totally clear that cross-checks
>>> between GUCs do not work nicely if implemented in check_hooks.

>> Do you remember the general timeframe or any of the GUCs involved?  I spent
>> some time searching through the commit log and mailing lists, but I've thus
>> far only found allusions to past bad experiences with such hooks.

> Could it be the effective_cache_size work from 2013-2014?

Yeah, that's what I was remembering.  It looks like my memory was
slightly faulty, in that what ee1e5662d tried to do was make the
default value of one GUC depend on the actual value of another one,
not implement a consistency check per se.  But the underlying problem
is the same: a check_hook can't assume it is seeing the appropriate
value of some other GUC, since a change of that one may be pending.

regards, tom lane




Re: Optimize WindowAgg's use of tuplestores

2024-07-11 Thread Ranier Vilela
Em qui., 11 de jul. de 2024 às 09:09, David Rowley 
escreveu:

> On Wed, 10 Jul 2024 at 02:42, Ashutosh Bapat
>  wrote:
> > Observations
> > 1. The numbers corresponding to 10 and 100 partitions are higher when
> > patched. That might be just noise. I don't see any reason why it would
> > impact negatively when there are a small number of partitions. The
> > lower partition cases also have a higher number of rows per partition,
> > so is the difference between MemoryContextDelete() vs
> > MemoryContextReset() making any difference here. May be worth
> > verifying those cases carefully. Otherwise upto 1000 partitions, it
> > doesn't show any differences.
>
> I think this might just be noise as a result of rearranging code. In
> terms of C code, I don't see any reason for it to be slower.  If you
> look at GenerationDelete() (as what is getting called from
> MemoryContextDelete()), it just calls GenerationReset(). So resetting
> is going to always be less work than deleting the context, especially
> given we don't need to create the context again when we reset it.
>
> I wrote the attached script to see if I can also see the slowdown and
> I do see the patched code come out slightly slower (within noise
> levels) in lower partition counts.
>
> To get my compiler to produce code in a more optimal order for the
> common case, I added unlikely() to the "if (winstate->all_first)"
> condition.  This is only evaluated on the first time after a rescan,
> so putting that code at the end of the function makes more sense.  The
> attached v2 patch has it this way.

Not that it's going to make any difference,
but since you're going to touch this code, why not?

Function *begin_partition*:
1. You can reduce the scope for variable *outerPlan*,
it is not needed anywhere else.
+ /*
+ * If this is the very first partition, we need to fetch the first input
+ * row to store in first_part_slot.
+ */
+ if (TupIsNull(winstate->first_part_slot))
+ {
+ PlanState *outerPlan = outerPlanState(winstate);
+ TupleTableSlot *outerslot = ExecProcNode(outerPlan);

2. There is once reference to variable numFuncs
You can reduce the scope.

+ /* reset mark and seek positions for each real window function */
+ for (int i = 0; i < winstate->numfuncs; i++)

Function *prepare_tuplestore. *
1. There is once reference to variable numFuncs
You can reduce the scope.
  /* create mark and read pointers for each real window function */
- for (i = 0; i < numfuncs; i++)
+ for (int i = 0; i < winstate->numfuncs; i++)

2. You can securely initialize the default value for variable
*winstate->grouptail_ptr*
in *else* part.

  if ((frameOptions & (FRAMEOPTION_EXCLUDE_GROUP |
  FRAMEOPTION_EXCLUDE_TIES)) &&
  node->ordNumCols != 0)
  winstate->grouptail_ptr =
  tuplestore_alloc_read_pointer(winstate->buffer, 0);
  }
+ else
+ winstate->grouptail_ptr = -1;

best regards,
Ranier Vilela


Re: Eager aggregation, take 3

2024-07-11 Thread Paul George
Hey Richard,

Looking more closely at this example

>select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by t2.a
having t2.a is null;

I wonder if the inability to exploit eager aggregation is more based on the
fact that COUNT(*) cannot be decomposed into an aggregation of PARTIAL
COUNT(*)s (apologies if my terminology is off/made up...I'm new to the
codebase). In other words, is it the case that a given aggregate function
already has built-in protection against the error case you correctly
pointed out?

To highlight this, in the simple example below we don't see aggregate
pushdown even with an INNER JOIN when the agg function is COUNT(*) but we
do when it's COUNT(t2.*):

-- same setup
drop table if exists t;
create table t(a int, b int, c int);
insert into t select i % 100, i % 10, i from generate_series(1, 1000) i;
analyze t;

-- query 1: COUNT(*) --> no pushdown

set enable_eager_aggregate=on;
explain (verbose, costs off) select t1.a, count(*) from t t1 join t t2 on
t1.a=t2.a group by t1.a;

QUERY PLAN
---
 HashAggregate
   Output: t1.a, count(*)
   Group Key: t1.a
   ->  Hash Join
 Output: t1.a
 Hash Cond: (t1.a = t2.a)
 ->  Seq Scan on public.t t1
   Output: t1.a, t1.b, t1.c
 ->  Hash
   Output: t2.a
   ->  Seq Scan on public.t t2
 Output: t2.a
(12 rows)


-- query 2: COUNT(t2.*) --> agg pushdown

set enable_eager_aggregate=on;
explain (verbose, costs off) select t1.a, count(t2.*) from t t1 join t t2
on t1.a=t2.a group by t1.a;

  QUERY PLAN
---
 Finalize HashAggregate
   Output: t1.a, count(t2.*)
   Group Key: t1.a
   ->  Hash Join
 Output: t1.a, (PARTIAL count(t2.*))
 Hash Cond: (t1.a = t2.a)
 ->  Seq Scan on public.t t1
   Output: t1.a, t1.b, t1.c
 ->  Hash
   Output: t2.a, (PARTIAL count(t2.*))
   ->  Partial HashAggregate
 Output: t2.a, PARTIAL count(t2.*)
 Group Key: t2.a
 ->  Seq Scan on public.t t2
   Output: t2.*, t2.a
(15 rows)

...while it might be true that COUNT(*) ... INNER JOIN should allow eager
agg pushdown (I haven't thought deeply about it, TBH), I did find this
result pretty interesting.


-Paul

On Wed, Jul 10, 2024 at 1:27 AM Richard Guo  wrote:

> On Sun, Jul 7, 2024 at 10:45 AM Paul George 
> wrote:
> > Thanks for reviving this patch and for all of your work on it! Eager
> aggregation pushdown will be beneficial for my work and I'm hoping to see
> it land.
>
> Thanks for looking at this patch!
>
> > The output of both the original query and this one match (and the plans
> with eager aggregation and the subquery are nearly identical if you restore
> the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does
> this mean that there are conditions under which eager aggregation can be
> pushed down to the nullable side?
>
> I think it's a very risky thing to push a partial aggregation down to
> the nullable side of an outer join, because the NULL-extended rows
> produced by the outer join would not be available when we perform the
> partial aggregation, while with a non-eager-aggregation plan these
> rows are available for the top-level aggregation.  This may put the
> rows into groups in a different way than expected, or get wrong values
> from the aggregate functions.  I've managed to compose an example:
>
> create table t (a int, b int);
> insert into t select 1, 1;
>
> select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
> t2.a having t2.a is null;
>  a | count
> ---+---
>| 1
> (1 row)
>
> This is the expected result, because after the outer join we have got
> a NULL-extended row.
>
> But if we somehow push down the partial aggregation to the nullable
> side of this outer join, we would get a wrong result.
>
> explain (costs off)
> select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
> t2.a having t2.a is null;
> QUERY PLAN
> ---
>  Finalize HashAggregate
>Group Key: t2.a
>->  Nested Loop Left Join
>  Filter: (t2.a IS NULL)
>  ->  Seq Scan on t t1
>  ->  Materialize
>->  Partial HashAggregate
>  Group Key: t2.a
>  ->  Seq Scan on t t2
>Filter: (b > 1)
> (10 rows)
>
> select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
> t2.a having t2.a is null;
>  a | count
> ---+---
>| 0
> (1 row)
>
> I believe there are cases where pushing a partial aggregation down to
> the nullable side of an outer join can be safe, but I doubt that there
> is an easy way to identify these cases and do the push-down for them.
> So for now I think we'd better refrain from doing tha

Re: speed up a logical replica setup

2024-07-11 Thread Euler Taveira
On Thu, Jul 11, 2024, at 2:00 PM, Alexander Lakhin wrote:
> Hello Amit and Hou-San,
> 
> 11.07.2024 13:21, Amit Kapila wrote:
> >
> > We don't wait for the xmin to catch up corresponding to this insert
> > and I don't know if there is a way to do that. So, we should move this
> > Insert to after the call to pg_sync_replication_slots(). It won't
> > impact the general test of pg_createsubscriber.
> >
> > Thanks to Hou-San for helping me in the analysis of this BF failure.
> 
> Thank you for investigating that issue!
> 
> May I ask you to look at another failure of the test occurred today [1]?
> 

Thanks for the report!

You are observing the same issue that Amit explained in [1]. The
pg_create_logical_replication_slot returns the EndRecPtr (see
slot->data.confirmed_flush in DecodingContextFindStartpoint()). EndRecPtr points
to the next record and it is a future position for an idle server. That's why
the recovery takes some time to finish because it is waiting for an activity to
increase the LSN position. Since you modified LOG_SNAPSHOT_INTERVAL_MS to create
additional WAL records soon, the EndRecPtr position is reached rapidly and the
recovery ends quickly.

Hayato proposes a patch [2] to create an additional WAL record that has the same
effect from you little hack: increase the LSN position to allow the recovery
finishes soon. I don't like the solution although it seems simple to implement.
As Amit said if we know the ReadRecPtr, we could use it as consistent LSN. The
problem is that it is used by logical decoding but it is not exposed. [reading
the code...] When the logical replication slot is created, restart_lsn points to
the lastReplayedEndRecPtr (see ReplicationSlotReserveWal()) that is the last
record replayed. Since the replication slots aren't in use, we could use the
restart_lsn from the last replication slot as a consistent LSN.

I'm attaching a patch that implements it.It runs in 6s instead of 26s.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2Bp%2B7Ag6nqdFRdqowK1EmJ6bG-MtZQ_54dnFBi%3D_OO5RQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/OSBPR01MB25521B15BF950D2523BBE143F5D32%40OSBPR01MB2552.jpnprd01.prod.outlook.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From a3be65c9bea08d3c7ffd575342a879d37b28b9a8 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 11 Jul 2024 19:59:56 -0300
Subject: [PATCH] pg_createsubscriber: fix slow recovery

If the primary server is idle when you are running pg_createsubscriber,
it used to take some time during recovery. The reason is that it was
using the LSN returned by pg_create_logical_replication_slot as
recovery_target_lsn. This LSN points to "end + 1" record that might not
be available at WAL, hence, the recovery routine waits until some
activity from auxiliary processes write WAL records and once it reaches
the recovery_target_lsn position.

Instead, use restart_lsn from the last replication slot. It points to
the last WAL record that was replayed. Hence, the recovery finishes
soon.

create_logical_replication_slot() does not return LSN so change its
signature.

Discussion: https://www.postgresql.org/message-id/2377319.1719766794%40sss.pgh.pa.us
Discussion: https://www.postgresql.org/message-id/68de6498-0449-a113-dd03-e198dded0bac%40gmail.com
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 56 -
 1 file changed, 43 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index 21dd50f8089..5ef3f751a5b 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -86,8 +86,8 @@ static void setup_recovery(const struct LogicalRepInfo *dbinfo, const char *data
 static void drop_primary_replication_slot(struct LogicalRepInfo *dbinfo,
 		  const char *slotname);
 static void drop_failover_replication_slots(struct LogicalRepInfo *dbinfo);
-static char *create_logical_replication_slot(PGconn *conn,
-			 struct LogicalRepInfo *dbinfo);
+static bool create_logical_replication_slot(PGconn *conn,
+			struct LogicalRepInfo *dbinfo);
 static void drop_replication_slot(PGconn *conn, struct LogicalRepInfo *dbinfo,
   const char *slot_name);
 static void pg_ctl_status(const char *pg_ctl_cmd, int rc);
@@ -769,15 +769,47 @@ setup_publisher(struct LogicalRepInfo *dbinfo)
 		create_publication(conn, &dbinfo[i]);
 
 		/* Create replication slot on publisher */
-		if (lsn)
-			pg_free(lsn);
-		lsn = create_logical_replication_slot(conn, &dbinfo[i]);
-		if (lsn != NULL || dry_run)
+		if (create_logical_replication_slot(conn, &dbinfo[i]) || dry_run)
 			pg_log_info("create replication slot \"%s\" on publisher",
 		dbinfo[i].replslotname);
 		else
 			exit(1);
 
+		/*
+		 * Get restart_lsn from the last replication slot. It points to the
+		 * last record replayed. The LSN returned by
+		 * pg_create_logical_replication_slot() should not be used because it
+		 * po

Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

2024-07-11 Thread Noah Misch
On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:
> I am working on using read streams in the CREATE DATABASE command when the
> strategy is wal_log. RelationCopyStorageUsingBuffer() function is used in
> this context. This function reads source buffers then copies them to the

Please rebase.  I applied this to 40126ac for review purposes.

> a. Timings:
> b. strace:
> c. perf:

Thanks for including those details.  That's valuable confirmation.

> Subject: [PATCH v1 1/3] Refactor PinBufferForBlock() to remove if checks about
>  persistence
> 
> There are if checks in PinBufferForBlock() function to set persistence
> of the relation and this function is called for the each block in the
> relation. Instead of that, set persistence of the relation before
> PinBufferForBlock() function.

I tried with the following additional patch to see if PinBufferForBlock() ever
gets invalid smgr_relpersistence:


--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1098,6 +1098,11 @@ PinBufferForBlock(Relation rel,
 
Assert(blockNum != P_NEW);
 
+   if (!(smgr_persistence == RELPERSISTENCE_TEMP ||
+ smgr_persistence == RELPERSISTENCE_PERMANENT ||
+ smgr_persistence == RELPERSISTENCE_UNLOGGED))
+   elog(WARNING, "unexpected relpersistence %d", smgr_persistence);
+
if (smgr_persistence == RELPERSISTENCE_TEMP)
{
io_context = IOCONTEXT_NORMAL;


That still gets relpersistence==0 in various src/test/regress cases.  I think
the intent was to prevent that.  If not, please add a comment about when
relpersistence==0 is still allowed.

> --- a/src/backend/storage/aio/read_stream.c
> +++ b/src/backend/storage/aio/read_stream.c
> @@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
>   {
>   stream->ios[i].op.rel = rel;
>   stream->ios[i].op.smgr = RelationGetSmgr(rel);
> - stream->ios[i].op.smgr_persistence = 0;
> + stream->ios[i].op.smgr_persistence = 
> rel->rd_rel->relpersistence;

Does the following comment in ReadBuffersOperation need an update?

/*
 * The following members should be set by the caller.  If only smgr is
 * provided without rel, then smgr_persistence can be set to override 
the
 * default assumption of RELPERSISTENCE_PERMANENT.
 */

> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c

> +/*
> + * Helper struct for read stream object used in
> + * RelationCopyStorageUsingBuffer() function.
> + */
> +struct copy_storage_using_buffer_read_stream_private
> +{
> + BlockNumber blocknum;
> + int64   last_block;
> +};

Why is last_block an int64, not a BlockNumber?

> @@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator 
> srclocator,

>   /* Iterate over each block of the source relation file. */
>   for (blkno = 0; blkno < nblocks; blkno++)
>   {
>   CHECK_FOR_INTERRUPTS();
>  
>   /* Read block from source relation. */
> - srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
> - 
>RBM_NORMAL, bstrategy_src,
> - 
>permanent);
> + srcBuf = read_stream_next_buffer(src_stream, NULL);
>   LockBuffer(srcBuf, BUFFER_LOCK_SHARE);

I think this should check for read_stream_next_buffer() returning
InvalidBuffer.  pg_prewarm doesn't, but the other callers do, and I think the
other callers are a better model.  LockBuffer() doesn't check the
InvalidBuffer case, so let's avoid the style of using a
read_stream_next_buffer() return value without checking.

Thanks,
nm




Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-07-11 Thread Michael Paquier
On Thu, Jul 11, 2024 at 09:35:24PM +0100, Ilya Gladyshev wrote:
> It is broken in master, I just didn’t want to create a separate
> thread, but it can be fixed independently. As I remember, the
> problem is that progress is tracked for each table in the hierarchy
> as if the table is processed separately, without ever setting
> partitions_total and partitions_done counters.

Please let's move this point to its own thread and deal with it with
an independent patch.  Hiding that in a thread that's already quite
long is not a good idea.  This needs proper review, and a separate
thread with a good subject to describe the problem will attract a
better audience to deal with the problem you are seeing.

I was not paying much attention, until you've mentioned that this was
an issue with HEAD.
--
Michael


signature.asc
Description: PGP signature


Re: Pluggable cumulative statistics

2024-07-11 Thread Michael Paquier
On Thu, Jul 11, 2024 at 01:29:08PM +, Bertrand Drouvot wrote:
> Please find attached a patch to do so (attached as .txt to not perturb the
> cfbot).

+ * Reads in existing statistics file into the shared stats hash (for non fixed
+ * amount stats) or into the fixed amount stats.

Thanks.  I have applied a simplified version of that, not mentioning
the details of what happens depending on the kinds of stats dealt
with.
--
Michael


signature.asc
Description: PGP signature


Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2024-07-11 Thread Michael Paquier
On Thu, Jul 11, 2024 at 01:11:05PM +0300, Aleksander Alekseev wrote:
> Thanks, Michael. I prepared a corrected patchset.

A comment about v3-0001.

-* If true, use long segment filenames formed from lower 48 bits of the
-* segment number, e.g. pg_xact/1234. Otherwise, use short
-* filenames formed from lower 16 bits of the segment number e.g.
-* pg_xact/1234.
+* If true, use long segment filenames. Use short filenames otherwise.
+* See SlruFileName().

We're losing some details here even if SlruFileName() has some
explanations, because one would need to read through the snprintf's
04X to know that short file names include 4 characters.  I'm OK to
mention SlruFileName() rather than duplicate the knowledge here, but
SlruFileName() should also be updated to mention the same level of
details with some examples of file names.
--
Michael


signature.asc
Description: PGP signature


Re: Clang function pointer type warnings in v14, v15

2024-07-11 Thread Thomas Munro
On Sat, Jul 6, 2024 at 2:35 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > So I propose to back-patch 101c37cd to those two branches.
>
> +1.

Done.

> I see that there are a boatload of related warnings in older
> branches too; do we want to try to do anything about that?  (I doubt
> code changes would be in-scope, but maybe adding a -Wno-foo switch
> to the build flags would be appropriate.)

I don't see any on clang 16 in the 12 and 13 branches.  Where are you
seeing them?

Has anyone thought about the -Wclobbered warnings from eg tamandua?




Re: Redundant syscache access in get_rel_sync_entry()

2024-07-11 Thread Michael Paquier
On Thu, Jul 11, 2024 at 07:10:58PM +0530, Ashutosh Bapat wrote:
> I think it's just convenient. We do that at multiple places; not exactly
> these functions but functions which fetch relation attributes from cached
> tuples. Given that the tuple is cached and local to the backend, it's not
> too expensive.  But if there are multiple places which do something like
> this, we may want to create more function get_rel_* function which return
> multiple properties in one function call. I see get_rel_namspace() and
> get_rel_name() called together at many places.

That's not worth the complications based on the level of caching.
This code is fine as-is.
--
Michael


signature.asc
Description: PGP signature


Re: Allow non-superuser to cancel superuser tasks.

2024-07-11 Thread Nathan Bossart
On Wed, Jul 10, 2024 at 02:14:55PM +0900, Michael Paquier wrote:
> +# Only non-superuser roles granted pg_signal_autovacuum_worker are allowed
> +# to signal autovacuum workers.  This test uses an injection point located
> +# at the beginning of the autovacuum worker startup.

nitpick: Superuser roles are also allowed to signal autovacuum workers.
Maybe this should read "Only roles with privileges of..."

> +# Create some content and set an aggressive autovacuum.
> +$node->safe_psql(
> + 'postgres', qq(
> +CREATE TABLE tab_int(i int);
> +ALTER TABLE tab_int SET (autovacuum_vacuum_cost_limit = 1);
> +ALTER TABLE tab_int SET (autovacuum_vacuum_cost_delay = 100);
> +));
> +
> +$node->safe_psql(
> + 'postgres', qq(
> +INSERT INTO tab_int VALUES(1);
> +));
> +
> +# Wait until an autovacuum worker starts.
> +$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');

I'm not following how this is guaranteed to trigger an autovacuum quickly.
Shouldn't we set autovacuum_vacuum_insert_threshold to 1 so that it is
eligible for autovacuum?

> +# Wait for the autovacuum worker to exit before scanning the logs.
> +$node->poll_query_until('postgres',
> + "SELECT count(*) = 0 FROM pg_stat_activity "
> + . "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");

WFM.  Even if the PID is quickly reused, this should work.  We just might
end up waiting a little longer.

Is it worth testing cancellation, too?

-- 
nathan




Re: improve performance of pg_dump with many sequences

2024-07-11 Thread Nathan Bossart
On Wed, Jul 10, 2024 at 11:52:33PM -0300, Euler Taveira wrote:
> On Wed, Jul 10, 2024, at 7:05 PM, Nathan Bossart wrote:
>> Unfortunately, I think we have to keep this workaround since older minor
>> releases of PostgreSQL don't have the fix.
> 
> Hmm. Right.

On second thought, maybe we should just limit this improvement to the minor
releases with the fix so that we _can_ get rid of the workaround.  Or we
could use the hacky workaround only for versions with the bug.

-- 
nathan




Re: Clang function pointer type warnings in v14, v15

2024-07-11 Thread Tom Lane
Thomas Munro  writes:
> On Sat, Jul 6, 2024 at 2:35 PM Tom Lane  wrote:
>> I see that there are a boatload of related warnings in older
>> branches too; do we want to try to do anything about that?  (I doubt
>> code changes would be in-scope, but maybe adding a -Wno-foo switch
>> to the build flags would be appropriate.)

> I don't see any on clang 16 in the 12 and 13 branches.  Where are you
> seeing them?

In the buildfarm.  "adder" and a bunch of other machines are throwing
-Wcast-function-type in about two dozen places in v13, and "jay" is
emitting several hundred -Wdeprecated-non-prototype warnings.

> Has anyone thought about the -Wclobbered warnings from eg tamandua?

I decided long ago that gcc's algorithm for emitting that warning
has no detectable connection to real problems.  Maybe it's worth
silencing them on general principles, but I've seen no sign that
it would actually fix any bugs.

regards, tom lane




Re: Redundant syscache access in get_rel_sync_entry()

2024-07-11 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Jul 11, 2024 at 07:10:58PM +0530, Ashutosh Bapat wrote:
>> I think it's just convenient. We do that at multiple places; not exactly
>> these functions but functions which fetch relation attributes from cached
>> tuples. Given that the tuple is cached and local to the backend, it's not
>> too expensive.  But if there are multiple places which do something like
>> this, we may want to create more function get_rel_* function which return
>> multiple properties in one function call. I see get_rel_namspace() and
>> get_rel_name() called together at many places.

> That's not worth the complications based on the level of caching.
> This code is fine as-is.

I could get behind creating such functions if there were a
demonstrable performance win, but in places that are not hot-spots
that's unlikely to be demonstrable.

regards, tom lane




Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2024-07-11 Thread Richard Guo
On Sat, Jul 6, 2024 at 5:32 PM Richard Guo  wrote:
> Here is a new rebase.
>
> I'm planning to push it next week, barring any objections.

Pushed.

Thanks
Richard




Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2024-07-11 Thread Tender Wang
Richard Guo  于2024年7月12日周五 10:30写道:

> On Sat, Jul 6, 2024 at 5:32 PM Richard Guo  wrote:
> > Here is a new rebase.
> >
> > I'm planning to push it next week, barring any objections.
>
> Pushed.
>
> Thanks
> Richard
>

Thanks for pushing.

-- 
Tender Wang


Re: Logical Replication of sequences

2024-07-11 Thread Peter Smith
Hi Vignesh. Here are the rest of my comments for patch v20240705-0003.

(Apologies for the length of this post; but it was unavoidable due to
this being the 1st review of a very large large 1700-line patch)

==
src/backend/catalog/pg_subscription.c

1. GetSubscriptionSequences

+/*
+ * Get the sequences for the subscription.
+ *
+ * The returned list is palloc'ed in the current memory context.
+ */

Is that comment right? The palloc seems to be done in
CacheMemoryContext, not in the current context.

~

2.
The code is very similar to the other function
GetSubscriptionRelations(). In fact I did not understand how the 2
functions know what they are returning:

E.g. how does GetSubscriptionRelations not return sequences too?
E.g. how does GetSubscriptionSequences not return relations too?

==
src/backend/commands/subscriptioncmds.c

CreateSubscription:
nitpick - put the sequence logic *after* the relations logic because
that is the order that seems used everywhere else.

~~~

3. AlterSubscription_refresh

- logicalrep_worker_stop(sub->oid, relid);
+ /* Stop the worker if relation kind is not sequence*/
+ if (relkind != RELKIND_SEQUENCE)
+ logicalrep_worker_stop(sub->oid, relid);

Can you give more reasons in the comment why skip the stop for sequence worker?

~

nitpick - period and space in the comment

~~~

4.
  for (off = 0; off < remove_rel_len; off++)
  {
  if (sub_remove_rels[off].state != SUBREL_STATE_READY &&
- sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE)
+ sub_remove_rels[off].state != SUBREL_STATE_SYNCDONE &&
+ get_rel_relkind(sub_remove_rels[off].relid) != RELKIND_SEQUENCE)
  {
Would this new logic perhaps be better written as:

if (get_rel_relkind(sub_remove_rels[off].relid) == RELKIND_SEQUENCE)
  continue;

~~~

AlterSubscription_refreshsequences:
nitpick - rename AlterSubscription_refresh_sequences

~
5.
There is significant code overlap between the existing
AlterSubscription_refresh and the new function
AlterSubscription_refreshsequences. I wonder if it is better to try to
combine the logic and just pass another parameter to
AlterSubscription_refresh saying to update the existing sequences if
necessary. Particularly since the AlterSubscription_refresh is already
tweaked to work for sequences. Of course, the resulting combined
function would be large and complex, but maybe that would still be
better than having giant slabs of nearly identical cut/paste code.
Thoughts?

~~~

check_publications_origin:
nitpick - move variable declarations
~~~

fetch_sequence_list:
nitpick - change /tablelist/seqlist/
nitpick - tweak the spaces of the SQL for alignment (similar to
fetch_table_list)

~

6.
+" WHERE s.pubname IN (");
+ first = true;
+ foreach_ptr(String, pubname, publications)
+ {
+ if (first)
+ first = false;
+ else
+ appendStringInfoString(&cmd, ", ");
+
+ appendStringInfoString(&cmd, quote_literal_cstr(pubname->sval));
+ }
+ appendStringInfoChar(&cmd, ')');

IMO this can be written much better by using get_publications_str()
function to do all this list work.

==
src/backend/replication/logical/launcher.c

7. logicalrep_worker_find

/*
 * Walks the workers array and searches for one that matches given
 * subscription id and relid.
 *
 * We are only interested in the leader apply worker or table sync worker.
 */

The above function comment (not in the patch 0003) is stale because
this AFAICT this is also going to return sequence workers if it finds
one.

~~~

8. logicalrep_sequence_sync_worker_find

+/*
+ * Walks the workers array and searches for one that matches given
+ * subscription id.
+ *
+ * We are only interested in the sequence sync worker.
+ */
+LogicalRepWorker *
+logicalrep_sequence_sync_worker_find(Oid subid, bool only_running)

There are other similar functions for walking the workers array to
search for a worker. Instead of having different functions for
different cases, wouldn't it be cleaner to combine these into a single
function, where you pass a parameter (e.g. a mask of worker types that
you are interested in finding)?

~

nitpick - declare a for loop variable 'i'

~~~

9. logicalrep_apply_worker_find

+static LogicalRepWorker *
+logicalrep_apply_worker_find(Oid subid, bool only_running)

All the other find* functions assume the lock is already held
(Assert(LWLockHeldByMe(LogicalRepWorkerLock));). But this one is
different. IMO it might be better to acquire the lock in the caller to
make all the find* functions look the same. Anyway, that will help to
combine everything into 1 "find" worker as suggested in the previous
review comment #8.

~

nitpick - declare a for loop variable 'i'
nitpick - removed unnecessary parens in condition.

~~~

10. logicalrep_worker_launch

/*--
* Sanity checks:
* - must be valid worker type
* - tablesync workers are only ones to have relid
* - parallel apply worker is the only kind of subworker
*/

The above code-comment (not in the 0003 patch) seems stale. This
should now also mention sequence sync workers, right?

~

Re: Logging which local address was connected to in log_line_prefix

2024-07-11 Thread David Steele

On 7/11/24 23:09, Greg Sabino Mullane wrote:
Thanks for the review. Please find attached a new version with proper 
tabs and indenting.


This looks good to me now. +1 overall for the feature.

Regards,
-David




Re: Removing unneeded self joins

2024-07-11 Thread Andrei Lepikhov

On 7/11/24 14:43, jian he wrote:

On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov  wrote:


On 7/2/24 07:25, jian he wrote:

to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.

Thanks for your job!
I passed through the patches and have some notes:
1. Patch 0001 has not been applied anymore since the previous week's
changes in the core. Also, there is one place with trailing whitespace.


thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.
I would like to oppose here: IMO, it is just a mishap which we made 
because of a long history of patch transformations. There we lost the 
case where RowMark exists for only one of candidate relations.
Also, after review I think we don't need so many new tests. Specifically 
for DML we already have one:


EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;

And we should just add something to elaborate it a bit.
See the patch in attachment containing my proposal to improve v4-0001 
main SJE patch. I think it resolved the issue with EPQ assertion as well 
as problems with returning value.


--
regards, Andrei Lepikhov
From c04add30999ecd64c51bde7db56a6e5637c16c74 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Tue, 9 Jul 2024 12:25:23 +0700
Subject: [PATCH] Apply SJE to DML queries: Just don't include result relation
 to the set of SJE candidates.

Also, fix the subtle bug with RowMarks.
---
 src/backend/optimizer/plan/analyzejoins.c | 24 +++--
 src/test/regress/expected/join.out| 61 +++
 src/test/regress/sql/join.sql | 17 ++-
 3 files changed, 84 insertions(+), 18 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bb14597762..d2b9ba7c08 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1860,10 +1860,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
 	/* restore the rangetblref in a proper order. */
 	restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
 
-	/* See remove_self_joins_one_group() */
-	Assert(root->parse->resultRelation != toRemove->relid);
-	Assert(root->parse->resultRelation != toKeep->relid);
-
 	/* Replace links in the planner info */
 	remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
 
@@ -2046,14 +2042,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 	{
 		RelOptInfo *inner = root->simple_rel_array[r];
 
-		/*
-		 * We don't accept result relation as either source or target relation
-		 * of SJE, because result relation has different behavior in
-		 * EvalPlanQual() and RETURNING clause.
-		 */
-		if (root->parse->resultRelation == r)
-			continue;
-
 		k = r;
 
 		while ((k = bms_next_member(relids, k)) > 0)
@@ -2069,9 +2057,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 			PlanRowMark *imark = NULL;
 			List	   *uclauses = NIL;
 
-			if (root->parse->resultRelation == k)
-continue;
-
 			/* A sanity check: the relations have the same Oid. */
 			Assert(root->simple_rte_array[k]->relid ==
    root->simple_rte_array[r]->relid);
@@ -2121,7 +2106,8 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 if (omark && imark)
 	break;
 			}
-			if (omark && imark && omark->markType != imark->markType)
+			if (((omark == NULL) ^ (imark == NULL)) ||
+(omark && omark->markType != imark->markType))
 continue;
 
 			/*
@@ -2231,7 +2217,8 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
 			 */
 			if (rte->rtekind == RTE_RELATION &&
 rte->relkind == RELKIND_RELATION &&
-rte->tablesample == NULL)
+rte->tablesample == NULL &&
+varno != root->parse->resultRelation)
 			{
 Assert(!bms_is_member(varno, relids));
 relids = bms_add_member(relids, varno);
@@ -2300,6 +2287,9 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
 
 relids = bms_del_members(relids, group);
 
+/* Don't apply SJE to result relation */
+Assert(!bms_is_member(root->parse->resultRelation, group));
+
 /*
  * Try to remove self-joins from a group of identical entries.
  * Make the next attempt iteratively - if something is deleted
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4e4cec633a..78dfcd4866 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7068,6 +7068,18 @@ UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
->  Seq Scan on sj sz
 (6 rows)
 
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = sz.b FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN  
+-
+ Update on sj sq
+   ->  Nested Loop
+ Join Filter: (sq.a =

Re: Check lateral references within PHVs for memoize cache keys

2024-07-11 Thread Andrei Lepikhov

On 7/11/24 16:18, Richard Guo wrote:

On Fri, Jun 28, 2024 at 10:14 PM Andrei Lepikhov  wrote:

I got the point about Memoize over join, but as a join still calls
replace_nestloop_params to replace parameters in its clauses, why not to
invent something similar to find Memoize keys inside specific JoinPath
node? It is not the issue of this patch, though - but is it doable?


I don't think it's impossible to do, but I'm skeptical that there's an
easy way to identify all the cache keys for joinrels, without having
available ppi_clauses and lateral_vars.

Ok




IMO, the code:
if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids))
cache_purge_all(node);

is a good place to check an assertion: is it really the parent query
parameters that make a difference between memoize keys and node list of
parameters?


I don't think we have enough info available here to identify which
params within outerPlan->chgParam are from outer levels.  Maybe we can
store root->outer_params in the MemoizeState node to help with this
assertion, but I'm not sure if it's worth the trouble.

Got it


Attached is an updated version of this patch.
I'm not sure about stability of output format of AVG aggregate across 
different platforms. Maybe better to return the result of comparison 
between the AVG() and expected value?


--
regards, Andrei Lepikhov





Re: Optimize WindowAgg's use of tuplestores

2024-07-11 Thread Tatsuo Ishii
Hi David,

Thank you for the patch.

> I think this might just be noise as a result of rearranging code. In
> terms of C code, I don't see any reason for it to be slower.  If you
> look at GenerationDelete() (as what is getting called from
> MemoryContextDelete()), it just calls GenerationReset(). So resetting
> is going to always be less work than deleting the context, especially
> given we don't need to create the context again when we reset it.
> 
> I wrote the attached script to see if I can also see the slowdown and
> I do see the patched code come out slightly slower (within noise
> levels) in lower partition counts.
> 
> To get my compiler to produce code in a more optimal order for the
> common case, I added unlikely() to the "if (winstate->all_first)"
> condition.  This is only evaluated on the first time after a rescan,
> so putting that code at the end of the function makes more sense.  The
> attached v2 patch has it this way.  You can see the numbers look
> slightly better in the attached graph.
> 
> Thanks for having a look at this.
> 
> David

@@ -2684,6 +2726,14 @@ ExecEndWindowAgg(WindowAggState *node)
PlanState  *outerPlan;
int i;
 
+   if (node->buffer != NULL)
+   {
+   tuplestore_end(node->buffer);
+
+   /* nullify so that release_partition skips the 
tuplestore_clear() */
+   node->buffer = NULL;
+   }
+

Is it possible that node->buffer == NULL in ExecEndWindowAgg()?  If
not, probably making it an Assert() or just removing the if() should
be fine.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Clang function pointer type warnings in v14, v15

2024-07-11 Thread Thomas Munro
On Fri, Jul 12, 2024 at 2:26 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Sat, Jul 6, 2024 at 2:35 PM Tom Lane  wrote:
> >> I see that there are a boatload of related warnings in older
> >> branches too; do we want to try to do anything about that?  (I doubt
> >> code changes would be in-scope, but maybe adding a -Wno-foo switch
> >> to the build flags would be appropriate.)
>
> > I don't see any on clang 16 in the 12 and 13 branches.  Where are you
> > seeing them?
>
> In the buildfarm.  "adder" and a bunch of other machines are throwing
> -Wcast-function-type in about two dozen places in v13, and "jay" is
> emitting several hundred -Wdeprecated-non-prototype warnings.

Ah, I see.  A few animals running with -Wextra.  Whereas in v14+ we
have -Wcast-function-type in the actual tree, which affects people's
workflows more directly.  Like my regular machine, or CI, when a
couple of the OSes' house compilers eventually reach clang 16.

I gess we have to decide if it's a matter for the tree, or for the
people who add -Wextra, ie to decide if they want to filter that down
a bit with some -Wno-XXX.  Adder already has some of those:

 'CFLAGS' => '-O1 -ggdb -g3
-fno-omit-frame-pointer -Wall -Wextra -Wno-unused-parameter
-Wno-sign-compare -Wno-missing-field-initializers -O0',




Re: Clang function pointer type warnings in v14, v15

2024-07-11 Thread Tom Lane
Thomas Munro  writes:
> On Fri, Jul 12, 2024 at 2:26 PM Tom Lane  wrote:
>> Thomas Munro  writes:
>>> I don't see any on clang 16 in the 12 and 13 branches.  Where are you
>>> seeing them?

>> In the buildfarm.  "adder" and a bunch of other machines are throwing
>> -Wcast-function-type in about two dozen places in v13, and "jay" is
>> emitting several hundred -Wdeprecated-non-prototype warnings.

> Ah, I see.  A few animals running with -Wextra.

Oh, got it.

> I gess we have to decide if it's a matter for the tree, or for the
> people who add -Wextra, ie to decide if they want to filter that down
> a bit with some -Wno-XXX.

I think it's reasonable to say that if you add -Wextra then it's
up to you to deal with the results.  If we were contemplating
enabling -Wextra as standard, then it'd be our problem --- but
nobody has proposed that AFAIR.  In any case we'd surely not
add it now to near-dead branches.

regards, tom lane




Re: Test to dump and restore objects left behind by regression

2024-07-11 Thread Ashutosh Bapat
On Tue, Jul 9, 2024 at 1:07 PM Michael Paquier  wrote:
>
> On Mon, Jul 08, 2024 at 03:59:30PM +0530, Ashutosh Bapat wrote:
> > Before submitting the patch, I looked for all the places which mention
> > AdjustUpgrade or AdjustUpgrade.pm to find places where the new module needs
> > to be mentioned. But I didn't find any. AdjustUpgrade is not mentioned
> > in src/test/perl/Makefile or src/test/perl/meson.build. Do we want to also
> > add AdjustUpgrade.pm in those files?
>
> Good question.  This has not been mentioned on the thread that added
> the module:
> https://www.postgresql.org/message-id/891521.1673657296%40sss.pgh.pa.us
>
> And I could see it as being useful if installed.  The same applies to
> Kerberos.pm, actually.  I'll ping that on a new thread.

For now, it may be better to maintain status-quo. If we see a need to
use these modules in future by say extensions or tests outside core
tree, we will add them to meson and make files.

>
> > We could forget 0002. I am fine with that.  But I can change the code such
> > that formats other than "plain" are tested when PG_TEST_EXTRAS contains
> > "regress_dump_formats". Would that be acceptable?
>
> Interesting idea.  That may be acceptable, under the same arguments as
> the xid_wraparound one.

Done. Added a new entry in PG_TEST_EXTRA documentation.

I have merged the two patches now.


-- 
Best Wishes,
Ashutosh Bapat
From aff0939b8aa8566daecf1ac35b9c7fce9fa851ca Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 27 Jun 2024 10:03:53 +0530
Subject: [PATCH 2/2] Test pg_dump/restore of regression objects

002_pg_upgrade.pl tests pg_upgrade on the regression database left
behind by regression run. Modify it to test dump and restore of the
regression database as well.

Regression database created by regression run contains almost all the
database objects supported by PostgreSQL in various states. The test
thus covers wider dump and restore scenarios.

When PG_TEST_EXTRA has 'regress_dump_formats' in it, test dump and
restore in all supported formats. Otherwise test only "plain" format so
that the test finishes quickly.

Author: Ashutosh Bapat
Reviewed by: Michael Pacquire
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2wf61pt+hfquzjbqourz...@mail.gmail.com
---
 doc/src/sgml/regress.sgml   |  13 ++
 src/bin/pg_upgrade/t/002_pg_upgrade.pl  | 173 ++--
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 109 
 3 files changed, 277 insertions(+), 18 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/AdjustDump.pm

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index d1042e0222..8c1a9ddc40 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -336,6 +336,19 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
   
  
 
+
+
+ regress_dump_formats
+ 
+  
+   When enabled,
+   src/bin/pg_upgrade/t/002_pg_upgrade.pl tests dump
+   and restore of regression database using all dump formats. Otherwise
+   tests only plain format. Not enabled by default
+   because it is resource intensive.
+  
+ 
+

 
Tests for features that are not supported by the current build
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..613512ffe7 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -13,6 +13,7 @@ use File::Path qw(rmtree);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::AdjustUpgrade;
+use PostgreSQL::Test::AdjustDump;
 use Test::More;
 
 # Can be changed to test the other modes.
@@ -36,9 +37,9 @@ sub generate_db
 		"created database with ASCII characters from $from_char to $to_char");
 }
 
-# Filter the contents of a dump before its use in a content comparison.
-# This returns the path to the filtered dump.
-sub filter_dump
+# Filter the contents of a dump before its use in a content comparison for
+# upgrade testing. This returns the path to the filtered dump.
+sub filter_dump_for_upgrade
 {
 	my ($is_old, $old_version, $dump_file) = @_;
 	my $dump_contents = slurp_file($dump_file);
@@ -61,6 +62,44 @@ sub filter_dump
 	return $dump_file_filtered;
 }
 
+# Test that the given two files match.  The files usually contain pg_dump
+# output in "plain" format. Output the difference if any.
+sub compare_dumps
+{
+	my ($dump1, $dump2, $testname) = @_;
+
+	my $compare_res = compare($dump1, $dump2);
+	is($compare_res, 0, $testname);
+
+	# Provide more context if the dumps do not match.
+	if ($compare_res != 0)
+	{
+		my ($stdout, $stderr) =
+		  run_command([ 'diff', '-u', $dump1, $dump2 ]);
+		print "=== diff of $dump1 and $dump2\n";
+		print "=== stdout ===\n";
+		print $stdout;
+		print "=== stderr ===\n";
+		print $stderr;
+		print "=== EOF ===\n";
+	}
+}
+
+# Adjust the contents of a dump before its use in a conte

RE: Adding skip scan (including MDAM style range skip scan) to nbtree

2024-07-11 Thread Masahiro.Ikeda
Hi,

Since I'd like to understand the skip scan to improve the EXPLAIN output
for multicolumn B-Tree Index[1], I began to try the skip scan with some
queries and look into the source code.

I have some feedback and comments.

(1)

At first, I was surprised to look at your benchmark result because the skip scan
index can improve much performance. I agree that there are many users to be
happy with the feature for especially OLAP use-case. I expected to use v18.


(2)

I found the cost is estimated to much higher if the number of skipped attributes
is more than two. Is it expected behavior?

# Test result. The attached file is the detail of tests.

-- Index Scan
-- The actual time is low since the skip scan works well
-- But the cost is higher than one of seqscan
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101;
  QUERY PLAN
   
---
 Index Scan using idx_id1_id2_id3 on public.test  (cost=0.42..26562.77 rows=984 
width=20) (actual time=0.051..15.533 rows=991 loops=1)
   Output: id1, id2, id3, value
   Index Cond: (test.id3 = 101)
   Buffers: shared hit=4402
 Planning:
   Buffers: shared hit=7
 Planning Time: 0.234 ms
 Execution Time: 15.711 ms
(8 rows)

-- Seq Scan
-- actual time is high, but the cost is lower than one of the above Index Scan.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101;
  QUERY PLAN
   
---
 Gather  (cost=1000.00..12676.73 rows=984 width=20) (actual time=0.856..113.861 
rows=991 loops=1)
   Output: id1, id2, id3, value
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=6370
   ->  Parallel Seq Scan on public.test  (cost=0.00..11578.33 rows=410 
width=20) (actual time=0.061..102.016 rows=330 loops=3)
 Output: id1, id2, id3, value
 Filter: (test.id3 = 101)
 Rows Removed by Filter: 333003
 Buffers: shared hit=6370
 Worker 0:  actual time=0.099..98.014 rows=315 loops=1
   Buffers: shared hit=2066
 Worker 1:  actual time=0.054..97.162 rows=299 loops=1
   Buffers: shared hit=1858
 Planning:
   Buffers: shared hit=19
 Planning Time: 0.194 ms
 Execution Time: 114.129 ms
(18 rows)


I look at btcostestimate() to find the reason and found the bound quals
and cost.num_sa_scans are different from my expectation.

My assumption is
* bound quals is id3=XXX (and id1 and id2 are skipped attributes)
* cost.num_sa_scans = 100 (=10*10 because assuming 10 primitive index scans
  per skipped attribute)

But it's wrong. The above index scan result is
* bound quals is NULL
* cost.num_sa_scans = 1


As I know you said the below, but I'd like to know the above is expected or not.

> That approach seems far more practicable than preempting the problem
> during planning or during nbtree preprocessing. It seems like it'd be
> very hard to model the costs statistically. We need revisions to
> btcostestimate, of course, but the less we can rely on btcostestimate
> the better. As I said, there are no new index paths generated by the
> optimizer for any of this.

I couldn't understand why there is the below logic well.

  btcostestimate()
  (...omit...)
if (indexcol != iclause->indexcol)
{
/* no quals at all for indexcol */
found_skip = true;
if (index->pages < 100)
break;
num_sa_scans += 10 * (indexcol - 
iclause->indexcol); // why add minus value?
continue;   
   // why skip to add bound quals?
}


(3)

Currently, there is an assumption that "there will be 10 primitive index scans
per skipped attribute". Is any chance to use pg_stats.n_distinct?

[1] Improve EXPLAIN output for multicolumn B-Tree Index
https://www.postgresql.org/message-id/flat/TYWPR01MB1098260B694D27758FE2BA46FB1C92%40TYWPR01MB10982.jpnprd01.prod.outlook.com

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION



compare_cost.sql
Description: compare_cost.sql


compare_cost_with_v2_patch.out
Description: compare_cost_with_v2_patch.out


Re: Allow non-superuser to cancel superuser tasks.

2024-07-11 Thread Michael Paquier
On Thu, Jul 11, 2024 at 08:50:57PM -0500, Nathan Bossart wrote:
> I'm not following how this is guaranteed to trigger an autovacuum quickly.
> Shouldn't we set autovacuum_vacuum_insert_threshold to 1 so that it is
> eligible for autovacuum?

You are right, this is not going to influence a faster startup of a
worker, so we could remove that entirely.  On closer look, the main
bottlebeck is that the test is spending a lot of time waiting on the
naptime even if we are using the minimum value of 1s, as the scan of
pg_stat_activity checking for workers keeps looping.

[ ..thinks.. ]

I have one trick in my sleeve for this one to make the launcher more
responsive in checking the timestamps of the database list.  That's
not perfect, but it reduces the wait behind the worker startups by
400ms (1s previously as of the naptime, 600ms now) with a reload to
set the launcher's latch after the injection point has been attached.
The difference in runtime is noticeable.

>> +# Wait for the autovacuum worker to exit before scanning the logs.
>> +$node->poll_query_until('postgres',
>> +"SELECT count(*) = 0 FROM pg_stat_activity "
>> +. "WHERE pid = $av_pid AND backend_type = 'autovacuum worker';");
> 
> WFM.  Even if the PID is quickly reused, this should work.  We just might
> end up waiting a little longer.
> 
> Is it worth testing cancellation, too?

The point is to check after pg_signal_backend, so I am not sure it is
worth the extra cycles for the cancellation.

Attaching the idea, with a fix for the comment you have mentioned and
appending "regress_" the role names for the warnings generated by
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, while on it.

What do you think?
--
Michael
From 8a827a8a59da4d3c2f2115151e1241cf663c11a1 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 12 Jul 2024 14:19:51 +0900
Subject: [PATCH v3] Add tap test for pg_signal_autovacuum role

---
 src/backend/postmaster/autovacuum.c   |  7 ++
 .../test_misc/t/006_signal_autovacuum.pl  | 94 +++
 2 files changed, 101 insertions(+)
 create mode 100644 src/test/modules/test_misc/t/006_signal_autovacuum.pl

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 928754b51c..4e4a0ccbef 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -100,6 +100,7 @@
 #include "utils/fmgroids.h"
 #include "utils/fmgrprotos.h"
 #include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1902,6 +1903,12 @@ do_autovacuum(void)
 	/* Start a transaction so our commands have one to play into. */
 	StartTransactionCommand();
 
+	/*
+	 * This injection point is put in a transaction block to work with a wait
+	 * that uses a condition variable.
+	 */
+	INJECTION_POINT("autovacuum-worker-start");
+
 	/*
 	 * Compute the multixact age for which freezing is urgent.  This is
 	 * normally autovacuum_multixact_freeze_max_age, but may be less if we are
diff --git a/src/test/modules/test_misc/t/006_signal_autovacuum.pl b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
new file mode 100644
index 00..ded42b1be9
--- /dev/null
+++ b/src/test/modules/test_misc/t/006_signal_autovacuum.pl
@@ -0,0 +1,94 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test signaling autovacuum worker with pg_signal_autovacuum_worker.
+#
+# Only roles with privileges of pg_signal_autovacuum_worker are allowed to
+# signal autovacuum workers.  This test uses an injection point located
+# at the beginning of the autovacuum worker startup.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize postgres
+my $psql_err = '';
+my $psql_out = '';
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+
+# This ensures a quick worker spawn.
+$node->append_conf(
+	'postgresql.conf', 'autovacuum_naptime = 1');
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node->safe_psql(
+	'postgres', qq(
+CREATE ROLE regress_regular_role;
+CREATE ROLE regress_worker_role;
+GRANT pg_signal_autovacuum_worker TO regress_worker_role;
+));
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+	"SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
+
+$node->reload();
+
+# Wait until an autovacuum worker starts.
+$node->wait_for_event('autovacuum worker', 'autovacuum-worker-start');
+
+my $av_pid = $node->safe_psql(
+	'postgres', qq(
+SELECT pid FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';
+));
+
+# Regular role cannot terminate autovacuum worker.
+my $terminate_with_no_pg_signal_av = $node->psql(
+	'postgres', qq(
+SET ROLE regress_regular_role;
+SELECT pg_terminate_backend($av_pid)

Re: speed up a logical replica setup

2024-07-11 Thread Amit Kapila
On Thu, Jul 11, 2024 at 7:04 PM Euler Taveira  wrote:
>
> On Thu, Jul 11, 2024, at 8:22 AM, Zhijie Hou (Fujitsu) wrote:
>
> The analysis and suggestion look reasonable to me.
> Here is a small patch which does the same.
>
>
> LGTM. However, I would add a comment above the INSERT you moved around to 
> remind
> you that a transaction cannot be added there because it will break this test.
>

Pushed after adding a comment.

-- 
With Regards,
Amit Kapila.




Re: Optimize WindowAgg's use of tuplestores

2024-07-11 Thread Tatsuo Ishii
> @@ -2684,6 +2726,14 @@ ExecEndWindowAgg(WindowAggState *node)
>   PlanState  *outerPlan;
>   int i;
>  
> + if (node->buffer != NULL)
> + {
> + tuplestore_end(node->buffer);
> +
> + /* nullify so that release_partition skips the 
> tuplestore_clear() */
> + node->buffer = NULL;
> + }
> +
> 
> Is it possible that node->buffer == NULL in ExecEndWindowAgg()?  If
> not, probably making it an Assert() or just removing the if() should
> be fine.

Of course it it possible, for example there's no row in a partition.
Sorry for noise.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Optimize WindowAgg's use of tuplestores

2024-07-11 Thread Ashutosh Bapat
On Thu, Jul 11, 2024 at 5:39 PM David Rowley  wrote:
>
> On Wed, 10 Jul 2024 at 02:42, Ashutosh Bapat
>  wrote:
> > Observations
> > 1. The numbers corresponding to 10 and 100 partitions are higher when
> > patched. That might be just noise. I don't see any reason why it would
> > impact negatively when there are a small number of partitions. The
> > lower partition cases also have a higher number of rows per partition,
> > so is the difference between MemoryContextDelete() vs
> > MemoryContextReset() making any difference here. May be worth
> > verifying those cases carefully. Otherwise upto 1000 partitions, it
> > doesn't show any differences.
>
> I think this might just be noise as a result of rearranging code. In
> terms of C code, I don't see any reason for it to be slower.  If you
> look at GenerationDelete() (as what is getting called from
> MemoryContextDelete()), it just calls GenerationReset(). So resetting
> is going to always be less work than deleting the context, especially
> given we don't need to create the context again when we reset it.
>
> I wrote the attached script to see if I can also see the slowdown and
> I do see the patched code come out slightly slower (within noise
> levels) in lower partition counts.
>
> To get my compiler to produce code in a more optimal order for the
> common case, I added unlikely() to the "if (winstate->all_first)"
> condition.  This is only evaluated on the first time after a rescan,
> so putting that code at the end of the function makes more sense.  The
> attached v2 patch has it this way.  You can see the numbers look
> slightly better in the attached graph.

The change to all_first seems unrelated to the tuplestore
optimization. But it's bringing the results inline with the master for
lower number of partitions.

Thanks for the script. I have similar results on my laptop.
>From master
Testing with 100 partitions
latency average = 505.738 ms
latency average = 509.407 ms
latency average = 522.461 ms
Testing with 10 partitions
latency average = 329.026 ms
latency average = 327.504 ms
latency average = 342.556 ms
Testing with 1 partitions
latency average = 299.496 ms
latency average = 298.266 ms
latency average = 306.773 ms
Testing with 1000 partitions
latency average = 299.006 ms
latency average = 302.188 ms
latency average = 301.701 ms
Testing with 100 partitions
latency average = 305.411 ms
latency average = 286.935 ms
latency average = 302.432 ms
Testing with 10 partitions
latency average = 288.091 ms
latency average = 294.506 ms
latency average = 305.082 ms
Testing with 1 partitions
latency average = 301.121 ms
latency average = 319.615 ms
latency average = 301.141 ms

Patched
Testing with 100 partitions
latency average = 351.683 ms
latency average = 352.516 ms
latency average = 352.086 ms
Testing with 10 partitions
latency average = 300.626 ms
latency average = 303.584 ms
latency average = 306.959 ms
Testing with 1 partitions
latency average = 289.560 ms
latency average = 302.248 ms
latency average = 297.423 ms
Testing with 1000 partitions
latency average = 308.600 ms
latency average = 299.215 ms
latency average = 289.681 ms
Testing with 100 partitions
latency average = 301.216 ms
latency average = 286.240 ms
latency average = 291.232 ms
Testing with 10 partitions
latency average = 305.260 ms
latency average = 296.707 ms
latency average = 300.266 ms
Testing with 1 partitions
latency average = 316.199 ms
latency average = 314.043 ms
latency average = 309.425 ms

Now that you are also seeing the slowdown with your earlier patch, I
am wondering whether adding unlikely() by itself is a good
optimization. There might be some other reason behind the perceived
slowdown. How do the numbers look when you just add unlikely() without
any other changes?

-- 
Best Wishes,
Ashutosh Bapat




Re: Flush pgstats file during checkpoints

2024-07-11 Thread Michael Paquier
On Fri, Jul 05, 2024 at 01:52:31PM +0900, Michael Paquier wrote:
> On Sat, Jun 29, 2024 at 11:13:04PM +0200, Tomas Vondra wrote:
>> I think those are two independent issues - knowing that the snapshot is
>> from the last checkpoint, and knowing that it's correct (not corrupted).
>> And yeah, we should be careful about fsync/durable_rename.
> 
> Yeah, that's bugging me as well.  I don't really get why we would not
> want durability at shutdown for this data.  So how about switching the
> end of pgstat_write_statsfile() to use durable_rename()?  Sounds like
> an independent change, worth on its own.

Please find attached a rebased patch set with the durability point
addressed in 0001.  There were also some conflicts.

Note that I have applied the previous 0002 adding an assert in
pgstat_write_statsfile() as 734c057a8935, as I've managed to break
again this assumption while hacking more on this area..
--
Michael
From b03087f1746233bce0d1fbbb69789795296e779b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 12 Jul 2024 15:31:27 +0900
Subject: [PATCH v2 1/4] Make write of pgstats file durable

This switches the pgstats code to use durable_rename() rather than
rename(), to make sure that the stats file is not lost should the host
by plugged off after PostgreSQL has been stopped.
---
 src/backend/utils/activity/pgstat.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index ed7baa6e04..66cda798fb 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1478,12 +1478,9 @@ pgstat_write_statsfile(void)
 		tmpfile)));
 		unlink(tmpfile);
 	}
-	else if (rename(tmpfile, statfile) < 0)
+	else if (durable_rename(tmpfile, statfile, LOG) < 0)
 	{
-		ereport(LOG,
-(errcode_for_file_access(),
- errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
-		tmpfile, statfile)));
+		/* error logged already */
 		unlink(tmpfile);
 	}
 }
-- 
2.45.2

From 4340ebd842049b5c706ada324bba36a5c5a78bb9 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 12 Jul 2024 15:22:12 +0900
Subject: [PATCH v2 2/4] Add redo LSN to pgstats file

This is used in the startup process to check that the file we are
loading is the one referring to the latest checkpoint.

Bump PGSTAT_FILE_FORMAT_ID.
---
 src/include/pgstat.h|  5 +++--
 src/backend/access/transam/xlog.c   |  2 +-
 src/backend/utils/activity/pgstat.c | 26 +++---
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6b99bb8aad..043d39a565 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -11,6 +11,7 @@
 #ifndef PGSTAT_H
 #define PGSTAT_H
 
+#include "access/xlogdefs.h"
 #include "datatype/timestamp.h"
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
@@ -235,7 +236,7 @@ typedef struct PgStat_TableXactStatus
  * 
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BCAD
+#define PGSTAT_FILE_FORMAT_ID	0x01A5BCAE
 
 typedef struct PgStat_ArchiverStats
 {
@@ -466,7 +467,7 @@ extern Size StatsShmemSize(void);
 extern void StatsShmemInit(void);
 
 /* Functions called during server startup / shutdown */
-extern void pgstat_restore_stats(void);
+extern void pgstat_restore_stats(XLogRecPtr redo);
 extern void pgstat_discard_stats(void);
 extern void pgstat_before_server_shutdown(int code, Datum arg);
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 33e27a6e72..f137551e8d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5641,7 +5641,7 @@ StartupXLOG(void)
 	if (didCrash)
 		pgstat_discard_stats();
 	else
-		pgstat_restore_stats();
+		pgstat_restore_stats(checkPoint.redo);
 
 	lastFullPageWrites = checkPoint.fullPageWrites;
 
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 66cda798fb..2952604a51 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -94,6 +94,7 @@
 #include 
 
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "lib/dshash.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -171,8 +172,8 @@ typedef struct PgStat_SnapshotEntry
  * --
  */
 
-static void pgstat_write_statsfile(void);
-static void pgstat_read_statsfile(void);
+static void pgstat_write_statsfile(XLogRecPtr redo);
+static void pgstat_read_statsfile(XLogRecPtr redo);
 
 static void pgstat_reset_after_failure(void);
 
@@ -448,9 +449,9 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
  * Should only be called by the startup process or in single user mode.
  */
 void
-pgstat_restore_stats(void)
+pgstat_restore_stats(XLogRecPtr redo)
 {
-	pgstat_read_statsfile();
+	pgstat_read_statsfile(redo);
 }
 
 /*
@@ -526,7 +527,7 @@ pgstat_before_serv

Re: Pgoutput not capturing the generated columns

2024-07-11 Thread Peter Smith
Hi Shubham.

Thanks for separating the new BMS 'columns' modification.

Here are my review comments for the latest patch v17-0001.

==

1. src/backend/replication/pgoutput/pgoutput.c

  /*
  * Columns included in the publication, or NULL if all columns are
  * included implicitly.  Note that the attnums in this bitmap are not
+ * publication and include_generated_columns option: other reasons should
+ * be checked at user side.  Note that the attnums in this bitmap are not
  * shifted by FirstLowInvalidHeapAttributeNumber.
  */
  Bitmapset  *columns;
With this latest 0001 there is now no change to the original
interpretation of RelationSyncEntry BMS 'columns'. So, I think this
field comment should remain unchanged; i.e. it should be the same as
the current HEAD comment.

==
src/test/subscription/t/011_generated.pl

nitpick - comment changes for 'tab2' and 'tab3' to make them more consistent.

==
99.
Please refer to the attached diff patch which implements any nitpicks
described above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index f449969..fe32987 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -28,14 +28,16 @@ $node_subscriber->safe_psql('postgres',
"CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
22) STORED, c int)"
 );
 
-# publisher-side tab2 has generated col 'b' but subscriber-side tab2 has 
NON-generated col 'b'.
+# tab2:
+# publisher-side tab2 has generated col 'b'.
+# subscriber-side tab2 has non-generated col 'b'.
 $node_publisher->safe_psql('postgres',
"CREATE TABLE tab2 (a int, b int GENERATED ALWAYS AS (a * 2) STORED)");
 $node_subscriber->safe_psql('postgres', "CREATE TABLE tab2 (a int, b int)");
 
 # tab3:
-# publisher-side tab3 has generated col 'b' but
-# subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.
+# publisher-side tab3 has generated col 'b'.
+# subscriber-side tab3 has generated col 'b', using a different computation.
 $node_publisher->safe_psql('postgres',
"CREATE TABLE tab3 (a int, b int GENERATED ALWAYS AS (a + 10) STORED)");
 $node_subscriber->safe_psql('postgres',
@@ -97,8 +99,11 @@ is( $result, qq(1|22|
 6|132|), 'generated columns replicated');
 
 #
-# TEST tab2: the publisher-side col 'b' is generated, and the subscriber-side
-# col 'b' is not generated, so confirm that col 'b' IS replicated.
+# TEST tab2:
+# publisher-side tab2 has generated col 'b'.
+# subscriber-side tab2 has non-generated col 'b'.
+#
+# Confirm that col 'b' is replicated.
 #
 $node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
 $node_publisher->wait_for_catchup('sub2');
@@ -110,10 +115,13 @@ is( $result, qq(4|8
 );
 
 #
-# TEST tab3: the publisher-side col 'b' is generated, and the subscriber-side
-# col 'b' is also generated, so confirmed that col 'b' IS NOT replicated. We
-# can know this because the result value is the subscriber-side computation
-# (which is not the same as the publisher-side computation for col 'b').
+# TEST tab3:
+# publisher-side tab3 has generated col 'b'.
+# subscriber-side tab3 has generated col 'b', using a different computation.
+#
+# Confirm that col 'b' is NOT replicated. We can know this because the result
+# value is the subscriber-side computation (which is different from the
+# publisher-side computation for this column).
 #
 $node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
 $node_publisher->wait_for_catchup('sub3');