RE: Exit walsender before confirming remote flush in logical replication

2023-02-08 Thread Hayato Kuroda (Fujitsu)
Dear Horiguchi-san,

Thank you for checking the patch! PSA new version.

> 0002:
> 
> This patch doesn't seem to offer a means to change the default
> walsender behavior.  We need a subscription option named like
> "walsender_exit_mode" to do that.

As I said in another mail[1], I'm thinking the feature does not have to be
used alone for now.

> +ConsumeWalsenderOptions(List *options, WalSndData *data)
> 
> I wonder if it is the right design to put options for different things
> into a single list. I rather choose to embed the walsender option in
> the syntax than needing this function.
> 
> K_START_REPLICATION opt_slot opt_physical RECPTR opt_timeline
> opt_shutdown_mode
> 
> K_START_REPLICATION K_SLOTIDENT K_LOGICAL RECPTR
> opt_shutdown_mode plugin_options
> 
> where opt_shutdown_mode would be like "SHUTDOWN_MODE immediate".

Right, the option handling was quite bad. I added new syntax opt_shutdown_mode
to logical replication. And many codes were modified accordingly.
Note that based on the other discussion, I removed codes
for supporting physical replication but tried to keep the extensibility.

> ==
> If we go with the current design, I think it is better to share the
> option list rule between the logical and physical START_REPLCIATION
> commands.
> 
> I'm not sure I like the option syntax
> "exit_before_confirming=". I imagin that other options may
> come in future. Thus, how about "walsender_shutdown_mode=",
> where the mode is one of "wait_flush"(default) and "immediate"?

Seems better, I changed to from boolean to enumeration.

> +typedef struct
> +{
> + boolexit_before_confirming;
> +} WalSndData;
> 
> Data doesn't seem to represent the variable. Why not WalSndOptions?

This is inspired by PGOutputData, but I prefer your idea. Fixed.

> - !equal(newsub->publications, MySubscription->publications))
> + !equal(newsub->publications, MySubscription->publications) ||
> + (newsub->minapplydelay > 0 &&
> MySubscription->minapplydelay == 0) ||
> + (newsub->minapplydelay == 0 &&
> MySubscription->minapplydelay > 0))
> 
>  I slightly prefer the following expression (Others may disagree:p):
> 
>   ((newsub->minapplydelay == 0) != (MySubscription->minapplydelay == 0))

I think conditions for the same parameter should be aligned one line,
So your posted seems better. Fixed.

> 
>  And I think we need a comment for the term. For example,
> 
>   /* minapplydelay affects START_REPLICATION option exit_before_confirming
> */

Added just above the condition.

> + * Reads all entrly of the list and consume if needed.
> s/entrly/entries/ ?
> ...

This part is no longer needed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866D3EC780D251953BDE7FAF5D89%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v4-0001-Time-delayed-logical-replication-subscriber.patch
Description: v4-0001-Time-delayed-logical-replication-subscriber.patch


v4-0002-Extend-START_REPLICATION-command-to-accept-walsen.patch
Description:  v4-0002-Extend-START_REPLICATION-command-to-accept-walsen.patch


Re: Logical replication timeout problem

2023-02-08 Thread Amit Kapila
On Wed, Feb 8, 2023 at 10:57 AM Andres Freund  wrote:
>
> On 2023-02-03 10:13:54 +0530, Amit Kapila wrote:
> > I am planning to push this to HEAD sometime next week (by Wednesday).
> > To backpatch this, we need to fix it in some non-standard way, like
> > without introducing a callback which I am not sure is a good idea. If
> > some other committers vote to get this in back branches with that or
> > some different idea that can be backpatched then we can do that
> > separately as well. I don't see this as a must-fix in back branches
> > because we have a workaround (increase timeout) or users can use the
> > streaming option (for >=14).
>
> I just saw the commit go in, and a quick scan over it makes me think neither
> this commit, nor f95d53eded, which unfortunately was already backpatched, is
> the right direction. The wrong direction likely started quite a bit earlier,
> with 024711bb544.
>
> It feels quite fundamentally wrong that bascially every output plugin needs to
> call a special function in nearly every callback.
>
> In 024711bb544 there was just one call to OutputPluginUpdateProgress() in
> pgoutput.c. Quite tellingly, it just updated pgoutput, without touching
> test_decoding.
>
> Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another.
>

I think the original commit 024711bb544 forgets to call it in
test_decoding and the other commits followed the same and missed to
update test_decoding.

>
> This makes no sense.  There's lots of output plugins out there. There's an
> increasing number of callbacks.  This isn't a maintainable path forward.
>
>
> If we want to call something to maintain state, it has to be happening from
> central infrastructure.
>
>
> It feels quite odd architecturally that WalSndUpdateProgress() ends up
> flushing out writes - that's far far from obvious.
>
> I don't think:
> /*
>  * Wait until there is no pending write. Also process replies from the other
>  * side and check timeouts during that.
>  */
> static void
> ProcessPendingWrites(void)
>
> Is really a good name. What are we processing?
>

It is for sending the keep_alive message (if required). That is
normally done when we skipped processing a transaction to ensure sync
replication is not delayed. It has been discussed previously [1][2] to
extend the WalSndUpdateProgress() interface. Basically, as explained
by Craig [2], this has to be done from plugin as it can do filtering
or there could be other reasons why the output plugin skips all
changes. We used the same interface for sending keep-alive message
when we processed a lot of (DDL) changes without sending anything to
plugin.

[1] - 
https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de
[2] - 
https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: possible proposal plpgsql GET DIAGNOSTICS oid = PG_ROUTINE_OID

2023-02-08 Thread Pavel Stehule
hi



st 8. 2. 2023 v 7:33 odesílatel Julien Rouhaud  napsal:

> On Tue, Feb 07, 2023 at 08:48:22PM +0100, Pavel Stehule wrote:
> >
> > I have a question about the possibility of simply getting the name of the
> > currently executed function. The reason for this request is
> simplification
> > of writing debug messages.
> >
> > GET DIAGNOSTICS _oid = PG_ROUTINE_OID;
> > RAISE NOTICE '... % ... %', _oid, _oid::regproc::text;
> >
> > The advantage of this dynamic access to function name is always valid
> value
> > not sensitive to some renaming or moving between schemas.
> >
> > I am able to separate a name from context, but it can be harder to write
> > this separation really robustly. It can be very easy to enhance the GET
> > DIAGNOSTICS statement to return the oid of currently executed function.
> >
> > Do you think it can be useful feature?
>
> +1, it would have been quite handy in a few of my projects.
>

it can looks like that

create or replace function foo(a int)
returns int as $$
declare s text; n text; o oid;
begin
  get diagnostics s = pg_current_routine_signature,
  n = pg_current_routine_name,
  o = pg_current_routine_oid;
  raise notice 'sign:%,  name:%,  oid:%', s, n, o;
  return a;
end;
$$ language plpgsql;
CREATE FUNCTION
(2023-02-08 09:04:03) postgres=# select foo(10);
NOTICE:  sign:foo(integer),  name:foo,  oid:16392
┌─┐
│ foo │
╞═╡
│  10 │
└─┘
(1 row)

The name - pg_routine_oid can be confusing, because there is not clean if
it is oid of currently executed routine or routine from top of exception

Regards

Pavel
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 70a002a0f6..908022a88a 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2475,6 +2475,28 @@ exec_stmt_getdiag(PLpgSQL_execstate *estate, PLpgSQL_stmt_getdiag *stmt)
 }
 break;
 
+			case PLPGSQL_GETDIAG_CURRENT_ROUTINE_NAME:
+{
+	char   *funcname;
+
+	funcname = get_func_name(estate->func->fn_oid);
+	exec_assign_c_string(estate, var, funcname);
+	if (funcname)
+		pfree(funcname);
+}
+break;
+
+			case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID:
+exec_assign_value(estate, var,
+  ObjectIdGetDatum(estate->func->fn_oid),
+  false, OIDOID, -1);
+  break;
+
+			case PLPGSQL_GETDIAG_CURRENT_ROUTINE_SIGNATURE:
+exec_assign_c_string(estate, var,
+	 estate->func->fn_signature);
+break;
+
 			default:
 elog(ERROR, "unrecognized diagnostic item kind: %d",
 	 diag_item->kind);
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index 5a6eadccd5..bdf02f36cc 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -325,6 +325,12 @@ plpgsql_getdiag_kindname(PLpgSQL_getdiag_kind kind)
 			return "TABLE_NAME";
 		case PLPGSQL_GETDIAG_SCHEMA_NAME:
 			return "SCHEMA_NAME";
+		case PLPGSQL_GETDIAG_CURRENT_ROUTINE_NAME:
+			return "PG_CURRENT_ROUTINE_NAME";
+		case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID:
+			return "PG_CURRENT_ROUTINE_OID";
+		case PLPGSQL_GETDIAG_CURRENT_ROUTINE_SIGNATURE:
+			return "PG_CURRENT_ROUTINE_SIGNATURE";
 	}
 
 	return "unknown";
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index edeb72c380..9ea21fec3d 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -318,6 +318,9 @@ static	void			check_raise_parameters(PLpgSQL_stmt_raise *stmt);
 %token 	K_OR
 %token 	K_PERFORM
 %token 	K_PG_CONTEXT
+%token 	K_PG_CURRENT_ROUTINE_NAME
+%token 	K_PG_CURRENT_ROUTINE_OID
+%token 	K_PG_CURRENT_ROUTINE_SIGNATURE
 %token 	K_PG_DATATYPE_NAME
 %token 	K_PG_EXCEPTION_CONTEXT
 %token 	K_PG_EXCEPTION_DETAIL
@@ -1035,6 +1038,9 @@ stmt_getdiag	: K_GET getdiag_area_opt K_DIAGNOSTICS getdiag_list ';'
 	break;
 /* these fields are allowed in either case */
 case PLPGSQL_GETDIAG_CONTEXT:
+case PLPGSQL_GETDIAG_CURRENT_ROUTINE_NAME:
+case PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID:
+case PLPGSQL_GETDIAG_CURRENT_ROUTINE_SIGNATURE:
 	break;
 default:
 	elog(ERROR, "unrecognized diagnostic item kind: %d",
@@ -1123,6 +1129,15 @@ getdiag_item :
 		else if (tok_is_keyword(tok, &yylval,
 K_RETURNED_SQLSTATE, "returned_sqlstate"))
 			$$ = PLPGSQL_GETDIAG_RETURNED_SQLSTATE;
+		else if (tok_is_keyword(tok, &yylval,
+K_PG_CURRENT_ROUTINE_NAME, "pg_current_routine_name"))
+			$$ = PLPGSQL_GETDIAG_CURRENT_ROUTINE_NAME;
+		else if (tok_is_keyword(tok, &yylval,
+K_PG_CURRENT_ROUTINE_OID, "pg_current_routine_oid"))
+			$$ = PLPGSQL_GETDIAG_CURRENT_ROUTINE_OID;
+		else if (tok_is_keyword(tok, &yylval,
+K_PG_CURRENT_ROUTINE_SIGNATURE, "pg_current_routine_signature"))
+			$$ = PLPGSQL_GETDIAG_CURRENT_ROUTINE_SIGNATURE;
 		else
 			yyerror("unrecognized GET DIAGNOSTICS item");
 	}
@@ -2523,6 +2538,9 @@ unreserved_keyword	:
 | K_OPEN
 

Re: bitscan forward/reverse on Windows

2023-02-08 Thread John Naylor
I wrote:
> Attached is a quick-and-dirty attempt to add MSVC support for the
rightmost/leftmost-one-pos functions.
>
> 0001 adds asserts to the existing coding.
> 0002 adds MSVC support. Tests pass on CI, but it's of course possible
that there is some bug that prevents hitting the fastpath. I've mostly used
the platform specific types, so some further cleanup might be needed.

I've cleaned these up and verified on godbolt.org that they work as
intended and still pass CI. I kept the Windows types as does other Winows
code in the tree, but used bool instead of unsigned char because it's used
like a boolean.

0003 is separate because I'm not quite sure how detailed to comment the
#ifdef maze. Could be left out.
0004 simplifies AllocSetFreeIndex() in the course of supporting MSVC. The
output is identical to HEAD in non-assert builds using gcc.

0002 through 0004 could be squashed together.

This plugs a hole in our platform-specific intrinsic support and is fairly
straightforward. Review welcome, but if there is none I intend to commit in
a week or two.

--
John Naylor
EDB: http://www.enterprisedb.com
From eb2e7f6aad9447010f34ce8d7230be754c61f6fb Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 8 Feb 2023 12:34:13 +0700
Subject: [PATCH v4 3/4] Add more comments to #else directives

WIP: squash or leave out
---
 src/include/port/pg_bitutils.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 9150789aaf..04f2fe8d3e 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -18,7 +18,7 @@
 #define HAVE_BITSCAN_FORWARD
 #define HAVE_BITSCAN_REVERSE
 
-#else
+#else			/* ! _MSC_VER */
 #if defined(HAVE__BUILTIN_CTZ)
 #define HAVE_BITSCAN_FORWARD
 #endif
@@ -70,7 +70,7 @@ pg_leftmost_one_pos32(uint32 word)
 	Assert(bitscan_result == result);
 	return bitscan_result;
 
-#else
+#else			/* ! HAVE_BITSCAN_REVERSE */
 	return result;
 #endif			/* HAVE_BITSCAN_REVERSE */
 }
@@ -119,7 +119,7 @@ pg_leftmost_one_pos64(uint64 word)
 	Assert(bitscan_result == result);
 	return bitscan_result;
 
-#else
+#else			/* ! HAVE_BITSCAN_REVERSE */
 	return result;
 #endif			/* HAVE_BITSCAN_REVERSE */
 }
@@ -165,7 +165,7 @@ pg_rightmost_one_pos32(uint32 word)
 	Assert(bitscan_result == result);
 	return bitscan_result;
 
-#else
+#else			/* ! HAVE_BITSCAN_FORWARD */
 	return result;
 #endif			/* HAVE_BITSCAN_FORWARD */
 }
@@ -217,7 +217,7 @@ pg_rightmost_one_pos64(uint64 word)
 	Assert(bitscan_result == result);
 	return bitscan_result;
 
-#else
+#else			/* ! HAVE_BITSCAN_FORWARD */
 	return result;
 #endif			/* HAVE_BITSCAN_FORWARD */
 }
-- 
2.39.1

From 4a84bb6980fa05ae3bde84179b8427799d6ee538 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 3 Feb 2023 16:54:39 +0700
Subject: [PATCH v4 4/4] Rationalize platform support in AllocSetFreeIndex

Previously, we directly tested for HAVE__BUILTIN_CLZ and copied the
internals of pg_leftmost_one_pos32(), with a special fallback that does
less work than the general fallback for that function. In the wake of
, we have MSVC support for bitscan intrinsics, and a more
general way to test for them, so just call pg_leftmost_one_pos32()
directly.

On gcc at least, there is no difference in the binary, showing that
compilers can do constant folding across the boundary of an inlined
function.
---
 src/backend/utils/mmgr/aset.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 740729b5d0..d12977b7b5 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -289,7 +289,7 @@ AllocSetFreeIndex(Size size)
 		 * or equivalently
 		 *		pg_leftmost_one_pos32(size - 1) - ALLOC_MINBITS + 1
 		 *
-		 * However, rather than just calling that function, we duplicate the
+		 * However, for platforms without intrinsic support, we duplicate the
 		 * logic here, allowing an additional optimization.  It's reasonable
 		 * to assume that ALLOC_CHUNK_LIMIT fits in 16 bits, so we can unroll
 		 * the byte-at-a-time loop in pg_leftmost_one_pos32 and just handle
@@ -299,8 +299,8 @@ AllocSetFreeIndex(Size size)
 		 * much trouble.
 		 *--
 		 */
-#ifdef HAVE__BUILTIN_CLZ
-		idx = 31 - __builtin_clz((uint32) size - 1) - ALLOC_MINBITS + 1;
+#ifdef HAVE_BITSCAN_REVERSE
+		idx = pg_leftmost_one_pos32(size - 1) - ALLOC_MINBITS + 1;
 #else
 		uint32		t,
 	tsize;
-- 
2.39.1

From 3b75d2b05a1ed02525b86152168151300a827613 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 8 Feb 2023 12:05:58 +0700
Subject: [PATCH v4 2/4] Add MSVC support for bitscan reverse/forward

---
 src/include/port/pg_bitutils.h | 75 ++
 1 file changed, 67 insertions(+), 8 deletions(-)

diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index a5df4f1a35..9150789aaf 100644
--- a/src/include/port/pg_bitutils.h
++

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-08 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! PSA new version.

> ==
> doc/src/sgml/glossary.sgml
> 
> 1.
> + 
> +  Replication setup that applies time-delayed copy of the data.
> +
> 
> That sentence seemed a bit strange to me.
> 
> SUGGESTION
> Replication setup that delays the application of changes by a
> specified minimum time-delay period.

Fixed.

> ==
> 
> src/backend/replication/logical/worker.c
> 
> 2. maybe_apply_delay
> 
> + if (wal_receiver_status_interval > 0 &&
> + diffms > wal_receiver_status_interval * 1000L)
> + {
> + WaitLatch(MyLatch,
> +   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> +   wal_receiver_status_interval * 1000L,
> +   WAIT_EVENT_RECOVERY_APPLY_DELAY);
> + send_feedback(last_received, true, false, true);
> + }
> 
> I felt that introducing another variable like:
> 
> long statusinterval_ms = wal_receiver_status_interval * 1000L;
> 
> would help here by doing 2 things:
> 1) The condition would be easier to read because the ms units would be the 
> same
> 2) Won't need * 1000L repeated in two places.
> 
> Only, do take care to assign this variable in the right place in this
> loop in case the configuration is changed.

Fixed. Calculations are done on two lines - first one is the entrance of the 
loop,
and second one is the after SIGHUP is detected.

> ==
> src/test/subscription/t/001_rep_changes.pl
> 
> 3.
> +# Test time-delayed logical replication
> +#
> +# If the subscription sets min_apply_delay parameter, the logical replication
> +# worker will delay the transaction apply for min_apply_delay milliseconds. 
> We
> +# look the time duration between tuples are inserted on publisher and then
> +# changes are replicated on subscriber.
> 
> This comment and the other one appearing later in this test are both
> explaining the same test strategy. I think both comments should be
> combined into one big one up-front, like this:
> 
> SUGGESTION
> If the subscription sets min_apply_delay parameter, the logical
> replication worker will delay the transaction apply for
> min_apply_delay milliseconds. We verify this by looking at the time
> difference between a) when tuples are inserted on the publisher, and
> b) when those changes are replicated on the subscriber. Even on slow
> machines, this strategy will give predictable behavior.

Changed.

> 4.
> +my $delay = 3;
> +
> +# Set min_apply_delay parameter to 3 seconds
> +$node_subscriber->safe_psql('postgres',
> + "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay =
> '${delay}s')");
> 
> IMO that "my $delay = 3;" assignment should be *after* the comment:
> 
> e.g.
> +
> +# Set min_apply_delay parameter to 3 seconds
> +my $delay = 3;
> +$node_subscriber->safe_psql('postgres',
> + "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay =
> '${delay}s')");

Right, changed.

> 5.
> +# Make new content on publisher and check its presence in subscriber
> depending
> +# on the delay applied above. Before doing the insertion, get the
> +# current timestamp that will be used as a comparison base. Even on slow
> +# machines, this allows to have a predictable behavior when comparing the
> +# delay between data insertion moment on publisher and replay time on
> subscriber.
> 
> Most of this comment is now redundant because this was already
> explained in the big comment up-front (see #3). Only one useful
> sentence is left.
> 
> SUGGESTION
> Before doing the insertion, get the current timestamp that will be
> used as a comparison base.

Removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v32-0001-Time-delayed-logical-replication-subscriber.patch
Description:  v32-0001-Time-delayed-logical-replication-subscriber.patch


Re: daitch_mokotoff module

2023-02-08 Thread Alvaro Herrera
On 2023-Jan-17, Dag Lem wrote:

> + * Daitch-Mokotoff Soundex
> + *
> + * Copyright (c) 2021 Finance Norway
> + * Author: Dag Lem 

Hmm, I don't think we accept copyright lines that aren't "PostgreSQL
Global Development Group".  Is it okay to use that, and update the year
to 2023?  (Note that answering "no" very likely means your patch is not
candidate for inclusion.)  Also, we tend not to have "Author:" lines.

> + * Permission to use, copy, modify, and distribute this software and its
> + * documentation for any purpose, without fee, and without a written 
> agreement
> + * is hereby granted, provided that the above copyright notice and this
> + * paragraph and the following two paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
> + * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
> + * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
> + * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
> + * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
> + * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

We don't keep a separate copyright statement in the file; rather we
assume that all files are under the PostgreSQL license, which is in the
COPYRIGHT file at the top of the tree.  Changing it thus has the side
effect that these disclaim notes refer to the University of California
rather than "the Author".  IANAL.


I think we should add SPDX markers to all the files we distribute:
/* SPDX-License-Identifier: PostgreSQL */

https://spdx.dev/ids/
https://spdx.org/licenses/PostgreSQL.html

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)




Re: Add sub-transaction overflow status in pg_stat_activity

2023-02-08 Thread Kirill Reshke
On Tue, 20 Dec 2022 at 09:23, Dilip Kumar  wrote:
>
> On Tue, Dec 20, 2022 at 2:32 AM Robert Haas  wrote:
> >
> > On Mon, Dec 19, 2022 at 3:48 PM Ted Yu  wrote:
> > > It seems the comment for `backend_subxact_overflowed` missed a word.
> > >
> > > Please see the patch.
> >
> > Committed this fix, thanks.
>
> Thanks, Robert!
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>
>


Hi hackers!

Nice patch, seems it may be useful in cases like alerting that subxid
overflow happened is database or whatever.
But I'm curious, what is the following work on this? I think it may be
way more helpful to, for example, log queries, causing sub-tx
overflow,
or even kill the backend, causing sub-tx overflow with GUC variables,
setting server behaviour.
For example, in Greenplum there is gp_subtransaction_overflow
extension and GUC for simply logging problematic queries[1]. Can we
have something
similar in PostgreSQL on the server-side?

[1] 
https://github.com/greenplum-db/gpdb/blob/6X_STABLE/gpcontrib/gp_subtransaction_overflow/gp_subtransaction_overflow.c#L42




Re: Deadlock between logrep apply worker and tablesync worker

2023-02-08 Thread Peter Smith
On Tue, Feb 7, 2023 at 6:46 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, February 7, 2023 12:12 PM Peter Smith  
> wrote:
> > On Fri, Feb 3, 2023 at 6:58 PM houzj.f...@fujitsu.com 
> > 
> > wrote:
> > >
> > ...
> > > > Right, I think that case could be addressed by Tom's patch to some
> > > > extent but I am thinking we should also try to analyze if we can
> > > > completely avoid the need to remove origins from both processes. One
> > > > idea could be to introduce another relstate something like
> > > > PRE_SYNCDONE and set it in a separate transaction before we set the
> > > > state as SYNCDONE and remove the slot and origin in tablesync worker.
> > > > Now, if the tablesync worker errors out due to some reason during
> > > > the second transaction, it can remove the slot and origin after restart 
> > > > by
> > checking the state.
> > > > However, it would add another relstate which may not be the best way
> > > > to address this problem. Anyway, that can be accomplished as a separate
> > patch.
> > >
> > > Here is an attempt to achieve the same.
> > > Basically, the patch removes the code that drop the origin in apply
> > > worker. And add a new state PRE_SYNCDONE after synchronization
> > > finished in front of apply (sublsn set), but before dropping the
> > > origin and other final cleanups. The tablesync will restart and redo
> > > the cleanup if it failed after reaching the new state. Besides, since
> > > the changes can already be applied on the table in PRE_SYNCDONE state,
> > > so I also modified the check in should_apply_changes_for_rel(). And
> > > some other conditions for the origin drop in subscription commands are
> > were adjusted in this patch.
> > >
> >
> > Here are some review comments for the 0001 patch
> >
> > ==
> > General Comment
> >
> > 0.
> > The idea of using the extra relstate for clean-up seems OK, but the
> > implementation of the new state in this patch appears misordered and
> > misnamed to me.
> >
> > The state name should indicate what it is doing (PRE_SYNCDONE is
> > meaningless). The patch describes in several places that this state means
> > "synchronized, but not yet cleaned up" therefore IMO it means the SYNCDONE
> > state should be *before* this new state. And since this new state is for
> > "cleanup" then let's call it something like that.
> >
> > To summarize, I don’t think the meaning of SYNCDONE should be touched.
> > SYNCDONE means the synchronization is done, same as before. And your new
> > "cleanup" state belongs directly *after* that. IMO it should be like this:
> >
> > 1. STATE_INIT
> > 2. STATE_DATASYNC
> > 3. STATE_FINISHEDCOPY
> > 4. STATE_SYNCDONE
> > 5. STATE_CLEANUP <-- new relstate
> > 6. STATE_READY
> >
> > Of course, this is going to impact almost every aspect of the patch, but I 
> > think
> > everything will be basically the same as you have it now
> > -- only all the state names and comments need to be adjusted according to 
> > the
> > above.
>
> Although I agree the CLEANUP is easier to understand, but I am a bit concerned
> that the changes would be a bit invasive.
>
> If we add a CLEANUP state at the end as suggested, it will change the meaning
> of the existing SYNCDONE state, before the change it means both data sync and
> cleanup have been done, but after the change it only mean the data sync is
> over. This also means all the current C codes that considered the SYNCDONE as
> the final state of table sync will need to be changed. Moreover, it's common
> for user to query the relation state from pg_subscription_rel to identify if
> the table sync of a table is finished(e.g. check relstate IN ('r', 's')), but
> if we add a new state(CLEANUP) as the final state, then all these SQLs would
> need to be changed as they need to check like relstate IN ('r', 'x'(new 
> cleanup
> state)).
>

IIUC, you are saying that we still want to keep the SYNCDONE state as
the last state before READY mainly because otherwise there is too much
impact on user/test SQL that is currently checking those ('s','r')
states.

OTOH, in the current 001 patch you had the SUBREL_STATE_PRE_SYNCDONE
meaning "synchronized but not yet cleaned up" (that's verbatim from
your PGDOCS). And there is C code where you are checking
SUBREL_STATE_PRE_SYNCDONE and essentially giving the state before the
SYNCDONE an equal status to the SYNCDONE (e.g.
should_apply_changes_for_rel seemed to be doing this).

It seems to be trying to have an each-way bet...

~~~

But I think there may be an easy way out of this problem:

Current HEAD
1. STATE_INIT 'i'
2. STATE_DATASYNC 'd'
3. STATE_FINISHEDCOPY 'f'
4. STATE_SYNCDONE 's'
5. STATE_READY 'r'

The patch 0001
1. STATE_INIT 'i'
2. STATE_DATASYNC 'd'
3. STATE_FINISHEDCOPY 'f'
4. STATE_PRESYNCDONE 'p' <-- new relstate
4. STATE_SYNCDONE 's'
5. STATE_READY 'r'

My previous suggestion (which you acknowledge is easier to understand,
but might cause hassles for existing SQL)
1. STATE_INIT 'i'
2. STATE_DATASYNC 'd'
3. STATE_FINISHEDCOPY 'f'
4. S

A bug with ExecCheckPermissions

2023-02-08 Thread o . tselebrovskiy

Greetings, everyone!

While working on an extension my colleague and I have found an 
interesting case;


When you try to execute next SQL statements on master branch of 
PostgreSQL:


CREATE TABLE parted_fk_naming (
id bigint NOT NULL default 1,
id_abc bigint,
CONSTRAINT dummy_constr FOREIGN KEY (id_abc)
REFERENCES parted_fk_naming (id),
PRIMARY KEY (id)
)
PARTITION BY LIST (id);

CREATE TABLE parted_fk_naming_1 (
id bigint NOT NULL default 1,
id_abc bigint,
PRIMARY KEY (id),
CONSTRAINT dummy_constr CHECK (true)
);

ALTER TABLE parted_fk_naming ATTACH PARTITION parted_fk_naming_1 FOR 
VALUES IN ('1');


seemingly nothing suspicious happens.
But if you debug function ExecCheckPermissions and look into what is 
passed to function (contents of rangeTable and rteperminfos to be 
exact),

you'll see some strange behaviour:

(
   {RANGETBLENTRY
   :alias <>
   :eref <>
   :rtekind 0
   :relid 16395
   :relkind r
   :rellockmode 1
   :tablesample <>
   :perminfoindex 0
   :lateral false
   :inh false
   :inFromCl false
   :securityQuals <>
   }
   {RANGETBLENTRY
   :alias <>
   :eref <>
   :rtekind 0
   :relid 16384
   :relkind p
   :rellockmode 1
   :tablesample <>
   :perminfoindex 0
   :lateral false
   :inh false
   :inFromCl false
   :securityQuals <>
   }
)

(
   {RTEPERMISSIONINFO
   :relid 16395
   :inh false
   :requiredPerms 2
   :checkAsUser 0
   :selectedCols (b 9)
   :insertedCols (b)
   :updatedCols (b)
   }
   {RTEPERMISSIONINFO
   :relid 16384
   :inh false
   :requiredPerms 2
   :checkAsUser 0
   :selectedCols (b 8)
   :insertedCols (b)
   :updatedCols (b)
   }
)

Both of RangeTableEntries have a perminfoindex of 0 and simultaneously 
have a RTEPERMISSIONINFO entry for them!


Right now this behaviour isn't affecting anything, but in future should 
someone want to use ExecutorCheckPerms_hook from 
/src/backend/executor/execMain.c, its input parameters
won't correspond to each other since members of rangeTable will have 
incorrect perminfoindex.


To fix this, we're setting fk's index to 1 and pk's index to 2 in 
/src/backend/utils/adt/ri_triggers.c so that list being passed to 
ExecCheckPermissions and its hook
has indexes for corresponding rteperminfos entries. 1 and 2 are chosen 
because perminfoindex is 1-based and fk is passed to list_make2 first;


We are eager to hear some thoughts from the community!

Regards,

Oleg Tselebrovskiidiff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 29f29d664b6..7666886742a 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1399,6 +1399,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	pkrte->relid = RelationGetRelid(pk_rel);
 	pkrte->relkind = pk_rel->rd_rel->relkind;
 	pkrte->rellockmode = AccessShareLock;
+	pkrte->perminfoindex = 2;
 
 	pk_perminfo = makeNode(RTEPermissionInfo);
 	pk_perminfo->relid = RelationGetRelid(pk_rel);
@@ -1409,6 +1410,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	fkrte->relid = RelationGetRelid(fk_rel);
 	fkrte->relkind = fk_rel->rd_rel->relkind;
 	fkrte->rellockmode = AccessShareLock;
+	fkrte->perminfoindex = 1;
 
 	fk_perminfo = makeNode(RTEPermissionInfo);
 	fk_perminfo->relid = RelationGetRelid(fk_rel);


Why cann't simplify stable function in planning phase?

2023-02-08 Thread tender wang
Hi hackers,
   In evaluate_function(), I find codes as shown below:

 /*
  * Ordinarily we are only allowed to simplify immutable functions. But for
  * purposes of estimation, we consider it okay to simplify functions that
  * are merely stable; the risk that the result might change from planning
  * time to execution time is worth taking in preference to not being able
   * to estimate the value at all.
   */
if (funcform->provolatile == PROVOLATILE_IMMUTABLE)
/* okay */ ;
else if (context->estimate && funcform->provolatile == PROVOLATILE_STABLE)
 /* okay */ ;
else
return NULL;

The codes say that stable function can not be simplified here(e.g. planning
phase).
I want to know the reason why stable function can not be simplified in
planning phase.
Maybe show me a example that it will be incorrect for  a query if simplify
stable function in
planning phases.

With kindest regards, tender wang


PostgreSQL 16 Dev apt-based Linux unable to install

2023-02-08 Thread André Verwijs


PostgreSQL 16 Dev  apt-based Linux,  unable to install  
make sure all dependencies are resolved, like libpq5 (or higher) for 
testing ...


" postgresql-client-16 : Prerequisites: libpq5 (>= 16~~devel) but 
15.1-1.pgdg+1+b1 will be installed "



--

__
My Twitter Page:
twitter.com OpenSimFan 

My Instagram page:
instagram.com dutchglory 

My Facebook page (be my friend, please)
facebook.com André Verwijs 



RE: Exit walsender before confirming remote flush in logical replication

2023-02-08 Thread Hayato Kuroda (Fujitsu)
> 
> Dear Horiguchi-san,
> 
> Thank you for checking the patch! PSA new version.

PSA rebased patch that supports updated time-delayed patch[1].

[1]: 
https://www.postgresql.org/message-id/tyapr01mb5866c11daf8ab04f3cc181d3f5...@tyapr01mb5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v5-0001-Time-delayed-logical-replication-subscriber.patch
Description: v5-0001-Time-delayed-logical-replication-subscriber.patch


v5-0002-Extend-START_REPLICATION-command-to-accept-walsen.patch
Description:  v5-0002-Extend-START_REPLICATION-command-to-accept-walsen.patch


Re: pglz compression performance, take two

2023-02-08 Thread Tomas Vondra



On 2/7/23 21:18, Andres Freund wrote:
> Hi,
> 
> On 2023-02-05 10:36:39 -0800, Andrey Borodin wrote:
>> On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin  wrote:
>>>
>>> Hello! Please find attached v8.
>>
>> I got some interesting feedback from some patch users.
>> There was an oversight that frequently yielded results that are 1,2 or
>> 3 bytes longer than expected.
>> Looking closer I found that the correctness of the last 3-byte tail is
>> checked in two places. PFA fix for this. Previously compressed data
>> was correct, however in some cases few bytes longer than the result of
>> current pglz implementation.
> 
> This version fails on cfbot, due to address sanitizer:
> 
> https://cirrus-ci.com/task/4921632586727424
> https://api.cirrus-ci.com/v1/artifact/task/4921632586727424/log/src/test/regress/log/initdb.log
> 
> 
> performing post-bootstrap initialization ... 
> =
> ==15991==ERROR: AddressSanitizer: heap-buffer-overflow on address 
> 0x61e02ee0 at pc 0x558e1b847b16 bp 0x7ffd35782f70 sp 0x7ffd35782f68
> READ of size 1 at 0x61e02ee0 thread T0
> #0 0x558e1b847b15 in pglz_hist_add 
> /tmp/cirrus-ci-build/src/common/pg_lzcompress.c:310
> #1 0x558e1b847b15 in pglz_compress 
> /tmp/cirrus-ci-build/src/common/pg_lzcompress.c:680
> #2 0x558e1aa86ef0 in pglz_compress_datum 
> /tmp/cirrus-ci-build/src/backend/access/common/toast_compression.c:65
> #3 0x558e1aa87af2 in toast_compress_datum 
> /tmp/cirrus-ci-build/src/backend/access/common/toast_internals.c:68
> #4 0x558e1ac22989 in toast_tuple_try_compression 
> /tmp/cirrus-ci-build/src/backend/access/table/toast_helper.c:234
> #5 0x558e1ab6af24 in heap_toast_insert_or_update 
> /tmp/cirrus-ci-build/src/backend/access/heap/heaptoast.c:197
> #6 0x558e1ab4a2a6 in heap_update 
> /tmp/cirrus-ci-build/src/backend/access/heap/heapam.c:3533
> ...
> 

Yeah, and valgrind seems to hit the same issue (it's not labeled as
buffer overflow, but it seems to be exactly the same place):

==380682== Invalid read of size 1
==380682==at 0xBCEAAB: pglz_hist_add (pg_lzcompress.c:310)
==380682==by 0xBCF130: pglz_compress (pg_lzcompress.c:670)
==380682==by 0x4A911F: pglz_compress_datum (toast_compression.c:65)
==380682==by 0x4A97E2: toast_compress_datum (toast_internals.c:68)
==380682==by 0x54CCA4: toast_tuple_try_compression (toast_helper.c:234)
==380682==by 0x4FFC33: heap_toast_insert_or_update (heaptoast.c:197)
==380682==by 0x4ED498: heap_update (heapam.c:3624)
==380682==by 0x4EE023: simple_heap_update (heapam.c:4060)
==380682==by 0x5B1B2B: CatalogTupleUpdateWithInfo (indexing.c:329)
==380682==by 0x65C3AB: update_attstats (analyze.c:1741)
==380682==by 0x65A054: do_analyze_rel (analyze.c:602)
==380682==by 0x659405: analyze_rel (analyze.c:261)
==380682==by 0x70A162: vacuum (vacuum.c:523)
==380682==by 0x8DF8F7: autovacuum_do_vac_analyze (autovacuum.c:3155)
==380682==by 0x8DE74A: do_autovacuum (autovacuum.c:2473)
==380682==by 0x8DD49E: AutoVacWorkerMain (autovacuum.c:1716)
==380682==by 0x8DD097: StartAutoVacWorker (autovacuum.c:1494)
==380682==by 0x8EA5B2: StartAutovacuumWorker (postmaster.c:5481)
==380682==by 0x8EA10A: process_pm_pmsignal (postmaster.c:5192)
==380682==by 0x8E6121: ServerLoop (postmaster.c:1770)
==380682==  Address 0xe722c78 is 103,368 bytes inside a recently
re-allocated block of size 131,072 alloc'd
==380682==at 0x48457AB: malloc (vg_replace_malloc.c:393)
==380682==by 0xB95423: AllocSetAlloc (aset.c:929)
==380682==by 0xBA2B6C: palloc (mcxt.c:1224)
==380682==by 0x4A0962: heap_copytuple (heaptuple.c:687)
==380682==by 0x73A2BB: tts_buffer_heap_copy_heap_tuple
(execTuples.c:842)
==380682==by 0x658E42: ExecCopySlotHeapTuple (tuptable.h:464)
==380682==by 0x65B288: acquire_sample_rows (analyze.c:1261)
==380682==by 0x659E42: do_analyze_rel (analyze.c:536)
==380682==by 0x659405: analyze_rel (analyze.c:261)
==380682==by 0x70A162: vacuum (vacuum.c:523)
==380682==by 0x8DF8F7: autovacuum_do_vac_analyze (autovacuum.c:3155)
==380682==by 0x8DE74A: do_autovacuum (autovacuum.c:2473)
==380682==by 0x8DD49E: AutoVacWorkerMain (autovacuum.c:1716)
==380682==by 0x8DD097: StartAutoVacWorker (autovacuum.c:1494)
==380682==by 0x8EA5B2: StartAutovacuumWorker (postmaster.c:5481)
==380682==by 0x8EA10A: process_pm_pmsignal (postmaster.c:5192)
==380682==by 0x8E6121: ServerLoop (postmaster.c:1770)
==380682==by 0x8E5B54: PostmasterMain (postmaster.c:1463)
==380682==by 0x7A806C: main (main.c:200)
==380682==

The place allocating the buffer changes over time, but the first part
(invalid read) seems to be exactly the same.

FWIW I did run previous versions using valgrind, so this gotta be due
some recent change.

> 
> Independent of this failure, I'm worried about the cost/benefit analysis of a
> pglz change that changes this much at once. It's quite 

Re: Why cann't simplify stable function in planning phase?

2023-02-08 Thread Laurenz Albe
On Wed, 2023-02-08 at 16:59 +0800, tender wang wrote:
>    In evaluate_function(), I find codes as shown below:
> 
>  /*
>   * Ordinarily we are only allowed to simplify immutable functions. But for
>   * purposes of estimation, we consider it okay to simplify functions that
>   * are merely stable; the risk that the result might change from planning
>   * time to execution time is worth taking in preference to not being able
>    * to estimate the value at all.
>    */
>  if (funcform->provolatile == PROVOLATILE_IMMUTABLE)
>     /* okay */ ;
>  else if (context->estimate && funcform->provolatile == PROVOLATILE_STABLE)
>      /* okay */ ;
>  else
>     return NULL;
> 
> The codes say that stable function can not be simplified here(e.g. planning 
> phase). 
> I want to know the reason why stable function can not be simplified in 
> planning phase.
> Maybe show me a example that it will be incorrect for  a query if simplify 
> stable function in 
> planning phases.

Query planning and query execution can happen at different times and using
different snapshots, so the result of a stable function can change in the
meantime.  Think of prepared statements using a generic plan.

Yours,
Laurenz Albe




Re: Why cann't simplify stable function in planning phase?

2023-02-08 Thread Tomas Vondra



On 2/8/23 09:59, tender wang wrote:
> Hi hackers,
>    In evaluate_function(), I find codes as shown below:
> 
>  /*
>   * Ordinarily we are only allowed to simplify immutable functions. But for
>   * purposes of estimation, we consider it okay to simplify functions that
>   * are merely stable; the risk that the result might change from planning
>   * time to execution time is worth taking in preference to not being able
>    * to estimate the value at all.
>    */
> if (funcform->provolatile == PROVOLATILE_IMMUTABLE)
>     /* okay */ ;
> else if (context->estimate && funcform->provolatile == PROVOLATILE_STABLE)
>      /* okay */ ;
> else
>     return NULL;
> 
> The codes say that stable function can not be simplified here(e.g.
> planning phase). 
> I want to know the reason why stable function can not be simplified in
> planning phase.
> Maybe show me a example that it will be incorrect for  a query if
> simplify stable function in 
> planning phases.
> 

A function is "stable" only within a particular execution - if you run a
query with a stable function twice, the function is allowed to return
different results.

If you consider parse analysis / planning as a separate query, this
explains why we can't simply evaluate the function in parse analysis and
then use the value in actual execution. See analyze_requires_snapshot()
references in postgres.c.

Note: To be precise this is not about "executions" but about snapshots,
and we could probably simplify the function call with isolation levels
that maintain a single snapshot (e.g. REPEATABLE READ). But we don't.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: make_ctags: use -I option to ignore pg_node_attr macro

2023-02-08 Thread Fujii Masao




On 2023/02/08 9:49, Tatsuo Ishii wrote:

I am not sure if this is good way to check if ctags supports "-e" or not.

+   thenctags --version 2>&1 | grep -- -e >/dev/null

Perhaps, "--help" might be intended rather than "--version" to check
supported options?


Yeah, that was my mistake.


  Even so, ctags would have other option whose name contains
"-e" than Emacs support, so this check could end in a wrong result.  Therefore,
it seems to me that it is better to check immediately if etags is available
in case that we don't have Exuberant-type ctags.


That makes sense.


Attached is the v2 patch.


Thanks for the patch!

With the patch, I got the following error when executing make_etags..

$ ./src/tools/make_etags
etags: invalid option -- 'e'
Try 'etags --help' for a complete list of options.
sort: No such file or directory


+   if [ $? != 0 -a -z "$ETAGS_EXISTS" ]
+   thenecho "'ctags' does not support emacs mode and etags does not 
exist" 1>&2; exit 1
+   fi

This code can be reached after "rm -f ./$TAGS_FILE" is executed in make_ctags.
But we should check whether the required program has been already installed
and exit immediately if not, before modifying anything?


This is the comment for the commit d1e2a380cb. I found that make_etags with
an invalid option reported the following usage message mentioning make_ctags
(not make_etags). Isn't this confusing?

$ ./src/tools/make_etags -a
Usage: /.../make_ctags [-e][-n]


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: meson: Optionally disable installation of test modules

2023-02-08 Thread Peter Eisentraut

On 01.02.23 13:41, Nazir Bilal Yavuz wrote:

On 1/31/23 11:44, Peter Eisentraut wrote:

On 30.01.23 18:42, Andres Freund wrote:

Bilal, with a bit of help by me, worked on an alternative approach to
this. It's a lot more verbose in the initial change, but wouldn't 
increase the

amount of work/lines for new test modules. The main advantage is that we
wouldn't have disable the modules by default, which I think would be 
quite

likely to result in plenty people not running the tests.

Sending a link instead of attaching, in case you already registered a 
cfbot entry:

https://github.com/anarazel/postgres/commit/d1d192a860da39af9aa63d7edf643eed07c4

Probably worth adding an install-test-modules target for manual use.


Looks like a good idea.  I'm happy to proceed along that line.


I am adding a patch of an alternative approach. Also, I updated the 
commit at the link by adding regress_module, autoinc_regress and 
refint_regress to the test_install_libs.


If you feel that your patch is ready, please add it to the commit fest. 
I look forward to reviewing it.






Re: generic plans and "initial" pruning

2023-02-08 Thread Amit Langote
On Tue, Feb 7, 2023 at 23:38 Andres Freund  wrote:

> Hi,
>
> On 2023-02-03 22:01:09 +0900, Amit Langote wrote:
> > I've added a test case under src/modules/delay_execution by adding a
> > new ExecutorStart_hook that works similarly as
> > delay_execution_planner().  The test works by allowing a concurrent
> > session to drop an object being referenced in a cached plan being
> > initialized while the ExecutorStart_hook waits to get an advisory
> > lock.  The concurrent drop of the referenced object is detected during
> > ExecInitNode() and thus triggers replanning of the cached plan.
> >
> > I also fixed a bug in the ExplainExecuteQuery() while testing and some
> comments.
>
> The tests seem to frequently hang on freebsd:
>
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478


Thanks for the heads up.  I’ve noticed this one too, though couldn’t find
the testrun artifacts like I could get for some other failures (on other
cirrus machines).  Has anyone else been a similar situation?

>
> 

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: deadlock-hard flakiness

2023-02-08 Thread Thomas Munro
On Wed, Feb 8, 2023 at 2:10 PM Andres Freund  wrote:
> I'm fairly sure I've seen this failure on the buildfarm as well, but I'm too
> impatient to wait for the buildfarm database query (it really should be
> updated to use lz4 toast compression).

Failures in deadlock-hard (excluding crashes, because they make lots
of tests appear to fail bogusly) all occurred on animals that are no
longer with us:

  animal   |  last_report_time
---+-
 anole | 2022-07-05 12:31:02
 dory  | 2021-09-30 04:50:08
 fossa | 2021-10-28 01:50:29
 gharial   | 2022-07-05 22:04:23
 hyrax | 2021-05-10 15:11:53
 lousyjack | 2020-05-18 10:03:03

The failures stopped in mid '21 as far as my scraper noticed:

 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-13%2016:31:57
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017:13:44
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-06-11%2006:14:39
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-31%2006:41:25
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-23%2019:43:04
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-16%2000:36:16
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-10%2000:42:43
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-05-08%2006:34:13
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-22%2021:24:02
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fossa&dt=2021-04-08%2019:36:06
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-03-22%2013:26:03
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-03-13%2007:24:02
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-03-05%2019:39:46
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-01-08%2003:16:28
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-12-28%2011:05:15
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-11-27%2015:39:27
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2020-10-25%2023:47:21
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-09-29%2021:35:52
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-07-29%2014:34:49
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-05-15%2005:03:03
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-05-14%2013:33:03
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-05-13%2022:03:03
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-05-12%2015:03:03
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-05-11%2022:03:06
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-05-10%2022:33:02
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-01-14%2010:11:30
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-12-04%2001:38:28
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2019-11-12%2005:43:59
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-10-16%2016:43:58
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2019-07-18%2021:57:59
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-07-10%2005:59:16
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2019-07-08%2015:02:17
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-06-23%2004:17:09
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-06-12%2021:46:24
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-09%2021:58:03
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-08%2021:58:04
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-08%2005:19:17
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-07%2000:23:39
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2017-04-05%2018:58:04




Re: deadlock-hard flakiness

2023-02-08 Thread Thomas Munro
On Wed, Feb 8, 2023 at 11:34 PM Thomas Munro  wrote:
> On Wed, Feb 8, 2023 at 2:10 PM Andres Freund  wrote:
> > I'm fairly sure I've seen this failure on the buildfarm as well, but I'm too
> > impatient to wait for the buildfarm database query (it really should be
> > updated to use lz4 toast compression).
>
> Failures in deadlock-hard (excluding crashes, because they make lots
> of tests appear to fail bogusly) all occurred on animals that are no
> longer with us:

Oh, and there was this thread:

https://www.postgresql.org/message-id/flat/CA%2BhUKGJ6xtAsXFFs%2BSGcR%3DJkv0wCje_W-SUxV1%2BN451Q-5t6MA%40mail.gmail.com




Re: Performance issues with parallelism and LIMIT

2023-02-08 Thread Tomas Vondra
On 2/1/23 14:41, David Geier wrote:
> Hi hackers,
> 
> While migrating from PostgreSQL 14 to 15, we encountered the following
> performance degradation caused by commit 46846433a03dff: "shm_mq: Update
> mq_bytes_written less often", discussion in [1].
> 
> The batching can make queries with a LIMIT clause run significantly
> slower compared to PostgreSQL 14, because neither the ring buffer write
> position is updated, nor the latch to inform the leader that there's
> data available is set, before a worker's queue is 1/4th full. This can
> be seen in the number of rows produced by a parallel worker. Worst-case,
> the data set is large and all rows to answer the query appear early, but
> are not big enough to fill the queue to 1/4th (e.g. when the LIMIT and
> the tuple sizes are small). Here is an example to reproduce the problem.
> 

Yeah, this is a pretty annoying regression. We already can hit poor
behavior when matching rows are not distributed uniformly in the tables
(which is what LIMIT costing assumes), and this makes it more likely to
hit similar issues. A bit like when doing many HTTP requests makes it
more likely to hit at least one 99% outlier.

> ...
> 
> I had a quick look at the code and I started wondering if we can't
> achieve the same performance improvement without batching by e.g.:
> 
> - Only set the latch if new data is written to an empty queue.
> Otherwise, the leader should anyways keep try reading from the queues
> without waiting for the latch, so no need to set the latch again.
> 
> - Reorganize struct shm_mq. There seems to be false sharing happening
> between at least mq_ring_size and the atomics and potentially also
> between the atomics. I'm wondering if the that's not the root cause of
> the "slow atomics" observed in [1]? I'm happy to do some profiling.
> 
> Alternatively, we could always set the latch if numberTuples in
> ExecutePlan() is reasonably low. To do so, the DestReceiver's receive()
> method would only need an additional "force flush" argument.
> 

No opinion on these options, but worth a try. Alternatively, we could
try the usual doubling approach - start with a low threshold (and set
the latch frequently), and then gradually increase it up to the 1/4.

That should work both for queries expecting only few rows and those
producing a lot of data.

> 
> A slightly different but related problem is when some workers have
> already produced enough rows to answer the LIMIT query, but other
> workers are still running without producing any new rows. In that case
> the "already done" workers will stop running even though they haven't
> reached 1/4th of the queue size, because the for-loop in execMain.c
> bails out in the following condition:
> 
>     if (numberTuples && numberTuples == current_tuple_count)
>     break;
> 
> Subsequently, the leader will end the plan and then wait in the Gather
> node for all workers to shutdown. However, workers still running but not
> producing any new rows will never reach the following condition in
> execMain.c to check if they're supposed to stop (the shared memory queue
> dest receiver will return false on detached queues):
> 
>     /*
>  * If we are not able to send the tuple, we assume the
> destination
>  * has closed and no more tuples can be sent. If that's the
> case,
>  * end the loop.
>  */
>     if (!dest->receiveSlot(slot, dest))
>     break;
> 
> Reproduction steps for this problem are below. Here the worker getting
> the first table page will be done right away, but the query takes as
> long as it takes to scan all pages of the entire table.
> 

Ouch!

> ...
> 
> We would need something similar to CHECK_FOR_INTERRUPTS() which returns
> a NULL slot if a parallel worker is supposed to stop execution (we could
> e.g. check if the queue got detached). Or could we amend
> CHECK_FOR_INTERRUPTS() to just stop the worker gracefully if the queue
> got detached?
> 

That sounds reasonable, but I'm not very familiar the leader-worker
communication, so no opinion on how it should be done.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




meson: Non-feature feature options

2023-02-08 Thread Peter Eisentraut
Most meson options (meson_options.txt) that enable an external 
dependency (e.g., icu, ldap) are of type 'feature'.  Most of these have 
a default value of 'auto', which means they are pulled in automatically 
if found.  Some have a default value of 'disabled' for specific reasons 
(e.g., selinux).  This is all good.


Two options deviate from this in annoying ways:

option('ssl', type : 'combo', choices : ['none', 'openssl'],
  value : 'none',
  description: 'use LIB for SSL/TLS support (openssl)')

option('uuid', type : 'combo', choices : ['none', 'bsd', 'e2fs', 'ossp'],
  value : 'none',
  description: 'build contrib/uuid-ossp using LIB')

These were moved over from configure like that.

The problem is that these features now cannot be automatically enabled 
and behave annoyingly different from other feature options.


For the 'ssl' option, we have deprecated the --with-openssl option in 
configure and replaced it with --with-ssl, in anticipation of other SSL 
implementations.  None of that ever happened or is currently planned 
AFAICT.  So I suggest that we semi-revert this, so that we can make 
'openssl' an auto option in meson.


For the 'uuid' option, I'm not sure what the best way to address this 
would.  We could establish a search order of libraries that is used if 
no specific one is set (similar to libreadline, libedit, in a way).  So 
we'd have one option 'uuid' that is of type feature with default 'auto' 
and another option, say, 'uuid-library' of type 'combo'.


Thoughts?




Re: A bug with ExecCheckPermissions

2023-02-08 Thread Alvaro Herrera
On 2023-Feb-08, o.tselebrovs...@postgrespro.ru wrote:

> But if you debug function ExecCheckPermissions and look into what is passed
> to function (contents of rangeTable and rteperminfos to be exact),
> you'll see some strange behaviour:

> Both of RangeTableEntries have a perminfoindex of 0 and simultaneously have
> a RTEPERMISSIONINFO entry for them!

Ouch.  Yeah, that's not great.  As you say, it doesn't really affect
anything, and we know full well that these RTEs are ad-hoc
manufactured.  But as we claim that we still pass the RTEs for the
benefit of hooks, then we should at least make them match.

I think we should also patch ExecCheckPermissions to use forboth(),
scanning the RTEs as it goes over the perminfos, and make sure that the
entries are consistent.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




REASSIGN OWNED vs ALTER TABLE OWNER TO permission inconsistencies

2023-02-08 Thread Nazir Bilal Yavuz

Hi,

My colleague Adam realized that when transferring ownership, 'REASSIGN 
OWNED' command doesn't check 'CREATE privilege on the table's schema' on 
new owner but 'ALTER TABLE OWNER TO' docs state that:


To alter the owner, you must also be a direct or indirect member of the 
new owning role, and that role must have CREATE privilege on the table's 
schema. (These restrictions enforce that altering the owner doesn't do 
anything you couldn't do by dropping and recreating the table. However, 
a superuser can alter ownership of any table anyway.)


I tested that with:

# Connect as a superuser
$ psql test
test=# CREATE ROLE source_role WITH LOGIN;
CREATE ROLE
test=# CREATE ROLE target_role WITH LOGIN;
CREATE ROLE
test=# GRANT target_role to source_role;
GRANT ROLE
test=# GRANT CREATE on schema public to source_role;
GRANT

# Connect as a source_role
$ psql test -U source_role
test=> CREATE TABLE test_table();
CREATE TABLE

test=> \dt
 List of relations
 Schema |    Name    | Type  |    Owner
++---+-
 public | test_table | table | source_role
(1 row)

# Alter owner with 'ALTER TABLE OWNER TO'
test=> ALTER TABLE test_table owner to target_role;
ERROR:  permission denied for schema public

# Alter owner with 'REASSIGN OWNED'
test=> REASSIGN OWNED BY source_role to target_role;
REASSIGN OWNED

test=> \dt
 List of relations
 Schema |    Name    | Type  |    Owner
++---+-
 public | test_table | table | target_role
(1 row)

As you can see, 'ALTER TABLE OWNER TO' checked 'CREATE privilege on the 
table's schema' on target_role but 'REASSIGN OWNED' didn't check it and 
transferred ownership of the table. Is this a potential security gap or 
intentional behaviour?


Regards,
Nazir Bilal Yavuz
Microsoft





Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl

2023-02-08 Thread Nitin Jadhav
> Okay, the part to add an initialization check for GUC_NO_SHOW_ALL and
> GUC_NOT_IN_SAMPLE looked fine by me, so applied after more comment
> polishing.
>
> 0001 has been applied to clean up the existing situation.

Thanks for committing these 2 changes.


> On top of that, I have noticed an extra combination that would not
> make sense and that could be checked with the SQL queries:
> GUC_DISALLOW_IN_FILE implies GUC_NOT_IN_SAMPLE.  The opposite may not
> be true, though, as some developer GUCs are marked as
> GUC_NOT_IN_SAMPLE but they are allowed in a file.  The only exception
> to that currently is config_file.  It is just a special case whose
> value is enforced at startup and it can be passed down as an option
> switch via the postgres binary, still it seems like it would be better
> to also mark it as GUC_NOT_IN_SAMPLE?  This is done in 0002, only for
> HEAD, as that would be a new check.
>
> Remains
> 0002, that I am letting sleep to see if there's interest for it, or
> perhaps more ideas around it.

Makes sense and the patch looks good to me.

Thanks & Regards,
Nitin Jadhav

On Wed, Feb 8, 2023 at 1:29 PM Michael Paquier  wrote:
>
> On Mon, Feb 06, 2023 at 04:23:02PM +0900, Michael Paquier wrote:
> > On top of that, I have noticed an extra combination that would not
> > make sense and that could be checked with the SQL queries:
> > GUC_DISALLOW_IN_FILE implies GUC_NOT_IN_SAMPLE.  The opposite may not
> > be true, though, as some developer GUCs are marked as
> > GUC_NOT_IN_SAMPLE but they are allowed in a file.  The only exception
> > to that currently is config_file.  It is just a special case whose
> > value is enforced at startup and it can be passed down as an option
> > switch via the postgres binary, still it seems like it would be better
> > to also mark it as GUC_NOT_IN_SAMPLE?  This is done in 0002, only for
> > HEAD, as that would be a new check.
>
> 0001 has been applied to clean up the existing situation.  Remains
> 0002, that I am letting sleep to see if there's interest for it, or
> perhaps more ideas around it.
> --
> Michael




Re: A bug with ExecCheckPermissions

2023-02-08 Thread Amit Langote
On Wed, Feb 8, 2023 at 16:19 Alvaro Herrera  wrote:

> On 2023-Feb-08, o.tselebrovs...@postgrespro.ru wrote:
>
> > But if you debug function ExecCheckPermissions and look into what is
> passed
> > to function (contents of rangeTable and rteperminfos to be exact),
> > you'll see some strange behaviour:
>
> > Both of RangeTableEntries have a perminfoindex of 0 and simultaneously
> have
> > a RTEPERMISSIONINFO entry for them!
>
> Ouch.  Yeah, that's not great.  As you say, it doesn't really affect
> anything, and we know full well that these RTEs are ad-hoc
> manufactured.  But as we claim that we still pass the RTEs for the
> benefit of hooks, then we should at least make them match.


+1.   We don’t have anything in this (core) code path that would try to use
perminfoindex for these RTEs, but there might well be in the future.

I think we should also patch ExecCheckPermissions to use forboth(),
> scanning the RTEs as it goes over the perminfos, and make sure that the
> entries are consistent.


Hmm, we can’t use forboth here, because not all RTEs have the corresponding
RTEPermissionInfo, inheritance children RTEs, for example.   Also, it
doesn’t make much sense to reinstate the original loop over range table and
fetch the RTEPermissionInfo for the RTEs with non-0 perminfoindex, because
the main goal of the patch was to make ExecCheckPermissions() independent
of range table length.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: make_ctags: use -I option to ignore pg_node_attr macro

2023-02-08 Thread Tatsuo Ishii
>> Attached is the v2 patch.
> 
> Thanks for the patch!
> 
> With the patch, I got the following error when executing make_etags..
> 
> $ ./src/tools/make_etags
> etags: invalid option -- 'e'
>   Try 'etags --help' for a complete list of options.
> sort: No such file or directory

Oops. Thank you for pointing it out. BTW, just out of curiosity, do
you have etags on you Mac? Mine doesn't have etags. That's why I
missed the error.

> + if [ $? != 0 -a -z "$ETAGS_EXISTS" ]
> + then echo "'ctags' does not support emacs mode and etags does not
> exist" 1>&2; exit 1
> + fi
> 
> This code can be reached after "rm -f ./$TAGS_FILE" is executed in
> make_ctags.
> But we should check whether the required program has been already
> installed
> and exit immediately if not, before modifying anything?

Agreed.

> This is the comment for the commit d1e2a380cb. I found that make_etags
> with
> an invalid option reported the following usage message mentioning
> make_ctags
> (not make_etags). Isn't this confusing?
> 
> $ ./src/tools/make_etags -a
> Usage: /.../make_ctags [-e][-n]

That's hard to fix without some code duplication. We decided that
make_etags is not a symlink to make_ctags, rather execs make_ctags. Of
course we could let make_etags perform the same option check, but I
doubt it's worth the trouble.

Anyway, attached is the v3 patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/src/tools/make_ctags b/src/tools/make_ctags
index 102881667b..38f4d8e2d5 100755
--- a/src/tools/make_ctags
+++ b/src/tools/make_ctags
@@ -12,12 +12,15 @@ fi
 MODE=
 NO_SYMLINK=
 TAGS_FILE="tags"
+ETAGS_EXISTS=
+PROG=ctags
 
 while [ $# -gt 0 ]
 do
 	if [ $1 = "-e" ]
 	then	MODE="-e"
 		TAGS_FILE="TAGS"
+		command -v etags >/dev/null && ETAGS_EXISTS="Y"
 	elif [ $1 = "-n" ]
 	then	NO_SYMLINK="Y"
 	else
@@ -27,15 +30,33 @@ do
 	shift
 done
 
-command -v ctags >/dev/null || \
+if test -z "$MODE"
+then	(command -v ctags >/dev/null) || \
 	{ echo "'ctags' program not found" 1>&2; exit 1; }
+fi
 
 trap "ret=$?; rm -rf /tmp/$$; exit $ret" 0 1 2 3 15
-rm -f ./$TAGS_FILE
 
 IS_EXUBERANT=""
 ctags --version 2>&1 | grep Exuberant && IS_EXUBERANT="Y"
 
+# ctags is not exuberant and emacs mode is specified
+if [ -z "$IS_EXUBERANT" -a -n "$MODE" ]
+then
+	if [ -n "$ETAGS_EXISTS" ]
+	then
+		# etags exists. Use it.
+		PROG=etags
+		MODE=
+	else	ctags --help 2>&1 | grep -- -e >/dev/null
+		# Note that "ctags --help" does not always work. Even certain ctags does not have the option.
+		# In that case we assume that the ctags does not support emacs mode.
+		if [ $? != 0 -a -z "$ETAGS_EXISTS" ]
+		then	echo "'ctags' does not support emacs mode and etags does not exist" 1>&2; exit 1
+		fi
+	fi
+fi
+
 # List of kinds supported by Exuberant Ctags 5.8
 # generated by ctags --list-kinds
 # --c-kinds was called --c-types before 2003
@@ -65,11 +86,19 @@ then	IGNORE_IDENTIFIES="-I pg_node_attr+"
 else	IGNORE_IDENTIFIES=
 fi
 
+if [ $PROG = "ctags" ]
+then	TAGS_OPT="-a -f"
+else	TAGS_OPT="-a -o"
+	FLAGS=
+fi
+
+rm -f ./$TAGS_FILE
+
 # this is outputting the tags into the file 'tags', and appending
 find `pwd`/ \( -name tmp_install -prune -o -name tmp_check -prune \) \
 	-o \( -name "*.[chly]" -o -iname "*makefile*" -o -name "*.mk" -o -name "*.in" \
 	-o -name "*.sql" -o -name "*.p[lm]" \) -type f -print |
-	xargs ctags $MODE -a -f $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES"
+	xargs $PROG $MODE $TAGS_OPT $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES"
 
 # Exuberant tags has a header that we cannot sort in with the other entries
 # so we skip the sort step


Re: meson: Non-feature feature options

2023-02-08 Thread Nazir Bilal Yavuz

Hi,


On 2/8/23 13:45, Peter Eisentraut wrote:


The problem is that these features now cannot be automatically enabled 
and behave annoyingly different from other feature options.


Agreed.


For the 'ssl' option, we have deprecated the --with-openssl option in 
configure and replaced it with --with-ssl, in anticipation of other 
SSL implementations.  None of that ever happened or is currently 
planned AFAICT.  So I suggest that we semi-revert this, so that we can 
make 'openssl' an auto option in meson.


+1


For the 'uuid' option, I'm not sure what the best way to address this 
would.  We could establish a search order of libraries that is used if 
no specific one is set (similar to libreadline, libedit, in a way).  
So we'd have one option 'uuid' that is of type feature with default 
'auto' and another option, say, 'uuid-library' of type 'combo'.




Your suggestion looks good and TCL already has a similar implementation 
with what you suggested:


option('pltcl', type : 'feature', value: 'auto',
  description: 'build with TCL support')

option('tcl_version', type : 'string', value : 'tcl',
  description: 'specify TCL version')


Regards,
Nazir Bilal Yavuz
Microsoft





Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error

2023-02-08 Thread Alvaro Herrera
On 2022-Sep-30, Yugo NAGATA wrote:

> Well, I still don't understand why we need to prepare only "the
> commands within a pipeline" before starting pipeline.  In the current
> behavior,  the entire script is prepared in advance just before executing
> the first SQL command in the script, instead of preparing each command
> one by one. This patch also prepare the entire script in advance, so
> there is no behavioural change in this sense.
> 
> However, there are a few behavioural changes. One is that the preparation
> is not counted in the command performance statistics as Fabien mentioned.
> Another is that all meta-commands including \shell and \sleep etc. are
> executed before the preparation.
> 
> To reduce impact of these changes, I updated the patch to prepare the
> commands just before executing the first SQL command or \startpipeline
> meta-command instead of at the beginning of the script. 

I propose instead the following: each command is prepared just before
it's executed, as previously, and if we see a \startpipeline, then we
prepare all commands starting with the one just after, and until the
\endpipeline.

I didn't test additional cases other than the one you submitted.

Testing this I noticed that pg_log_debug et al don't support
multithreading very well -- the lines are interspersed.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error

2023-02-08 Thread Alvaro Herrera
On 2023-Feb-08, Alvaro Herrera wrote:

> I propose instead the following: each command is prepared just before
> it's executed, as previously, and if we see a \startpipeline, then we
> prepare all commands starting with the one just after, and until the
> \endpipeline.

Here's the patch.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 6d8938009b246463efe4104f5312e37b28b55235 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 8 Feb 2023 13:01:47 +0100
Subject: [PATCH v7] Prepare commands in pipelines in advance

---
 src/bin/pgbench/pgbench.c| 147 +--
 src/bin/pgbench/t/001_pgbench_with_server.pl |  17 +++
 2 files changed, 117 insertions(+), 47 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 508ed218e8..a66fd51f92 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -628,7 +628,8 @@ typedef struct
 	pg_time_usec_t txn_begin;	/* used for measuring schedule lag times */
 	pg_time_usec_t stmt_begin;	/* used for measuring statement latencies */
 
-	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
+	/* whether client prepared each command of each script */
+	bool	  **prepared;
 
 	/*
 	 * For processing failures and repeating transactions with serialization
@@ -739,6 +740,7 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * prepname		The name that this command is prepared under, in prepare mode
  * retries		Number of retries after a serialization or deadlock error in the
  *current command.
  * failures		Number of errors in the current command that were not retried.
@@ -754,6 +756,7 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	char	   *prepname;
 	int64		retries;
 	int64		failures;
 } Command;
@@ -3006,13 +3009,6 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	return true;
 }
 
-#define MAX_PREPARE_NAME		32
-static void
-preparedStatementName(char *buffer, int file, int state)
-{
-	sprintf(buffer, "P%d_%d", file, state);
-}
-
 /*
  * Report the abortion of the client when processing SQL commands.
  */
@@ -3053,6 +3049,87 @@ chooseScript(TState *thread)
 	return i - 1;
 }
 
+/*
+ * Prepare the SQL command from st->use_file at command_num.  The name of the
+ * new prepared statement is stored in command->prepname.
+ */
+static void
+prepareCommand(CState *st, int command_num)
+{
+	Command*command = sql_script[st->use_file].commands[command_num];
+	static int	prepnr = 0;
+
+	/* No prepare for non-SQL commands */
+	if (command->type != SQL_COMMAND)
+		return;
+
+	if (!st->prepared)
+	{
+		st->prepared = pg_malloc(sizeof(bool *) * num_scripts);
+		for (int i = 0; i < num_scripts; i++)
+		{
+			ParsedScript *script = &sql_script[i];
+			int			numcmds;
+
+			for (numcmds = 0; script->commands[numcmds] != NULL; numcmds++)
+;
+			st->prepared[i] = pg_malloc0(numcmds);
+		}
+	}
+
+	if (!st->prepared[st->use_file][command_num])
+	{
+		PGresult   *res;
+
+		if (!command->prepname)
+			command->prepname = psprintf("P%d_%d", st->use_file, prepnr++);
+		pg_log_debug("client %d preparing %s", st->id, command->prepname);
+		res = PQprepare(st->con, command->prepname,
+		command->argv[0], command->argc - 1, NULL);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pg_log_error("%s", PQerrorMessage(st->con));
+		PQclear(res);
+		st->prepared[st->use_file][command_num] = true;
+	}
+}
+
+/*
+ * Prepare all the commands in the script that come after the \startpipeline
+ * that's at position st->command, and the first \endpipeline we find.
+ *
+ * (This sets the ->prepared flag for each relevant command, but doesn't move
+ * the st->command counter)
+ */
+static void
+prepareCommandsInPipeline(CState *st)
+{
+	int			j;
+	Command   **commands = sql_script[st->use_file].commands;
+
+	Assert(commands[st->command]->type == META_COMMAND &&
+		   commands[st->command]->meta == META_STARTPIPELINE);
+
+	/*
+	 * We set the 'prepared' flag on the \startpipeline itself to flag that we
+	 * don't need to do this next time, even though we don't actually prepare
+	 * it.
+	 */
+	if (st->prepared &&
+		st->prepared[st->use_file][st->command])
+		return;
+
+	for (j = st->command + 1; commands[j] != NULL; j++)
+	{
+		if (commands[j]->type == META_COMMAND &&
+			commands[j]->meta == META_ENDPIPELINE)
+			break;
+
+		prepareCommand(st, j);
+	}
+
+	st->prepared[st->use_file][st->command] = true;
+}
+
 /* Send a SQL command, using the chosen querymode */
 static bool
 sendCommand(CState *st, Command *command)
@@ -3083,49 +3160,14 @@ sendCommand(CState *st, Command *command)
 	}
 	else if (querymode == QUERY_PREPARED)
 	{
-		char		name[MAX_PREPARE_NAME];
 		const char *params[MAX_ARGS];
 
-		if (!st->prepared[st->use_file])
-	

Re: OpenSSL 3.0.0 vs old branches

2023-02-08 Thread Andrew Dunstan


On 2023-02-07 Tu 23:37, Tom Lane wrote:

Michael Paquier  writes:

On Tue, Feb 07, 2023 at 01:28:26PM -0500, Tom Lane wrote:

I think Peter's misremembering the history, and OpenSSL 3 *is*
supported in these branches.  There could be an argument for
not back-patching f0d2c65f17 on the grounds that pre-1.1.1 is
also supported there.  On the whole though, it seems more useful
today for that test to pass with 3.x than for it to pass with 0.9.8.
And I can't see investing effort to make it do both (but if Peter
wants to, I won't stand in the way).

Cutting support for 0.9.8 in oldest branches would be a very risky
move, but as you say, if that only involves a failure in the SSL
tests while still allowing anything we have to work, fine by me to
live with that.

Question: is anybody around here still testing with 0.9.8 (or 1.0.x)
at all?  The systems I had that had that version on them are dead.



In the last 30 days, only the following buildfarm animals have reported 
running the ssl checks on the relevant branches:


 crake
 eelpout
 fairywren
 gokiburi
 hachi
 longfin

I don't think any of these runs openssl <= 1.0.x. If we want to preserve 
testability for those very old versions we should actually be doing some 
testing. Or we could just move on and backpatch this as I've suggested. 
I'll be pretty surprised if we get a single complaint.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: run pgindent on a regular basis / scripted manner

2023-02-08 Thread Andrew Dunstan


On 2023-02-07 Tu 12:21, Jelte Fennema wrote:

On Mon, Feb 6, 2023 at 10:21 AM Andrew Dunstan  wrote:

Here's a quick patch for 1 and 3. Would also need to adjust the docco.



This time with patch.

When supplying the --commit flag it still formats all files for me. I
was able to fix that by replacing:
# no non-option arguments given. so do everything in the current directory
$code_base ||= '.' unless @ARGV;

with:
# no files, dirs or commits given. so do everything in the current directory
$code_base ||= '.' unless @ARGV || @commits;



Yeah, thanks for testing. Here's a new patch with that change and the 
comment adjusted.





Does the code-base flag still make sense if you can simply pass a
directory as regular args now?



Probably not. I'll look into removing it.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 56640e576a..34fb7d604d 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -23,12 +23,14 @@ my $devnull = File::Spec->devnull;
 
 my ($typedefs_file, $typedef_str, $code_base,
 	@excludes,  $indent,  $build,
-	$show_diff, $silent_diff, $help);
+	$show_diff, $silent_diff, $help,
+	@commits,);
 
 $help = 0;
 
 my %options = (
 	"help"   => \$help,
+	"commit=s"   => \@commits,
 	"typedefs=s" => \$typedefs_file,
 	"list-of-typedefs=s" => \$typedef_str,
 	"code-base=s"=> \$code_base,
@@ -44,6 +46,9 @@ usage() if $help;
 usage("Cannot have both --silent-diff and --show-diff")
   if $silent_diff && $show_diff;
 
+usage("Cannot use --commit with --code-base or command line file list")
+  if (@commits && ($code_base || @ARGV));
+
 run_build($code_base) if ($build);
 
 # command line option wins, then environment (which is how --build sets it) ,
@@ -53,8 +58,9 @@ $typedefs_file ||= $ENV{PGTYPEDEFS};
 # build mode sets PGINDENT
 $indent ||= $ENV{PGINDENT} || $ENV{INDENT} || "pg_bsd_indent";
 
-# no non-option arguments given. so do everything in the current directory
-$code_base ||= '.' unless @ARGV;
+# if no non-option arguments or commits are given, default to looking in the
+# current directory
+$code_base ||= '.' unless (@ARGV || @commits);
 
 my $sourcedir = locate_sourcedir();
 
@@ -388,6 +394,7 @@ Usage:
 pgindent [OPTION]... [FILE]...
 Options:
 	--help  show this message and quit
+--commit=gitref use files modified by the named commit
 	--typedefs=FILE file containing a list of typedefs
 	--list-of-typedefs=STR  string containing typedefs, space separated
 	--code-base=DIR path to the base of PostgreSQL source code
@@ -396,7 +403,7 @@ Options:
 	--build build the pg_bsd_indent program
 	--show-diff show the changes that would be made
 	--silent-diff   exit with status 2 if any changes would be made
-The --excludes option can be given more than once.
+The --excludes and --commit options can be given more than once.
 EOF
 	if ($help)
 	{
@@ -412,27 +419,38 @@ EOF
 
 # main
 
-# get the list of files under code base, if it's set
-File::Find::find(
-	{
-		wanted => sub {
-			my ($dev, $ino, $mode, $nlink, $uid, $gid);
-			(($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))
-			  && -f _
-			  && /^.*\.[ch]\z/s
-			  && push(@files, $File::Find::name);
-		}
-	},
-	$code_base) if $code_base;
-
 $filtered_typedefs_fh = load_typedefs();
 
 check_indent();
 
-# any non-option arguments are files to be processed
-push(@files, @ARGV);
+build_clean($code_base) if $build;
 
-# the exclude list applies to command line arguments as well as found files
+my $wanted = sub
+{
+	my ($dev, $ino, $mode, $nlink, $uid, $gid);
+	(($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))
+	  && -f _
+	  && /^.*\.[ch]\z/s
+	  && push(@files, $File::Find::name);
+};
+
+# get the list of files under code base, if it's set
+File::Find::find({wanted => $wanted }, $code_base) if $code_base;
+
+# any non-option arguments are files or directories to be processed
+File::Find::find({wanted => $wanted}, @ARGV) if @ARGV;
+
+# process named commits by comparing each with their immediate ancestor
+foreach my $commit (@commits)
+{
+	my $prev="$commit~";
+	my @affected=`git diff-tree --no-commit-id --name-only -r $commit $prev`;
+	die "git error" if $?;
+	chomp(@affected);
+	push(@files,@affected);
+}
+
+# remove excluded files from the file list
 process_exclude();
 
 foreach my $source_filename (@files)
@@ -481,6 +499,4 @@ foreach my $source_filename (@files)
 	}
 }
 
-build_clean($code_base) if $build;
-
 exit 0;


Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-02-08 Thread Aleksander Alekseev
Hi,

> To me it's a pretty fundamental violation of how heap visibility works.

I don't think this has much to do with heap visibility. It's true that
generally a command doesn't see its own tuples. This is done in order
to avoid the Halloween problem which however can't happen in this
particular case.

Other than that the heap doesn't care about the visibility, it merely
stores the tuples. The visibility is determined by xmin/xmax, the
isolation level, etc.

It's true that the patch changes visibility rules in one very
particular edge case. This alone is arguably not a good enough reason
to reject a patch.

> Given that we skip the update in "UPDATE", your argument doesn't hold much
> water.

Peter made this argument above too and I will give the same anwer.
There is no reason why two completely different SQL statements should
behave the same.

> > That's a reasonable concern, however I was unable to break unique
> > constraints or triggers so far:
>
> I think you'd have to do a careful analysis of a lot of code for that to hold
> any water.

Alternatively we could work smarter, not harder, and let the hardware
find the bugs for us. Writing tests is much simpler and bullet-proof
than analyzing the code.

Again, to clarify, I'm merely playing the role of Devil's advocate
here. I'm not saying that the patch should necessarily be accepted,
nor am I 100% sure that it has any undiscovered bugs. However the
arguments against received so far don't strike me personally as being
particularly convincing.

As an example, one could argue that there are applications that
*expect* to get an ERROR in the case of self-conflicting inserts. And
by changing this behavior we will break these applications. If the
majority believes that we seriously care about this it would be a good
enough reason to withdraw the patch.

-- 
Best regards,
Aleksander Alekseev




Re: daitch_mokotoff module

2023-02-08 Thread Dag Lem
Tomas Vondra  writes:

> On 2/7/23 18:08, Paul Ramsey wrote:
>> 
>> 
>>> On Feb 7, 2023, at 6:47 AM, Dag Lem  wrote:
>>>
>>> I just went by to check the status of the patch, and I noticed that
>>> you've added yourself as reviewer earlier - great!
>>>
>>> Please tell me if there is anything I can do to help bring this across
>>> the finish line.
>> 
>> Honestly, I had set it to Ready for Committer, but then I went to
>> run regression one more time and my regression blew up. I found I
>> couldn't enable the UTF tests without things failing. And I don't
>> blame you! I think my installation is probably out-of-alignment in
>> some way, but I didn't want to flip the Ready flag without having
>> run everything through to completion, so I flipped it back. Also,
>> are the UTF tests enabled by default? It wasn't clear to me that
>> they were?
>> 
> The utf8 tests are enabled depending on the encoding returned by
> getdatabaseencoding(). Systems with other encodings will simply use the
> alternate .out file. And it works perfectly fine for me.
>
> IMHO it's ready for committer.
>
>
> regards

Yes, the UTF-8 tests follow the current best practice as has been
explained to me earlier. The following patch exemplifies this:

https://github.com/postgres/postgres/commit/c2e8bd27519f47ff56987b30eb34a01969b9a9e8


Best regards,

Dag Lem




Re: run pgindent on a regular basis / scripted manner

2023-02-08 Thread Andrew Dunstan


On 2023-02-08 We 07:41, Andrew Dunstan wrote:



On 2023-02-07 Tu 12:21, Jelte Fennema wrote:



Does the code-base flag still make sense if you can simply pass a
directory as regular args now?



Probably not. I'll look into removing it.





What we should probably do is remove all the build stuff along with 
$code_base. It dates back to the time when I developed this as an out of 
tree replacement for the old pgindent, and is just basically wasted 
space now. After I get done with the current round of enhancements I'll 
reorganize the script and get rid of things we don't need any more.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Parallelize correlated subqueries that execute within each worker

2023-02-08 Thread James Coleman
On Mon, Feb 6, 2023 at 11:39 AM Antonin Houska  wrote:
>
> James Coleman  wrote:
> > Which this patch we do in fact now see (as expected) rels with
> > non-empty lateral_relids showing up in generate_[useful_]gather_paths.
> > And the partial paths can now have non-empty required outer rels. I'm
> > not able to come up with a plan that would actually be caught by those
> > checks; I theorize that because of the few places we actually call
> > generate_[useful_]gather_paths we are in practice already excluding
> > those, but for now I've left these as a conditional rather than an
> > assertion because it seems like the kind of guard we'd want to ensure
> > those methods are safe.
>
> Maybe we can later (in separate patches) relax the restrictions imposed on
> partial path creation a little bit, so that more parameterized partial paths
> are created.
>
> One particular case that should be rejected by your checks is a partial index
> path, which can be parameterized, but I couldn't construct a query that makes
> your checks fire. Maybe the reason is that a parameterized index path is
> mostly used on the inner side of a NL join, however no partial path can be
> used there. (The problem is that each worker evaluating the NL join would only
> see a subset of the inner relation, which whould lead to incorrect results.)
>
> So I'd also choose conditions rather than assert statements.

Thanks for confirming.

>
> Following are my (minor) findings:
>
> In generate_gather_paths() you added this test
>
> /*
>  * Delay gather path creation until the level in the join tree where 
> all
>  * params used in a worker are generated within that worker.
>  */
> if (!bms_is_subset(required_outer, rel->relids))
> return;
>
> but I'm not sure if required_outer can contain anything of rel->relids. How
> about using bms_is_empty(required) outer, or even this?
>
> if (required_outer)
> return;
>
> Similarly,
>
> /* We can't pass params to workers. */
> if (!bms_is_subset(PATH_REQ_OUTER(cheapest_partial_path), 
> rel->relids))
>
> might look like
>
> if (!bms_is_empty(PATH_REQ_OUTER(cheapest_partial_path)))
>
> or
>
> if (PATH_REQ_OUTER(cheapest_partial_path))

I'm not sure about this change. Deciding is difficult given the fact
that we don't seem to currently generate these paths, but I don't see
a reason why lateral_relids can't be present on the rel, and if so,
then we need to check to see if they're a subset of relids we can
satisfy rather than checking that they don't exist.

> In particular, build_index_paths() does the following when setting
> outer_relids (which eventually becomes (path->param_info->ppi_req_outer):
>
> /* Enforce convention that outer_relids is exactly NULL if empty */
> if (bms_is_empty(outer_relids))
> outer_relids = NULL;
>
>
> Another question is whether in this call
>
> simple_gather_path = (Path *)
> create_gather_path(root, rel, cheapest_partial_path, 
> rel->reltarget,
>required_outer, rowsp);
>
> required_outer should be passed to create_gather_path(). Shouldn't it rather
> be PATH_REQ_OUTER(cheapest_partial_path) that you test just above? Again,
> build_index_paths() initializes outer_relids this way
>
> outer_relids = bms_copy(rel->lateral_relids);
>
> but then it may add some more relations:
>
> /* OK to include this clause */
> index_clauses = lappend(index_clauses, iclause);
> outer_relids = bms_add_members(outer_relids,
>
> rinfo->clause_relids);
>
> So I think that PATH_REQ_OUTER(cheapest_partial_path) in
> generate_gather_paths() can eventually contain more relations than
> required_outer, and therefore it's safer to check the first.

Yes, this is a good catch. Originally I didn't know about
PATH_REQ_OUTER, and I'd missed using it in these places.

>
> Similar comments might apply to generate_useful_gather_paths(). Here I also
> suggest to move this test
>
> /* We can't pass params to workers. */
> if (!bms_is_subset(PATH_REQ_OUTER(subpath), rel->relids))
> continue;
>
> to the top of the loop because it's relatively cheap.

Moved.

Attached is v9.

James Coleman


v9-0001-Add-tests-before-change.patch
Description: Binary data


v9-0002-Parallelize-correlated-subqueries.patch
Description: Binary data


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-08 Thread shveta malik
On Tue, Feb 7, 2023 at 8:18 AM shiy.f...@fujitsu.com
 wrote:
>
>
> On Thu, Feb 2, 2023 11:48 AM shveta malik  wrote:
> >
> >
> > So to fix this, I think either we update origin and slot entries in
> > the system catalog after the creation has passed or we clean-up the
> > system catalog in case of failure. What do you think?
> >
>
> I think the first way seems better.

Yes, I agree.

>
> I reproduced the problem I reported before with latest patch (v7-0001,
> v10-0002), and looked into this problem. It is caused by a similar reason. 
> Here
> is some analysis for the problem I reported [1].#6.
>
> First, a tablesync worker (worker-1) started for "tbl1", its originname is
> "pg_16398_1". And it exited because of unique constraint. In
> LogicalRepSyncTableStart(), originname in pg_subscription_rel is updated when
> updating table state to DATASYNC, and the origin is created when updating 
> table
> state to FINISHEDCOPY. So when it exited with state DATASYNC , the origin is 
> not
> created but the originname has been updated in pg_subscription_rel.
>
> Then a tablesync worker (worker-2) started for "tbl2", its originname is
> "pg_16398_2". After tablesync of "tbl2" finished, this worker moved to sync
> table "tbl1". In LogicalRepSyncTableStart(), it got the originname of "tbl1" -
> "pg_16398_1", by calling ReplicationOriginNameForLogicalRep(), and tried to 
> drop
> the origin (although it is not actually created before). After that, it called
> replorigin_by_name to get the originid whose name is "pg_16398_1" and the 
> result
> is InvalidOid. Origin won't be created in this case because the sync worker 
> has
> created a replication slot (when it synced tbl2), so the originid was still
> invalid and it caused an assertion failure when calling replorigin_advance().
>
> It seems we don't need to drop previous origin in worker-2 because the 
> previous
> origin was not created in worker-1. I think one way to fix it is to not update
> originname of pg_subscription_rel when setting state to DATASYNC, and only do
> that when setting state to FINISHEDCOPY. If so, the originname in
> pg_subscription_rel will be set at the same time the origin is created.

+1. Update of system-catalog needs to be done carefully and only when
origin is created.

thanks
Shveta




Re: Hi i am Intrested to contribute

2023-02-08 Thread Aleksander Alekseev
Hi, Shivam!

> You may find useful the guide on how to contribute [1]. You can freely
> choose what you want (from the list of TODOs linked or anything else)
> and work on it, no permission from anyone is necessary.
> The downside is that it's not easy to detect what is useful for the
> first time, so I'd recommend first joining reviewing existing patches
> at commitfest page [2] and/or trying to do some bugfixes from
> pgsql-bugs mailing list. Then over time, you can gather some context
> and you can choose more and more complicated things.

Additionally, take a look at several recent discussions [1][2] of the subject.

[1]: https://postgr.es/m/48279D7D-F780-4F79-B820-4336D2EA10BE%40u.nus.edu
[2]: 
https://postgr.es/m/jVE8e0yCYML-PtkT9EkRu7L31k05D2PptAmrjx2CMP2CE0v4kFI1rysuo4lAuYmcPUXsUD-0UXISJ62GZC2P5Ktf9KukCxjDfADCHOaorfY%3D%40pm.me

-- 
Best regards,
Aleksander Alekseev




Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-02-08 Thread Hans Buschmann
During data refactoring of our Application I encountered $subject when joining 
4 CTEs with left join or inner join.


1. Background

PG 15.1 on Windows x64 (OS seems no to have no meening here)


I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping 
certain data (4 CTEs qup,qli,qin,qou)

The grouping of the data in the CTEs gives estimated row counts of about 1000 
(1 tenth of the real value) This is OK for estimation.


These 4 CTEs are then used to combine the data by joining them.


2. Problem

The 4 CTEs are joined by left joins as shown below:


from qup
left join qli on (qli.curr_season=qup.curr_season and 
qli.curr_code=qup.curr_code and qli.ibitmask>0 and cardinality(qli.mat_arr) <=8)
left join qin on (qin.curr_season=qup.curr_season and 
qin.curr_code=qup.curr_code and qin.ibitmask>0 and cardinality(qin.mat_arr) <=8)
left join qou on (qou.curr_season=qup.curr_season and 
qou.curr_code=qup.curr_code and qou.ibitmask>0 and cardinality(qou.mat_arr) 
<=11)
where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21

The plan first retrieves qup and qli, taking the estimated row counts of 1163 
and 1147 respectively


BUT the result is then hashed and the row count is estimated as 33!


In a Left join the row count stays always the same as the one of left table 
(here qup with 1163 rows)


The same algorithm which reduces the row estimation from 1163 to 33 is used in 
the next step to give an estimation of 1 row.

This is totally wrong.


Here is the execution plan of the query:

(search the plan for rows=33)


QUERY PLAN
--
 Append  (cost=13673.81..17463.30 rows=5734 width=104) (actual 
time=168.307..222.670 rows=9963 loops=1)
   CTE qup
 ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80) (actual 
time=35.466..68.131 rows=10735 loops=1)
   Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
   ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual 
time=35.454..36.819 rows=50969 loops=1)
 Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code COLLATE "C"
 Sort Method: quicksort  Memory: 4722kB
 ->  Hash Left Join  (cost=41.71..1246.13 rows=50969 width=18) 
(actual time=0.148..10.687 rows=50969 loops=1)
   Hash Cond: ((sa_upper.sup_mat_code)::text = 
upper_target.up_mat_code)
   ->  Seq Scan on sa_upper  (cost=0.00..884.69 rows=50969 
width=16) (actual time=0.005..1.972 rows=50969 loops=1)
   ->  Hash  (cost=35.53..35.53 rows=495 width=6) (actual 
time=0.140..0.140 rows=495 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 27kB
 ->  Seq Scan on upper_target  (cost=0.00..35.53 
rows=495 width=6) (actual time=0.007..0.103 rows=495 loops=1)
   Filter: (id_up <= 495)
   Rows Removed by Filter: 1467
   CTE qli
 ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80) (actual 
time=9.446..27.388 rows=10469 loops=1)
   Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
   ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual 
time=9.440..9.811 rows=11774 loops=1)
 Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code COLLATE 
"C"
 Sort Method: quicksort  Memory: 1120kB
 ->  Hash Left Join  (cost=7.34..301.19 rows=11774 width=18) 
(actual time=0.045..2.438 rows=11774 loops=1)
   Hash Cond: ((sa_lining.sli_mat_code)::text = 
lining_target.li_mat_code)
   ->  Seq Scan on sa_lining  (cost=0.00..204.74 rows=11774 
width=16) (actual time=0.008..0.470 rows=11774 loops=1)
   ->  Hash  (cost=5.86..5.86 rows=118 width=6) (actual 
time=0.034..0.034 rows=119 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 13kB
 ->  Seq Scan on lining_target  (cost=0.00..5.86 
rows=118 width=6) (actual time=0.008..0.024 rows=119 loops=1)
   Filter: (id_li <= 119)
   Rows Removed by Filter: 190
   CTE qin
 ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80) (actual 
time=11.424..31.508 rows=10678 loops=1)
   Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
   ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual 
time=11.416..11.908 rows=15230 loops=1)
 Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code COLLATE 
"C"
 Sort Method: quicksort  Memory: 1336kB
 ->  Hash Left Join  (cost=10.49..369.26 rows=15230 width=18) 
(actual time=0.051..3.108 rows=15230 loops=1)
   Hash Cond

Re: PostgreSQL 16 Dev apt-based Linux unable to install

2023-02-08 Thread Justin Pryzby
On Wed, Feb 08, 2023 at 10:46:42AM +0100, André Verwijs wrote:
> 
> PostgreSQL 16 Dev  apt-based Linux,  unable to install  
> make sure all dependencies are resolved, like libpq5 (or higher) for testing
> ...
> 
> " postgresql-client-16 : Prerequisites: libpq5 (>= 16~~devel) but
> 15.1-1.pgdg+1+b1 will be installed "

Few things:

You're always going to want to show the command that you ran in addition
to the error that you got.

This has to do with the debian packages, and not to postgres itself, so
this other list is a better place to ask than -hackers:
https://www.postgresql.org/list/pgsql-pkg-debian/

I think you'll need to use a command like
sudo apt-get install postgresql-16 -t buster-pgdg-snapshot

-- 
Justin




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-02-08 Thread Bharath Rupireddy
On Wed, Feb 8, 2023 at 10:33 AM Dilip Kumar  wrote:
>
> On Wed, Feb 8, 2023 at 9:57 AM Bharath Rupireddy
>  wrote:
> >
> > I can also do a few other things, but before working on them, I'd like
> > to hear from others:
> > 1. A separate wait event (WAIT_EVENT_WAL_READ_FROM_BUFFERS) for
> > reading from WAL buffers - right now, WAIT_EVENT_WAL_READ is being
> > used both for reading from WAL buffers and WAL files. Given the fact
> > that we won't wait for a lock or do a time-taking task while reading
> > from buffers, it seems unnecessary.
>
> Yes, we do not need this separate wait event and we also don't need
> WAIT_EVENT_WAL_READ wait event while reading from the buffer.  Because
> we are not performing any IO so no specific wait event is needed and
> for reading from the WAL buffer we are acquiring WALBufMappingLock so
> that lwlock event will be tracked under that lock.

Nope, LWLockConditionalAcquire doesn't wait, so no lock wait event (no
LWLockReportWaitStart) there. I agree to not have any wait event for
reading from WAL buffers as no IO is involved there. I removed it in
the attached v4 patch.

> > 2. A separate TAP test for verifying that the WAL is actually read
> > from WAL buffers - right now, existing tests for recovery,
> > subscription, pg_walinspect already cover the code, see [1]. However,
> > if needed, I can add a separate TAP test.
>
> Can we write a test that can actually validate that we have read from
> a WAL Buffer? If so then it would be good to have such a test to avoid
> any future breakage in that logic.  But if it is just for hitting the
> code but no guarantee that whether we can validate as part of the test
> whether it has hit the WAL buffer or not then I think the existing
> cases are sufficient.

We could set up a standby or a logical replication subscriber or
pg_walinspect extension and verify if the code got hit with the help
of the server log (DEBUG1) message added by the patch. However, this
can make the test volatile.

Therefore, I came up with a simple and small test module/extension
named test_wal_read_from_buffers under src/test/module. It basically
exposes a SQL-function given an LSN, it calls XLogReadFromBuffers()
and returns true if it hits WAL buffers, otherwise false. And the
simple TAP test of this module verifies if the function returns true.
I attached the test module as v4-0002 here. The test module looks
specific and also helps as demonstration of how one can possibly use
the new XLogReadFromBuffers().

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v4-0001-Improve-WALRead-to-suck-data-directly-from-WAL-bu.patch
Description: Binary data


v4-0002-Add-test-module-for-verifying-WAL-read-from-WAL-b.patch
Description: Binary data


Re: daitch_mokotoff module

2023-02-08 Thread Dag Lem
Alvaro Herrera  writes:

> On 2023-Jan-17, Dag Lem wrote:
>
>> + * Daitch-Mokotoff Soundex
>> + *
>> + * Copyright (c) 2021 Finance Norway
>> + * Author: Dag Lem 
>
> Hmm, I don't think we accept copyright lines that aren't "PostgreSQL
> Global Development Group".  Is it okay to use that, and update the year
> to 2023?  (Note that answering "no" very likely means your patch is not
> candidate for inclusion.)  Also, we tend not to have "Author:" lines.
>

You'll have to forgive me for not knowing about this rule:

  grep -ER "Copyright.*[0-9]{4}" contrib/ | grep -v PostgreSQL

In any case, I have checked with the copyright owner, and it would be OK
to assign the copyright to "PostgreSQL Global Development Group".

To avoid going back and forth with patches, how do you propose that the
sponsor and the author of the contributed module should be credited?
Woule something like this be acceptable?

/*
 * Daitch-Mokotoff Soundex
 *
 * Copyright (c) 2023, PostgreSQL Global Development Group
 *
 * This module was sponsored by Finance Norway / Trafikkforsikringsforeningen
 * and implemented by Dag Lem 
 *
 ...

[...]

>
> We don't keep a separate copyright statement in the file; rather we
> assume that all files are under the PostgreSQL license, which is in the
> COPYRIGHT file at the top of the tree.  Changing it thus has the side
> effect that these disclaim notes refer to the University of California
> rather than "the Author".  IANAL.

OK, no problem. Note that you will again find counterexamples under
contrib/ (and in some other places):

  grep -R "Permission to use" .

> I think we should add SPDX markers to all the files we distribute:
> /* SPDX-License-Identifier: PostgreSQL */
>
> https://spdx.dev/ids/
> https://spdx.org/licenses/PostgreSQL.html

As far as I can tell, this is not included in any file so far, and is
thus better left to decide and implement by someone else.

Best regards

Dag Lem




Re: Why cann't simplify stable function in planning phase?

2023-02-08 Thread Tom Lane
Tomas Vondra  writes:
> Note: To be precise this is not about "executions" but about snapshots,
> and we could probably simplify the function call with isolation levels
> that maintain a single snapshot (e.g. REPEATABLE READ). But we don't.

We don't do that because, in fact, execution is *never* done with the same
snapshot used for planning.  See comment in postgres.c:

 * While it looks promising to reuse the same snapshot for query
 * execution (at least for simple protocol), unfortunately it causes
 * execution to use a snapshot that has been acquired before locking
 * any of the tables mentioned in the query.  This creates user-
 * visible anomalies, so refrain.  Refer to
 * https://postgr.es/m/flat/5075d8df.6050...@fuzzy.cz for details.

I'm not entirely sure that that locking argument still holds, but having
been burned once I'm pretty hesitant to try that again.

regards, tom lane




Re: OpenSSL 3.0.0 vs old branches

2023-02-08 Thread Erik Rijkers

Op 08-02-2023 om 05:37 schreef Tom Lane:

Michael Paquier  writes:

On Tue, Feb 07, 2023 at 01:28:26PM -0500, Tom Lane wrote:

I think Peter's misremembering the history, and OpenSSL 3 *is*
supported in these branches.  There could be an argument for
not back-patching f0d2c65f17 on the grounds that pre-1.1.1 is
also supported there.  On the whole though, it seems more useful
today for that test to pass with 3.x than for it to pass with 0.9.8.
And I can't see investing effort to make it do both (but if Peter
wants to, I won't stand in the way).



Cutting support for 0.9.8 in oldest branches would be a very risky
move, but as you say, if that only involves a failure in the SSL
tests while still allowing anything we have to work, fine by me to
live with that.


Question: is anybody around here still testing with 0.9.8 (or 1.0.x)
at all?  The systems I had that had that version on them are dead.

regards, tom lane


I've hoarded an old centos 6.1 system that I don't really use anymore 
but sometimes (once every few weeks, I guess) start up and build master 
on, for instance to test with postgres_fdw/replication. Such a build 
would include a make check, and I think I would have noticed any fails.


That system says:
OpenSSL> OpenSSL 1.0.1e-fips 11 Feb 2013

FWIW, just now I built & ran check-world for 15 and 16 with 
PG_TEST_EXTRA=ssl (which I didn't use before).  Both finished ok.


Erik Rijkers




Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Robert Haas
On Sun, Feb 5, 2023 at 7:46 PM Nathan Bossart  wrote:
> Got it.  I suspect we'll want to do something similar for archive modules
> eventually, too.

+1.

I felt like the archive modules work was a step forward when we did,
because basic_archive does some things that you're not likely to get
right if you do it on your own. And a similar approach to
restore_command might be also be valuable, at least in my opinion.
However, the gains that we can get out of the archive module facility
in its present form do seem to be somewhat limited, for exactly the
kinds of reasons being discussed here.

I kind of wonder whether we ought to try to flip the model around. At
present, the idea is that the archiver is doing its thing and it makes
callbacks into the archive module. But what if we got rid of the
archiver main loop altogether and put the main loop inside of the
archive module, and have it call back to some functions that we
provide? One function could be like char
*pgarch_next_file_to_be_archived_if_there_is_one_ready(void) and the
other could be like void
pgarch_some_file_that_you_gave_me_previously_is_now_fully_archived(char
*which_one). That way, we'd break the tight coupling where you have to
get a unit of work and perform it in full before you can get the next
unit of work. Some variant of this could work on the restore side,
too, I think, although we have less certainty about how much it makes
to prefetch for restore than we do about what needs to be archived.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: OpenSSL 3.0.0 vs old branches

2023-02-08 Thread Peter Eisentraut

On 07.02.23 19:28, Tom Lane wrote:

I think Peter's misremembering the history, and OpenSSL 3*is*
supported in these branches.  There could be an argument for
not back-patching f0d2c65f17 on the grounds that pre-1.1.1 is
also supported there.  On the whole though, it seems more useful
today for that test to pass with 3.x than for it to pass with 0.9.8.


Ok, let's do it.





Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 16:08:39 +0300, Aleksander Alekseev wrote:
> > To me it's a pretty fundamental violation of how heap visibility works.
> 
> I don't think this has much to do with heap visibility. It's true that
> generally a command doesn't see its own tuples. This is done in order
> to avoid the Halloween problem which however can't happen in this
> particular case.
> 
> Other than that the heap doesn't care about the visibility, it merely
> stores the tuples. The visibility is determined by xmin/xmax, the
> isolation level, etc.

Yes, and the fact is that cmin == cmax is something that we don't normally
produce, yet you emit it now, without, as far as I can tell it, a convincing
reason.


> > > That's a reasonable concern, however I was unable to break unique
> > > constraints or triggers so far:
> >
> > I think you'd have to do a careful analysis of a lot of code for that to 
> > hold
> > any water.
> 
> Alternatively we could work smarter, not harder, and let the hardware
> find the bugs for us. Writing tests is much simpler and bullet-proof
> than analyzing the code.

That's a spectactularly wrong argument in almost all cases. Unless you have a
way to get to full branch coverage or use a model checker that basically does
the same, testing isn't going to give you a whole lot of confidence that you
haven't introduced bugs. This is particularly true for something like heapam,
where a lot of the tricky behaviour requires complicated interactions between
multiple connections.


> Again, to clarify, I'm merely playing the role of Devil's advocate
> here. I'm not saying that the patch should necessarily be accepted,
> nor am I 100% sure that it has any undiscovered bugs. However the
> arguments against received so far don't strike me personally as being
> particularly convincing.

I've said my piece, as-is I vote to reject the patch.

Greetings,

Andres Freund




Re: possible proposal plpgsql GET DIAGNOSTICS oid = PG_ROUTINE_OID

2023-02-08 Thread Kirk Wolak
On Wed, Feb 8, 2023 at 3:08 AM Pavel Stehule 
wrote:

> hi
>
> st 8. 2. 2023 v 7:33 odesílatel Julien Rouhaud 
> napsal:
>
>> On Tue, Feb 07, 2023 at 08:48:22PM +0100, Pavel Stehule wrote:
>> >
>> > GET DIAGNOSTICS _oid = PG_ROUTINE_OID;
>> > RAISE NOTICE '... % ... %', _oid, _oid::regproc::text;
>> >
>> > Do you think it can be useful feature?
>>
>> +1, it would have been quite handy in a few of my projects.
>>
>
> it can looks like that
>
> create or replace function foo(a int)
> returns int as $$
> declare s text; n text; o oid;
> begin
>   get diagnostics s = pg_current_routine_signature,
>   n = pg_current_routine_name,
>   o = pg_current_routine_oid;
>   raise notice 'sign:%,  name:%,  oid:%', s, n, o;
>   return a;
> end;
> $$ language plpgsql;
> CREATE FUNCTION
> (2023-02-08 09:04:03) postgres=# select foo(10);
> NOTICE:  sign:foo(integer),  name:foo,  oid:16392
> ┌─┐
> │ foo │
> ╞═╡
> │  10 │
> └─┘
> (1 row)
>
> The name - pg_routine_oid can be confusing, because there is not clean if
> it is oid of currently executed routine or routine from top of exception
>
> Regards
>
> Pavel
>

I agree that the name changed to pg_current_routine_...  makes the most
sense, great call...

+1


Re: typos

2023-02-08 Thread Justin Pryzby
Some more accumulated/new typos.
>From 6c79a0d4e0251dbbac38babb60bb2d0fbae3da8d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 11 Jan 2023 12:52:25 -0600
Subject: [PATCH 01/10] use "an SQL" rather than a SQL

Per 04539e
---
 doc/src/sgml/ecpg.sgml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index a76cf3538f1..f52165165dc 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -1506,7 +1506,7 @@ EXEC SQL TYPE serial_t IS long;
  
 
  
-  Any word you declare as a typedef cannot be used as a SQL keyword
+  Any word you declare as a typedef cannot be used as an SQL keyword
   in EXEC SQL commands later in the same program.
   For example, this won't work:
 
@@ -1518,7 +1518,7 @@ EXEC SQL START TRANSACTION;
 
   ECPG will report a syntax error for START
   TRANSACTION, because it no longer
-  recognizes START as a SQL keyword,
+  recognizes START as an SQL keyword,
   only as a typedef.
   (If you have such a conflict, and renaming the typedef
   seems impractical, you could write the SQL command
@@ -1530,7 +1530,7 @@ EXEC SQL START TRANSACTION;
In PostgreSQL releases before v16, use
of SQL keywords as typedef names was likely to result in syntax
errors associated with use of the typedef itself, rather than use
-   of the name as a SQL keyword.  The new behavior is less likely to
+   of the name as an SQL keyword.  The new behavior is less likely to
cause problems when an existing ECPG application is recompiled in
a new PostgreSQL release with new
keywords.
-- 
2.25.1

>From fef49b822bad849722ec9b043ffd676c80492d0d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 15 Jan 2023 17:00:06 -0600
Subject: [PATCH 02/10] comment typos

---
 src/backend/commands/dbcommands.c  | 2 +-
 src/backend/executor/execMain.c| 2 +-
 src/backend/jit/llvm/llvmjit_inline.cpp| 2 +-
 src/backend/replication/logical/snapbuild.c| 2 +-
 src/backend/replication/walsender.c| 6 +++---
 src/bin/pg_dump/pg_backup_custom.c | 2 +-
 src/bin/pg_dump/pg_dumpall.c   | 2 +-
 src/include/lib/ilist.h| 2 +-
 src/test/regress/expected/alter_table.out  | 2 +-
 src/test/regress/expected/create_procedure.out | 2 +-
 src/test/regress/sql/alter_table.sql   | 2 +-
 src/test/regress/sql/create_procedure.sql  | 2 +-
 src/test/subscription/t/031_column_list.pl | 2 +-
 13 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 1f4ce2fb9cf..ef05633bb05 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -3090,7 +3090,7 @@ dbase_redo(XLogReaderState *record)
 
 		/*
 		 * There's a case where the copy source directory is missing for the
-		 * same reason above.  Create the emtpy source directory so that
+		 * same reason above.  Create the empty source directory so that
 		 * copydir below doesn't fail.  The directory will be dropped soon by
 		 * recovery.
 		 */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a5115b9c1f7..39bfb48dc22 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -134,7 +134,7 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
 	/*
 	 * In some cases (e.g. an EXECUTE statement) a query execution will skip
 	 * parse analysis, which means that the query_id won't be reported.  Note
-	 * that it's harmless to report the query_id multiple time, as the call
+	 * that it's harmless to report the query_id multiple times, as the call
 	 * will be ignored if the top level query_id has already been reported.
 	 */
 	pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp
index dc35e002f51..c765add8564 100644
--- a/src/backend/jit/llvm/llvmjit_inline.cpp
+++ b/src/backend/jit/llvm/llvmjit_inline.cpp
@@ -753,7 +753,7 @@ function_inlinable(llvm::Function &F,
 		/* import referenced function itself */
 		importVars.insert(referencedFunction->getName());
 
-		/* import referenced function and its dependants */
+		/* import referenced function and its dependents */
 		for (auto& recImportVar : recImportVars)
 			importVars.insert(recImportVar.first());
 	}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 829c5681120..62542827e4b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1816,7 +1816,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	fsync_fname("pg_logical/snapshots", true);
 
 	/*
-	 * Now there's no way we can loose the dumped state anymore, remember this
+	 * Now there's no way we can lose the d

Re: Named Operators

2023-02-08 Thread Tom Lane
Peter Eisentraut  writes:
> On 12.01.23 14:55, Matthias van de Meent wrote:
>>> Matter of taste, I guess. But more importantly, defining an operator
>>> gives you many additional features that the planner can use to
>>> optimize your query differently, which it can't do with functions. See
>>> the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.

>> I see. Wouldn't it be better then to instead make it possible for the
>> planner to detect the use of the functions used in operators and treat
>> them as aliases of the operator? Or am I missing something w.r.t.
>> differences between operator and function invocation?
>> E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
>> `my_bigint + 1` (and vice versa), while they should be able to support
>> that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.

> I have been thinking about something like this for a long time. 
> Basically, we would merge pg_proc and pg_operator internally.  Then, all 
> the special treatment for operators would also be available to 
> two-argument functions.

I had a thought about this ...

I do not think this proposal is going anywhere as-written.
There seems very little chance that we can invent a syntax that
is concise, non-ugly, and not likely to get blindsided by future
SQL spec extensions.  Even if we were sure that, say, "{foo}"
was safe from spec interference, the syntax "a {foo} b" has
exactly nothing to recommend it compared to "foo(a,b)".
It's not shorter, it's not standard, it won't help any pre-existing
queries, and it can't use function-call features such as named
arguments.

As Matthias said, what we actually need is for the planner to be able
to optimize function calls on the same basis as operators.  We should
tackle that directly rather than inventing new syntax.

We could go after that by inventing a bunch of new function properties
to parallel operator properties, but there is a simpler way: just
teach the planner to look to see if a function call is a call of the
underlying function of some operator, and if so treat it like that
operator.  Right now that'd be an expensive lookup, but we could
remove that objection with an index on pg_operator.oprcode or a
single new field in pg_proc.

This approach does have a couple of shortcomings:

* You still have to invent an operator name, even if you never
plan to use it in queries.  This is just cosmetic though.
It's not going to matter if the operator name is long or looks like
line noise, if you only need to use it a few times in setup DDL.

* We could not extend this to support index functions with more than
two arguments, a request we've heard once in awhile in the past.
Our answer to that so far has been "make a function/operator with
one indexed argument and one composite-type argument", which is a
bit of an ugly workaround but seems to be serviceable enough.

On the whole I don't think these shortcomings are big enough
to justify all the work that would be involved in attaching
operator-like optimization information directly to functions.
(To mention just one nontrivial stumbling block: do you really
want to invent "shell functions" similar to the shell-operator
hack?  If not, how are you going to handle declaration of
commutator pairs?)

In the long run this might lead to thinking of pg_operator as
an extension of pg_proc in the same way that pg_aggregate is.
But we have not unified pg_aggregate into pg_proc, and I don't
think anyone wants to, because pg_proc rows are undesirably
wide already.  There's a similar objection to attaching
optimization fields directly to pg_proc.

You could imagine some follow-on internal cleanup like trying
to unify FuncExpr and OpExpr into a single node type (carrying
a function OID and optionally an operator OID).  But that need
not have any user-visible impact either; it'd mainly be good
for eliminating a lot of near-duplicate code.

regards, tom lane




Re: OpenSSL 3.0.0 vs old branches

2023-02-08 Thread Tom Lane
Erik Rijkers  writes:
> Op 08-02-2023 om 05:37 schreef Tom Lane:
>> Question: is anybody around here still testing with 0.9.8 (or 1.0.x)
>> at all?  The systems I had that had that version on them are dead.

> I've hoarded an old centos 6.1 system that I don't really use anymore 
> but sometimes (once every few weeks, I guess) start up and build master 
> on, for instance to test with postgres_fdw/replication. Such a build 
> would include a make check, and I think I would have noticed any fails.
> That system says:
> OpenSSL> OpenSSL 1.0.1e-fips 11 Feb 2013
> FWIW, just now I built & ran check-world for 15 and 16 with 
> PG_TEST_EXTRA=ssl (which I didn't use before).  Both finished ok.

Oh, that's good to know.  That means that the newer form of this
test works with 1.0.1, which means that we'd only lose test
compatibility with 0.9.x OpenSSL.  That bothers me not at all
in 2023.

regards, tom lane




Re: Why cann't simplify stable function in planning phase?

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 09:57:04 -0500, Tom Lane wrote:
> Tomas Vondra  writes:
> > Note: To be precise this is not about "executions" but about snapshots,
> > and we could probably simplify the function call with isolation levels
> > that maintain a single snapshot (e.g. REPEATABLE READ). But we don't.
> 
> We don't do that because, in fact, execution is *never* done with the same
> snapshot used for planning.  See comment in postgres.c:
> 
>  * While it looks promising to reuse the same snapshot for query
>  * execution (at least for simple protocol), unfortunately it causes
>  * execution to use a snapshot that has been acquired before locking
>  * any of the tables mentioned in the query.  This creates user-
>  * visible anomalies, so refrain.  Refer to
>  * https://postgr.es/m/flat/5075d8df.6050...@fuzzy.cz for details.
> 
> I'm not entirely sure that that locking argument still holds, but having
> been burned once I'm pretty hesitant to try that again.

Because we now avoid re-computing snapshots, if there weren't any concurrent
commits/aborts, the gain would likely not be all that high anyway.

We should work on gettting rid of the ProcArrayLock acquisition in case we can
reuse the snapshot, though. I think it's doable safely, but when working on
it, I didn't succeed at writing a concise description as to why it's sfae, so
I decided that the rest of the wins are big enough to not focus on it then and
there.

Greetings,

Andres Freund




Re: meson: Non-feature feature options

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 11:45:05 +0100, Peter Eisentraut wrote:
> Most meson options (meson_options.txt) that enable an external dependency
> (e.g., icu, ldap) are of type 'feature'.  Most of these have a default value
> of 'auto', which means they are pulled in automatically if found.  Some have
> a default value of 'disabled' for specific reasons (e.g., selinux).  This is
> all good.
> 
> Two options deviate from this in annoying ways:
> 
> option('ssl', type : 'combo', choices : ['none', 'openssl'],
>   value : 'none',
>   description: 'use LIB for SSL/TLS support (openssl)')
> 
> option('uuid', type : 'combo', choices : ['none', 'bsd', 'e2fs', 'ossp'],
>   value : 'none',
>   description: 'build contrib/uuid-ossp using LIB')
> 
> These were moved over from configure like that.
>
> The problem is that these features now cannot be automatically enabled and
> behave annoyingly different from other feature options.

Oh, yes, this has been bothering me too.


> For the 'ssl' option, we have deprecated the --with-openssl option in
> configure and replaced it with --with-ssl, in anticipation of other SSL
> implementations.  None of that ever happened or is currently planned AFAICT.
> So I suggest that we semi-revert this, so that we can make 'openssl' an auto
> option in meson.

Hm. I'm inclined to leave it there - I do think it's somewhat likely that
we'll eventually end up with some platform native library. I think it's likely
the NSS patch isn't going anywhere, but I'm not sure that's true for
e.g. using the windows encryption library. IIRC Heikki had a patch at some
point.

I'd probably just add a 'auto' option, and manually make it behave like a
feature option.


> For the 'uuid' option, I'm not sure what the best way to address this would.
> We could establish a search order of libraries that is used if no specific
> one is set (similar to libreadline, libedit, in a way).  So we'd have one
> option 'uuid' that is of type feature with default 'auto' and another
> option, say, 'uuid-library' of type 'combo'.

Or add 'auto' as a combo option, and handle the value of the auto_features
option ourselves?

Greetings,

Andres Freund




Re: recovery modules

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 16:23:34 +0900, Michael Paquier wrote:
> On Sat, Feb 04, 2023 at 10:14:36AM -0800, Nathan Bossart wrote:
> > On Sat, Feb 04, 2023 at 11:59:20AM +0900, Michael Paquier wrote:
> >> +   ArchiveModuleCallbacks struct filled with the callback function 
> >> pointers for
> >> This needs a structname markup.
> >> 
> >> +   can use state->private_data to store it.
> >> And here it would be structfield.
> > 
> > fixed
> 
> Andres, did you have the change to look at that?  I did look at it,
> but it may not address all the points you may have in mind.

Yes, I think this looks pretty good now.

One minor thing: I don't think we really need the AssertVariableIsOfType() for
anything but the Init() one?

Greetings,

Andres Freund




Re: Named Operators

2023-02-08 Thread Tom Lane
I wrote:
> This approach does have a couple of shortcomings:

> * You still have to invent an operator name, even if you never
> plan to use it in queries.  This is just cosmetic though.
> It's not going to matter if the operator name is long or looks like
> line noise, if you only need to use it a few times in setup DDL.

Oh, one other thought is that we could address that complaint
by allowing OPERATOR(identifier), so that your DDL could use
a meaningful name for the operator.  I see that we don't
actually support OPERATOR() right now in CREATE OPERATOR or
ALTER OPERATOR:

regression=# create operator operator(+) (function = foo);
ERROR:  syntax error at or near "("
LINE 1: create operator operator(+) (function = foo);
^

but I doubt that'd be hard to fix.

regards, tom lane




Re: when the startup process doesn't (logging startup delays)

2023-02-08 Thread Bharath Rupireddy
On Mon, Feb 6, 2023 at 9:39 PM Robert Haas  wrote:
>
> On Mon, Feb 6, 2023 at 11:07 AM Tom Lane  wrote:
> > Umm ... is this really the sort of patch to be committing on a
> > release wrap day?
>
> Oh, shoot, I wasn't thinking about that. Would you like me to revert
> it in v15 for now?

Thanks a lot Robert for taking care of this. The patch is committed on
HEAD and reverted on v15. Now that the minor version branches are
stamped, is it time for us to get this to v15? I can then close the CF
entry - https://commitfest.postgresql.org/42/4012/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: run pgindent on a regular basis / scripted manner

2023-02-08 Thread Jelte Fennema
With the new patch --commit works as expected for me now. And sounds
good to up the script a bit afterwards.

On Wed, 8 Feb 2023 at 14:27, Andrew Dunstan  wrote:
>
>
> On 2023-02-08 We 07:41, Andrew Dunstan wrote:
>
>
> On 2023-02-07 Tu 12:21, Jelte Fennema wrote:
>
>
> Does the code-base flag still make sense if you can simply pass a
> directory as regular args now?
>
>
> Probably not. I'll look into removing it.
>
>
>
>
> What we should probably do is remove all the build stuff along with 
> $code_base. It dates back to the time when I developed this as an out of tree 
> replacement for the old pgindent, and is just basically wasted space now. 
> After I get done with the current round of enhancements I'll reorganize the 
> script and get rid of things we don't need any more.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com




Re: recovery modules

2023-02-08 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 08:27:13AM -0800, Andres Freund wrote:
> One minor thing: I don't think we really need the AssertVariableIsOfType() for
> anything but the Init() one?

This is another part that was borrowed from logical decoding output
plugins.  I'm not sure this adds much since f2b73c8 ("Add central
declarations for dlsym()ed symbols").  Perhaps we should remove all of
these assertions for functions that now have central declarations.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: recovery modules

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 09:27:05 -0800, Nathan Bossart wrote:
> On Wed, Feb 08, 2023 at 08:27:13AM -0800, Andres Freund wrote:
> > One minor thing: I don't think we really need the AssertVariableIsOfType() 
> > for
> > anything but the Init() one?
> 
> This is another part that was borrowed from logical decoding output
> plugins.

I know :(. It was needed in an earlier version of the output plugin interface,
where all the different callbacks were looked up via dlsym(), but should have
been removed after that.


> I'm not sure this adds much since f2b73c8 ("Add central
> declarations for dlsym()ed symbols").  Perhaps we should remove all of
> these assertions for functions that now have central declarations.

Most of them weren't needed even before that.

And yes, I'd be for a patch to remove all of those assertions.

Greetings,

Andres Freund




Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

2023-02-08 Thread Peter Geoghegan
On Wed, Feb 8, 2023 at 5:08 AM Aleksander Alekseev
 wrote:
> > To me it's a pretty fundamental violation of how heap visibility works.
>
> I don't think this has much to do with heap visibility. It's true that
> generally a command doesn't see its own tuples. This is done in order
> to avoid the Halloween problem which however can't happen in this
> particular case.
>
> Other than that the heap doesn't care about the visibility, it merely
> stores the tuples. The visibility is determined by xmin/xmax, the
> isolation level, etc.

I think that in a green field situation we would probably make READ
COMMITTED updates throw cardinality violations in the same way as ON
CONFLICT DO UPDATE, while not changing anything about ON CONFLICT DO
NOTHING. We made a deliberate trade-off with the design of DO NOTHING,
which won't lock conflicting rows, and so won't dirty any heap pages
that it doesn't insert on to.

I don't buy your argument about DO UPDATE needing to be brought into
line with DO NOTHING. In any case I'm pretty sure that Tom's remarks
in 2016 about a behavioral inconsistencies (which you cited) actually
called for making DO NOTHING more like DO UPDATE -- not the other way
around.

To me it seems as if allowing the same command to update the same row
more than once is just not desirable in general. It doesn't seem
necessary to bring low level arguments about cmin/cmax into it, nor
does it seem necessary to talk about things like the Halloween
problem. To me the best argument is also the simplest: who would want
us to allow it, and for what purpose?

I suppose that we might theoretically prefer to throw a cardinality
violation for DO NOTHING, but I don't see a way to do that without
locking rows and dirtying heap pages. If somebody were to argue that
we should make DO NOTHING lock rows and throw similar errors now then
I'd also disagree with them, but to a much lesser degree. I don't
think that this patch is a good idea.

-- 
Peter Geoghegan




Re: when the startup process doesn't (logging startup delays)

2023-02-08 Thread Tom Lane
Bharath Rupireddy  writes:
> Thanks a lot Robert for taking care of this. The patch is committed on
> HEAD and reverted on v15. Now that the minor version branches are
> stamped, is it time for us to get this to v15? I can then close the CF
> entry - https://commitfest.postgresql.org/42/4012/.

No objection to un-reverting from here.

regards, tom lane




Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 10:22:24AM -0500, Robert Haas wrote:
> I felt like the archive modules work was a step forward when we did,
> because basic_archive does some things that you're not likely to get
> right if you do it on your own. And a similar approach to
> restore_command might be also be valuable, at least in my opinion.
> However, the gains that we can get out of the archive module facility
> in its present form do seem to be somewhat limited, for exactly the
> kinds of reasons being discussed here.

I'm glad to hear that there is interest in taking this stuff to the next
level.  I'm currently planning to first get the basic API in place for
recovery modules like we have for archive modules, but I'm hoping to
position it so that it leads naturally to asynchronous, parallel, and/or
batching approaches down the road (v17?).

> I kind of wonder whether we ought to try to flip the model around. At
> present, the idea is that the archiver is doing its thing and it makes
> callbacks into the archive module. But what if we got rid of the
> archiver main loop altogether and put the main loop inside of the
> archive module, and have it call back to some functions that we
> provide? One function could be like char
> *pgarch_next_file_to_be_archived_if_there_is_one_ready(void) and the
> other could be like void
> pgarch_some_file_that_you_gave_me_previously_is_now_fully_archived(char
> *which_one). That way, we'd break the tight coupling where you have to
> get a unit of work and perform it in full before you can get the next
> unit of work. Some variant of this could work on the restore side,
> too, I think, although we have less certainty about how much it makes
> to prefetch for restore than we do about what needs to be archived.

I think this could be a good approach if we decide not to bake too much
into PostgreSQL itself (e.g., such as creating multiple archive workers
that each call out to the module).  Archive module authors would
effectively need to write their own archiver processes.  That sounds super
flexible, but it also sounds like it might be harder to get right.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: HOT chain validation in verify_heapam()

2023-02-08 Thread Robert Haas
On Sun, Feb 5, 2023 at 3:57 AM Himanshu Upadhyaya
 wrote:
> Thanks, yes it's working fine with Prepared Transaction.
> Please find attached the v9 patch incorporating all the review comments.

I don't know quite how we're still going around in circles about this,
but this code makes no sense to me at all:

/*
 * Add data to the predecessor array even if the current or
 * successor's LP is not valid. We will not process/validate these
 * offset entries while looping over the predecessor array but
 * having all entries in the predecessor array will help in
 * identifying(and validating) the Root of a chain.
 */
if (!lp_valid[ctx.offnum] || !lp_valid[nextoffnum])
{
predecessor[nextoffnum] = ctx.offnum;
continue;
}

If the current offset number is not for a valid line pointer, then it
makes no sense to talk about the successor. An invalid redirected line
pointer is one that points off the end of the line pointer array, or
to before the beginning of the line pointer array, or to a line
pointer that is unused. An invalid line pointer that is LP_USED is one
which points to a location outside the page, or to a location inside
the page. In none of these cases does it make any sense to talk about
the next tuple. If the line pointer isn't valid, it's pointing to some
invalid location where there cannot possibly be a tuple. In other
words, if lp_valid[ctx.offnum] is false, then nextoffnum is a garbage
value, and therefore referencing predecessor[nextoffnum] is useless
and dangerous.

If the next offset number is not for a valid line pointer, we could in
theory still assign to the predecessor array, as you propose here. In
that case, the tuple or line pointer at ctx.offnum is pointing to the
line pointer at nextoffnum and that is all fine. But what is the
point? The comment claims that the point is that it will help us
identify and validate the root of the hot chain. But if the line
pointer at nextoffnum is not valid, it can't be the root of a hot
chain. When we're talking about the root of a HOT chain, we're
speaking about a tuple. If lp_valid[nextoffnum] is false, there is no
tuple. Instead of pointing to a tuple, that line pointer is pointing
to garbage.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: recovery modules

2023-02-08 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 09:33:44AM -0800, Andres Freund wrote:
> And yes, I'd be for a patch to remove all of those assertions.

done

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 09fcca03d4a91be5757201cc311d3dad08463cd0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 27 Jan 2023 21:01:22 -0800
Subject: [PATCH v7 1/4] s/ArchiveContext/ArchiveCallbacks

---
 src/backend/postmaster/pgarch.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index e551af2905..065d7d1313 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -97,7 +97,7 @@ char	   *XLogArchiveLibrary = "";
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
-static ArchiveModuleCallbacks ArchiveContext;
+static ArchiveModuleCallbacks ArchiveCallbacks;
 
 
 /*
@@ -406,8 +406,8 @@ pgarch_ArchiverCopyLoop(void)
 			HandlePgArchInterrupts();
 
 			/* can't do anything if not configured ... */
-			if (ArchiveContext.check_configured_cb != NULL &&
-!ArchiveContext.check_configured_cb())
+			if (ArchiveCallbacks.check_configured_cb != NULL &&
+!ArchiveCallbacks.check_configured_cb())
 			{
 ereport(WARNING,
 		(errmsg("archive_mode enabled, yet archiving is not configured")));
@@ -508,7 +508,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = ArchiveContext.archive_file_cb(xlog, pathname);
+	ret = ArchiveCallbacks.archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -814,7 +814,7 @@ HandlePgArchInterrupts(void)
 /*
  * LoadArchiveLibrary
  *
- * Loads the archiving callbacks into our local ArchiveContext.
+ * Loads the archiving callbacks into our local ArchiveCallbacks.
  */
 static void
 LoadArchiveLibrary(void)
@@ -827,7 +827,7 @@ LoadArchiveLibrary(void)
  errmsg("both archive_command and archive_library set"),
  errdetail("Only one of archive_command, archive_library may be set.")));
 
-	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+	memset(&ArchiveCallbacks, 0, sizeof(ArchiveModuleCallbacks));
 
 	/*
 	 * If shell archiving is enabled, use our special initialization function.
@@ -844,9 +844,9 @@ LoadArchiveLibrary(void)
 		ereport(ERROR,
 (errmsg("archive modules have to define the symbol %s", "_PG_archive_module_init")));
 
-	(*archive_init) (&ArchiveContext);
+	(*archive_init) (&ArchiveCallbacks);
 
-	if (ArchiveContext.archive_file_cb == NULL)
+	if (ArchiveCallbacks.archive_file_cb == NULL)
 		ereport(ERROR,
 (errmsg("archive modules must register an archive callback")));
 
@@ -859,6 +859,6 @@ LoadArchiveLibrary(void)
 static void
 pgarch_call_module_shutdown_cb(int code, Datum arg)
 {
-	if (ArchiveContext.shutdown_cb != NULL)
-		ArchiveContext.shutdown_cb();
+	if (ArchiveCallbacks.shutdown_cb != NULL)
+		ArchiveCallbacks.shutdown_cb();
 }
-- 
2.25.1

>From 870999a50d3372d9ed33583b4d6eac05a2accada Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 27 Jan 2023 21:16:34 -0800
Subject: [PATCH v7 2/4] move archive module exports to dedicated headers

---
 contrib/basic_archive/basic_archive.c   |  2 +-
 src/backend/postmaster/pgarch.c |  2 ++
 src/backend/postmaster/shell_archive.c  |  3 +-
 src/backend/utils/misc/guc_tables.c |  1 +
 src/include/postmaster/archive_module.h | 47 +
 src/include/postmaster/pgarch.h | 39 
 src/include/postmaster/shell_archive.h  | 24 +
 7 files changed, 77 insertions(+), 41 deletions(-)
 create mode 100644 src/include/postmaster/archive_module.h
 create mode 100644 src/include/postmaster/shell_archive.h

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 3d29711a31..87bbb2174d 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -32,7 +32,7 @@
 
 #include "common/int.h"
 #include "miscadmin.h"
-#include "postmaster/pgarch.h"
+#include "postmaster/archive_module.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 065d7d1313..281d9fd8b7 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -34,8 +34,10 @@
 #include "lib/binaryheap.h"
 #include "libpq/pqsignal.h"
 #include "pgstat.h"
+#include "postmaster/archive_module.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/pgarch.h"
+#include "postmaster/shell_archive.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/latch.h"
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index 806b81c3f2..35f88c1222 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/s

Re: make_ctags: use -I option to ignore pg_node_attr macro

2023-02-08 Thread Fujii Masao



On 2023/02/08 20:17, Tatsuo Ishii wrote:

Attached is the v2 patch.


Thanks for the patch!

With the patch, I got the following error when executing make_etags..

$ ./src/tools/make_etags
etags: invalid option -- 'e'
Try 'etags --help' for a complete list of options.
sort: No such file or directory


Oops. Thank you for pointing it out. BTW, just out of curiosity, do
you have etags on you Mac?


Yes.

$ etags --version
etags (GNU Emacs 28.2)
Copyright (C) 2022 Free Software Foundation, Inc.
This program is distributed under the terms in ETAGS.README



This is the comment for the commit d1e2a380cb. I found that make_etags
with
an invalid option reported the following usage message mentioning
make_ctags
(not make_etags). Isn't this confusing?

$ ./src/tools/make_etags -a
Usage: /.../make_ctags [-e][-n]


That's hard to fix without some code duplication. We decided that
make_etags is not a symlink to make_ctags, rather execs make_ctags. Of
course we could let make_etags perform the same option check, but I
doubt it's worth the trouble.


How about just applying the following into make_etags?

+if [ $# -gt 1 ] || ( [ $# -eq 1 ] && [ $1 != "-n" ] )
+then   echo "Usage: $0 [-n]"
+   exit 1
+fi



Anyway, attached is the v3 patch.


Thanks for updating the patch!

With the patch, make_etags caused the following error messages
on my MacOS.

$ ./src/tools/make_etags
No such file or directory
No such file or directory

To fix this error, probaby we should get rid of double-quotes
from "$FLAGS" "$IGNORE_IDENTIFIES" in the following command.

-   xargs ctags $MODE -a -f $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES"
+   xargs $PROG $MODE $TAGS_OPT $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES"



+   elsectags --help 2>&1 | grep -- -e >/dev/null
+   # Note that "ctags --help" does not always work. Even certain 
ctags does not have the option.

This code seems to assume that there is non-Exuberant ctags
supporting -e option. But does such ctags really exist?


I fixed the above issues and refactored the code.
Attached is the updated version of the patch. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/src/tools/make_ctags b/src/tools/make_ctags
index 102881667b..aa7d7b573f 100755
--- a/src/tools/make_ctags
+++ b/src/tools/make_ctags
@@ -9,15 +9,19 @@ then  echo $usage
exit 1
 fi
 
-MODE=
+EMACS_MODE=
 NO_SYMLINK=
+IS_EXUBERANT=
+PROG="ctags"
+TAGS_OPT="-a -f"
 TAGS_FILE="tags"
+FLAGS=
+IGNORE_IDENTIFIES=
 
 while [ $# -gt 0 ]
 do
if [ $1 = "-e" ]
-   thenMODE="-e"
-   TAGS_FILE="TAGS"
+   thenEMACS_MODE="Y"
elif [ $1 = "-n" ]
thenNO_SYMLINK="Y"
else
@@ -27,15 +31,24 @@ do
shift
 done
 
-command -v ctags >/dev/null || \
+if [ ! "$EMACS_MODE" ]
+then   (command -v ctags >/dev/null) || \
{ echo "'ctags' program not found" 1>&2; exit 1; }
+fi
 
-trap "ret=$?; rm -rf /tmp/$$; exit $ret" 0 1 2 3 15
-rm -f ./$TAGS_FILE
-
-IS_EXUBERANT=""
 ctags --version 2>&1 | grep Exuberant && IS_EXUBERANT="Y"
 
+if [ "$EMACS_MODE" ]
+then   TAGS_FILE="TAGS"
+   if [ "$IS_EXUBERANT" ]
+   then PROG="ctags -e"
+   else(command -v etags >/dev/null) || \
+   { echo "neither 'etags' nor exuberant 'ctags' program not 
found" 1>&2; exit 1; }
+   PROG="etags"
+   TAGS_OPT="-a -o"
+   fi
+fi
+
 # List of kinds supported by Exuberant Ctags 5.8
 # generated by ctags --list-kinds
 # --c-kinds was called --c-types before 2003
@@ -56,20 +69,23 @@ ctags --version 2>&1 | grep Exuberant && IS_EXUBERANT="Y"
 
 if [ "$IS_EXUBERANT" ]
 then   FLAGS="--c-kinds=+dfmstuv"
-else   FLAGS="-dt"
+elif [ ! "$EMACS_MODE" ]
+then   FLAGS="-dt"
 fi
 
 # Use -I option to ignore a macro
 if [ "$IS_EXUBERANT" ]
 then   IGNORE_IDENTIFIES="-I pg_node_attr+"
-else   IGNORE_IDENTIFIES=
 fi
 
+trap "ret=$?; rm -rf /tmp/$$; exit $ret" 0 1 2 3 15
+rm -f ./$TAGS_FILE
+
 # this is outputting the tags into the file 'tags', and appending
 find `pwd`/ \( -name tmp_install -prune -o -name tmp_check -prune \) \
-o \( -name "*.[chly]" -o -iname "*makefile*" -o -name "*.mk" -o -name 
"*.in" \
-o -name "*.sql" -o -name "*.p[lm]" \) -type f -print |
-   xargs ctags $MODE -a -f $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES"
+   xargs $PROG $TAGS_OPT $TAGS_FILE $FLAGS $IGNORE_IDENTIFIES
 
 # Exuberant tags has a header that we cannot sort in with the other entries
 # so we skip the sort step
diff --git a/src/tools/make_etags b/src/tools/make_etags
index afc57e3e89..7e51bb4c4e 100755
--- a/src/tools/make_etags
+++ b/src/tools/make_etags
@@ -1,5 +1,11 @@
 #!/bin/sh
-# src/tools/make_etags
+
+# src/tools/make_etags [-n]
+
+if [ $# -gt 1 ] || ( [ $# -eq 1 ] && [ $1 != "-n" ] )
+then   echo "Usage: $0 [-n]"
+   exit 1
+fi
 
 cdir=`dirname $0`
 dir=`(cd $cdir && pwd)`


Re: Logical replication timeout problem

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 13:36:02 +0530, Amit Kapila wrote:
> On Wed, Feb 8, 2023 at 10:57 AM Andres Freund  wrote:
> >
> > On 2023-02-03 10:13:54 +0530, Amit Kapila wrote:
> > > I am planning to push this to HEAD sometime next week (by Wednesday).
> > > To backpatch this, we need to fix it in some non-standard way, like
> > > without introducing a callback which I am not sure is a good idea. If
> > > some other committers vote to get this in back branches with that or
> > > some different idea that can be backpatched then we can do that
> > > separately as well. I don't see this as a must-fix in back branches
> > > because we have a workaround (increase timeout) or users can use the
> > > streaming option (for >=14).
> >
> > I just saw the commit go in, and a quick scan over it makes me think neither
> > this commit, nor f95d53eded, which unfortunately was already backpatched, is
> > the right direction. The wrong direction likely started quite a bit earlier,
> > with 024711bb544.
> >
> > It feels quite fundamentally wrong that bascially every output plugin needs 
> > to
> > call a special function in nearly every callback.
> >
> > In 024711bb544 there was just one call to OutputPluginUpdateProgress() in
> > pgoutput.c. Quite tellingly, it just updated pgoutput, without touching
> > test_decoding.
> >
> > Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another.
> >
> 
> I think the original commit 024711bb544 forgets to call it in
> test_decoding and the other commits followed the same and missed to
> update test_decoding.

I think that's a symptom of the wrong architecture having been chosen. This
should *never* have been the task of output plugins.


> > I don't think:
> > /*
> >  * Wait until there is no pending write. Also process replies from the other
> >  * side and check timeouts during that.
> >  */
> > static void
> > ProcessPendingWrites(void)
> >
> > Is really a good name. What are we processing?
> >
> 
> It is for sending the keep_alive message (if required). That is
> normally done when we skipped processing a transaction to ensure sync
> replication is not delayed.

But how is that "processing pending writes"? For me "processing" implies we're
doing some analysis on them or such.


If we want to write data in WalSndUpdateProgress(), shouldn't we move the
common code of WalSndWriteData() and WalSndUpdateProgress() into
ProcessPendingWrites()?


> It has been discussed previously [1][2] to
> extend the WalSndUpdateProgress() interface. Basically, as explained
> by Craig [2], this has to be done from plugin as it can do filtering
> or there could be other reasons why the output plugin skips all
> changes. We used the same interface for sending keep-alive message
> when we processed a lot of (DDL) changes without sending anything to
> plugin.
>
> [1] - 
> https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de
> [2] - 
> https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com

I don't buy that this has to be done by the output plugin. The actual sending
out of data happens via the LogicalDecodingContext callbacks, so we very well
can know whether we recently did send out data or not.

This really is a concern of the LogicalDecodingContext, it has pretty much
nothing to do with output plugins.  We should remove all calls of
OutputPluginUpdateProgress() from pgoutput, and add the necessary calls to
LogicalDecodingContext->update_progress() to generic code. And

Additionally we should either rename WalSndUpdateProgress(), because it's now
doing *far* more than "updating progress", or alternatively, split it into two
functions.


I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
consider what happens if e.g. the origin filter filters out entire
transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
we'll be lucky because we'll return quickly to XLogSendLogical(), but not
reliably.


Greetings,

Andres Freund




Re: A bug with ExecCheckPermissions

2023-02-08 Thread Alvaro Herrera
On 2023-Feb-08, Amit Langote wrote:

> On Wed, Feb 8, 2023 at 16:19 Alvaro Herrera  wrote:

> > I think we should also patch ExecCheckPermissions to use forboth(),
> > scanning the RTEs as it goes over the perminfos, and make sure that the
> > entries are consistent.
> 
> Hmm, we can’t use forboth here, because not all RTEs have the corresponding
> RTEPermissionInfo, inheritance children RTEs, for example.

Doh, of course.

> Also, it doesn’t make much sense to reinstate the original loop over
> range table and fetch the RTEPermissionInfo for the RTEs with non-0
> perminfoindex, because the main goal of the patch was to make
> ExecCheckPermissions() independent of range table length.

Yeah, I'm thinking in a mechanism that would allow us to detect bugs in
development builds — no need to have it run in production builds.
However, I can't see any useful way to implement it.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)




Re: recovery modules

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 09:57:56 -0800, Nathan Bossart wrote:
> On Wed, Feb 08, 2023 at 09:33:44AM -0800, Andres Freund wrote:
> > And yes, I'd be for a patch to remove all of those assertions.
> 
> done

If you'd reorder it so that 0004 applies independently from the other changes,
I'd just push that now.


I was remembering additional AssertVariableIsOfType(), but it looks like we
actually did remember to take them out when redesigning the output plugin
interface...

Greetings,

Andres Freund




Re: Logical replication timeout problem

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> consider what happens if e.g. the origin filter filters out entire
> transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
> we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> reliably.

Is it actually the right thing to check SyncRepRequested() in that logic? It's
quite common to set up syncrep so that individual users or transactions opt
into syncrep, but to leave the default disabled.

I don't really see an alternative to making this depend solely on
sync_standbys_defined.

Greetings,

Andres Freund




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-08 Thread Peter Smith
On Wed, Feb 8, 2023 at 8:03 PM Hayato Kuroda (Fujitsu)
 wrote:
>
...
> > ==
> >
> > src/backend/replication/logical/worker.c
> >
> > 2. maybe_apply_delay
> >
> > + if (wal_receiver_status_interval > 0 &&
> > + diffms > wal_receiver_status_interval * 1000L)
> > + {
> > + WaitLatch(MyLatch,
> > +   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> > +   wal_receiver_status_interval * 1000L,
> > +   WAIT_EVENT_RECOVERY_APPLY_DELAY);
> > + send_feedback(last_received, true, false, true);
> > + }
> >
> > I felt that introducing another variable like:
> >
> > long statusinterval_ms = wal_receiver_status_interval * 1000L;
> >
> > would help here by doing 2 things:
> > 1) The condition would be easier to read because the ms units would be the 
> > same
> > 2) Won't need * 1000L repeated in two places.
> >
> > Only, do take care to assign this variable in the right place in this
> > loop in case the configuration is changed.
>
> Fixed. Calculations are done on two lines - first one is the entrance of the 
> loop,
> and second one is the after SIGHUP is detected.
>

TBH, I expected you would write this as just a *single* variable
assignment before the condition like below:

SUGGESTION (tweaked comment and put single assignment before condition)
/*
 * Call send_feedback() to prevent the publisher from exiting by
 * timeout during the delay, when the status interval is greater than
 * zero.
 */
status_interval_ms = wal_receiver_status_interval * 1000L;
if (status_interval_ms > 0 && diffms > status_interval_ms)
{
...

~

I understand in theory, your code is more efficient, but in practice,
I think the overhead of a single variable assignment every loop
iteration (which is doing WaitLatch anyway) is of insignificant
concern, whereas having one assignment is simpler than having two IMO.

But, if you want to keep it the way you have then that is OK.

Otherwise, this patch v32 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: recovery modules

2023-02-08 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 10:24:18AM -0800, Andres Freund wrote:
> If you'd reorder it so that 0004 applies independently from the other changes,
> I'd just push that now.

done

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 688566e23519f173a805696e4f00345f30d5d8e6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 8 Feb 2023 09:51:46 -0800
Subject: [PATCH v8 1/4] remove unnecessary assertions for functions with
 central declarations

---
 contrib/basic_archive/basic_archive.c   | 2 --
 contrib/test_decoding/test_decoding.c   | 2 --
 src/backend/postmaster/shell_archive.c  | 2 --
 src/backend/replication/pgoutput/pgoutput.c | 2 --
 4 files changed, 8 deletions(-)

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 3d29711a31..36b7a4814a 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -81,8 +81,6 @@ _PG_init(void)
 void
 _PG_archive_module_init(ArchiveModuleCallbacks *cb)
 {
-	AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
-
 	cb->check_configured_cb = basic_archive_configured;
 	cb->archive_file_cb = basic_archive_file;
 }
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e523d22eba..b7e6048647 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -127,8 +127,6 @@ _PG_init(void)
 void
 _PG_output_plugin_init(OutputPluginCallbacks *cb)
 {
-	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
-
 	cb->startup_cb = pg_decode_startup;
 	cb->begin_cb = pg_decode_begin_txn;
 	cb->change_cb = pg_decode_change;
diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c
index 806b81c3f2..7771b951b7 100644
--- a/src/backend/postmaster/shell_archive.c
+++ b/src/backend/postmaster/shell_archive.c
@@ -29,8 +29,6 @@ static void shell_archive_shutdown(void);
 void
 shell_archive_init(ArchiveModuleCallbacks *cb)
 {
-	AssertVariableIsOfType(&shell_archive_init, ArchiveModuleInit);
-
 	cb->check_configured_cb = shell_archive_configured;
 	cb->archive_file_cb = shell_archive_file;
 	cb->shutdown_cb = shell_archive_shutdown;
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 73b080060d..98377c094b 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -248,8 +248,6 @@ static void pgoutput_column_list_init(PGOutputData *data,
 void
 _PG_output_plugin_init(OutputPluginCallbacks *cb)
 {
-	AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
-
 	cb->startup_cb = pgoutput_startup;
 	cb->begin_cb = pgoutput_begin_txn;
 	cb->change_cb = pgoutput_change;
-- 
2.25.1

>From c09374708bdde1eab98fdcad320d5689c6764bc2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 27 Jan 2023 21:01:22 -0800
Subject: [PATCH v8 2/4] s/ArchiveContext/ArchiveCallbacks

---
 src/backend/postmaster/pgarch.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index e551af2905..065d7d1313 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -97,7 +97,7 @@ char	   *XLogArchiveLibrary = "";
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
-static ArchiveModuleCallbacks ArchiveContext;
+static ArchiveModuleCallbacks ArchiveCallbacks;
 
 
 /*
@@ -406,8 +406,8 @@ pgarch_ArchiverCopyLoop(void)
 			HandlePgArchInterrupts();
 
 			/* can't do anything if not configured ... */
-			if (ArchiveContext.check_configured_cb != NULL &&
-!ArchiveContext.check_configured_cb())
+			if (ArchiveCallbacks.check_configured_cb != NULL &&
+!ArchiveCallbacks.check_configured_cb())
 			{
 ereport(WARNING,
 		(errmsg("archive_mode enabled, yet archiving is not configured")));
@@ -508,7 +508,7 @@ pgarch_archiveXlog(char *xlog)
 	snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog);
 	set_ps_display(activitymsg);
 
-	ret = ArchiveContext.archive_file_cb(xlog, pathname);
+	ret = ArchiveCallbacks.archive_file_cb(xlog, pathname);
 	if (ret)
 		snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog);
 	else
@@ -814,7 +814,7 @@ HandlePgArchInterrupts(void)
 /*
  * LoadArchiveLibrary
  *
- * Loads the archiving callbacks into our local ArchiveContext.
+ * Loads the archiving callbacks into our local ArchiveCallbacks.
  */
 static void
 LoadArchiveLibrary(void)
@@ -827,7 +827,7 @@ LoadArchiveLibrary(void)
  errmsg("both archive_command and archive_library set"),
  errdetail("Only one of archive_command, archive_library may be set.")));
 
-	memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks));
+	memset(&ArchiveCallbacks, 0, sizeof(ArchiveModuleCallbacks));
 
 	/*
 	 * If shell archiving is enabled, use our special initialization function.
@@ -844,

Re: Deadlock between logrep apply worker and tablesync worker

2023-02-08 Thread Peter Smith
On Fri, Feb 3, 2023 at 6:58 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, February 2, 2023 7:21 PM Amit Kapila  
> wrote:
> >
> > On Thu, Feb 2, 2023 at 12:05 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, January 31, 2023 1:07 AM vignesh C 
> > wrote:
> > > > On Mon, 30 Jan 2023 at 17:30, vignesh C  wrote:
> > > >
> > >
> > > I also tried to test the time of "src/test/subscription/t/002_types.pl"
> > > before and after the patch(change the lock level) and Tom's
> > > patch(split
> > > transaction) like what Vignesh has shared on -hackers.
> > >
> > > I run about 100 times for each case. Tom's and the lock level patch
> > > behave similarly on my machines[1].
> > >
> > > HEAD: 3426 ~ 6425 ms
> > > HEAD + Tom: 3404 ~ 3462 ms
> > > HEAD + Vignesh: 3419 ~ 3474 ms
> > > HEAD + Tom + Vignesh: 3408 ~ 3454 ms
> > >
> > > Even apart from the testing time reduction, reducing the lock level
> > > and lock the specific object can also help improve the lock contention
> > > which user(that use the exposed function) , table sync worker and
> > > apply worker can also benefit from it. So, I think pushing the patch to 
> > > change
> > the lock level makes sense.
> > >
> > > And the patch looks good to me.
> > >
> >
> > Thanks for the tests. I also see a reduction in test time variability with 
> > Vignesh's
> > patch. I think we can release the locks in case the origin is concurrently
> > dropped as in the attached patch. I am planning to commit this patch
> > tomorrow unless there are more comments or objections.
> >
> > > While on it, after pushing the patch, I think there is another case
> > > might also worth to be improved, that is the table sync and apply
> > > worker try to drop the same origin which might cause some delay. This
> > > is another case(different from the deadlock), so I feel we can try to 
> > > improve
> > this in another patch.
> > >
> >
> > Right, I think that case could be addressed by Tom's patch to some extent 
> > but
> > I am thinking we should also try to analyze if we can completely avoid the 
> > need
> > to remove origins from both processes. One idea could be to introduce
> > another relstate something like PRE_SYNCDONE and set it in a separate
> > transaction before we set the state as SYNCDONE and remove the slot and
> > origin in tablesync worker.
> > Now, if the tablesync worker errors out due to some reason during the second
> > transaction, it can remove the slot and origin after restart by checking 
> > the state.
> > However, it would add another relstate which may not be the best way to
> > address this problem. Anyway, that can be accomplished as a separate patch.
>
> Here is an attempt to achieve the same.
> Basically, the patch removes the code that drop the origin in apply worker. 
> And
> add a new state PRE_SYNCDONE after synchronization finished in front of apply
> (sublsn set), but before dropping the origin and other final cleanups. The
> tablesync will restart and redo the cleanup if it failed after reaching the 
> new
> state. Besides, since the changes can already be applied on the table in
> PRE_SYNCDONE state, so I also modified the check in
> should_apply_changes_for_rel(). And some other conditions for the origin drop
> in subscription commands are were adjusted in this patch.
>


BTW, the tablesync.c has a large file header comment which describes
all about the relstates including some examples. So this patch needs
to include modifications to that comment.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Logical replication timeout problem

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 10:30:37 -0800, Andres Freund wrote:
> On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> > I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> > consider what happens if e.g. the origin filter filters out entire
> > transactions. We'll afaics never get to WalSndUpdateProgress(). In some 
> > cases
> > we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> > reliably.
>
> Is it actually the right thing to check SyncRepRequested() in that logic? It's
> quite common to set up syncrep so that individual users or transactions opt
> into syncrep, but to leave the default disabled.
>
> I don't really see an alternative to making this depend solely on
> sync_standbys_defined.

Hacking on a rough prototype how I think this should rather look, I had a few
questions / remarks:

- We probably need to call UpdateProgress from a bunch of places in decode.c
  as well? Indicating that we're lagging by a lot, just because all
  transactions were in another database seems decidedly suboptimal.

- Why should lag tracking only be updated at commit like points? That seems
  like it adds odd discontinuinities?

- The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems
  somewhat odd. They have very overlapping meanings IMO.

- there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there
  should be? It's legit progress.

- That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite
  confusing, naming-wise. IIUC "reading" is about receiving confirmation
  messages, "writing" about the time the record was generated.  ISTM that the
  current time is a quite poor approximation in XLogSendPhysical(), but pretty
  much meaningless in WalSndUpdateProgress()? Am I missing something?

- Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
  WalSndWriteData() missing wal_sender_timeout <= 0 checks?

- I don't really understand why f95d53edged55 added !end_xact to the if
  condition for ProcessPendingWrites(). Is the theory that we'll end up in an
  outer loop soon?


Attached is a current, quite rough, prototype. It addresses some of the points
raised, but far from all. There's also several XXXs/FIXMEs in it.  I changed
the file-ending to .txt to avoid hijacking the CF entry.

Greetings,

Andres Freund
>From 1d89c84b1465b28ddef8c110500c3744477488df Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 8 Feb 2023 11:57:37 -0800
Subject: [PATCH v1] WIP: Initial sketch of progress update rework

---
 src/include/replication/logical.h |   7 +-
 src/include/replication/output_plugin.h   |   1 -
 src/include/replication/reorderbuffer.h   |  12 --
 src/backend/replication/logical/logical.c | 188 ++
 .../replication/logical/reorderbuffer.c   |  20 --
 src/backend/replication/pgoutput/pgoutput.c   |  10 -
 src/backend/replication/walsender.c   |  14 +-
 7 files changed, 116 insertions(+), 136 deletions(-)

diff --git a/src/include/replication/logical.h 
b/src/include/replication/logical.h
index 5f49554ea05..472f2a5b84c 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -27,8 +27,8 @@ typedef LogicalOutputPluginWriterWrite 
LogicalOutputPluginWriterPrepareWrite;
 typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct 
LogicalDecodingContext *lr,

 XLogRecPtr Ptr,

 TransactionId xid,
-   
 bool skipped_xact
-);
+   
 bool did_write,
+   
 bool finished_xact);
 
 typedef struct LogicalDecodingContext
 {
@@ -105,10 +105,9 @@ typedef struct LogicalDecodingContext
 */
boolaccept_writes;
boolprepared_write;
+   booldid_write;
XLogRecPtr  write_location;
TransactionId write_xid;
-   /* Are we processing the end LSN of a transaction? */
-   boolend_xact;
 } LogicalDecodingContext;
 
 
diff --git a/src/include/replication/output_plugin.h 
b/src/include/replication/output_plugin.h
index 2d89d26586e..b9358e15444 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -245,6 +245,5 @@ typedef struct OutputPluginCallbacks
 /* Functions in replication/logical/logical.c */
 extern void OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool 
last_write);
 extern void OutputPluginWrite(struct LogicalDecodingContext *ctx, bool 
last_write

Re: Move defaults toward ICU in 16?

2023-02-08 Thread Jeff Davis
On Thu, 2023-02-02 at 18:10 -0500, Tom Lane wrote:
> Yeah.  I would be resistant to making ICU a required dependency,
> but it doesn't seem unreasonable to start moving towards it being
> our default collation support.

Patch attached.

To get the default locale, the patch initializes a UCollator with NULL
for the locale name, and then queries it for the locale name. Then it's
converted to a language tag, which is consistent with the initial
collation import. I'm not sure that's the best way, but it seems
reasonable.

If it's a user-provided locale (--icu-locale=), then the patch leaves
it as-is, and does not convert it to a language tag (consistent with
CREATE COLLATION and CREATE DATABASE).

I opened another discussion about whether we want to try harder to
validate or canonicalize the locale name:

https://www.postgresql.org/message-id/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.ca...@j-davis.com

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 1b7d940c0f12062185b8b42bf8d3c0a6f05a74d4 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 8 Feb 2023 12:06:26 -0800
Subject: [PATCH v1] Use ICU by default at initdb time.

If the ICU locale is not specified, initialize the default collator
and retrieve the locale name from that.

Discussion: https://postgr.es/m/510d284759f6e943ce15096167760b2edcb2e700.ca...@j-davis.com
---
 src/bin/initdb/initdb.c | 74 +++--
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7a58c33ace..7321652db3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -133,7 +134,11 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+#ifdef USE_ICU
+static char locale_provider = COLLPROVIDER_ICU;
+#else
 static char locale_provider = COLLPROVIDER_LIBC;
+#endif
 static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
@@ -2024,6 +2029,72 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+/*
+ * Check that ICU accepts the locale name; or if not specified, retrieve the
+ * default ICU locale.
+ */
+static void
+check_icu_locale()
+{
+#ifdef USE_ICU
+	UCollator	*collator;
+	UErrorCode   status;
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(icu_locale, &status);
+	if (U_FAILURE(status))
+	{
+		if (icu_locale)
+			pg_fatal("ICU locale \"%s\" could not be opened: %s",
+	 icu_locale, u_errorName(status));
+		else
+			pg_fatal("default ICU locale could not be opened: %s",
+	 u_errorName(status));
+	}
+
+	/* if not specified, get locale from default collator */
+	if (icu_locale == NULL)
+	{
+		const char	*default_locale;
+
+		status = U_ZERO_ERROR;
+		default_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE,
+			  &status);
+		if (U_FAILURE(status))
+		{
+			ucol_close(collator);
+			pg_fatal("could not determine default ICU locale");
+		}
+
+		if (U_ICU_VERSION_MAJOR_NUM >= 54)
+		{
+			const bool	 strict = true;
+			char		*langtag;
+			int			 len;
+
+			len = uloc_toLanguageTag(default_locale, NULL, 0, strict, &status);
+			langtag = pg_malloc(len + 1);
+			status = U_ZERO_ERROR;
+			uloc_toLanguageTag(default_locale, langtag, len + 1, strict,
+			   &status);
+
+			if (U_FAILURE(status))
+			{
+ucol_close(collator);
+pg_fatal("could not determine language tag for default locale \"%s\": %s",
+		 default_locale, u_errorName(status));
+			}
+
+			icu_locale = langtag;
+		}
+		else
+			icu_locale = pg_strdup(default_locale);
+	}
+
+	ucol_close(collator);
+#endif
+}
+
 /*
  * set up the locale variables
  *
@@ -2077,8 +2148,7 @@ setlocales(void)
 
 	if (locale_provider == COLLPROVIDER_ICU)
 	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
+		check_icu_locale();
 
 		/*
 		 * In supported builds, the ICU locale ID will be checked by the
-- 
2.34.1



Re: SLRUs in the main buffer pool - Page Header definitions

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 20:04:52 +, Bagga, Rishu wrote:
> To summarize, our underlying effort is to move the SLRUs to the buffer
> cache. We were working with Thomas Munro off a patch he introduced here
> [1]. Munro’s patch moves SLRUs to the buffer cache by introducing the
> pseudo db id 9 to denote SLRU pages, but maintains the current “raw”
> data format of SLRU pages. The addition of page headers in our patch
> resolves this issue [2] Munro mentions in this email [3].
>
> Heikki Linnakangas then introduced  patch on top of Munro’s patch that
> modularizes the storage manager, allowing SLRUs to use it [4]. Instead
> of using db id 9, SLRUs use spcOid 9, and each SLRU is its own relation.
> Here, Heikki simplifies the storage manager by having each struct be
> responsible for just one fork of a relation; thus increasing
> extensibility of the smgr API, including for SLRUs. [5] We integrated
> our changes introducing page headers for SLRU pages, and upgrade logic
> to Heikki’s latest patch.

That doesn't explain the bulk of the changes here.  Why e.g. does any of the
above require RelationGetSmgr() to handle the fork as well? Why do we need
smgrtruncate_multi()? And why does all of this happens as one patch?

As is, with a lot of changes mushed together, without distinct explanations
for why is what done, this patch is essentially unreviewable. It'll not make
progress in this form.

It doesn't help that much to reference prior discussions in the email I'm
responding to - the patches need to be mostly understandable on their own,
without reading several threads.  And there needs to be explanations in the
code as well, otherwise we'll have no chance to understand any of this in a
few years.


> > > Rebased patch as per latest community changes since last email.
>
> > This version doesn't actually build.
> > https://cirrus-ci.com/task/4512310190931968
> > [19:43:20.131] FAILED:
> > src/test/modules/test_slru/test_slru.so.p/test_slru.c.o
>
>
> As for the build failures, I was using make install to build, and make
> check for regression tests.

There's a *lot* more tests than the main regression tests. You need to make
sure that they all continue to work. Enable tap tests and se check-world.


> The build failures in this link are from unit tests dependent on custom
> SLRUs, which would no longer apply. I’ve attached another patch that removes
> these tests.

I think you'd need to fix those tests, rather than just removing them.

I suspect we're going to continue to want SLRU specific stats, but your patch
also seems to remove those.


Greetings,

Andres Freund




Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Jim Jones
while working on another item of the TODO list I realized that I should 
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.


Fixed in v5.

Best regards, Jim
From f503b25c7fd8d984d29536e78577741e5e7c5e9f Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v5] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  | 34 +++
 src/backend/utils/adt/xml.c | 68 +
 src/include/catalog/pg_proc.dat |  3 +
 src/test/regress/expected/xml.out   | 93 +
 src/test/regress/expected/xml_1.out | 45 ++
 src/test/regress/sql/xml.sql| 27 +
 6 files changed, 270 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..9c7f5c85cb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,74 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*xmlbuf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+StringInfoData buf;
+PgXmlErrorContext *xmlerrcxt;
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
+
+PG_TRY();
+{
+
+int nbytes;
+
+/**
+ * xmlDocDumpFormatMemory (()
+ *   xmlDocPtr doc, # the XML document
+ *   xmlChar **xmlbuf,  # the memory pointer
+ *   int  *nbytes,  # the memory length
+ *   int   format   # 1 = node indenting
+ *)
+ */
+
+xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+if(!nbytes || xmlerrcxt->err_occurred) {
+xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
+"could not indent the given XML document");
+}
+
+initStringInfo(&buf);
+appendStringInfoString(&buf, (const char *)xmlbuf);
+
+}
+PG_CATCH();
+{
+
+if(doc!=NULL)
+xmlFreeDoc(doc);
+if(xmlbuf!=NULL)
+xmlFree(xmlbuf);
+
+pg_xml_done(xmlerrcxt, true);
+
+PG_RE_THROW();
+
+}
+PG_END_TRY();
+
+xmlFreeDoc(doc);
+xmlFree(xmlbuf);
+
+pg_xml_done(xmlerrcxt, false);
+
+PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..afaa83941b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,96 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | 
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+ 

Re: Can we do something to help stop users mistakenly using force_parallel_mode?

2023-02-08 Thread David Rowley
On Thu, 2 Feb 2023 at 01:24, John Naylor  wrote:
>
>
> On Wed, Feb 1, 2023 at 6:41 PM David Rowley  wrote:
> >
> > I don't really share Laurenz's worry [2] about compatibility break
> > from renaming this GUC. I think the legitimate usages of this setting
> > are probably far more rare than the illegitimate ones. I'm not overly
> > concerned about renaming if it helps stop people from making this
> > mistake. I believe the current name is just too conveniently named and
> > that users are likely just to incorrectly assume it does exactly what
> > they want because what else could it possibly do?!
> >
> > I think something like debug_parallel_query is much less likely to be 
> > misused.
>
> +1 on both points.

I've attached a patch which does the renaming to debug_parallel_query.
I've made it so the old name can still be used.  This is only intended
to temporarily allow backward compatibility until buildfarm member
owners can change their configs to use debug_parallel_query instead of
force_parallel_mode.

David
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d190be1925..9fe9fef1a9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11091,17 +11091,17 @@ dynamic_library_path = 
'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
- 
-  force_parallel_mode (enum)
+ 
+  debug_parallel_query (enum)
   
-   force_parallel_mode configuration 
parameter
+   debug_parallel_query configuration 
parameter
   
   
   

 Allows the use of parallel queries for testing purposes even in cases
 where no performance benefit is expected.
-The allowed values of force_parallel_mode are
+The allowed values of debug_parallel_query are
 off (use parallel mode only when it is expected to 
improve
 performance), on (force parallel query for all 
queries
 for which it is thought to be safe), and regress 
(like
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 117d097390..a08c7a78af 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -370,7 +370,7 @@ make check LANG=C ENCODING=EUC_JP
 set in the PGOPTIONS environment variable (for settings
 that allow this):
 
-make check PGOPTIONS="-c force_parallel_mode=regress -c work_mem=50MB"
+make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"
 
 When running against a temporary installation, custom settings can also be
 set by supplying a pre-written postgresql.conf:
diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index 9e3ec0d5d8..37e3fcd03a 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1156,7 +1156,7 @@ HandleParallelMessage(ParallelContext *pcxt, int i, 
StringInfo msg)
 * because it causes test-result instability 
depending on
 * whether a parallel worker is actually used 
or not.)
 */
-   if (force_parallel_mode != 
FORCE_PARALLEL_REGRESS)
+   if (debug_parallel_query != 
FORCE_PARALLEL_REGRESS)
{
if (edata.context)
edata.context = 
psprintf("%s\n%s", edata.context,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbbf28cf06..e57bda7b62 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -756,8 +756,8 @@ ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc)
/*
 * Sometimes we mark a Gather node as "invisible", which means that it's
 * not to be displayed in EXPLAIN output.  The purpose of this is to 
allow
-* running regression tests with force_parallel_mode=regress to get the
-* same results as running the same tests with force_parallel_mode=off.
+* running regression tests with debug_parallel_query=regress to get the
+* same results as running the same tests with debug_parallel_query=off.
 * Such marking is currently only supported on a Gather at the top of 
the
 * plan.  We skip that node, and we must also hide per-worker detail 
data
 * further down in the plan tree.
diff --git a/src/backend/optimizer/plan/planmain.c 
b/src/backend/optimizer/plan/planmain.c
index 4c17407e5d..63009f6e87 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -114,12 +114,12 @@ query_planner(PlannerInfo *root,
 * Anything parallel-restricted in the query 
tlist will be
 * dealt with later.)  This is normally pretty 
silly, because
 * a Result-only plan would never be 
interesting to
-* para

Re: daitch_mokotoff module

2023-02-08 Thread Tomas Vondra



On 2/8/23 15:31, Dag Lem wrote:
> Alvaro Herrera  writes:
> 
>> On 2023-Jan-17, Dag Lem wrote:
>>
>>> + * Daitch-Mokotoff Soundex
>>> + *
>>> + * Copyright (c) 2021 Finance Norway
>>> + * Author: Dag Lem 
>>
>> Hmm, I don't think we accept copyright lines that aren't "PostgreSQL
>> Global Development Group".  Is it okay to use that, and update the year
>> to 2023?  (Note that answering "no" very likely means your patch is not
>> candidate for inclusion.)  Also, we tend not to have "Author:" lines.
>>
> 
> You'll have to forgive me for not knowing about this rule:
> 
>   grep -ER "Copyright.*[0-9]{4}" contrib/ | grep -v PostgreSQL
> 
> In any case, I have checked with the copyright owner, and it would be OK
> to assign the copyright to "PostgreSQL Global Development Group".
> 

I'm not entirely sure what's the rule either, and I'm a committer. My
guess is these cases are either old and/or adding a code that already
existed elsewhere (like some of the double metaphone, for example), or
maybe both. But I'd bet we'd prefer not adding more ...

> To avoid going back and forth with patches, how do you propose that the
> sponsor and the author of the contributed module should be credited?
> Woule something like this be acceptable?
> 

We generally credit contributors in two ways - by mentioning them in the
commit message, and by listing them in the release notes (for individual
features).

> /*
>  * Daitch-Mokotoff Soundex
>  *
>  * Copyright (c) 2023, PostgreSQL Global Development Group
>  *
>  * This module was sponsored by Finance Norway / Trafikkforsikringsforeningen
>  * and implemented by Dag Lem 
>  *
>  ...
> 
> [...]
> 
>>
>> We don't keep a separate copyright statement in the file; rather we
>> assume that all files are under the PostgreSQL license, which is in the
>> COPYRIGHT file at the top of the tree.  Changing it thus has the side
>> effect that these disclaim notes refer to the University of California
>> rather than "the Author".  IANAL.
> 
> OK, no problem. Note that you will again find counterexamples under
> contrib/ (and in some other places):
> 
>   grep -R "Permission to use" .
> 
>> I think we should add SPDX markers to all the files we distribute:
>> /* SPDX-License-Identifier: PostgreSQL */
>>
>> https://spdx.dev/ids/
>> https://spdx.org/licenses/PostgreSQL.html
> 
> As far as I can tell, this is not included in any file so far, and is
> thus better left to decide and implement by someone else.
> 

I don't think Alvaro was suggesting this patch should do that. It was
more a generic comment about what the project as a whole might do.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Robert Haas
On Wed, Feb 8, 2023 at 12:43 PM Nathan Bossart  wrote:
> I think this could be a good approach if we decide not to bake too much
> into PostgreSQL itself (e.g., such as creating multiple archive workers
> that each call out to the module).  Archive module authors would
> effectively need to write their own archiver processes.  That sounds super
> flexible, but it also sounds like it might be harder to get right.

Yep. That's a problem, and I'm certainly open to better ideas.

However, if we assume that the archive module is likely to be doing
something like juggling a bunch of file descriptors over which it is
speaking HTTP, what other model works, really? It might be juggling
those file descriptors indirectly, or it might be relying on an
intermediate library like curl or something from Amazon that talks to
S3 or whatever, but only it knows what resources it's juggling, or
what functions it needs to call to manage them. On the other hand, we
don't really need a lot from it. We need it to CHECK_FOR_INTERRUPTS()
and handle that without leaking resources or breaking the world in
some way, and we sort of need it to, you know, actually archive stuff,
but apart from that I guess it can do what it likes (unless I'm
missing some other important function of the archiver?).

It's probably a good idea if the archiver function returns when it's
fully caught up and there's no more work to do. Then we could handle
decisions about hibernation in the core code, rather than having every
archive module invent its own way of doing that. But when there's work
happening, as far as I can see, the archive module needs to have
control pretty nearly all the time, or it's not going to be able to do
anything clever.

Always happy to hear if you see it differently

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-02-08 Thread Tomas Vondra
On 2/8/23 14:55, Hans Buschmann wrote:
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
> 
> 
> 1. Background
> 
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
> 
> 
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
> 
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
> 
> 
> These 4 CTEs are then used to combine the data by joining them.
> 
> 
> 2. Problem
> 
> The 4 CTEs are joined by left joins as shown below:
>
...
> 
> This case really brought me to detect the problem!
> 
> The original query and data are not shown here, but the principle should
> be clear from the execution plans.
> 
> I think the planner shouldn't change the row estimations on further
> steps after left joins at all, and be a bit more conservative on inner
> joins.

But the code should alredy do exactly that, see:

https://github.com/postgres/postgres/blob/dbe8a1726cfd5a09cf1ef99e76f5f89e2efada71/src/backend/optimizer/path/costsize.c#L5212

And in fact, the second part of the plains shows it's doing the trick:

 ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104)
(actual time=2.321..2.556 rows=1415 loops=1)
 Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND
 ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
 ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
 ->  Sort  (cost=651.57..666.11 rows=5816 width=72)

But notice the first join (with rows=33) doesn't say "Left". And I see
there's Append on top, so presumably the query is much more complex, and
there's a regular join of these CTEs in some other part.

We'll need to se the whole query, not just one chunk of it.

FWIW it seems you're using materialized CTEs - that's likely pretty bad
for the estimates, because we don't propagate statistics from the CTE.
So a join on CTEs can't see statistics from the underlying tables, and
that can easily produce really bad estimates.

I'm assuming you're not using AS MATERIALIZED explicitly, so I'd bet
this happens because the "cardinality" function is marked as volatile.
Perhaps it can be redefined as stable/immutable.

> This may be related to the fact that this case has 2 join-conditions
> (xx_season an xx_code).

That shouldn't affect outer join estimates this way (but as I explained
above, the join does not seem to be "left" per the explain).
Multi-column joins can cause issues, no doubt about it - but CTEs make
it worse because we can't e.g. see foreign keys.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: OpenSSL 3.0.0 vs old branches

2023-02-08 Thread Andrew Dunstan


On 2023-02-08 We 10:42, Peter Eisentraut wrote:

On 07.02.23 19:28, Tom Lane wrote:

I think Peter's misremembering the history, and OpenSSL 3*is*
supported in these branches.  There could be an argument for
not back-patching f0d2c65f17 on the grounds that pre-1.1.1 is
also supported there.  On the whole though, it seems more useful
today for that test to pass with 3.x than for it to pass with 0.9.8.


Ok, let's do it.



Done


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-08 Thread Peter Geoghegan
On Tue, Feb 7, 2023 at 6:47 PM Andres Freund  wrote:
> One thing I'm not quite sure what to do about is that we atm use a hardcoded
> DEBUG2 (not controlled by VERBOSE) in a bunch of places:
>
> ereport(DEBUG2,
> (errmsg("table \"%s\": removed %lld dead item 
> identifiers in %u pages",
> vacrel->relname, (long long) index, 
> vacuumed_pages)));
>
> ivinfo.message_level = DEBUG2;
>
> I find DEBUG2 hard to use to run the entire regression tests, it results in a
> lot of output. Lots of it far less important than these kinds of details
> here. So I'd like to use a different log level for them - but without further
> complications that'd mean they'd show up in VACUUUM VERBOSE.

I think that these DEBUG2 traces are of limited value, even for
experts. I personally never use them -- I just use VACUUM
VERBOSE/autovacuum logging for everything, since it's far easier to
read, and isn't missing something that the DEBUG2 traces have. In fact
I wonder if we should even have them at all.

I generally don't care about the details when VACUUM runs out of space
for dead_items -- these days the important thing is to just avoid it
completely. I also generally don't care about how many index tuples
were deleted by each index's ambulkdelete() call, since VACUUM VERBOSE
already shows me the number of LP_DEAD stubs encountered/removed in
the heap. I can see the size of indexes and information about page
deletion in VACUUM VERBOSE these days, too.

Don't get me wrong. It *would* be useful to see more information about
each index in VACUUM VERBOSE output -- just not the number of tuples
deleted. Tuples really don't matter much at this level. But seeing
something about the number of WAL records written while vacuuming each
index is another story. That's a cost that is likely to vary in
possibly-interesting ways amongst indexes on the table, unlike
IndexBulkDeleteResult.tuples_removed, which is very noisy, and
signifies almost nothing important on its own.

-- 
Peter Geoghegan




Re: WAL Insertion Lock Improvements

2023-02-08 Thread Nathan Bossart
+   pg_atomic_exchange_u64(valptr, val);

nitpick: I'd add a (void) at the beginning of these calls to
pg_atomic_exchange_u64() so that it's clear that we are discarding the
return value.

+   /*
+* Update the lock variable atomically first without having to acquire 
wait
+* list lock, so that if anyone looking for the lock will have chance to
+* grab it a bit quickly.
+*
+* NB: Note the use of pg_atomic_exchange_u64 as opposed to just
+* pg_atomic_write_u64 to update the value. Since 
pg_atomic_exchange_u64 is
+* a full barrier, we're guaranteed that the subsequent atomic read of 
lock
+* state to check if it has any waiters happens after we set the lock
+* variable to new value here. Without a barrier, we could end up 
missing
+* waiters that otherwise should have been woken up.
+*/
+   pg_atomic_exchange_u64(valptr, val);
+
+   /*
+* Quick exit when there are no waiters. This avoids unnecessary 
lwlock's
+* wait list lock acquisition and release.
+*/
+   if ((pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) == 0)
+   return;

I think this makes sense.  A waiter could queue itself after the exchange,
but it'll recheck after queueing.  IIUC this is basically how this works
today.  We update the value and release the lock before waking up any
waiters, so the same principle applies.

Overall, I think this patch is in reasonable shape.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: First draft of back-branch release notes is done

2023-02-08 Thread Justin Pryzby
On Fri, Feb 03, 2023 at 02:32:39PM -0500, Tom Lane wrote:
> ... at
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f282b026787da69d88a35404cf62f1cc21cfbb7c
> 
> As usual, please send corrections/comments by Sunday.

It's of no concern, but I was curious why this one wasn't included:

commit 72aea955d49712a17c08748aa9abcbcf98c32fc5
Author: Thomas Munro 
Date:   Fri Jan 6 16:38:46 2023 +1300

Fix pg_truncate() on Windows.

Commit 57faaf376 added pg_truncate(const char *path, off_t length), but
"length" was ignored under WIN32 and the file was unconditionally
truncated to 0.

There was no live bug, since the only caller passes 0.

Fix, and back-patch to 14 where the function arrived.




Re: run pgindent on a regular basis / scripted manner

2023-02-08 Thread Andrew Dunstan


On 2023-02-08 We 12:06, Jelte Fennema wrote:

With the new patch --commit works as expected for me now. And sounds
good to up the script a bit afterwards.




Thanks, I have committed this. Still looking at Robert's other request.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: deadlock-hard flakiness

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-07 17:10:21 -0800, Andres Freund wrote:
> diff -U3 /tmp/cirrus-ci-build/src/test/isolation/expected/deadlock-hard.out 
> /tmp/cirrus-ci-build/build/testrun/isolation-running/isolation/results/deadlock-hard.out
> --- /tmp/cirrus-ci-build/src/test/isolation/expected/deadlock-hard.out
> 2023-02-07 05:32:34.536429000 +
> +++ 
> /tmp/cirrus-ci-build/build/testrun/isolation-running/isolation/results/deadlock-hard.out
>   2023-02-07 05:40:33.833908000 +
> @@ -25,10 +25,11 @@
>  step s6a7: <... completed>
>  step s6c: COMMIT;
>  step s5a6: <... completed>
> -step s5c: COMMIT;
> +step s5c: COMMIT; 
>  step s4a5: <... completed>
>  step s4c: COMMIT;
>  step s3a4: <... completed>
> +step s5c: <... completed>
>  step s3c: COMMIT;
>  step s2a3: <... completed>
>  step s2c: COMMIT;
> 
> 
> Commit 741d7f1047f fixed a similar issue in deadlock-hard. But it looks like
> we need something more. But perhaps this isn't an output ordering issue:
> 
> How can we end up with s5c getting reported as waiting? I don't see how s5c
> could end up blocking on anything?

After looking through isolationtester's blocking detection logic I started to
suspect that what we're seeing is not being blocked by a heavyweight lock, but
by a snapshot. So I added logging to
pg_isolation_test_session_is_blocked(). Took a while to reproduce the issue,
but indeed:
https://cirrus-ci.com/task/4901334571286528
https://api.cirrus-ci.com/v1/artifact/task/4901334571286528/testrun/build/testrun/isolation-running/isolation/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/4901334571286528/testrun/build/testrun/runningcheck.log

indicates that we indeed were blocked by a snapshot:
2023-02-08 21:30:12.123 UTC [9276][client backend] 
[isolation/deadlock-hard/control connection][3/8971:0] LOG:  pid 9280 blocked 
due to snapshot by pid: 0
...
2023-02-08 21:30:12.155 UTC [9276][client backend] 
[isolation/deadlock-hard/control connection][3/8973:0] LOG:  pid 9278 blocked 
due to snapshot by pid: 0


Unclear why we end up without a pid. It looks like 2PC removes the pid from
the field?  In the problematic case the prepared_xacts test is indeed
scheduled concurrently:

2023-02-08 21:30:12.100 UTC [9397][client backend] 
[pg_regress/prepared_xacts][23/1296:39171] ERROR:  transaction identifier 
"foo3" is already in use
2023-02-08 21:30:12.100 UTC [9397][client backend] 
[pg_regress/prepared_xacts][23/1296:39171] STATEMENT:  PREPARE TRANSACTION 
'foo3';

foo3 for example does use SERIALIZABLE.


I don't really understand how GetSafeSnapshotBlockingPids() can end up finding
deadlock-hard's sessions being blocked by a safe snapshot. Afaict nothing uses
serializable in that test. How can SXACT_FLAG_DEFERRABLE_WAITING be set for
the sxact of a backend that never did serializable? Are we possibly forgetting
to clear it or such?


I don't think it should affect the reports here, but I did break something
when removing SHMQueue - GetSafeSnapshotBlockingPids() doesn't check
output_size anymore. Will fix.  Thomas, any chance you could do a pass through
96003717645 to see if I screwed up other things? I stared a lot at that
change, but I obviously did miss at least one thing.

Greetings,

Andres Freund




Re: deadlock-hard flakiness

2023-02-08 Thread Andres Freund
Hi,

On 2023-02-08 14:11:45 -0800, Andres Freund wrote:
> On 2023-02-07 17:10:21 -0800, Andres Freund wrote:
> I don't really understand how GetSafeSnapshotBlockingPids() can end up finding
> deadlock-hard's sessions being blocked by a safe snapshot. Afaict nothing uses
> serializable in that test. How can SXACT_FLAG_DEFERRABLE_WAITING be set for
> the sxact of a backend that never did serializable? Are we possibly forgetting
> to clear it or such?
> 
> 
> I don't think it should affect the reports here, but I did break something
> when removing SHMQueue - GetSafeSnapshotBlockingPids() doesn't check
> output_size anymore. Will fix.  Thomas, any chance you could do a pass through
> 96003717645 to see if I screwed up other things? I stared a lot at that
> change, but I obviously did miss at least one thing.

Argh, it's actually caused by 96003717645 as well: Previously loop iteration
without finding a matching pid ends with sxact == NULL, now it doesn't.

Greetings,

Andres Freund




Re: Support logical replication of DDLs

2023-02-08 Thread Peter Smith
Hi Vignesh, thanks for addressing my v63-0002 review comments.

I confirmed most of the changes. Below is a quick follow-up for the
remaining ones.

On Mon, Feb 6, 2023 at 10:32 PM vignesh C  wrote:
>
> On Mon, 6 Feb 2023 at 06:47, Peter Smith  wrote:
> >
...
> >
> > 8.
> > + value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);
> >
> > Should the code be checking or asserting value is not NULL?
> >
> > (IIRC I asked this a long time ago - sorry if it was already answered)
> >
>
> Yes, this was already answered by Zheng, quoting as "The null checking
> for value is done in the upcoming call of expand_one_jsonb_element()."
> in [1]

Thanks for the info. I saw that Zheng-san only wrote it is handled in
the “upcoming call of expand_one_jsonb_element”, but I don’t know if
that is sufficient. For example, if the execution heads down the other
path (expand_jsonb_array) with a NULL jsonarr then it going to crash,
isn't it? So I still think some change may be needed here.

> > 11.
> > +/*
> > + * Expand a JSON value as an operator name.
> > + */
> > +static void
> > +expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval)
> >
> > Should this function comment be more like the comment for
> > expand_jsonval_dottedname by saying there can be an optional
> > "schemaname"?
>
> Modified

Is it really OK for the “objname" to be optional here (Yes, I know the
code is currently implemented like it is OK, but I am doubtful)

That would everything can be optional and the buf result might be
nothing. It could also mean if the "schemaname" is provided but the
"objname" is not, then the buf will have a trailing ".".

It doesn't sound quite right to me.

>
> > ~~~
> >
> > 12.
> > +static bool
> > +expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> > +{
> > + if (jsonval->type == jbvString)
> > + {
> > + appendBinaryStringInfo(buf, jsonval->val.string.val,
> > +jsonval->val.string.len);
> > + }
> > + else if (jsonval->type == jbvBinary)
> > + {
> > + json_trivalue present;
> > +
> > + present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> > +   "present");
> > +
> > + /*
> > + * If "present" is set to false, this element expands to empty;
> > + * otherwise (either true or absent), fall through to expand "fmt".
> > + */
> > + if (present == tv_false)
> > + return false;
> > +
> > + expand_fmt_recursive(jsonval->val.binary.data, buf);
> > + }
> > + else
> > + return false;
> > +
> > + return true;
> > +}
> >
> > I felt this could be simpler if there is a new 'expanded' variable
> > because then you can have just a single return point instead of 3
> > returns;
> >
> > If you choose to do this there is a minor tweak to the "fall through" 
> > comment.
> >
> > SUGGESTION
> > expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> > {
> > bool expanded = true;
> >
> > if (jsonval->type == jbvString)
> > {
> > appendBinaryStringInfo(buf, jsonval->val.string.val,
> >jsonval->val.string.len);
> > }
> > else if (jsonval->type == jbvBinary)
> > {
> > json_trivalue present;
> >
> > present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> >   "present");
> >
> > /*
> >  * If "present" is set to false, this element expands to empty;
> >  * otherwise (either true or absent), expand "fmt".
> >  */
> > if (present == tv_false)
> > expanded = false;
> > else
> > expand_fmt_recursive(jsonval->val.binary.data, buf);
> > }
> >
> > return expanded;
> > }
>
> I'm not sure if this change is required as this will introduce a new
> variable and require it to be set, this variable should be set to
> "expand = false" in else after else if also, instead I preferred the
> existing code. I did not make any change for this unless you are
> seeing some bigger optimization.
>

Sorry, I messed up the previous code suggestion. It should have said:

SUGGESTION
expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
{
bool expanded = false;

if (jsonval->type == jbvString)
{
appendBinaryStringInfo(buf, jsonval->val.string.val,
jsonval->val.string.len);
expanded = true;
}
else if (jsonval->type == jbvBinary)
{
json_trivalue present;
present =
find_bool_in_jsonbcontainer(jsonval->val.binary.data, "present");

/*
 * If "present" is set to false, this element expands to empty;
 * otherwise (either true or absent), expand "fmt".
 */
if (present != tv_false)
{
expand_fmt_recursive(jsonval->val.binary.data, buf);
expanded = true;
}
}
return expanded;
}

~

But I have no special "optimization" in mind. Only, IMO the code is
easier to understand, because:
- 1 return is simpler than 3 returns
- 1 else is simpler than 2 else's

YMMV.

--
Kind Regards,
Peter Smith.
Fujitsu Au

Re: MacOS: xsltproc fails with "warning: failed to load external entity"

2023-02-08 Thread Tom Lane
I pushed the discussed documentation improvements, and changed the
behavior of "ninja docs" to only build the HTML docs.  However,
I've not done anything about documenting what is the minimum
ninja version.

regards, tom lane




Re: First draft of back-branch release notes is done

2023-02-08 Thread Tom Lane
Justin Pryzby  writes:
> It's of no concern, but I was curious why this one wasn't included:

> commit 72aea955d49712a17c08748aa9abcbcf98c32fc5
> Author: Thomas Munro 
> Date:   Fri Jan 6 16:38:46 2023 +1300

> Fix pg_truncate() on Windows.

> Commit 57faaf376 added pg_truncate(const char *path, off_t length), but
> "length" was ignored under WIN32 and the file was unconditionally
> truncated to 0.

> There was no live bug, since the only caller passes 0.

I concluded that due to the lack of live bug, this would not be of
interest to end users.  The back-patch was just for future-proofing.

regards, tom lane




Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 04:24:15PM -0500, Robert Haas wrote:
> On Wed, Feb 8, 2023 at 12:43 PM Nathan Bossart  
> wrote:
>> I think this could be a good approach if we decide not to bake too much
>> into PostgreSQL itself (e.g., such as creating multiple archive workers
>> that each call out to the module).  Archive module authors would
>> effectively need to write their own archiver processes.  That sounds super
>> flexible, but it also sounds like it might be harder to get right.
> 
> Yep. That's a problem, and I'm certainly open to better ideas.
> 
> However, if we assume that the archive module is likely to be doing
> something like juggling a bunch of file descriptors over which it is
> speaking HTTP, what other model works, really? It might be juggling
> those file descriptors indirectly, or it might be relying on an
> intermediate library like curl or something from Amazon that talks to
> S3 or whatever, but only it knows what resources it's juggling, or
> what functions it needs to call to manage them. On the other hand, we
> don't really need a lot from it. We need it to CHECK_FOR_INTERRUPTS()
> and handle that without leaking resources or breaking the world in
> some way, and we sort of need it to, you know, actually archive stuff,
> but apart from that I guess it can do what it likes (unless I'm
> missing some other important function of the archiver?).
> 
> It's probably a good idea if the archiver function returns when it's
> fully caught up and there's no more work to do. Then we could handle
> decisions about hibernation in the core code, rather than having every
> archive module invent its own way of doing that. But when there's work
> happening, as far as I can see, the archive module needs to have
> control pretty nearly all the time, or it's not going to be able to do
> anything clever.
> 
> Always happy to hear if you see it differently

These are all good points.  Perhaps there could be a base archiver
implementation that shell_archive uses (and that other modules could use if
desired, which might be important for backward compatibility with the
existing callbacks).  But if you want to do something fancier than
archiving sequentially, you could write your own.

In any case, I'm not really wedded to any particular approach at the
moment, so I am likewise open to better ideas.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Can we do something to help stop users mistakenly using force_parallel_mode?

2023-02-08 Thread Tom Lane
David Rowley  writes:
> I've attached a patch which does the renaming to debug_parallel_query.
> I've made it so the old name can still be used.

There's a better way to do that last, which is to add the translation to
map_old_guc_names[].  I am not very sure what happens if you have multiple
GUC entries pointing at the same underlying variable, but I bet that
it isn't great.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-08 Thread Justin Pryzby
On Thu, Feb 02, 2023 at 09:18:07AM -0600, Justin Pryzby wrote:
> On Wed, Feb 01, 2023 at 07:24:48PM +0100, Matthias van de Meent wrote:
> > On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev  
> > wrote:
> > > 1 февр. 2023 г., в 20:27, Matthias van de Meent 
> > >  написал(а):
> > >
> > >> In HEAD we set TOTAL to whatever number partitioned table we're
> > >> currently processing has - regardless of whether we're the top level
> > >> statement.
> > >> With the patch we instead add the number of child relations to that
> > >> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
> > >> posted by Ilya. Approximately immediately after updating that count we
> > >> recurse to the child relations, and that only returns once it is done
> > >> creating the indexes, so both TOTAL and DONE go up as we process more
> > >> partitions in the hierarchy.
> > >
> > > The TOTAL in the patch is set only when processing the top-level parent 
> > > and it is not updated when we recurse, so yes, it is constant. From v3:
> > 
> > Ugh, I misread the patch, more specifically count_leaf_partitions and
> > the !OidIsValid(parentIndexId) condition changes.
> > 
> > You are correct, sorry for the noise.
> 
> That suggests that the comments could've been more clear.  I added a
> comment suggested by Tomas and adjusted some others and wrote a commit
> message.  I even ran pgindent for about the 3rd time ever.
> 
> 002 are my changes as a separate patch, which you could apply to your
> local branch.
> 
> And 003/4 are assertions that I wrote to demonstrate the problem and the
> verify the fixes, but not being proposed for commit.

That was probably a confusing way to present it - I should've sent the
relative diff as a .txt rather than as patch 002.

This squishes together 001/2 as the main patch.
I believe it's ready.

-- 
Justin
>From ed643acace062cad15cd6ea9dc76a663de9c97d4 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, and if the attached index is
partitioned, the counter is incremented to account for each of its leaf
partitions.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 ++-
 src/backend/commands/indexcmds.c  | 70 +--
 src/backend/utils/activity/backend_progress.c | 28 
 src/include/utils/backend_progress.h  |  1 +
 4 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b67..fa139dcece7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
@@ -6612,7 +6615,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of partitions on which the index has been created.
+   the number of partitions on which the index has been created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..84c84c41acc 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;

Re: Can we do something to help stop users mistakenly using force_parallel_mode?

2023-02-08 Thread David Rowley
On Thu, 9 Feb 2023 at 11:26, Tom Lane  wrote:
>
> David Rowley  writes:
> > I've attached a patch which does the renaming to debug_parallel_query.
> > I've made it so the old name can still be used.
>
> There's a better way to do that last, which is to add the translation to
> map_old_guc_names[].  I am not very sure what happens if you have multiple
> GUC entries pointing at the same underlying variable, but I bet that
> it isn't great.

Thanks for pointing that out. That might mean we can keep the
translation long-term as it won't appear in pg_settings and \dconfig,
or we might want to remove it if we want to be more deliberate about
breaking things for users who are misusing it. We maybe could just
consider that if/when all buildfarm animals are all using the new
name.

Attached updated patch.

David
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d190be1925..9fe9fef1a9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11091,17 +11091,17 @@ dynamic_library_path = 
'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
- 
-  force_parallel_mode (enum)
+ 
+  debug_parallel_query (enum)
   
-   force_parallel_mode configuration 
parameter
+   debug_parallel_query configuration 
parameter
   
   
   

 Allows the use of parallel queries for testing purposes even in cases
 where no performance benefit is expected.
-The allowed values of force_parallel_mode are
+The allowed values of debug_parallel_query are
 off (use parallel mode only when it is expected to 
improve
 performance), on (force parallel query for all 
queries
 for which it is thought to be safe), and regress 
(like
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 117d097390..a08c7a78af 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -370,7 +370,7 @@ make check LANG=C ENCODING=EUC_JP
 set in the PGOPTIONS environment variable (for settings
 that allow this):
 
-make check PGOPTIONS="-c force_parallel_mode=regress -c work_mem=50MB"
+make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"
 
 When running against a temporary installation, custom settings can also be
 set by supplying a pre-written postgresql.conf:
diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index 9e3ec0d5d8..37e3fcd03a 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1156,7 +1156,7 @@ HandleParallelMessage(ParallelContext *pcxt, int i, 
StringInfo msg)
 * because it causes test-result instability 
depending on
 * whether a parallel worker is actually used 
or not.)
 */
-   if (force_parallel_mode != 
FORCE_PARALLEL_REGRESS)
+   if (debug_parallel_query != 
FORCE_PARALLEL_REGRESS)
{
if (edata.context)
edata.context = 
psprintf("%s\n%s", edata.context,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbbf28cf06..e57bda7b62 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -756,8 +756,8 @@ ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc)
/*
 * Sometimes we mark a Gather node as "invisible", which means that it's
 * not to be displayed in EXPLAIN output.  The purpose of this is to 
allow
-* running regression tests with force_parallel_mode=regress to get the
-* same results as running the same tests with force_parallel_mode=off.
+* running regression tests with debug_parallel_query=regress to get the
+* same results as running the same tests with debug_parallel_query=off.
 * Such marking is currently only supported on a Gather at the top of 
the
 * plan.  We skip that node, and we must also hide per-worker detail 
data
 * further down in the plan tree.
diff --git a/src/backend/optimizer/plan/planmain.c 
b/src/backend/optimizer/plan/planmain.c
index 4c17407e5d..63009f6e87 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -114,12 +114,12 @@ query_planner(PlannerInfo *root,
 * Anything parallel-restricted in the query 
tlist will be
 * dealt with later.)  This is normally pretty 
silly, because
 * a Result-only plan would never be 
interesting to
-* parallelize.  However, if 
force_parallel_mode is on, then
+* parallelize.  However, if 
debug_parallel_query is on, then
 * we want to execute the Resul

Re: Improve logging when using Huge Pages

2023-02-08 Thread Nathan Bossart
On Thu, Feb 02, 2023 at 04:53:37PM +0100, Alvaro Herrera wrote:
> Maybe I misread the code (actually I only read the patch), but -- does
> it set active=true when huge_pages=on?  I think the code only works when
> huge_pages=try.  That might be pretty confusing; I think it should say
> "on" in both cases.

+1

Also, while this is indeed a runtime-computed parameter, it won't be
initialized until after 'postgres -C' has been handled, so 'postgres -C'
will always report it as "off".  However, I'm not sure it makes sense to
check it with 'postgres -C' anyway since you want to know if the current
server is using huge pages.

At the moment, I think I'd vote for a new function instead of a GUC.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Peter Smith
On Thu, Feb 9, 2023 at 7:31 AM Jim Jones  wrote:
>
> while working on another item of the TODO list I realized that I should
> be using a PG_TRY() block in he xmlDocDumpFormatMemory call.
>
> Fixed in v5.
>

I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: RFC: WAL infrastructure issues, updates and improvements

2023-02-08 Thread Matthias van de Meent
Hi,

Various users and threads on -hackers (incl. [0] specifically) have
complained about the overhead of our WAL format. One example of this
is that we write at least 44 bytes to register any changes to a single
relation's page: There is a 24-byte WAL record header, a 4-byte block
record header, plus 12 bytes for RelFileLocator, plus 4 bytes for
BlockNumber. This is a very significant overhead, which is threatening
to grow ever larger as we're considering moving to larger identifiers
in PostgreSQL: 56-bit relfilenodes, 64-bit XIDs, etc. I think we can
do significantly better than that in most, if not all, cases.

For PostgreSQL 17, I'd like to propose an effort to improve our WAL
infrastructure. The tail of this mail contains identified areas for
improvement and feasibility analysis for improving on those items.
There are probably more issues and potential improvements, but these
are the ones I was aware of.

Note that I'm mentioning PostgreSQL 17 because I don't have a lot of
time for 16, and I think changing WAL formats and behavior all at once
reduces the total effort for keeping external tooling compatible (as
opposed to piecemeal changes); "to rip off the band-aid". I'm sending
this mail now to collect community feedback and to make the thread
discoverable in the archives - I can't find any other threads in the
-hackers archives that were created specifically to cover people's
gripes with WAL and solutions to those gripes.

Kind regards,

Matthias van de Meent

CC-ed Heikki, Robert, Andres and Dilip for their interest in the topic
shown in the 56-bit relfilenumber thread, and Koichi for his work on
parallel recovery.
(sorry for the duplicate mail, I misconfigured my sender address)



I am aware of the following ideas that exist, several of which came up
at the FOSDEM Developer Meeting last Thursday [1][2]:

Reducing the size of the XLog record header:


We currently store XIDs in every XLog record header, where only some need
them. E.g. most index records do not need XIDs for recovery, nor
conflict resolution.

The length field is 4 bytes, but many records are https://www.postgresql.org/message-id/flat/CA%2BTgmoaa9Yc9O-FP4vS_xTKf8Wgy8TzHpjnjN56_ShKE%3DjrP-Q%40mail.gmail.com
[1] 
https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#XLog_Format
[2] 
https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#Page_Format
[3] https://wiki.postgresql.org/wiki/Parallel_Recovery




Re: Improve logging when using Huge Pages

2023-02-08 Thread Justin Pryzby
On Thu, Feb 02, 2023 at 04:53:37PM +0100, Alvaro Herrera wrote:
> On 2023-Jan-24, Justin Pryzby wrote:
> > On Mon, Jan 23, 2023 at 05:33:35PM -0800, Andres Freund wrote:
> > > I'm ok with this being a GUC, it doesn't feel elegant, but I suspect 
> > > there's
> > > no realistic elegant answer.
> 
> > Whether it's a GUC or a function, I propose to name it hugepages_active.
> > If there's no objections, I'll add to the CF.
> 
> Maybe I misread the code (actually I only read the patch), but -- does
> it set active=true when huge_pages=on?  I think the code only works when
> huge_pages=try.  That might be pretty confusing; I think it should say
> "on" in both cases.

Yes - I realized that too.   There's no reason this GUC should be
inaccurate when huge_pages=on.  (I had hoped there would be a conflict
needing resolution before anyone else noticed.)

I don't think it makes sense to run postgres -C huge_pages_active,
however, so I see no issue that that would always returns "false".
If need be, maybe the documentation could say "indicates whether huge
pages are active for the running server".

Does anybody else want to vote for a function rather than a
RUNTIME_COMPUTED GUC ?

-- 
Justin
>From 217b01b7eddabb5d863d50b1d4ca8e1019d9413c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 23 Jan 2023 18:33:51 -0600
Subject: [PATCH v2] add GUC: huge_pages_active

This is useful to show the current state of huge pages when
huge_pages=try.  The effective status is not otherwise visible without
OS level tools like gdb or /proc/N/smaps.
---
 doc/src/sgml/config.sgml| 17 -
 src/backend/port/sysv_shmem.c   |  5 -
 src/backend/port/win32_shmem.c  |  5 -
 src/backend/utils/misc/guc_tables.c | 13 +
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d190be1925d..fcef4e3ce8e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1708,7 +1708,8 @@ include_dir 'conf.d'
 server will try to request huge pages, but fall back to the default if
 that fails. With on, failure to request huge pages
 will prevent the server from starting up. With off,
-huge pages will not be requested.
+huge pages will not be requested.  The actual state of huge pages is
+indicated by the server variable .

 

@@ -10687,6 +10688,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  huge_pages_active (boolean)
+  
+   huge_pages_active configuration parameter
+  
+  
+  
+   
+Reports whether huge pages are in use by the current process.
+See  for more information.
+   
+  
+ 
+
  
   integer_datetimes (boolean)
   
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index eaba244bc9c..9b9b25d53ba 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -621,7 +621,10 @@ CreateAnonymousSegment(Size *size)
 		ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE,
    PG_MMAP_FLAGS | mmap_flags, -1, 0);
 		mmap_errno = errno;
-		if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED)
+		if (ptr != MAP_FAILED)
+			SetConfigOption("huge_pages_active", "on",
+	PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+		else if (huge_pages == HUGE_PAGES_TRY)
 			elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m",
  allocsize);
 	}
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 62e08074770..73472a18f25 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -290,7 +290,10 @@ retry:
  size_low,	/* Size Lower 32 bits */
  szShareMem);
 
-		if (!hmap)
+		if (hmap)
+			SetConfigOption("huge_pages_active", "on",
+	PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+		else
 		{
 			if (GetLastError() == ERROR_NO_SYSTEM_RESOURCES &&
 huge_pages == HUGE_PAGES_TRY &&
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index b21698934c8..1e8e67a4c54 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -571,6 +571,7 @@ static int	shared_memory_size_mb;
 static int	shared_memory_size_in_huge_pages;
 static int	wal_block_size;
 static bool data_checksums;
+static bool huge_pages_active = false; /* dynamically set */
 static bool integer_datetimes;
 
 #ifdef USE_ASSERT_CHECKING
@@ -1013,6 +1014,18 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+
+	{
+		{"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Indicates whether huge pages are in use."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_RUNTIME_COMPUTED
+		},
+		&huge_pages_active,
+		false,
+		NULL, NULL, NULL
+	},
+
 	{
 		/* Not for general use --- used by SET SESSION AUTHORIZATION */
 		{"is_superuser", PGC_INTERNAL, UNGR

Re: [PATCH] Add pretty-printed XML output option

2023-02-08 Thread Jim Jones

On 09.02.23 00:09, Peter Smith wrote:

I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?


Hi Peter,

My logic there was the following: if program reached that part of the 
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked, 
which consequently means that the variables doc and xmlbuf are != NULL, 
therefore not needing to be checked. Am I missing something?


Thanks a lot for the review!

Best, Jim





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Weird failure with latches in curculio on v15

2023-02-08 Thread Michael Paquier
On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote:
> These are all good points.  Perhaps there could be a base archiver
> implementation that shell_archive uses (and that other modules could use if
> desired, which might be important for backward compatibility with the
> existing callbacks).  But if you want to do something fancier than
> archiving sequentially, you could write your own.

Which is basically the kind of things you can already achieve with a
background worker and a module of your own?

> In any case, I'm not really wedded to any particular approach at the
> moment, so I am likewise open to better ideas.

I am not sure how much we should try to move from core into the
modules when it comes to the current archiver process, with how much
control you'd like to give to users.  It also looks like to me that
this is the kind of problem where we would not have the correct
callback design until someone comes in and develops a solution that
would shape around it.  On top of that, this is the kind of things
achievable with just a bgworker, and perhaps simpler as the parallel
state can just be maintained in it, which is what the archiver process
is about at the end?  Or were there some restrictions in the bgworker
APIs that would not fit with what an archiver process should do?
Perhaps these restrictions, if any, are what we'd better try to lift
first?
--
Michael


signature.asc
Description: PGP signature


  1   2   >