Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-02 Thread Andres Freund
On 2022-02-15 13:02:41 +0900, Michael Paquier wrote:
> >> +  @regress_command = (@regress_command, @extra_opts);
> >> +
> >> +  $oldnode->command_ok(@regress_command,
> >> +  'regression test run on old instance');
> > 
> > I also think this should take EXTRA_REGRESS_OPTS into account - test.sh did.
> 
> This is already taken into account, as of the @extra_opts bits.

But in a bad way, because EXTRA_REGRESS_OPTS now always wins, even for stuff
we want to override. Note how test.sh explicitly specifies port, bindir etc
after the pre-existing EXTRA_REGRESS_OPTS.




Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-02 Thread Andres Freund
Hi,

On 2022-03-02 15:57:23 +0900, Michael Paquier wrote:
> Do others have an opinion about a backpatch of the bugfix?  Nobody has
> complained about that since pg_upgrade exists, so I have just done the
> change on HEAD.

WFM.



> +++ b/src/bin/pg_upgrade/t/001_basic.pl
> @@ -0,0 +1,9 @@
> +use strict;
> +use warnings;
> +
> +use PostgreSQL::Test::Utils;
> +use Test::More tests => 8;

Outdated.

> +program_help_ok('pg_upgrade');
> +program_version_ok('pg_upgrade');
> +program_options_handling_ok('pg_upgrade');

Unrelated. But I kinda wish we'd do this in a saner manner than copying this
test into every binary. E.g. by ensuring that all tools installed in the temp
install are tested or such.


> +# The test of pg_upgrade consists in setting up an instance.  This is the
> +# source instance used for the upgrade. Then a new and fresh instance is
> +# created, and is used as the target instance for the upgrade.

This seems a bit repetitive. Lots of "instance".

> Before
> +# running an upgrade, a logical dump of the old instance is taken, and a
> +# second logical dump of the new instance is taken after the upgrade.
> +# The upgrade test passes if there are no differences in these two dumps.
> +
> +# Testing upgrades with an older instance of PostgreSQL requires setting up
> +# two environment variables, as of:
> +# - "olddump", to point to a dump file that will be used to set
> +#   up the old instance to upgrade from, the dump being restored in the
> +#   old cluster.
> +# - "oldinstall", to point to the installation path of the old
> +#   instance.
> +if (   (defined($ENV{olddump}) && !defined($ENV{oldinstall}))
> + || (!defined($ENV{olddump}) && defined($ENV{oldinstall})))

Odd indentation. Spaces between parens?


> +$newnode->init(extra => [ '--wal-segsize', '1', '--allow-group-access' ]);

I'd copy the comments from test.sh wrt --wal-segsize,
--allow-group-access.


Greetings,

Andres Freund




Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-02 Thread Michael Paquier
On Wed, Mar 02, 2022 at 12:01:17AM -0800, Andres Freund wrote:
> But in a bad way, because EXTRA_REGRESS_OPTS now always wins, even for stuff
> we want to override. Note how test.sh explicitly specifies port, bindir etc
> after the pre-existing EXTRA_REGRESS_OPTS.

Ah, right.  Will fix.
--
Michael


signature.asc
Description: PGP signature


Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-02 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 4:07 PM Kyotaro Horiguchi
 wrote:
>
> At Wed, 2 Mar 2022 14:39:54 +0900, Masahiko Sawada  
> wrote in
> > On Wed, Mar 2, 2022 at 11:55 AM Amit Kapila  wrote:
> > >
> > > On Mon, Feb 28, 2022 at 5:46 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > I've attached two patches: the first one changes
> > > > apply_error_callback() so that it uses complete sentences with if-else
> > > > blocks in order to have a translation work,
> > > >
> > >
> > > This is an improvement over what we have now but I think this is still
> > > not completely correct as per message translation rules:
> > > + else
> > > + errcontext("processing remote data during \"%s\" in transaction %u at 
> > > %s",
> > > +logicalrep_message_type(errarg->command),
> > > +errarg->remote_xid,
> > > +(errarg->ts != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)");
> > >
> > > As per guidelines [1][2], we don't prefer to construct messages at
> > > run-time aka we can do better for the following part: +(errarg->ts
> > > != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)". I think we need
> > > to use if-else here to split it further. If you agree, then the same
> > > needs to be dealt with in other parts of the patch as well.
> >
> > I intended to use "(not-set)" as a value rather than a word in the
> > sentence so I think it doesn't violate the guidelines. We can split it
> > further as you suggested but we will end up having more if-else
> > branches.
>
> It seems to me exactly our way.  In the first place I doubt
> "(not-set)" fits the place for timestamp even in English.
>
> Moreover, if we (I?) translated the message into Japanese, it would
> look like;
>
> CONTEXT: (not-set)にトランザクション 2352314 内で "TRUNCATE" でのリモートデータの処理中
>
> I don't think that looks fine.  Translating "(not-set)" makes things
> even worse.
>
> CONTEXT: (非設定)にトランザクション 2352314 内で "TRUNCATE" でのリモートデータの処理中
>
> Yes, I can alleviate that strangeness a bit by modulating it, but it
> still looks odd.

Indeed. But the timestamp is removed in the latest version patch.

>
> CONTEXT: 時刻(非設定)、トランザクション 2352314 内で "TRUNCATE" でのリモートデータの処理中
>
> Rather, I'd prefer simply to list the attributes.
>
> CONTEXT: processing remote data during "MESSAGE". Transaction (unknown). Time 
> (unknown), replication target relation (unknown), column (unknown)
> CONTEXT: "MESSAGE"でのリモートデータの処理中. トランザクション (不明). 時刻 (不明), レプリケーション対象リレーション 
> (不明), column (不明)

Peter Smith also seems to prefer this style. Looking at existing error
context messages, we use this list style in logical.c:

static void
output_plugin_error_callback(void *arg)
{
LogicalErrorCallbackState *state = (LogicalErrorCallbackState *) arg;

/* not all callbacks have an associated LSN  */
if (state->report_location != InvalidXLogRecPtr)
errcontext("slot \"%s\", output plugin \"%s\", in the %s
callback, associated LSN %X/%X",
   NameStr(state->ctx->slot->data.name),
   NameStr(state->ctx->slot->data.plugin),
   state->callback_name,
   LSN_FORMAT_ARGS(state->report_location));
else
errcontext("slot \"%s\", output plugin \"%s\", in the %s callback",
   NameStr(state->ctx->slot->data.name),
   NameStr(state->ctx->slot->data.plugin),
   state->callback_name);

}

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-02 Thread Peter Eisentraut

On 01.03.22 23:05, Jacob Champion wrote:

On Tue, 2022-03-01 at 19:56 +0100, Peter Eisentraut wrote:

This patch contains no documentation.  I'm having a hard time
understanding what the name "session_authn_id" is supposed to convey.
The comment for the Port.authn_id field says this is the "system
username", which sounds like a clearer terminology.


"System username" may help from an internal development perspective,
especially as it relates to pg_ident.conf, but I don't think that's
likely to be a useful descriptor to an end user. (I don't think of a
client certificate's Subject Distinguished Name as a "system
username".) Does my attempt in v5 help?


Yeah, maybe there are better names.  But I have no idea what the letter 
combination "authn_id" is supposed to stand for.  Is it an 
"authentication identifier"?  What does it identify?  Maybe I'm missing 
something here, but I don't find it clear.





Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Kyotaro Horiguchi
At Wed, 2 Mar 2022 16:46:01 +0900, Michael Paquier  wrote 
in 
> Hi all,
> 
> In my hunt looking for incorrect SRFs, I have noticed a new case of a
> system function marked as proretset while it builds and returns only
> one record.  And this is a popular one: pg_stop_backup(), labelled
> v2.
> 
> This leads to a lot of unnecessary work, as the function creates a
> tuplestore it has no need for with the usual set of checks related to
> SRFs.  The logic can be be simplified as of the attached.
> 
> Thoughts?

That direction seems find to me.

But the patch forgets to remove an useless variable.

>   /* Initialise attributes information in the tuple descriptor */
>   tupdesc = CreateTemplateTupleDesc(PG_STOP_BACKUP_V2_COLS);
>   TupleDescInitEntry(tupdesc, (AttrNumber) 1, "lsn",
>  PG_LSNOID, -1, 0);

I think we can use get_call_resuilt_type here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Peter Eisentraut

On 01.03.22 22:17, Jonathan S. Katz wrote:
If you're moving to a newer version of PostgreSQL, you likely have to 
update your connection drivers anyway (rebuilt against new libpq, 
supporting any changes in the protocol, etc). I would prefer more data 
to support that argument, but this is generally what you need to do.


However, we may need to step towards it. We took one step last release 
with defaulting to SCRAM. Perhaps this release we add a warning for 
anything using md5 auth that "this will be removed in a future release." 
(or specifically v16). We should also indicate in the docs that md5 is 
deprecated and will be removed.


I find that a lot of people are still purposely using md5.  Removing it 
now or in a year would be quite a disruption.


It's also worth considering that keeping the code equipped to handle 
different kinds of password hashing would help it stay in shape if we 
ever need to add support for the next SHA after 256 or whatever.






RE: Failed transaction statistics to measure the logical replication progress

2022-03-02 Thread shiy.f...@fujitsu.com
Hi,

A comments on the v26 patch.

The following document about pg_stat_subscription_stats view only says that
"showing statistics about errors", should we add something about transactions
here? 

 
  
pg_stat_subscription_statspg_stat_subscription_stats
  One row per subscription, showing statistics about errors.
  See 
  pg_stat_subscription_stats for details.
  
 


I noticed that the v24 patch has some changes about the description of this
view. Maybe we can modify to "showing statistics about errors and transactions".

Regards,
Shi yu


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Peter Eisentraut

On 01.03.22 22:34, Andres Freund wrote:

The cases I've heard about are about centralizing auth across multiple cloud
services. Including secret management in some form. E.g. allowing an
application to auth to postgres, redis and having the secret provided by
infrastructure, rather than having to hardcode it in config or such.

I can't see application developers configuring kerberos and I don't think
LDAP, PAM, Radius are a great answer either, due to the plaintext requirement
alone? LDAP is pretty clearly dying technology, PAM is fragile complicated C
stuff that's not portable across OSs. Radius is probably the most realistic,
but at least as implemented doesn't seem flexible enough (e.g. no access to
group memberships etc).

Nor does baking stuff like that in seem realistic to me, it'll presumably be
too cloud provider specific.


Let's gather some more information on this.  PostgreSQL should support 
the authentication that many people want to use out of the box.  I don't 
think it would be good to be at a point where all the built-in methods 
are outdated and if you want to use the good stuff you have to hunt for 
plugins.  The number of different cloud APIs is effectively small.  I 
expect that there are a lot of similarities, like they probably all need 
support for http calls, they might need support for caching lookups, 
etc.  OIDC was mentioned elsewhere.  That's a standard.  Is that 
compatible with any cloud providers?  Would that be enough for many users?





Re: Two noncritical bugs of pg_waldump

2022-03-02 Thread Kyotaro Horiguchi
At Fri, 25 Feb 2022 10:48:47 -0800, Andres Freund  wrote in 
> Hi,
> 
> On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote:
> > > If I give an empty file to the tool it complains as the follows.
> > > 
> > > > pg_waldump: fatal: could not read file "hoge": No such file or directory
> > > 
> > > No, the file exists.  The cause is it reads uninitialized errno to
> > > detect errors from the system call.  read(2) is defined to set errno
> > > always when it returns -1 and doesn't otherwise. Thus it seems to me
> > > that it is better to check that the return value is less than zero
> > > than to clear errno before the call to read().
> > 
> > So I post a patch contains only the indisputable part.
> 
> Thanks for the report and fix. Pushed. This was surprisingly painful, all but
> one branch had conflicts...

Ah, I didn't expect that this is committed so quickly.  I should have
created patches for all versions.  Anyway thanks for committing this!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: more descriptive message for process termination due to max_slot_wal_keep_size

2022-03-02 Thread Kyotaro Horiguchi
At Wed, 02 Mar 2022 15:37:19 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> The CI was confused by the mixed patches for multiple PG versions. In
> this version the patchset for master are attached as .patch and that
> for PG13 as .txt.

Yeah It is of course the relevant check should be fixed.  The
attached v5 adjusts 019_replslot_limit.pl.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 4909db2893f0b33ab6bca1ffc3ad802cd159c577 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 24 Dec 2021 12:52:07 +0900
Subject: [PATCH v5 1/2] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie 
Author: Kyotaro Horiguchi 
Reviewed-by: Ashutosh Bapat 
Reviewed-by: Bharath Rupireddy 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index caa6b29756..92f19bcb35 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1300,8 +1300,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 ereport(LOG,
-		(errmsg("terminating process %d to release replication slot \"%s\"",
-active_pid, NameStr(slotname;
+		(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+active_pid, NameStr(slotname),
+LSN_FORMAT_ARGS(restart_lsn;
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
-- 
2.27.0

>From 7a9e571e8cf4e5ffc13d101733c1b6fc455b1aec Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 24 Dec 2021 12:58:25 +0900
Subject: [PATCH v5 2/2] Add detailed information to slot-invalidation messages

The messages says just "exceeds max_slot_wal_keep_size" but doesn't
tell the concrete LSN of the limit. That information helps operators'
understanding on the issue.

Author: Kyotaro Horiguchi 
Reviewed-by: Ashutosh Bapat 
Reviewed-by: Masahiko Sawada 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c| 12 
 src/test/recovery/t/019_replslot_limit.pl |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 92f19bcb35..be21b62add 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1300,9 +1300,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 ereport(LOG,
-		(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+		(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
 active_pid, NameStr(slotname),
-LSN_FORMAT_ARGS(restart_lsn;
+LSN_FORMAT_ARGS(restart_lsn),
+LSN_FORMAT_ARGS(oldestLSN)),
+		 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
@@ -1345,9 +1347,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			ReplicationSlotRelease();
 
 			ereport(LOG,
-	(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+	(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
 			NameStr(slotname),
-			LSN_FORMAT_ARGS(restart_lsn;
+			LSN_FORMAT_ARGS(restart_lsn),
+			LSN_FORMAT_ARGS(oldestLSN)),
+	 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 			/* done with this slot for now */
 			break;
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 9bb71b62c0..bac496a9cf 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -188,7 +188,7 @@ for (my $i = 0; $i < 1; $i++)
 {
 	if (find_in_log(
 			$node_primary,
-			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size",
+			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds the limit [0-9A-F/]+",
 			$logstart))
 	{
 		$invalidated = 1;
-- 
2.27.0



Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-02 Thread Tatsuo Ishii
Hi Yugo and Fabien,

It seems the patch is ready for committer except below. Do you guys
want to do more on below?

>> # TESTS
>> 
>> I suggested to simplify the tests by using conditionals & sequences. You 
>> reported that you got stuck. Hmmm.
>> 
>> I tried again my tests which worked fine when started with 2 clients, 
>> otherwise they get stuck because the first client waits for the other one 
>> which does not exists (the point is to generate deadlocks and other 
>> errors). Maybe this is your issue?
> 
> That seems to be right. It got stuck when I used -T option rather than -t,
> it was because, I guess, the number of transactions on each thread was
> different.
> 
>> Could you try with:
>> 
>>psql < deadlock_prep.sql
>>pgbench -t 4 -c 2 -f deadlock.sql
>># note: each deadlock detection takes 1 second
>> 
>>psql < deadlock_prep.sql
>>pgbench -t 10 -c 2 -f serializable.sql
>># very quick 50% serialization errors
> 
> That works. However, it still gets hang when --max-tries = 2,
> so maybe I would not think we can use it for testing the retry
> feature

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: PG DOCS - logical replication filtering

2022-03-02 Thread Aleksander Alekseev
Hi Peter,

> PSA a PG docs patch that is associated with the logical replication
> Row Filters feature which was recently pushed [1].

The patch looks mostly OK, but I have several nitpicks.

```
By default, all data from all published tables will be replicated to the
appropriate subscribers.
[...]
By default, all operation types are replicated.
```

The second sentence seems to be redundant.

```
(This feature is available since PostgreSQL 15)
```

Please correct me if I'm wrong, but I don't think we say that in the docs.
When the user opens the documentation for version X he or she sees
everything that is available in this version.

```
31.3. Filtering
[...]
There are 3 different ways to filter what data gets replicated.
31.3.1. Operation Filters
[...]
31.3.2. Row Filters
[...]
```
It looks like there are 2 different ways after all.

I see that a large part of the documentation is commented and marked as TBA
(Column Filters, Combining Different Kinds of Filters). Could you please
clarify if it's a work-in-progress patch? If it's not, I believe the
commented part should be removed before committing.

-- 
Best regards,
Aleksander Alekseev


Re: PG DOCS - logical replication filtering

2022-03-02 Thread Aleksander Alekseev
Hi again,

> The second sentence seems to be redundant.

Actually, I'm wrong on this one.

>
-- 
Best regards,
Aleksander Alekseev


Re: PG DOCS - logical replication filtering

2022-03-02 Thread Amit Kapila
On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
 wrote:
>
>
> I see that a large part of the documentation is commented and marked as TBA 
> (Column Filters, Combining Different Kinds of Filters). Could you please 
> clarify if it's a work-in-progress patch? If it's not, I believe the 
> commented part should be removed before committing.
>

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

-- 
With Regards,
Amit Kapila.




Re: Add id's to various elements in protocol.sgml

2022-03-02 Thread Peter Eisentraut

On 01.03.22 20:50, Brar Piening wrote:

Patch is attached. I don't think it should get applied this way, though.
The fact that you only get links for section headers that have manually
assigned ids would be pretty surprising for users of the docs and in
some files (e.g. protocol-flow.html) only every other section has a
manually assigned id. It would be easy to emit a message (or even fail)
whenever the template fails to find an id and then manually assign ids
until they are everywhere (currently that means all varlistentries and
sections) but that would a) be quite some work and b) make the patch
quite heavy, so I wouldn't even start this before there is really
consensus that this is the right direction.


I have applied the part of your patch that adds the id's.  The 
discussion about the formatting aspect can continue.


I changed the id's for the protocol messages to mixed case, so that it 
matches how these are commonly referred to elsewhere.  It doesn't affect 
the output.






RE: Optionally automatically disable logical replication subscriptions on error

2022-03-02 Thread osumi.takami...@fujitsu.com
On Wednesday, March 2, 2022 12:47 PM Masahiko Sawada  
wrote:
> After more thoughts, should we do both AbortOutOfAnyTransaction() and error
> message handling while holding interrupts? That is,
> 
> HOLD_INTERRUPTS();
> EmitErrorReport();
> FlushErrorState();
> AbortOutOfAny Transaction();
> RESUME_INTERRUPTS();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> er());
> 
> I think it's better that we do clean up first and then do other works such as
> sending the message to the stats collector and updating the catalog.
I agree. Fixed. Along with this change, I corrected the header comment of
DisableSubscriptionOnError, too.


> Here are some comments on v24 patch:
> 
> +/* Look up our subscription in the catalogs */
> +tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
> +
> CStringGetDatum(MySubscription->name));
> 
> s/catalogs/catalog/
> 
> Why don't we use SUBSCRIPTIONOID with MySubscription->oid?
Changed.


> ---
> +if (!HeapTupleIsValid(tup))
> +ereport(ERROR,
> +errcode(ERRCODE_UNDEFINED_OBJECT),
> +errmsg("subscription \"%s\" does not
> exist",
> +   MySubscription->name));
> 
> I think we should use elog() here rather than ereport() since it's a
> should-not-happen error.
Fixed


> ---
> +/* Notify the subscription will be no longer valid */
> 
> I'd suggest rephrasing it to like "Notify the subscription will be disabled". 
> (the
> subscription is still valid actually, but just disabled).
Fixed. Also, I've made this sentence past one, because of the code place
change below.

 
> ---
> +/* Notify the subscription will be no longer valid */
> +ereport(LOG,
> +errmsg("logical replication subscription
> \"%s\" will be disabled due to an error",
> +   MySubscription->name));
> +
> 
> I think we can report the log at the end of this function rather than during 
> the
> transaction.
Fixed. In this case, I needed to adjust the comment to indicate the processing
to disable the sub has *completed* as well.

> ---
> +my $cmd = qq(
> +CREATE TABLE tbl (i INT);
> +ALTER TABLE tbl REPLICA IDENTITY FULL;
> +CREATE INDEX tbl_idx ON tbl(i));
> 
> I think we don't need to set REPLICA IDENTITY FULL to this table since there 
> is
> notupdate/delete.
> 
> Do we need tbl_idx?
Removed both the replica identity and tbl_idx;


> ---
> +$cmd = qq(
> +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE
> +sr.srsubstate IN ('s', 'r'));
> +$node_subscriber->poll_query_until('postgres', $cmd);
> 
> It seems better to add a condition of srrelid just in case.
Makes sense. Fixed.


> ---
> +$cmd = qq(
> +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s WHERE
> s.subname =
> +'sub' AND s.subenabled IS FALSE);
> +$node_subscriber->poll_query_until('postgres', $cmd)
> +  or die "Timed out while waiting for subscriber to be disabled";
> 
> I think that it's more natural to directly check the subscription's 
> subenabled.
> For example:
> 
> SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';
Fixed. I modified another code similar to this for tablesync error as well.


> ---
> +$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
> +$node_subscriber->safe_psql('postgres', $cmd); $cmd = q(SELECT
> COUNT(1)
> += 3 FROM tbl WHERE i = 3);
> +$node_subscriber->poll_query_until('postgres', $cmd)
> +  or die "Timed out while waiting for applying";
> 
> I think it's better to wait for the subscriber to catch up and check the query
> result instead of using poll_query_until() so that we can check the query 
> result
> in case where the test fails.
I modified the code to wait for the subscriber and deleted poll_query_until.
Also, when I consider the test failure for this test as you mentioned,
expecting and checking the number of return value that equals 3
would be better. So, I expressed this point in my test as well,
according to your advice.


> ---
> +$cmd = qq(DROP INDEX tbl_unique);
> +$node_subscriber->safe_psql('postgres', $cmd);
> 
> In the newly added tap tests, all queries are consistently assigned to $cmd 
> and
> executed even when the query is used only once. It seems a different style 
> from
> the one in other tap tests. Is there any reason why we do this style for all 
> queries
> in the tap tests?
Fixed. I removed the 'cmd' variable itself.


Attached an updated patch v26.

Best Regards,
Takamichi Osumi



v26-0001-Optionally-disable-subscriptions-on-error.patch
Description: v26-0001-Optionally-disable-subscriptions-on-error.patch


Re: Add id's to various elements in protocol.sgml

2022-03-02 Thread Peter Eisentraut

On 01.03.22 18:27, Brar Piening wrote:

Also I'm not sure how well the autogenerated ids are reproducible over
time/versions/builds, and if it is wise to use them as targets to link
to from somewhere else.


Autogenerated ids are stable across builds of the same source.  They 
would change if the document structure is changed, for example, a 
section is inserted.





Re: PG DOCS - logical replication filtering

2022-03-02 Thread Aleksander Alekseev
Hi hackers,

> I see that a large part of the documentation is commented and marked as
> TBA (Column Filters, Combining Different Kinds of Filters). Could you
> please clarify if it's a work-in-progress patch? If it's not, I believe the
> commented part should be removed before committing.
> >
>
> I think we can remove any Column Filters related information
> (placeholders), if that patch gets committed, we can always extend the
> existing docs.
>

Here is an updated version of the patch.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Update-the-documentation-for-logical-replication.patch
Description: Binary data


Re: create_index test fails when synchronous_commit = off @ master

2022-03-02 Thread Aleksander Alekseev
Hi hackers,

> I don't see what else can be done either. Here is the corresponding patch.

Here is an updated patch that includes the commit message.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Use-synchronous_commit-on-in-test_setup.sql.patch
Description: Binary data


Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Michael Paquier
On Wed, Mar 02, 2022 at 05:22:35PM +0900, Kyotaro Horiguchi wrote:
> But the patch forgets to remove an useless variable.

Indeed.  I forgot to look at stderr.

>>  /* Initialise attributes information in the tuple descriptor */
>>  tupdesc = CreateTemplateTupleDesc(PG_STOP_BACKUP_V2_COLS);
>>  TupleDescInitEntry(tupdesc, (AttrNumber) 1, "lsn",
>> PG_LSNOID, -1, 0);
> 
> I think we can use get_call_resuilt_type here.

Yes, I don't mind doing so here.
--
Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index bf88858171..d8e8715ed1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6275,9 +6275,9 @@
   proname => 'pg_stop_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => '', prosrc => 'pg_stop_backup' },
 { oid => '2739', descr => 'finish taking an online backup',
-  proname => 'pg_stop_backup', prorows => '1', proretset => 't',
-  provolatile => 'v', proparallel => 'r', prorettype => 'record',
-  proargtypes => 'bool bool', proallargtypes => '{bool,bool,pg_lsn,text,text}',
+  proname => 'pg_stop_backup', provolatile => 'v', proparallel => 'r',
+  prorettype => 'record', proargtypes => 'bool bool',
+  proallargtypes => '{bool,bool,pg_lsn,text,text}',
   proargmodes => '{i,i,o,o,o}',
   proargnames => '{exclusive,wait_for_archive,lsn,labelfile,spcmapfile}',
   prosrc => 'pg_stop_backup_v2' },
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 12e2bf4135..9f7a282ed2 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -165,43 +165,20 @@ pg_stop_backup(PG_FUNCTION_ARGS)
 Datum
 pg_stop_backup_v2(PG_FUNCTION_ARGS)
 {
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+#define PG_STOP_BACKUP_V2_COLS 3
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	Datum		values[3];
-	bool		nulls[3];
+	Datum		values[PG_STOP_BACKUP_V2_COLS];
+	bool		nulls[PG_STOP_BACKUP_V2_COLS];
 
 	bool		exclusive = PG_GETARG_BOOL(0);
 	bool		waitforarchive = PG_GETARG_BOOL(1);
 	XLogRecPtr	stoppoint;
 	SessionBackupState status = get_backup_status();
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
+	/* Initialise attributes information in the tuple descriptor */
 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
 		elog(ERROR, "return type must be a row type");
 
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
-
-	MemoryContextSwitchTo(oldcontext);
-
 	MemSet(values, 0, sizeof(values));
 	MemSet(nulls, 0, sizeof(nulls));
 
@@ -251,9 +228,8 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 	/* Stoppoint is included on both exclusive and nonexclusive backups */
 	values[0] = LSNGetDatum(stoppoint);
 
-	tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
 /*
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 758ab6e25a..81bac6f581 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -384,7 +384,7 @@ CREATE OR REPLACE FUNCTION
 CREATE OR REPLACE FUNCTION pg_stop_backup (
 exclusive boolean, wait_for_archive boolean DEFAULT true,
 OUT lsn pg_lsn, OUT labelfile text, OUT spcmapfile text)
-  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
+  RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
 CREATE OR REPLACE FUNCTION


signature.asc
Description: PGP signature


Re: [PATCH] Add reloption for views to enable RLS

2022-03-02 Thread Dean Rasheed
On Tue, 1 Mar 2022 at 16:40, Christoph Heiss
 wrote:
>
> That is also the main reason I preferred naming it "security_invoker" -
> it is consistent with other databases and eases transition from such
> systems.
>
> I kept "check_permissions_owner" for now. Constantly changing it around
> with each iteration doesn't really bring any value IMHO, I'd rather have
> a final consensus on how to name the option and *then* change it for good.
>

Yes indeed, it's annoying to keep changing the name between patch
versions, so let's try to get a consensus now.

For my part, I find myself more and more convinced that
"security_invoker" is the right name, because it matches the
terminology used for functions, and in other database systems. I think
the parallels between security invoker functions and security invoker
views are quite strong.

There are a couple of additional considerations that lend weight to
that choice of name, though not uniquely to it:

1). There is a slight advantage to having an option that defaults to
false/off, like the existing "security_barrier" option -- it allows a
shorthand to turn the option on, because the system automatically
turns "WITH (security_barrier)" into "WITH (security_barrier=true)".

2). Grammatically, a name like this works better, because it serves
both as the name of the boolean option, and as an adjective that can
be used to describe and name the feature -- as in "security barrier
views are cool" -- making it easier to talk about the feature.

"check_permissions_owner=false" doesn't work as well in either regard,
and just feels much more clumsy.

When we come to write the release notes for this feature, saying that
this version of PG now supports security invoker views is going to
mean a lot more to people who already use that feature in other
databases.

What are other people's opinions?

Regards,
Dean




Changing "Hot Standby" to "hot standby"

2022-03-02 Thread Daniel Westermann (DWE)
Hi,

with reference to the discussion in docs: 
https://www.postgresql.org/message-id/flat/2221339.1645896597%40sss.pgh.pa.us#5a346c15ec2edbe8fcc93a1ffc2a7c7d

Here is a patch that changes "Hot Standby" to "hot standby" in 
high-availability.sgml, so we have a consistent wording.
Thoughts?

There are other places where hot standby is capitalized, but I guess we should 
start here.

Regards
Danieldiff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b5b6042104..ec144489e5 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -548,7 +548,7 @@ protocol to make nodes agree on a serializable transactional order.
rollforward will take considerably longer, so that technique only
offers a solution for disaster recovery, not high availability.
A standby server can also be used for read-only queries, in which case
-   it is called a Hot Standby server. See  for
+   it is called a hot standby server. See  for
more information.
   
 
@@ -1032,7 +1032,7 @@ primary_slot_name = 'node_a_slot'

 

-Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
+hot standby feedback propagates upstream, whatever the cascaded arrangement.

 

@@ -1496,19 +1496,19 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
   
 
  
-  Hot Standby
+  hot standby
 
   
-   Hot Standby
+   hot standby
   
 

-Hot Standby is the term used to describe the ability to connect to
+hot standby is the term used to describe the ability to connect to
 the server and run read-only queries while the server is in archive
 recovery or standby mode. This
 is useful both for replication purposes and for restoring a backup
 to a desired state with great precision.
-The term Hot Standby also refers to the ability of the server to move
+The term hot standby also refers to the ability of the server to move
 from recovery through to normal operation while users continue running
 queries and/or keep their connections open.

@@ -1623,7 +1623,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
being executed during recovery.  This restriction applies even to
temporary tables, because table rows cannot be read or written without
assigning a transaction ID, which is currently not possible in a
-   Hot Standby environment.
+   hot standby environment.
   
  
  
@@ -1703,7 +1703,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 In normal operation, read-only transactions are allowed to
 use LISTEN and NOTIFY,
-so Hot Standby sessions operate under slightly tighter
+so hot standby sessions operate under slightly tighter
 restrictions than ordinary read-only sessions.  It is possible that some
 of these restrictions might be loosened in a future release.

@@ -1746,7 +1746,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 

-There are also additional types of conflict that can occur with Hot Standby.
+There are also additional types of conflict that can occur with hot standby.
 These conflicts are hard conflicts in the sense that queries
 might need to be canceled and, in some cases, sessions disconnected to resolve them.
 The user is provided with several ways to handle these
@@ -1947,8 +1947,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 If hot_standby is on in postgresql.conf
 (the default value) and there is a
 standby.signalstandby.signalfor hot standby
-file present, the server will run in Hot Standby mode.
-However, it may take some time for Hot Standby connections to be allowed,
+file present, the server will run in hot standby mode.
+However, it may take some time for hot standby connections to be allowed,
 because the server will not accept connections until it has completed
 sufficient recovery to provide a consistent state against which queries
 can run.  During this period,
@@ -2252,7 +2252,7 @@ HINT:  You can then restart the server after making the necessary configuration
   
 
   
-   Hot Standby Parameter Reference
+   hot standby Parameter Reference
 

 Various parameters have been mentioned above in
@@ -2282,7 +2282,7 @@ HINT:  You can then restart the server after making the necessary configuration
Caveats
 

-There are several limitations of Hot Standby.
+There are several limitations of hot standby.
 These can and probably will be fixed in future releases:
 
   
@@ -2299,7 +2299,7 @@ HINT:  You can then restart the server after making the necessary configuration
 
  Valid starting points for standby queries are generated at each
  checkpoint on the primary. If the standby is shut down while the primary
- is in a shutdown state, it might not be possible to re-enter Hot Standby
+ is in a shutdown state, it might not be possible to re-enter hot standby
  u

Re: psql: Make SSL info display more compact

2022-03-02 Thread Daniel Gustafsson
> On 1 Mar 2022, at 09:44, Peter Eisentraut  
> wrote:

> I propose a reduced patch that just removes the "bits" display, since that is
> redundant with the "cipher"


No objections from me.

--
Daniel Gustafsson   https://vmware.com/





Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Aleksander Alekseev
Hi Michael,

```
 Datum
 pg_stop_backup_v2(PG_FUNCTION_ARGS)
 {
-ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+#define PG_STOP_BACKUP_V2_COLS 3
 TupleDesctupdesc;
-Tuplestorestate *tupstore;
-MemoryContext per_query_ctx;
-MemoryContext oldcontext;
-Datumvalues[3];
-boolnulls[3];
+Datumvalues[PG_STOP_BACKUP_V2_COLS];
+boolnulls[PG_STOP_BACKUP_V2_COLS];
```

Declaring a macro inside the procedure body is a bit unconventional.
Since it doesn't seem to be used for anything except these two array
declarations I suggest keeping simply "3" here.

-- 
Best regards,
Aleksander Alekseev




Re: Handle infinite recursion in logical replication setup

2022-03-02 Thread Amit Kapila
On Wed, Feb 23, 2022 at 11:59 AM vignesh C  wrote:
>
...
...
> I have attached a basic patch for this, if the idea is accepted, I
> will work further to test more scenarios, add documentation, and test
> and post an updated patch.
> For the second problem, Table synchronization of table including local
> data and replicated data using copy command.
>
> Let us consider the following scenario:
> a) node1 publishing to node2 b) node2 publishing to node1. Here in
> this case node1 will have replicated data from node2 and vice versa.
>
> In the above if user wants to include node3 to subscribe data from
> node2. Users will have to create a subscription in node3 to get the
> data from node2. During table synchronization we send the complete
> table data from node2 to node3. Node2 will have local data from node2
> and also replicated data from node1. Currently we don't have an option
> to differentiate between the locally generated data and replicated
> data in the heap which will cause infinite recursion as described
> above.
>
> To handle this if user has specified only_local option, we could throw
> a warning or error out while creating subscription in this case, we
> could have a column srreplicateddata in pg_subscription_rel which
> could indicate if the table has any replicated data or not:
> postgres=# select * from pg_subscription_rel;
>  srsubid | srrelid | srsubstate | srsublsn  | srreplicateddata
> -+-++---+--
>16389 |   16384 | r  | 0/14A4640 |t
>16389 |   16385 | r  | 0/14A4690 |f
> (1 row)
>
> In the above example, srreplicateddata with true indicates, tabel t1
> whose relid is 16384 has replicated data and the other row having
> srreplicateddata  as false indicates table t2 whose relid is 16385
> does not have replicated data.
> When creating a new subscription, the subscriber will connect to the
> publisher and check if the relation has replicated data by checking
> srreplicateddata in pg_subscription_rel table.
> If the table has any replicated data, log a warning or error for this.
>

If you want to give the error in this case, then I think we need to
provide an option to the user to allow copy. One possibility could be
to extend existing copy_data option as 'false', 'true', 'force'. For
'false', there shouldn't be any change, for 'true', if 'only_local'
option is also set and the new column indicates replicated data then
give an error, for 'force', we won't give an error even if the
conditions as mentioned for 'true' case are met, rather we will allow
copy in this case.

> Also, we could document the steps on how to handle the initial sync like:
> a) Complete the ongoing transactions on this table in the replication
> setup nodes i.e. node1 and node2 in the above case,  so that the table
> data is consistent, b) Once there are no ongoing transaction, Copy the
> table data using copy command from any one of the nodes, c) create
> subscription with copy_data option as off d) Perform further
> transactions on the table e) All the further transactions performed
> will be handled by the walsender which will take care of skipping
> replicated data and sending only the local data. i.e. node2 will send
> the locally generated data to node3.
>
> I'm not sure if there is any other better way to handle this.
>

I could think of the below options for users to set up bi-directional
replication for the same table.

Option-1:
There is no pre-existing data in the tables that are going to
participate in bi-directional replication. In such a case, Users can
create pub/sub (with only_local option as proposed by you) on both
nodes before starting any writes on tables. This will allow
bi-directional replication for the required tables. Now, if the user
wants one of the nodes to join at a later point, then the strategy in
Option-2/3 could be used.

Option-2:
One of the nodes (say node-1) has some pre-existing data and another
node (say node-2) doesn't have any pre-existing data. In this case,
the user can set up pub/sub (with only_local and copy_data as 'false'
options) for node-1 first before any of the operations on node-2.
Then, it can set up pub/sub on node-2. This will allow bi-directional
replication for the required tables.

Option-3:
Both the nodes have some pre-existing data. I think the easiest option
could be to truncate data on one of the nodes and set up pub/sub on
both nodes. See, one way to achieve it among two nodes as below:

Node-1:
Table t1 has data
1, 2, 3, 4

Publication for t1, pub1: Create Publication pub1 For Table t1;

Node-2:
Table t1 has data
5, 6, 7, 8
Publication for t1, pub1_2: Create Publication pub1_2 For Table t1;

Now, Create Subscription for pub1 on node1: Create Subscription sub1_2
Connection '' Publication pub1 WITH (only_local =
true);

Node-1:
Begin;
# Disallow truncates to be published
Alter Publication pub1 Set (publish = 'insert, update, delete');
Truncate t1;
Create Subscription sub1

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-03-02 Thread Kyotaro Horiguchi
At Wed, 02 Mar 2022 16:59:09 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 20 Jan 2022 17:19:04 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > At Thu, 20 Jan 2022 15:07:22 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > CI now likes this version for all platforms.
> 
> An xlog.c refactoring happend recently hit this.
> Just rebased on the change.

A function added to Util.pm used perl2host, which has been removed
recently.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From bb714659adcde5265974c46b061966e5dfc556be Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 11 Nov 2021 20:42:00 +0900
Subject: [PATCH v17 1/3] Add tablespace support to TAP framework

TAP framework doesn't support nodes that have tablespaces.  Especially
backup and initialization from backups failed if the source node has
tablespaces.  This commit provides simple way to create tablespace
directories and allows backup routines to handle tablespaces.
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++-
 src/test/perl/PostgreSQL/Test/Utils.pm   |  42 
 2 files changed, 304 insertions(+), 2 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index be05845248..15d57b9a71 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -298,6 +298,64 @@ sub archive_dir
 
 =pod
 
+=item $node->tablespace_storage([, nocreate])
+
+Diretory to store tablespace directories.
+If nocreate is true, returns undef if not yet created.
+
+=cut
+
+sub tablespace_storage
+{
+	my ($self, $nocreate) = @_;
+
+	if (!defined $self->{_tsproot})
+	{
+		# tablespace is not used, return undef if nocreate is specified.
+		return undef if ($nocreate);
+
+		# create and remember the tablespae root directotry.
+		$self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short();
+	}
+
+	return $self->{_tsproot};
+}
+
+=pod
+
+=item $node->tablespaces()
+
+Returns a hash from tablespace OID to tablespace directory name.  For
+example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as
+$hash{16384} = "ts1".
+
+=cut
+
+sub tablespaces
+{
+	my ($self) = @_;
+	my $pg_tblspc = $self->data_dir . '/pg_tblspc';
+	my %ret;
+
+	# return undef if no tablespace is used
+	return undef if (!defined $self->tablespace_storage(1));
+
+	# collect tablespace entries in pg_tblspc directory
+	opendir(my $dir, $pg_tblspc);
+	while (my $oid = readdir($dir))
+	{
+		next if ($oid !~ /^([0-9]+)$/);
+		my $linkpath = "$pg_tblspc/$oid";
+		my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath);
+		$ret{$oid} = File::Basename::basename($tsppath);
+	}
+	closedir($dir);
+
+	return %ret;
+}
+
+=pod
+
 =item $node->backup_dir()
 
 The output path for backups taken with $node->backup()
@@ -313,6 +371,77 @@ sub backup_dir
 
 =pod
 
+=item $node->backup_tablespace_storage_path(backup_name)
+
+Returns tablespace location path for backup_name.
+Retuns the parent directory if backup_name is not given.
+
+=cut
+
+sub backup_tablespace_storage_path
+{
+	my ($self, $backup_name) = @_;
+	my $dir = $self->backup_dir . '/__tsps';
+
+	$dir .= "/$backup_name" if (defined $backup_name);
+
+	return $dir;
+}
+
+=pod
+
+=item $node->backup_create_tablespace_storage(backup_name)
+
+Create tablespace location directory for backup_name if not yet.
+Create the parent tablespace storage that holds all location
+directories if backup_name is not supplied.
+
+=cut
+
+sub backup_create_tablespace_storage
+{
+	my ($self, $backup_name) = @_;
+	my $dir = $self->backup_tablespace_storage_path($backup_name);
+
+	File::Path::make_path $dir if (! -d $dir);
+}
+
+=pod
+
+=item $node->backup_tablespaces(backup_name)
+
+Returns a reference to hash from tablespace OID to tablespace
+directory name of tablespace directory that the specified backup has.
+For example, an oid 16384 pointing to ../tsps/backup1/ts1 is stored as
+$hash{16384} = "ts1".
+
+=cut
+
+sub backup_tablespaces
+{
+	my ($self, $backup_name) = @_;
+	my $pg_tblspc = $self->backup_dir . '/' . $backup_name . '/pg_tblspc';
+	my %ret;
+
+	#return undef if this backup holds no tablespaces
+	return undef if (! -d $self->backup_tablespace_storage_path($backup_name));
+
+	# scan pg_tblspc directory of the backup
+	opendir(my $dir, $pg_tblspc);
+	while (my $oid = readdir($dir))
+	{
+		next if ($oid !~ /^([0-9]+)$/);
+		my $linkpath = "$pg_tblspc/$oid";
+		my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath);
+		$ret{$oid} = File::Basename::basename($tsppath);
+	}
+	closedir($dir);
+
+	return \%ret;
+}
+
+=pod
+
 =item $node->install_path()
 
 The configured install path (if any) for the node.
@@ -370,6 +499,7 @@ sub info
 	print $fh "Data directory: " . $self->data_dir . "\n";
 	print $fh "Backup directory: " . $self->backup_dir . "\n";
 	print $fh "Archive directory: " . $self->archive_dir . "\n";
+	print $fh "Tablespace directory: " . $self->tablespace_storage . "\n";
 	print $fh "Connection stri

Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-02 Thread Amit Kapila
On Wed, Mar 2, 2022 at 1:26 PM Andres Freund  wrote:
>
> > > While rebasing, I was wondering why pgstat_reset_subscription_counter() 
> > > has
> > > "all subscription counters" support. We don't have a function to reset all
> > > function stats or such either.
> > >
> >
> > We have similar thing for srlu (pg_stat_reset_slru) and slots
> > (pg_stat_reset_replication_slot).
>
> Neither should have been added imo. We're already at 9 different reset
> functions.
>

As per [1], we have 7.

>
> And having pg_stat_reset_shared() with variable
> 'reset' systems but then also pg_stat_reset_slru() and
> pg_stat_reset_subscription_stats() is absurd.
>

I don't know. I feel if for some subsystem, we have a way to reset a
single counter like for slru or slots, one would prefer to use the
same function to reset all stats of that subsytem. Now, for WAL,
bgwriter, etc., we don't want to reset any specific counter, so doing
it via a shared function is okay but not for others.

[1] - 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-STATS-FUNCTIONS

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Add reloption for views to enable RLS

2022-03-02 Thread Wolfgang Walther

Dean Rasheed:

That is also the main reason I preferred naming it "security_invoker" -
it is consistent with other databases and eases transition from such
systems.

[...]

For my part, I find myself more and more convinced that
"security_invoker" is the right name, because it matches the
terminology used for functions, and in other database systems. I think
the parallels between security invoker functions and security invoker
views are quite strong.

[...]

When we come to write the release notes for this feature, saying that
this version of PG now supports security invoker views is going to
mean a lot more to people who already use that feature in other
databases.

What are other people's opinions?


All those points in favor of security_invoker are very good indeed. The 
main objection was not the term invoker, though, but the implicit 
association it creates as in "security_invoker=false behaves like 
security definer". But this is clearly wrong, the "security definer" 
semantics as used for functions or in other databases just don't apply 
as the default in PG.


I think renaming the reloption was a shortcut to avoid that association, 
while the best way to deal with that would be explicit documentation. 
Meanwhile, the patch has added a mention about CURRENT_USER, so that's a 
first step. Maybe an explicit mention that security_invoker=false, is 
NOT the same as "security definer" and explaining why would already be 
enough?


Best

Wolfgang





Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-03-02 Thread Nitin Jadhav
Thanks for reviewing.

> > > I suggested upthread to store the starting timeline instead.  This way 
> > > you can
> > > deduce whether it's a restartpoint or a checkpoint, but you can also 
> > > deduce
> > > other information, like what was the starting WAL.
> >
> > I don't understand why we need the timeline here to just determine
> > whether it's a restartpoint or checkpoint.
>
> I'm not saying it's necessary, I'm saying that for the same space usage we can
> store something a bit more useful.  If no one cares about having the starting
> timeline available for no extra cost then sure, let's just store the kind
> directly.

Fixed.

> 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks
> directly instead of new function pg_stat_get_progress_checkpoint_kind?
> + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s",
> + (flags == 0) ? "unknown" : "",
> + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
> + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
> + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "",
> + (flags & CHECKPOINT_FORCE) ? "force " : "",
> + (flags & CHECKPOINT_WAIT) ? "wait " : "",
> + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "",
> + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "",
> + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : "");

Fixed.
---

> 5) Do we need a special phase for this checkpoint operation? I'm not
> sure in which cases it will take a long time, but it looks like
> there's a wait loop here.
> vxids = GetVirtualXIDsDelayingChkpt(&nvxids);
> if (nvxids > 0)
> {
> do
> {
> pg_usleep(1L); /* wait for 10 msec */
> } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids));
> }

Yes. It is better to add a separate phase here.
---

> Also, how about special phases for SyncPostCheckpoint(),
> SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
> PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
> it might be increase in future (?)), TruncateSUBTRANS()?

SyncPreCheckpoint() is just incrementing a counter and
PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel
there is no need to add any phases for these as of now. We can add in
the future if necessary. Added phases for SyncPostCheckpoint(),
InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS().
---

> 6) SLRU (Simple LRU) isn't a phase here, you can just say
> PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES.
> +
> + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
> + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES);
>  CheckPointPredicate();
>
> And :s/checkpointing SLRU pages/checkpointing predicate lock pages
>+  WHEN 9 THEN 'checkpointing SLRU pages'

Fixed.
---

> 7) 
> :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS

I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as
it describes the purpose in less words.

> And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN
> 'processing file sync requests'

Fixed.
---

> 8) :s/Finalizing/finalizing
> +  WHEN 14 THEN 'Finalizing'

Fixed.
---

> 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files
> +  WHEN 3 THEN 'checkpointing snapshots'
> :s/checkpointing logical rewrite mappings/checkpointing logical
> replication rewrite mapping files
> +  WHEN 4 THEN 'checkpointing logical rewrite mappings'

Fixed.
---

> 10) I'm not sure if it's discussed, how about adding the number of
> snapshot/mapping files so far the checkpoint has processed in file
> processing while loops of
> CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
> be many logical snapshot or mapping files and users may be interested
> in knowing the so-far-processed-file-count.

I had thought about this while sharing the v1 patch and mentioned my
views upthread. I feel it won't give meaningful progress information
(It can be treated as statistics). Hence not included. Thoughts?

> > > As mentioned upthread, there can be multiple backends that request a
> > > checkpoint, so unless we want to store an array of pid we should store a 
> > > number
> > > of backend that are waiting for a new checkpoint.
> >
> > Yeah, you are right. Let's not go that path and store an array of
> > pids. I don't see a strong use-case with the pid of the process
> > requesting checkpoint. If required, we can add it later once the
> > pg_stat_progress_checkpoint view gets in.
>
> I don't think that's really necessary to give the pid list.
>
> If you requested a new checkpoint, it doesn't matter if it's only your backend
> that triggered it, another backend or a few other dozen, the result will be 
> the
> same and you have the information that the request has been seen.  We could
> store just a bool for that but having a number instead also gives a bit more
> information and may allow you to detect some broken logic on your client code
> if it keeps increasing.

It's a good met

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-02 Thread Fabien COELHO



Hello Tatsuo-san,

It seems the patch is ready for committer except below. Do you guys want 
to do more on below?


I'm planning a new review of this significant patch, possibly over the 
next week-end, or the next.


--
Fabien.




Re: row filtering for logical replication

2022-03-02 Thread Tomas Vondra
Hi,

While working on the column filtering patch, which touches about the
same places, I noticed two minor gaps in testing:

1) The regression tests do perform multiple ALTER PUBLICATION commands,
tweaking the row filter. But there are no checks the row filter was
actually modified / stored in the catalog. It might be just thrown away
and no one would notice.

2) There are no pg_dump tests.


So attached are two trivial patched, addressing this. The first one adds
a couple \dRp and \d commands, to show what the catalogs contain. The
second one adds a simple pg_dump test.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom e781f840e38701c63d8b57ff36bd520f2cced6ad Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 26 Feb 2022 17:33:09 +0100
Subject: [PATCH 1/3] Verify changing WHERE condition for a publication

Commit 52e4f0cd47 added support for row filters in logical replication,
including regression tests with multiple ALTER PUBLICATION commands,
modifying the row filter. But the tests never verified that the row
filter was actually updated in the catalog. This adds a couple \d and
\dRp commands, to verify the catalog was updated.
---
 src/test/regress/expected/publication.out | 66 +++
 src/test/regress/sql/publication.sql  |  8 +++
 2 files changed, 74 insertions(+)

diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 3c382e520e4..227ce759486 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -395,15 +395,81 @@ LINE 1: ...ICATION testpub5 ADD TABLE testpub_rf_tbl1 WHERE (b < '2' CO...
 DETAIL:  User-defined collations are not allowed.
 -- ok - NULLIF is allowed
 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl1 WHERE (NULLIF(1,2) = a);
+\d+ testpub_rf_tbl1
+  Table "public.testpub_rf_tbl1"
+ Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
++-+---+--+-+--+--+-
+ a  | integer |   |  | | plain|  | 
+ b  | text|   |  | | extended |  | 
+Publications:
+"testpub5" WHERE (NULLIF(1, 2) = a)
+
+\dRp+ testpub5
+Publication testpub5
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+--++-+-+-+---+--
+ regress_publication_user | f  | t   | f   | f   | f | f
+Tables:
+"public.testpub_rf_tbl1" WHERE (NULLIF(1, 2) = a)
+
 -- ok - built-in operators are allowed
 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl1 WHERE (a IS NULL);
+\d+ testpub_rf_tbl1
+  Table "public.testpub_rf_tbl1"
+ Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
++-+---+--+-+--+--+-
+ a  | integer |   |  | | plain|  | 
+ b  | text|   |  | | extended |  | 
+Publications:
+"testpub5" WHERE (a IS NULL)
+
+\dRp+ testpub5
+Publication testpub5
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+--++-+-+-+---+--
+ regress_publication_user | f  | t   | f   | f   | f | f
+Tables:
+"public.testpub_rf_tbl1" WHERE (a IS NULL)
+
 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl1 WHERE ((a > 5) IS FALSE);
 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl1 WHERE (a IS DISTINCT FROM 5);
 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl1 WHERE ((a, a + 1) < (2, 3));
 -- ok - built-in type coercions between two binary compatible datatypes are allowed
 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl1 WHERE (b::varchar < '2');
+\d+ testpub_rf_tbl1
+  Table "public.testpub_rf_tbl1"
+ Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
++-+---+--+-+--+--+-
+ a  | integer |   |  | | plain|  | 
+ b  | text|   |  | | extended |  | 
+Publications:
+"testpub5" WHERE (((b)::character varying)::text < '2'::text)
+
+\dRp+ testpub5
+Publication testpub5
+  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root 
+--++-+-+-+---+--
+ regress_publication_user | f  | t

Add CHECKPOINT_REQUESTED flag to the log message in LogCheckpointStart()

2022-03-02 Thread Nitin Jadhav
Hi,

I have noticed that the CHECKPOINT_REQUESTED flag information is not
present in the log message of LogCheckpointStart() function. I would
like to understand if it was missed or left intentionally. The log
message describes all the possible checkpoint flags except
CHECKPOINT_REQUESTED flag. I feel we should support this. Thoughts?

Please find the patch attached.

Thanks & Regards,
Nitin Jadhav


v1-0001-add-checkpoint_requested-flag-to-the-log-message.patch
Description: Binary data


Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-02 Thread Daniel Gustafsson
> On 1 Mar 2022, at 17:16, Greg Stark  wrote:

> Last November Daniel Gustafsson  did a patch triage. It took him three
> emails to get through the patches in the commitfest back then.

It should be noted that I only powered through the patches that had been in 3+
commitfests at the time..

> Since then we've had the November and the January commitfests so I was
> interested to see how many of these patches had advanced

..so there are new patches now that have crossed the (admittedly arbitrarily
chosen) breakpoint of 3+ CF's.

> So I'm going to post updates but I'm going to break it up into smaller
> batches because otherwise it'll take me a month before I post
> anything.

Thanks for picking it up and continuing with recent developments.  Let me know
if you want a hand in triaging patchsets.

--
Daniel Gustafsson   https://vmware.com/





Re: row filtering for logical replication

2022-03-02 Thread Euler Taveira
On Wed, Mar 2, 2022, at 8:45 AM, Tomas Vondra wrote:
> While working on the column filtering patch, which touches about the
> same places, I noticed two minor gaps in testing:
> 
> 1) The regression tests do perform multiple ALTER PUBLICATION commands,
> tweaking the row filter. But there are no checks the row filter was
> actually modified / stored in the catalog. It might be just thrown away
> and no one would notice.
The test that row filter was modified is available in a previous section. The
one that you modified (0001) is testing the supported objects.

153 ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000 AND e 
< 2000);
154 \dRp+ testpub5
155 ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2;
156 \dRp+ testpub5
157 -- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another WHERE 
expression)
158 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300 AND e < 
500);
159 \dRp+ testpub5

IIRC this test was written before adding the row filter information into the
psql. We could add \d+ testpub_rf_tbl3 before and after the modification.

> 2) There are no pg_dump tests.
WFM.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Add CHECKPOINT_REQUESTED flag to the log message in LogCheckpointStart()

2022-03-02 Thread Bharath Rupireddy
On Wed, Mar 2, 2022 at 5:41 PM Nitin Jadhav
 wrote:
>
> Hi,
>
> I have noticed that the CHECKPOINT_REQUESTED flag information is not
> present in the log message of LogCheckpointStart() function. I would
> like to understand if it was missed or left intentionally. The log
> message describes all the possible checkpoint flags except
> CHECKPOINT_REQUESTED flag. I feel we should support this. Thoughts?

I don't think that's useful. Being in LogCheckpointStart
(CreateCheckPoint or CreateRestartPoint) itself means that somebody
has requested a checkpoint. Having CHECKPOINT_REQUESTED doesn't add
any value.

I would suggest removing the CHECKPOINT_REQUESTED flag as it's not
being used anywhere instead CheckpointerShmem->ckpt_flags is used as
an indication of the checkpoint requested in CheckpointerMain [1]. If
others don't agree to remove as it doesn't cause any harm, then, I
would  add something like this for more readability:

 if volatile CheckpointerShmemStruct *)
CheckpointerShmem)->ckpt_flags) & CHECKPOINT_REQUESTED))
{
do_checkpoint = true;
PendingCheckpointerStats.m_requested_checkpoints++;
}

[1]
/*
 * Detect a pending checkpoint request by checking whether the flags
 * word in shared memory is nonzero.  We shouldn't need to acquire the
 * ckpt_lck for this.
 */
if (((volatile CheckpointerShmemStruct *)
CheckpointerShmem)->ckpt_flags)
{
do_checkpoint = true;
PendingCheckpointerStats.m_requested_checkpoints++;
}

Regards,
Bharath Rupireddy.




Re: Add 64-bit XIDs into PostgreSQL 15

2022-03-02 Thread Aleksander Alekseev
Hi hackers,

> In this part, I suppose you've found a definite bug. Thanks! There are a 
> couple
> of ways how it could be fixed:
>
> 1. If we enforce checkpoint at replica promotion then we force full-page 
> writes after each page modification afterward.
>
> 2. Maybe it's worth using BufferDesc bit to mark the page as converted to 
> 64xid but not yet written to disk? For example, one of four bits from 
> BUF_USAGECOUNT.
> BM_MAX_USAGE_COUNT  = 5 so it will be enough 3 bits to store it. This will 
> change in-memory page representation but will not need WAL-logging which is 
> impossible on a replica.
>
> What do you think about it?

I'm having difficulties merging and/or testing
v8-0002-Add-64bit-xid.patch since I'm not 100% sure which commit this
patch was targeting. Could you please submit a rebased patch and/or
share your development branch on GitHub?

I agree with Bruce it would be great to deliver this in PG15. Please
let me know if you believe it's unrealistic for any reason so I will
focus on testing and reviewing other patches.

For now, I'm changing the status of the patch to "Waiting on Author".

-- 
Best regards,
Aleksander Alekseev




Re: logical replication empty transactions

2022-03-02 Thread Ajin Cherian
On Wed, Mar 2, 2022 at 1:01 PM shiy.f...@fujitsu.com
 wrote:
>
> 4.
> @@ -1617,9 +1829,21 @@ pgoutput_stream_prepare_txn(LogicalDecodingContext 
> *ctx,
> ReorderBufferTXN *txn,
> XLogRecPtr 
> prepare_lsn)
>  {
> +   PGOutputTxnData *txndata = txn->output_plugin_private;
> +   boolsent_begin_txn = txndata->sent_begin_txn;
> +
> Assert(rbtxn_is_streamed(txn));
>
> -   OutputPluginUpdateProgress(ctx);
> +   pfree(txndata);
> +   txn->output_plugin_private = NULL;
> +
> +   if (!sent_begin_txn)
> +   {
> +   elog(DEBUG1, "Skipping replication of an empty transaction in 
> stream prepare");
> +   return;
> +   }
> +
> +   OutputPluginUpdateProgress(ctx, false);
> OutputPluginPrepareWrite(ctx, true);
> logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn);
> OutputPluginWrite(ctx, true);
>
> I notice that the patch skips stream prepared transaction, this would cause an
> error on subscriber side when committing this transaction on publisher side, 
> so
> I think we'd better not do that.
>
> For example:
> (set logical_decoding_work_mem = 64kB, max_prepared_transactions = 10 in
> postgresql.conf)
>
> -- publisher
> create table test (a int, b text, primary key(a));
> create table test2 (a int, b text, primary key(a));
> create publication pub for table test;
>
> -- subscriber
> create table test (a int, b text, primary key(a));
> create table test2 (a int, b text, primary key(a));
> create subscription sub connection 'dbname=postgres port=5432' publication 
> pub with(two_phase=on, streaming=on);
>
> -- publisher
> begin;
> INSERT INTO test2 SELECT i, md5(i::text) FROM generate_series(1, 1000) s(i);
> prepare transaction 't';
> commit prepared 't';
>
> The error message in subscriber log:
> ERROR:  prepared transaction with identifier "pg_gid_16391_722" does not exist
>

Thanks for the test. I guess this mixed streaming+two-phase runs into
the same problem that
was there while skipping two-phased transactions. If the eventual
commit prepared comes after a restart,
then there is no way of knowing if the original transaction was
skipped or not and we can't know if the commit prepared
needs to be sent. I tried not skipping the "stream prepare", but that
causes a crash in the apply worker
as it tries to find the non-existent streamed file. We could add logic
to silently ignore a spurious "stream prepare"
but that might not be ideal. Any thoughts on how to address this? Or
else, we will need to avoid skipping streamed
transactions as well.

regards,
Ajin Cherian
Fujitsu Australia




Re: Changing "Hot Standby" to "hot standby"

2022-03-02 Thread Aleksander Alekseev
Hi Daniel,

> Here is a patch that changes "Hot Standby" to "hot standby" in
high-availability.sgml, so we have a consistent wording.
> Thoughts?

```
-   Hot Standby Parameter Reference
+   hot standby Parameter Reference
```

Pretty sure that for titles we should keep English capitalization rules.

-- 
Best regards,
Aleksander Alekseev


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2022-02-28 11:26:06 +0100, Peter Eisentraut wrote:
> > We already have a variety of authentication mechanisms that support central
> > management: LDAP, PAM, Kerberos, Radius.
> 
> LDAP, PAM and Radius all require cleartext passwords, so aren't a great
> solution based on the concerns voiced in this thread. IME Kerberos is
> operationally too complicated to really be used, unless it's already part of
> the operating environment.

... which is very, very, very often is, wrt Kerberos.

> > What other mechanisms are people thinking about implementing using these
> > hooks?
> 
> The cases I've heard about are about centralizing auth across multiple cloud
> services. Including secret management in some form. E.g. allowing an
> application to auth to postgres, redis and having the secret provided by
> infrastructure, rather than having to hardcode it in config or such.

This sounds a lot like OAUTH or similar, which might be useful to
consider adding if it can be done reasonably.

> I can't see application developers configuring kerberos and I don't think
> LDAP, PAM, Radius are a great answer either, due to the plaintext requirement
> alone? LDAP is pretty clearly dying technology, PAM is fragile complicated C
> stuff that's not portable across OSs. Radius is probably the most realistic,
> but at least as implemented doesn't seem flexible enough (e.g. no access to
> group memberships etc).

Great thing about Kerberos is that, in general, application developers
don't really need to do much to configure it.

> Nor does baking stuff like that in seem realistic to me, it'll presumably be
> too cloud provider specific.

I don't quite buy an argument that hinges on the idea of centralized
auth across multiple cloud services, as suggested above, while also
being too cloud provider specific to be something that could be baked
in, as said here.  Maybe I'm misunderstanding.  Also a bit concerned
that adding hooks on presumptions about what some cloud provider needs
isn't a good plan.

In general though, I'm certainly more supportive of the idea of adding
support for authentication mechanisms that are standardized and work
across multiple cloud providers and services in general, than about
adding things which are specific to one provider.  I don't particularly
care for the idea of adding hooks and then having cloud providers go off
and develop their own proprietary authentication system that aren't
standardized and which don't have public review, which seems like it's
the use-case for adding them.  I'm not dead-set against it, but it just
doesn't strike me as a good area to add hooks for folks to use.  Better
would be baked in code that follows a well defined and reviewed RFC that
anyone can look at and make sure follows the standard.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> On 01.03.22 22:34, Andres Freund wrote:
> >The cases I've heard about are about centralizing auth across multiple cloud
> >services. Including secret management in some form. E.g. allowing an
> >application to auth to postgres, redis and having the secret provided by
> >infrastructure, rather than having to hardcode it in config or such.
> >
> >I can't see application developers configuring kerberos and I don't think
> >LDAP, PAM, Radius are a great answer either, due to the plaintext requirement
> >alone? LDAP is pretty clearly dying technology, PAM is fragile complicated C
> >stuff that's not portable across OSs. Radius is probably the most realistic,
> >but at least as implemented doesn't seem flexible enough (e.g. no access to
> >group memberships etc).
> >
> >Nor does baking stuff like that in seem realistic to me, it'll presumably be
> >too cloud provider specific.
> 
> Let's gather some more information on this.  PostgreSQL should support the
> authentication that many people want to use out of the box.  I don't think
> it would be good to be at a point where all the built-in methods are
> outdated and if you want to use the good stuff you have to hunt for plugins.
> The number of different cloud APIs is effectively small.  I expect that
> there are a lot of similarities, like they probably all need support for
> http calls, they might need support for caching lookups, etc.  OIDC was
> mentioned elsewhere.  That's a standard.  Is that compatible with any cloud
> providers?  Would that be enough for many users?

I tend to agree with this, and to that end I'd argue that we should
probably be working to drop support for things like PAM, with
appropriate code replacing any useful capabilities that folks were using
it for (which would also mean it'd work across all the OS's we support,
which would be nice..).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Jonathan S. Katz

On 3/2/22 3:24 AM, Peter Eisentraut wrote:

On 01.03.22 22:17, Jonathan S. Katz wrote:
If you're moving to a newer version of PostgreSQL, you likely have to 
update your connection drivers anyway (rebuilt against new libpq, 
supporting any changes in the protocol, etc). I would prefer more data 
to support that argument, but this is generally what you need to do.


However, we may need to step towards it. We took one step last release 
with defaulting to SCRAM. Perhaps this release we add a warning for 
anything using md5 auth that "this will be removed in a future 
release." (or specifically v16). We should also indicate in the docs 
that md5 is deprecated and will be removed.


I find that a lot of people are still purposely using md5.  Removing it 
now or in a year would be quite a disruption.


What are the reasons they are still purposely using it? The ones I have 
seen/heard are:


- Using an older driver
- On a pre-v10 PG
- Unaware of SCRAM

What I'm proposing above is to start the process of deprecating it as an 
auth method, which also allows to continue the education efforts to 
upgrae. Does that make sense?


It's also worth considering that keeping the code equipped to handle 
different kinds of password hashing would help it stay in shape if we 
ever need to add support for the next SHA after 256 or whatever.


I think it's fine to keep the hashing code. The end goal is to remove 
the md5 authentication mechanism.


Thanks,

Jonathan




OpenPGP_signature
Description: OpenPGP digital signature


Re: PG DOCS - logical replication filtering

2022-03-02 Thread Peter Eisentraut

On 02.03.22 05:47, Peter Smith wrote:

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).


The pending feature to select a subset of table columns to replicate is 
not "column filtering".  The thread might still be still called that, 
but we have changed the patch to not use that terminology.


Filtering is a dynamic action based on actual values.  The row filtering 
feature does that.  The column list feature is a static DDL-time 
configuration.  It is no more filtering than specifying a list of tables 
in a publication is table filtering.


So please consider organizing the documentation differently to not 
create this confusion.






Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Robert Haas
On Wed, Mar 2, 2022 at 5:25 AM Aleksander Alekseev
 wrote:
> ```
>  Datum
>  pg_stop_backup_v2(PG_FUNCTION_ARGS)
>  {
> -ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +#define PG_STOP_BACKUP_V2_COLS 3
>  TupleDesctupdesc;
> -Tuplestorestate *tupstore;
> -MemoryContext per_query_ctx;
> -MemoryContext oldcontext;
> -Datumvalues[3];
> -boolnulls[3];
> +Datumvalues[PG_STOP_BACKUP_V2_COLS];
> +boolnulls[PG_STOP_BACKUP_V2_COLS];
> ```
>
> Declaring a macro inside the procedure body is a bit unconventional.
> Since it doesn't seem to be used for anything except these two array
> declarations I suggest keeping simply "3" here.

I think we do this kind of thing in various places in similar
situations, and I think it is good style. It makes it easier to catch
everything if you ever need to update the code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 2, 2022 at 5:25 AM Aleksander Alekseev
>  wrote:
>> Declaring a macro inside the procedure body is a bit unconventional.
>> Since it doesn't seem to be used for anything except these two array
>> declarations I suggest keeping simply "3" here.

> I think we do this kind of thing in various places in similar
> situations, and I think it is good style. It makes it easier to catch
> everything if you ever need to update the code.

Yeah, there's plenty of precedent for that coding if you look around.
I've not read the whole patch, but this snippet seems fine to me
if there's also an #undef at the end of the function.

regards, tom lane




Re: Allow root ownership of client certificate key

2022-03-02 Thread Tom Lane
Chris Bandy  writes:
> On 3/1/22 3:15 AM, Tom Lane wrote:
>> Anyway, I'd be happier about back-patching if we could document
>> actual requests to make it work like the server side does.

> PGO runs PostgreSQL 10 through 14 in Kubernetes, and we have to work
> around this issue when using certificates for system accounts.

Sold then, I'll make it so in a bit.

regards, tom lane




Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Aleksander Alekseev
Hi Tom.

Yeah, there's plenty of precedent for that coding if you look around.
> I've not read the whole patch, but this snippet seems fine to me
> if there's also an #undef at the end of the function.
>

No, there is no #undef. With #undef I don't mind it either.

-- 
Best regards,
Aleksander Alekseev


Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-02 Thread Ashutosh Sharma
Some review comments on v5 patch (v5-0001-pg_walinspect.patch)

+--
+-- pg_get_wal_records_info()
+--
+CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
+IN end_lsn pg_lsn,
+IN wait_for_wal boolean DEFAULT false,
+OUT lsn pg_lsn,

What does the wait_for_wal flag mean here when one has already
specified the start and end lsn? AFAIU, If a user has specified a
start and stop LSN, it means that the user knows the extent to which
he/she wants to display the WAL records in which case we need to stop
once the end lsn has reached . So what is the meaning of wait_for_wal
flag? Does it look sensible to have the wait_for_wal flag here? To me
it doesn't.

==

+--
+-- pg_get_wal_records_info_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
+OUT lsn pg_lsn,
+OUT prev_lsn pg_lsn,
+OUT xid xid,

Why is this function required? Is pg_get_wal_records_info() alone not
enough? I think it is. See if we can make end_lsn optional in
pg_get_wal_records_info() and lets just have it alone. I think it can
do the job of pg_get_wal_records_info_till_end_of_wal function.

==

+--
+-- pg_get_wal_stats_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
+OUT resource_manager text,
+OUT count int8,

Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?

==


+   if (loc <= read_upto)
+   break;
+
+   /* Let's not wait for WAL to be available if
indicated */
+   if (loc > read_upto &&
+   state->private_data != NULL)
+   {

Why loc > read_upto? The first if condition is (loc <= read_upto)
followed by the second if condition - (loc > read_upto). Is the first
if condition (loc <= read_upto) not enough to indicate that loc >
read_upto?

==

+#define IsEndOfWALReached(state) \
+   (state->private_data != NULL && \
+   (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->no_wait == true) && \
+   (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->reached_end_of_wal == true))


I think we should either use state or xlogreader. First line says
state->private_data and second line xlogreader->private_data.

==

+   (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->reached_end_of_wal == true))
+

There is a new patch coming to make the end of WAL messages less
scary. It introduces the EOW flag in xlogreaderstate maybe we can use
that instead of introducing new flags in private area to represent the
end of WAL.

==

+/*
+ * XLogReaderRoutine->page_read callback for reading local xlog files
+ *
+ * This function is same as read_local_xlog_page except that it works in both
+ * wait and no wait mode. The callers can specify about waiting in private_data
+ * of XLogReaderState.
+ */
+int
+read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
+  int reqLen, XLogRecPtr
targetRecPtr, char *cur_page)
+{
+   XLogRecPtr  read_upto,

Do we really need this function? Can't we make use of an existing WAL
reader function - read_local_xlog_page()?

--
With Regards,
Ashutosh Sharma.

On Fri, Feb 25, 2022 at 4:33 PM Bharath Rupireddy
 wrote:
>
> On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma  wrote:
> > I don't think that's the use case of this patch. Unless there is some
> > other valid reason, I would suggest you remove it.
>
> Removed the function pg_verify_raw_wal_record. Robert and Greg also
> voted for removal upthread.
>
> > > > Should we add a function that returns the pointer to the first and
> > > > probably the last WAL record in the WAL segment? This would help users
> > > > to inspect the wal records in the entire wal segment if they wish to
> > > > do so.
> > >
> > > Good point. One can do this already with pg_get_wal_records_info and
> > > pg_walfile_name_offset. Usually, the LSN format itself can give an
> > > idea about the WAL file it is in.
> > >
> > > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > > pg_get_wal_records_info('0/500', '0/5FF') order by lsn asc
> > > limit 1;
> > > lsn|pg_walfile_name_offset
> > > ---+---
> > >  0/538 | (00010005,56)
> > > (1 row)
> > >
> > > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > > pg_get_wal_records_info('0/500', '0/5FF') order by lsn desc
> > > limit 1;
> > > lsn|   pg_walfile_name_offset
> > > ---+-
> > >  0/5C0 | (00010005,16777152)
> > > (1 row)
> > >
> >
> > The workaround you are suggesting is not very user friendly and FYKI
> > pg_wal_records_info simply hangs at times when we specify the higher
> > and lower limit of lsn in a wal file.
> >
> > To make things easier for the end users I would

Re: Add 64-bit XIDs into PostgreSQL 15

2022-03-02 Thread Pavel Borisov
Hi hackers!

Hi! Here is the rebased version.
>
I'd like to add a description of what was done in v9:
- The patch is rebased on current master branch
- In-memory tuple storage format was refactored as promised to have
pre-calculated 64bit xmin and xmax, not just copies of pd_xid_base and
pd_multi_base.
- Fixed bug reported by Andres Freund, with lazy conversion of pages
upgraded from 32 to 64 xid when first tuple read (and therefore lazy
conversion) is done in read-only state (read-only xact or on replica). In
this case now in memory buffer descriptor will be marked
with REGBUF_CONVERTED flag. When cluster comes to read-write state this
will lead to emitting full page write xlog instruction for this page.

Relevant changes in README are also done.

We'd very much appreciate enthusiasm to have 64 bit xid's in PG15 and any
effort to review and test this feature.

Alexander, thanks for your attention to the patchset. Your questions and
review are very much welcome!
The participation of other hackers is highly appreciated as always!

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Bruce Momjian
On Tue, Mar  1, 2022 at 08:31:19AM -0500, Stephen Frost wrote:
> > The last time I played with this area is the recent error handling
> > improvement with cryptohashes but MD5 has actually helped here in
> > detecting the problem as a patched OpenSSL would complain if trying to
> > use MD5 as hash function when FIPS is enabled.
> 
> Having to continue to deal with md5 as an algorithm when it's known to
> be notably less secure and so much so that organizations essentially ban
> its use for exactly what we're using it for, in fact, another reason to

Really?  I thought it was publicly-visible MD5 hashes that were the
biggest problem.  Our 32-bit salt during the connection is a problem, of
course.

> remove it, not a reason to keep it.  Better code coverage testing of
> error paths is the answer to making sure that our error handling behaves
> properly.

What is the logic to removing md5 but keeping 'password'?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Mar  1, 2022 at 08:31:19AM -0500, Stephen Frost wrote:
> > > The last time I played with this area is the recent error handling
> > > improvement with cryptohashes but MD5 has actually helped here in
> > > detecting the problem as a patched OpenSSL would complain if trying to
> > > use MD5 as hash function when FIPS is enabled.
> > 
> > Having to continue to deal with md5 as an algorithm when it's known to
> > be notably less secure and so much so that organizations essentially ban
> > its use for exactly what we're using it for, in fact, another reason to
> 
> Really?  I thought it was publicly-visible MD5 hashes that were the
> biggest problem.  Our 32-bit salt during the connection is a problem, of
> course.

Neither are good.  Not sure that we really need to spend a lot of effort
trying to figure out which issue is the biggest problem.

> > remove it, not a reason to keep it.  Better code coverage testing of
> > error paths is the answer to making sure that our error handling behaves
> > properly.
> 
> What is the logic to removing md5 but keeping 'password'?

I don't think we should keep 'password'.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Tom Lane
Stephen Frost  writes:
> * Bruce Momjian (br...@momjian.us) wrote:
>> What is the logic to removing md5 but keeping 'password'?

> I don't think we should keep 'password'.

I don't see much point in that unless we deprecate *all* the
auth methods that transmit a cleartext password.

regards, tom lane




Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Bruce Momjian (br...@momjian.us) wrote:
> >> What is the logic to removing md5 but keeping 'password'?
> 
> > I don't think we should keep 'password'.
> 
> I don't see much point in that unless we deprecate *all* the
> auth methods that transmit a cleartext password.

I'm not sure that it's quite so simple.  Perhaps we should also drop
LDAP and I don't really think PAM was ever terribly good for us to have,
but at least PAM and RADIUS could possibly be used with OTP solutions
(and maybe LDAP?  Not sure, don't think I've seen that but perhaps..),
rendering sniffing of what's transmitted less valuable.  We don't
support that for 'password' itself or for 'md5' in any serious way
though.

We really should drop ident already though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Bruce Momjian
On Wed, Mar  2, 2022 at 10:09:31AM -0500, Stephen Frost wrote:
> I'm not sure that it's quite so simple.  Perhaps we should also drop
> LDAP and I don't really think PAM was ever terribly good for us to have,
> but at least PAM and RADIUS could possibly be used with OTP solutions
> (and maybe LDAP?  Not sure, don't think I've seen that but perhaps..),
> rendering sniffing of what's transmitted less valuable.  We don't
> support that for 'password' itself or for 'md5' in any serious way
> though.

I thought all the plain-password methods were already using SSL
(hopefully with certificate authentication) and they were therefore
safe.  Why would we remove something like LDAP if that is what the site
is already using?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: SQL/JSON: functions

2022-03-02 Thread Andrew Dunstan


On 3/1/22 16:41, Andrew Dunstan wrote:
> On 2/1/22 14:11,I wrote:
>> 2. The new GUC "sql_json" is a bit of a worry. I understand what it's
>> trying to do, but I'm trying to convince myself it's not going to be a
>> fruitful source of error reports, especially if people switch it in the
>> middle of a session. Maybe it should be an initdb option instead of a GUC?
>>
>>
>
> So far my efforts have not borne fruit. Here's why:
>
>
> andrew=# set sql_json = jsonb;
> SET
> andrew=# create table abc (x text, y json);
> CREATE TABLE
> andrew=# \d abc
>    Table "public.abc"
>  Column | Type | Collation | Nullable | Default
> +--+---+--+-
>  x  | text |   |  |
>  y  | json |   |  |
>
> andrew=# insert into abc values ('a','{"q":1}');
> INSERT 0 1
> andrew=# select json_each(y) from abc;
> ERROR:  function json_each(json) does not exist
> LINE 1: select json_each(y) from abc;
>    ^
> HINT:  No function matches the given name and argument types. You might
> need to add explicit type casts.
> andrew=# select jsonb_each(y) from abc;
>  jsonb_each
> 
>  (q,1)
> (1 row)
>
>
> The description tells them the column is json, but the json_* functions
> don't work on the column and you need to use the jsonb functions. That
> seems to me a recipe for major confusion. It might be better if we set
> it at initdb time so it couldn't be changed, but even so it could be
> horribly confusing.
>
> This is certainly severable from the rest of these patches. I'm not sure
> how severable it is from the SQL/JSON Table patches.
>
>

I have confirmed that this is not required at all for the JSON_TABLE
patch set.


I'll submit new patch sets omitting it shortly. The GUC patch can be
considered separately, probably as release 16 material, but I think as
is it's at best quite incomplete.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Changing "Hot Standby" to "hot standby"

2022-03-02 Thread Daniel Westermann (DWE)
Hi Aleksander,

> Pretty sure that for titles we should keep English capitalization rules.

Done like that. Thanks for taking a look.

Regards
Danieldiff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b5b6042104..08eb1ad946 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -548,7 +548,7 @@ protocol to make nodes agree on a serializable transactional order.
rollforward will take considerably longer, so that technique only
offers a solution for disaster recovery, not high availability.
A standby server can also be used for read-only queries, in which case
-   it is called a Hot Standby server. See  for
+   it is called a hot standby server. See  for
more information.
   
 
@@ -1032,7 +1032,7 @@ primary_slot_name = 'node_a_slot'

 

-Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
+hot standby feedback propagates upstream, whatever the cascaded arrangement.

 

@@ -1499,16 +1499,16 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
   Hot Standby
 
   
-   Hot Standby
+   hot standby
   
 

-Hot Standby is the term used to describe the ability to connect to
+hot standby is the term used to describe the ability to connect to
 the server and run read-only queries while the server is in archive
 recovery or standby mode. This
 is useful both for replication purposes and for restoring a backup
 to a desired state with great precision.
-The term Hot Standby also refers to the ability of the server to move
+The term hot standby also refers to the ability of the server to move
 from recovery through to normal operation while users continue running
 queries and/or keep their connections open.

@@ -1623,7 +1623,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
being executed during recovery.  This restriction applies even to
temporary tables, because table rows cannot be read or written without
assigning a transaction ID, which is currently not possible in a
-   Hot Standby environment.
+   hot standby environment.
   
  
  
@@ -1703,7 +1703,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 In normal operation, read-only transactions are allowed to
 use LISTEN and NOTIFY,
-so Hot Standby sessions operate under slightly tighter
+so hot standby sessions operate under slightly tighter
 restrictions than ordinary read-only sessions.  It is possible that some
 of these restrictions might be loosened in a future release.

@@ -1746,7 +1746,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 

-There are also additional types of conflict that can occur with Hot Standby.
+There are also additional types of conflict that can occur with hot standby.
 These conflicts are hard conflicts in the sense that queries
 might need to be canceled and, in some cases, sessions disconnected to resolve them.
 The user is provided with several ways to handle these
@@ -1947,8 +1947,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 If hot_standby is on in postgresql.conf
 (the default value) and there is a
 standby.signalstandby.signalfor hot standby
-file present, the server will run in Hot Standby mode.
-However, it may take some time for Hot Standby connections to be allowed,
+file present, the server will run in hot standby mode.
+However, it may take some time for hot standby connections to be allowed,
 because the server will not accept connections until it has completed
 sufficient recovery to provide a consistent state against which queries
 can run.  During this period,
@@ -2282,7 +2282,7 @@ HINT:  You can then restart the server after making the necessary configuration
Caveats
 

-There are several limitations of Hot Standby.
+There are several limitations of hot standby.
 These can and probably will be fixed in future releases:
 
   
@@ -2299,7 +2299,7 @@ HINT:  You can then restart the server after making the necessary configuration
 
  Valid starting points for standby queries are generated at each
  checkpoint on the primary. If the standby is shut down while the primary
- is in a shutdown state, it might not be possible to re-enter Hot Standby
+ is in a shutdown state, it might not be possible to re-enter hot standby
  until the primary is started up, so that it generates further starting
  points in the WAL logs.  This situation isn't a problem in the most
  common situations where it might happen. Generally, if the primary is


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Peter Eisentraut



On 02.03.22 15:16, Jonathan S. Katz wrote:
I find that a lot of people are still purposely using md5.  Removing 
it now or in a year would be quite a disruption.


What are the reasons they are still purposely using it? The ones I have 
seen/heard are:


- Using an older driver
- On a pre-v10 PG
- Unaware of SCRAM


I'm not really sure, but it seems like they are content with what they 
have and don't want to bother with the new fancy stuff.


What I'm proposing above is to start the process of deprecating it as an 
auth method, which also allows to continue the education efforts to 
upgrae. Does that make sense?


I'm not in favor of starting a process that will result in removal of 
the md5 method at this time.





Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Mar  2, 2022 at 10:09:31AM -0500, Stephen Frost wrote:
> > I'm not sure that it's quite so simple.  Perhaps we should also drop
> > LDAP and I don't really think PAM was ever terribly good for us to have,
> > but at least PAM and RADIUS could possibly be used with OTP solutions
> > (and maybe LDAP?  Not sure, don't think I've seen that but perhaps..),
> > rendering sniffing of what's transmitted less valuable.  We don't
> > support that for 'password' itself or for 'md5' in any serious way
> > though.
> 
> I thought all the plain-password methods were already using SSL
> (hopefully with certificate authentication) and they were therefore
> safe.  Why would we remove something like LDAP if that is what the site
> is already using?

We don't require SSL to be used with them..?  Further, as already
discussed on this thread, SSL only helps with on-the-wire, doesn't
address the risk of a compromised server.  LDAP, in particular, is
terrible in this regard because it's a centralized password system,
meaning that one compromised server will lead to an attacker gaining
full access to the victim's account throughout the enterprise.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> On 02.03.22 15:16, Jonathan S. Katz wrote:
> >>I find that a lot of people are still purposely using md5.  Removing it
> >>now or in a year would be quite a disruption.
> >
> >What are the reasons they are still purposely using it? The ones I have
> >seen/heard are:
> >
> >- Using an older driver
> >- On a pre-v10 PG
> >- Unaware of SCRAM
> 
> I'm not really sure, but it seems like they are content with what they have
> and don't want to bother with the new fancy stuff.

There were lots and lots of folks who were comfortable with
recovery.conf, yet we removed that without any qualms from one major
version to the next.  md5 will have had 5 years of overlap with scram.

> >What I'm proposing above is to start the process of deprecating it as an
> >auth method, which also allows to continue the education efforts to
> >upgrae. Does that make sense?
> 
> I'm not in favor of starting a process that will result in removal of the
> md5 method at this time.

I am.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PoC/RFC] Multiple passwords, interval expirations

2022-03-02 Thread Joshua Brindle
On Wed, Mar 2, 2022 at 9:58 AM Joshua Brindle
 wrote:
>
> This is not intended for PG15.
>
> Attached are a proof of concept patchset to implement multiple valid
> passwords, which have independent expirations, set by a GUC or SQL
> using an interval.
>



> postgres=# select * from pg_auth_password ;
>  roleid |  name   |
>password
> |  expiration
> +-+---
> +---
>  10 | __def__ |
> SCRAM-SHA-256$4096:yGiHIYPwc2az7xj/7TIyTA==$OQL/AEcEY1yOCNbrZEj4zDvNnOLpIqltOW1uQvosLvc=:9VRRppuIkSrwhiBN5ePy8wB1y
> zDa/2uX0WUx6gXi93E= |
>   16384 | __def__ |
> SCRAM-SHA-256$4096:AA==$1Ivp4d+vAWxowpuGEn05KR9lxyGOms3yy85k3D7XpBg=:k8xUjU6xrJG17PMGa/Zya6pAE
> /M7pEDaoIFmWvNIEUg= | 2022-03-02 06:52:31.217193-08
>   16384 | newone  |
> SCRAM-SHA-256$4096:AA==$WK3+41CCGDognSnZrtpHhv00z9LuVUjHR1hWq8T1+iE=:w2C5GuhgiEB7wXqPxYfxBKB+e
> hm4h6Oeif1uzpPIFVk= | 2022-03-03 06:47:53.728159-08
> (3 rows)

There's obviously a salt problem here that I'll need to fix that
apparently snuck in at the last rebase, but this brings up one aspect
of the patchset I didn't mention in the original email:

For the SCRAM protocol to work as is with existing clients the salt
for each password must be the same. Right now ALTER USER will find and
reuse the salt, but a user passing in a pre-computed SCRAM secret
currently has no way to get the salt.

for \password (we'll need a new one that takes a password name) I was
thinking libpq could hold onto the salt that was used to log in, but
for outside computation we'll need some way for the client to request
it.

None of that is done yet.




Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Joshua Brindle
On Wed, Mar 2, 2022 at 10:29 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Wed, Mar  2, 2022 at 10:09:31AM -0500, Stephen Frost wrote:
> > > I'm not sure that it's quite so simple.  Perhaps we should also drop
> > > LDAP and I don't really think PAM was ever terribly good for us to have,
> > > but at least PAM and RADIUS could possibly be used with OTP solutions
> > > (and maybe LDAP?  Not sure, don't think I've seen that but perhaps..),
> > > rendering sniffing of what's transmitted less valuable.  We don't
> > > support that for 'password' itself or for 'md5' in any serious way
> > > though.
> >
> > I thought all the plain-password methods were already using SSL
> > (hopefully with certificate authentication) and they were therefore
> > safe.  Why would we remove something like LDAP if that is what the site
> > is already using?
>
> We don't require SSL to be used with them..?  Further, as already
> discussed on this thread, SSL only helps with on-the-wire, doesn't
> address the risk of a compromised server.  LDAP, in particular, is
> terrible in this regard because it's a centralized password system,
> meaning that one compromised server will lead to an attacker gaining
> full access to the victim's account throughout the enterprise.

I want to add support for the deprecation move, and I think/hope that
my multi-password patchset can help make the transition less painful.

I also want to throw out another existing issue with MD5, namely that
the salt as the role is fundamentally problematic, even outside of
trivial pass-the-hash attacks one could build a rainbow table today
that uses 'postgres' as the salt, and get admin credentials that can
be used for password stuffing attacks against other enterprise assets.
This means PG is actively enabling lateral movement through enterprise
systems if MD5 passwords are used.




Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Jonathan S. Katz

On 3/2/22 10:30 AM, Stephen Frost wrote:

Greetings,

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:

On 02.03.22 15:16, Jonathan S. Katz wrote:

I find that a lot of people are still purposely using md5.  Removing it
now or in a year would be quite a disruption.


What are the reasons they are still purposely using it? The ones I have
seen/heard are:

- Using an older driver
- On a pre-v10 PG
- Unaware of SCRAM


I'm not really sure, but it seems like they are content with what they have
and don't want to bother with the new fancy stuff.


By that argument, we should have kept "password" (plain) as an 
authentication method.


The specific use-cases I've presented are all solvable issues. The 
biggest challenging with existing users is the upgrade process, which is 
why I'd rather we begin a deprecation process and see if there are any 
ways we can make the md5 => SCRAM transition easier.



There were lots and lots of folks who were comfortable with
recovery.conf, yet we removed that without any qualms from one major
version to the next.  md5 will have had 5 years of overlap with scram.


I do agree with Stephen in principle here. I encountered upgrade 
challenges in this an challenge with updating automation to handle this 
change.



What I'm proposing above is to start the process of deprecating it as an
auth method, which also allows to continue the education efforts to
upgrae. Does that make sense?


I'm not in favor of starting a process that will result in removal of the
md5 method at this time.


I am.


+1 for starting this process. It may still take a few more years, but we 
should help our users to move away from an auth method with known issues.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Bruce Momjian
On Wed, Mar  2, 2022 at 10:29:45AM -0500, Stephen Frost wrote:
> We don't require SSL to be used with them..?  Further, as already
> discussed on this thread, SSL only helps with on-the-wire, doesn't
> address the risk of a compromised server.  LDAP, in particular, is
> terrible in this regard because it's a centralized password system,
> meaning that one compromised server will lead to an attacker gaining
> full access to the victim's account throughout the enterprise.

Yes, but the site chose LDAP, and I don't think it is our place to tell
them what to use.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-02 Thread Robert Haas
On Tue, Mar 1, 2022 at 9:05 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I agree. My question is: why shouldn't every case where we can deduce
> > an implied inequality be reasonably likely to show a benefit?
>
> Maybe it will be, if we can deal with the issue you already mentioned
> about not misestimating the resulting partially-redundant conditions.

OK.

> > Second, it looks to me like the patch takes the rather naive strategy
> > of enforcing the derived clauses everywhere that they can legally be
> > put, which seems certain not to be optimal.
>
> I'm not sure about that ... it's basically what we do with derived
> equalities.  However, there's enough structure in the equivalence-class
> case that we don't end up enforcing redundant quals.  It's not clear
> to me whether the same can be said here.

I mean, to go back to the example of a.x < 42 and a.x = b.x, there are
three possible choices as to where to enforce the qual (a, b, both).
That's a meaningful choice, independent of any estimation issue. I
think it is reasonably common to have cases where a.x < 42 is very
selective and b.x < 42 hardly filters out anything at all, or the
other way around. Certainly, that kind of situation came up a lot in
PostgreSQL-based applications that I wrote myself back in the day. If
we're just talking about btree operators, *maybe* we can say it's
cheap enough that we don't care, but color me a tad skeptical.

> > I don't know whether attaching something to the equivalence class data
> > structure is the right idea or not. Presumably, we don't want to make
> > an extra pass over the query tree to gather the information needed for
> > this kind of optimization, and it feels like we need to know which
> > vars are EMs before we try to derive alternate/additional quals.
>
> Yeah, we don't want to make an additional pass over the tree, and
> we also would rather not add an additional set of per-operator
> catalog lookups.  We might be able to generalize the code that looks
> for equality operators so that it looks for "any btree operator"
> with the same number of lookups, and then have it feed the results
> down either the EquivalenceClass path or the inequality path
> as appropriate.  At the end, after we've formed all the ECs, we
> could have a go at matching up the inequality structures with the
> ECs.

Interesting idea.

> But I don't agree that ECs are a necessary prerequisite.
> Here are a couple of other patterns that might be worth looking for:
>
> * "a > b AND b > c" allows deducing "a > c", whether or not any
> of those values appears in an EC.
>
> * "a > const1 AND a > const2" can be simplified to either "a > const1"
> or "a > const2" depending on which constant is larger.  (The predicate
> proof mechanism already has a form of this, but we don't typically
> apply it in a way that would result in dropping the redundant qual.)
>
> It's entirely possible that one or both of these patterns is not
> worth looking for.  But I would say that it's equally unproven
> that deriving "a > c" from "a = b AND b > c" is worth the cycles.
> I'll grant that it's most likely going to be a win if we can use
> any of these patterns to generate a restriction clause from what
> had been join clauses.  Beyond that it's much less clear.

Pretty much all of the cases that I've run across involve an equijoin
plus an inequality, so if somebody asked me which problem we ought to
put most effort into solving, I'd say that one. Cases like "a>1 and
a>2" or a same-table case like "a=b and b>3" haven't been as common in
my experience, and haven't caused as much trouble when they do happen.
Part of that is because if you have something like "a>1 and a>2" in
your query, it may be easy for you to just tweak the query generation
to avoid it, and if "a=b and b>3" is coming up a lot, you may choose
to adjust your data model (e.g. choose to store NULL in b to indicate
same-as-a), whereas if you have something like
"orders.orderno=order_lines.orderno and order_lines.orderno<1,"
what are you going to do to avoid that exactly? If you normalize your
order data and then want to find the old orders, this problem arises
ineluctably.

But having said that, I'm not *against* doing something about those
cases if it's cheap or falls out naturally. If we could detect for
free that the user had written a>1 and a>2, it'd certainly be
beneficial to drop the former qual and keep only the latter. If the
user writes a>b and b>c and all those columns are in one table I don't
see how it helps to derive a>c, because we're still going to need to
check the other two quals anyway so we've just created more work. But
if those columns are not all in the same table then I'd say chances
are really pretty good. Like, suppose it's x.a>y.b and y.b>x.c. Well,
like I say, I don't really see people writing queries like that
myself, but if they do, it seems pretty obvious that deriving x.a>x.c
has the potential to save a LOT of trouble. If it's x.a>y.b and
y.b>z.c I don't f

Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Mar  2, 2022 at 10:29:45AM -0500, Stephen Frost wrote:
> > We don't require SSL to be used with them..?  Further, as already
> > discussed on this thread, SSL only helps with on-the-wire, doesn't
> > address the risk of a compromised server.  LDAP, in particular, is
> > terrible in this regard because it's a centralized password system,
> > meaning that one compromised server will lead to an attacker gaining
> > full access to the victim's account throughout the enterprise.
> 
> Yes, but the site chose LDAP, and I don't think it is our place to tell
> them what to use.

It's our decision what we want to support and maintain in the code base
and what we don't.  Folks often ask for things that we don't or won't
support and this isn't any different from that.  We also remove things
on a rather regular basis even when they're being used- generally
because we have something better, as we do here.  I disagree that an
argument of 'some people use it so we can't remove it' holds any weight
here.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Bruce Momjian
On Wed, Mar  2, 2022 at 10:54:27AM -0500, Stephen Frost wrote:
> It's our decision what we want to support and maintain in the code base
> and what we don't.  Folks often ask for things that we don't or won't
> support and this isn't any different from that.  We also remove things
> on a rather regular basis even when they're being used- generally
> because we have something better, as we do here.  I disagree that an
> argument of 'some people use it so we can't remove it' holds any weight
> here.

I disagree.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-02 Thread Tom Lane
Robert Haas  writes:
> So the questions in my mind here are all
> about whether we can detect this stuff cheaply and whether anybody
> wants to do the work to make it happen, not whether we'd get a benefit
> in the cases where it kicks in.

Right, my worries are mostly about the first point.

regards, tom lane




Re: Handle infinite recursion in logical replication setup

2022-03-02 Thread vignesh C
On Tue, Mar 1, 2022 at 4:12 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Vignesh,
>
> > In logical replication, currently Walsender sends the data that is
> > generated locally and the data that are replicated from other
> > instances. This results in infinite recursion in circular logical
> > replication setup.
>
> Thank you for good explanation. I understand that this fix can be used
> for a bidirectional replication.
>
> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
>
> So you wanted to solve these two problem and currently focused on
> the first one, right? We can check one by one.
>
> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> Sounds good, but I cannot distinguish whether the assumption will keep.
>
> I played with your patch, but it could not be applied to current master.
> I tested from bd74c40 and I confirmed infinite loop was not appeared.
Rebased the patch on top of head

> local_only could not be set from ALTER SUBSCRIPTION command.
> Is it expected?
Modified

Thanks for the comments, the attached patch has the changes for the same.

Regards,
Vignesh
From 7c67cc23584e1106fbf2011c8c6658442125e48f Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Wed, 2 Mar 2022 20:40:34 +0530
Subject: [PATCH v2] Skip replication of non local data.

Add an option only_local which will subscribe only to the locally
generated data in the publisher node. If subscriber is created with this
option, publisher will skip publishing the data that was subscribed
from other nodes. It can be created using following syntax:
ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=' PUBLICATION pub1 with (only_local = on);
---
 contrib/test_decoding/test_decoding.c |  13 +++
 doc/src/sgml/ref/alter_subscription.sgml  |   3 +-
 doc/src/sgml/ref/create_subscription.sgml |  12 ++
 src/backend/catalog/pg_subscription.c |   1 +
 src/backend/catalog/system_views.sql  |   3 +-
 src/backend/commands/subscriptioncmds.c   |  29 -
 .../libpqwalreceiver/libpqwalreceiver.c   |  18 ++-
 src/backend/replication/logical/decode.c  |  36 --
 src/backend/replication/logical/logical.c |  35 ++
 src/backend/replication/logical/tablesync.c   |   2 +-
 src/backend/replication/logical/worker.c  |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  25 
 src/backend/replication/slot.c|   4 +-
 src/backend/replication/slotfuncs.c   |  18 ++-
 src/backend/replication/walreceiver.c |   2 +-
 src/backend/replication/walsender.c   |  21 +++-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/catalog/pg_subscription.h |   3 +
 src/include/replication/logical.h |   4 +
 src/include/replication/output_plugin.h   |   7 ++
 src/include/replication/pgoutput.h|   1 +
 src/include/replication/slot.h|   5 +-
 src/include/replication/walreceiver.h |   8 +-
 src/test/regress/expected/rules.out   |   5 +-
 src/test/regress/expected/subscription.out|   4 +
 src/test/regress/sql/subscription.sql |   4 +
 src/test/subscription/t/029_circular.pl   | 108 ++
 src/tools/pgindent/typedefs.list  |   1 +
 29 files changed, 345 insertions(+), 39 deletions(-)
 create mode 100644 src/test/subscription/t/029_circular.pl

diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index ea22649e41..58bc5dbc1c 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -73,6 +73,8 @@ static void pg_decode_truncate(LogicalDecodingContext *ctx,
 			   ReorderBufferChange *change);
 static bool pg_decode_filter(LogicalDecodingContext *ctx,
 			 RepOriginId origin_id);
+static bool pg_decode_filter_remotedata(LogicalDecodingContext *ctx,
+		RepOriginId origin_id);
 static void pg_decode_message(LogicalDecodingContext *ctx,
 			  ReorderBufferTXN *txn, XLogRecPtr message_lsn,
 			  bool transactional, const char *prefix,
@@ -148,6 +150,7 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
 	cb->truncate_cb = pg_de

Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Mar  2, 2022 at 10:54:27AM -0500, Stephen Frost wrote:
> > It's our decision what we want to support and maintain in the code base
> > and what we don't.  Folks often ask for things that we don't or won't
> > support and this isn't any different from that.  We also remove things
> > on a rather regular basis even when they're being used- generally
> > because we have something better, as we do here.  I disagree that an
> > argument of 'some people use it so we can't remove it' holds any weight
> > here.
> 
> I disagree.

With... which?  We removed recovery.conf without any warning between
major releases, yet it was used by every single PG file-based backup and
restore solution out there and by every single organization that had
ever done a restore of PG since it was introduced in 8.0.  Passing
around cleartext passwords with the LDAP authentication method is
clearly bad from a security perspective and it's a bunch of code to
support that, along with it being quite complicated to configure and get
set up (arguably harder than Kerberos, if you want my 2c).  If you want
to say that it's valuable for us to continue to maintain that code
because it's good and useful and might even be the only option for some
people, fine, though I disagree, but I don't think my argument that we
shouldn't keep it just because *someone* is using it is any different
from our general project policy about features.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Add 64-bit XIDs into PostgreSQL 15

2022-03-02 Thread Justin Pryzby
On Wed, Mar 02, 2022 at 06:43:11PM +0400, Pavel Borisov wrote:
> Hi hackers!
> 
> Hi! Here is the rebased version.

The patch doesn't apply - I suppose the patch is relative a forked postgres
which already has other patches.

http://cfbot.cputube.org/pavel-borisov.html

Note also that I mentioned an issue with pg_upgrade.  Handling that that well
is probably the most important part of the patch.

-- 
Justin




Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Julien Rouhaud
On Wed, Mar 02, 2022 at 05:40:00PM +0300, Aleksander Alekseev wrote:
> Hi Tom.
>
> Yeah, there's plenty of precedent for that coding if you look around.
> > I've not read the whole patch, but this snippet seems fine to me
> > if there's also an #undef at the end of the function.
>
> No, there is no #undef. With #undef I don't mind it either.

I don't see strong evidence for that pattern being wildly used with some naive
grepping:

#define for such use without undef:
POSTGRES_FDW_GET_CONNECTIONS_COLS
HEAP_TUPLE_INFOMASK_COLS
CONNECTBY_NCOLS
DBLINK_NOTIFY_COLS
PG_STAT_STATEMENTS_COLS
PG_STAT_STATEMENTS_INFO_COLS
HEAPCHECK_RELATION_COLS
PG_PARTITION_TREE_COLS
PG_STAT_GET_ACTIVITY_COLS
PG_STAT_GET_WAL_COLS
PG_STAT_GET_SLRU_COLS
PG_STAT_GET_REPLICATION_SLOT_COLS
PG_STAT_GET_SUBSCRIPTION_STATS_COLS
PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
PG_GET_SHMEM_SIZES_COLS
PG_GET_REPLICATION_SLOTS_COLS
READ_REPLICATION_SLOT_COLS
PG_STAT_GET_WAL_SENDERS_COLS
PG_STAT_GET_SUBSCRIPTION_COLS

With an undef:
REPLICATION_ORIGIN_PROGRESS_COLS




Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-02 Thread Greg Stark
On Wed, 2 Mar 2022 at 07:12, Daniel Gustafsson  wrote:

> Thanks for picking it up and continuing with recent developments.  Let me know
> if you want a hand in triaging patchsets.

While I have the time there may be patches I may need help coming to
the right conclusions about what actions to take.

I think the main thing I can do to help is to take patches that have
no chance of making this release and taking them off our collective
plates. -- Hopefully after they've received feedback but as this is
the last commitfest of the release that's a secondary concern.

But I'm unclear exactly what the consequences in the commitfest app
are of specific state changes. As I understand it there are basically
two alternatives:

1) Returned with feedback -- does this make it harder for an author to
resume work release? Can they simply reactivate the CF entry or do
they need to start a new one and then lose history in the app?

2) Moved to next commitfest -- this seems to just drag the pain on.
Then it has to get triaged again next commitfest and if it's actually
stalled (or progressing fine without needing feedback) that's just
extra work for nothing.

Do I have this right? What is the right state to put a patch in that
means "this patch doesn't need to be triaged again unless the author
actually feels progress has been made and needs new feedback or thinks
its committable"?

-- 
greg




Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread Chapman Flack
On 03/02/22 02:46, Michael Paquier wrote:
> system function marked as proretset while it builds and returns only
> one record.  And this is a popular one: pg_stop_backup(), labelled
> v2.

I had just recently noticed that while reviewing [0], but shrugged,
as I didn't know what the history was.

Is this best handled as a separate patch, or folded into [0], which is
going to be altering and renaming that function anyway?


On 03/02/22 09:31, Robert Haas wrote:
> On Wed, Mar 2, 2022 at 5:25 AM Aleksander Alekseev
>> Since it doesn't seem to be used for anything except these two array
>> declarations I suggest keeping simply "3" here.
>
> I think we do this kind of thing in various places in similar
> situations, and I think it is good style. It makes it easier to catch
> everything if you ever need to update the code.


I've been known (in other projects) to sometimes accomplish the same
thing with, e.g.,

Datum  values[3];
boolnulls[sizeof values / sizeof *values];


Doesn't win any beauty contests, but just one place to change the length
if it needs changing. I see we define a lengthof in c.h, so could use:

Datum  values[3];
boolnulls[lengthof(values)];

Regards,
-Chap


[0] https://commitfest.postgresql.org/37/3436/




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-02 Thread Bharath Rupireddy
On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma  wrote:
>
> Some review comments on v5 patch (v5-0001-pg_walinspect.patch)

Thanks for reviewing.

> +--
> +-- pg_get_wal_records_info()
> +--
> +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> +IN end_lsn pg_lsn,
> +IN wait_for_wal boolean DEFAULT false,
> +OUT lsn pg_lsn,
>
> What does the wait_for_wal flag mean here when one has already
> specified the start and end lsn? AFAIU, If a user has specified a
> start and stop LSN, it means that the user knows the extent to which
> he/she wants to display the WAL records in which case we need to stop
> once the end lsn has reached . So what is the meaning of wait_for_wal
> flag? Does it look sensible to have the wait_for_wal flag here? To me
> it doesn't.

Users can always specify a future end_lsn and set wait_for_wal to
true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
wait for the WAL. IMO, this is useful. If you remember you were okay
with wait/nowait versions for some of the functions upthread [1]. I'm
not going to retain this behaviour for both
pg_get_wal_records_info/pg_get_wal_stats as it is similar to
pg_waldump's --follow option.

> ==
>
> +--
> +-- pg_get_wal_records_info_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> +OUT lsn pg_lsn,
> +OUT prev_lsn pg_lsn,
> +OUT xid xid,
>
> Why is this function required? Is pg_get_wal_records_info() alone not
> enough? I think it is. See if we can make end_lsn optional in
> pg_get_wal_records_info() and lets just have it alone. I think it can
> do the job of pg_get_wal_records_info_till_end_of_wal function.
>
> ==
>
> +--
> +-- pg_get_wal_stats_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> +OUT resource_manager text,
> +OUT count int8,
>
> Above comment applies to this function as well. Isn't pg_get_wal_stats() 
> enough?

I'm doing the following input validations for these functions to not
cause any issues with invalid LSN. If I were to have the default value
for end_lsn as 0/0, I can't perform input validations right? That is
the reason I'm having separate functions {pg_get_wal_records_info,
pg_get_wal_stats}_till_end_of_wal() versions.

/* Validate input. */
if (XLogRecPtrIsInvalid(start_lsn))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("invalid WAL record start LSN")));

if (XLogRecPtrIsInvalid(end_lsn))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("invalid WAL record end LSN")));

if (start_lsn >= end_lsn)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("WAL record start LSN must be less than end LSN")));

> ==
>
>
> +   if (loc <= read_upto)
> +   break;
> +
> +   /* Let's not wait for WAL to be available if
> indicated */
> +   if (loc > read_upto &&
> +   state->private_data != NULL)
> +   {
>
> Why loc > read_upto? The first if condition is (loc <= read_upto)
> followed by the second if condition - (loc > read_upto). Is the first
> if condition (loc <= read_upto) not enough to indicate that loc >
> read_upto?

Yeah, that's unnecessary, I improved the comment there and removed loc
> read_upto.

> ==
>
> +#define IsEndOfWALReached(state) \
> +   (state->private_data != NULL && \
> +   (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->no_wait == true) && \
> +   (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
>
> I think we should either use state or xlogreader. First line says
> state->private_data and second line xlogreader->private_data.

I've changed it to use state instead of xlogreader.

> ==
>
> +   (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
> +
>
> There is a new patch coming to make the end of WAL messages less
> scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> that instead of introducing new flags in private area to represent the
> end of WAL.

Yeah that would be great. But we never know which one gets committed
first. Until then it's not good to have dependencies on two "on-going"
patches. Later, we can change.

> ==
>
> +/*
> + * XLogReaderRoutine->page_read callback for reading local xlog files
> + *
> + * This function is same as read_local_xlog_page except that it works in both
> + * wait and no wait mode. The callers can specify about waiting in 
> private_data
> + * of XLogReaderState.
> + */
> +int
> +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> +  int reqLen, XLogRecPtr
> targetRecPtr, char *cur_page)
> 

Re: Allow root ownership of client certificate key

2022-03-02 Thread David Steele

On 3/2/22 08:40, Tom Lane wrote:

Chris Bandy  writes:

On 3/1/22 3:15 AM, Tom Lane wrote:

Anyway, I'd be happier about back-patching if we could document
actual requests to make it work like the server side does.



PGO runs PostgreSQL 10 through 14 in Kubernetes, and we have to work
around this issue when using certificates for system accounts.


Sold then, I'll make it so in a bit.


Thank you! I think the containers community is really going to 
appreciate this.


Regards,
-David




Re: pg_stop_backup() v2 incorrectly marked as proretset

2022-03-02 Thread David Steele

On 3/2/22 11:04, Chapman Flack wrote:

On 03/02/22 02:46, Michael Paquier wrote:

system function marked as proretset while it builds and returns only
one record.  And this is a popular one: pg_stop_backup(), labelled
v2.


I had just recently noticed that while reviewing [0], but shrugged,
as I didn't know what the history was.

Is this best handled as a separate patch, or folded into [0], which is
going to be altering and renaming that function anyway?


On 03/02/22 09:31, Robert Haas wrote:

On Wed, Mar 2, 2022 at 5:25 AM Aleksander Alekseev

Since it doesn't seem to be used for anything except these two array
declarations I suggest keeping simply "3" here.


I think we do this kind of thing in various places in similar
situations, and I think it is good style. It makes it easier to catch
everything if you ever need to update the code.



I've been known (in other projects) to sometimes accomplish the same
thing with, e.g.,

Datum  values[3];
boolnulls[sizeof values / sizeof *values];


I also use this pattern, though I would generally write it as:

bool nulls[sizeof(values) / sizeof(Datum)];

Chap's way makes it possible to use a macro, though, so that's a plus.

Regards,
-David




Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-02 Thread Tom Lane
Greg Stark  writes:
> Do I have this right? What is the right state to put a patch in that
> means "this patch doesn't need to be triaged again unless the author
> actually feels progress has been made and needs new feedback or thinks
> its committable"?

But that's not really the goal, is it?  ISTM what you want to do is
identify patches that we're not going to try to get into v15, and
then push them out to the next CF so that we don't spend more time
on them this month.  But that determination should not preclude them
from being looked at on the normal basis once the next CF arrives.
So I'd say just push them forward with status "Needs review" or
"Waiting on author", whichever seems more appropriate.

If a patch seems to have stalled to the point where neither of
those statuses is appropriate, then closing it RWF would be the
thing to do; but that's not special to the last-CF situation.

regards, tom lane




Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-02 Thread Julien Rouhaud
On Wed, Mar 02, 2022 at 11:58:28AM -0500, Greg Stark wrote:
>
> But I'm unclear exactly what the consequences in the commitfest app
> are of specific state changes. As I understand it there are basically
> two alternatives:
>
> 1) Returned with feedback -- does this make it harder for an author to
> resume work release? Can they simply reactivate the CF entry or do
> they need to start a new one and then lose history in the app?

As far as I know they would need to create a new entry, and thus lose the
history.

> 2) Moved to next commitfest -- this seems to just drag the pain on.
> Then it has to get triaged again next commitfest and if it's actually
> stalled (or progressing fine without needing feedback) that's just
> extra work for nothing.
>
> Do I have this right? What is the right state to put a patch in that
> means "this patch doesn't need to be triaged again unless the author
> actually feels progress has been made and needs new feedback or thinks
> its committable"?

I don't think that 2) means having to triage again.  If a patch gets moved to
the next commitfest now, then clearly it's not ready and should be also
switched to Waiting on Author.

In the next commitfest, if the author doesn't address the problems raised
during review the patch will still be in Waiting for Author, and the only
needed triaging would be to close as Return With Feedback patches that looks
abandoned.  For now the arbitrary "abandoned" definition is usually "patch in
Waiting on Author for at least 2 weeks at the end of the commitfest with no
sign of activity from the author".




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-02 Thread Robert Haas
On Wed, Mar 2, 2022 at 11:09 AM Tom Lane  wrote:
> Robert Haas  writes:
> > So the questions in my mind here are all
> > about whether we can detect this stuff cheaply and whether anybody
> > wants to do the work to make it happen, not whether we'd get a benefit
> > in the cases where it kicks in.
>
> Right, my worries are mostly about the first point.

OK, cool.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: using extended statistics to improve join estimates

2022-03-02 Thread Justin Pryzby
On Wed, Jan 19, 2022 at 06:18:09PM +0800, Julien Rouhaud wrote:
> On Tue, Jan 04, 2022 at 03:55:50PM -0800, Andres Freund wrote:
> > On 2022-01-01 18:21:06 +0100, Tomas Vondra wrote:
> > > Here's an updated patch, rebased and fixing a couple typos reported by
> > > Justin Pryzby directly.
> > 
> > FWIW, cfbot reports a few compiler warnings:
> 
> Also the patch doesn't apply anymore:
> 
> http://cfbot.cputube.org/patch_36_3055.log
> === Applying patches on top of PostgreSQL commit ID 
> 74527c3e022d3ace648340b79a6ddec3419f6732 ===
> === applying patch 
> ./0001-Estimate-joins-using-extended-statistics-20220101.patch
> patching file src/backend/optimizer/path/clausesel.c
> patching file src/backend/statistics/extended_stats.c
> Hunk #1 FAILED at 30.
> Hunk #2 succeeded at 102 (offset 1 line).
> Hunk #3 succeeded at 2619 (offset 9 lines).
> 1 out of 3 hunks FAILED -- saving rejects to file 
> src/backend/statistics/extended_stats.c.rej

Rebased over 269b532ae and muted compiler warnings.

Tomas - is this patch viable for pg15 , or should move to the next CF ?

In case it's useful, I ran this on cirrus with my branch for code coverage.
https://cirrus-ci.com/task/5816731397521408
https://api.cirrus-ci.com/v1/artifact/task/5816731397521408/coverage/coverage/00-index.html

statext_find_matching_mcv() has poor coverage.
statext_clauselist_join_selectivity() has poor coverage for the "stats2" case.

In mcv.c: mcv_combine_extended() and mcv_combine_simple() have poor coverage
for the "else if" cases (does it matter?)

Not related to this patch:
build_attnums_array() isn't being hit.

Same at statext_is_compatible_clause_internal()
1538   0 : *exprs = lappend(*exprs, clause);

statext_mcv_[de]serialize() aren't being hit for cstrings.

-- 
Justin




Re: using extended statistics to improve join estimates

2022-03-02 Thread Justin Pryzby
On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote:
> Rebased over 269b532ae and muted compiler warnings.

And attached.
>From 587a5e9fe87c26cdcd9602fc349f092da95cc580 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 13 Dec 2021 14:05:17 +0100
Subject: [PATCH] Estimate joins using extended statistics

Use extended statistics (MCV) to improve join estimates. In general this
is similar to how we use regular statistics - we search for extended
statistics (with MCV) covering all join clauses, and if we find such MCV
on both sides of the join, we combine those two MCVs.

Extended statistics allow a couple additional improvements - e.g. if
there are baserel conditions, we can use them to restrict the part of
the MCVs combined. This means we're building conditional probability
distribution and calculating conditional probability

P(join clauses | baserel conditions)

instead of just P(join clauses).

The patch also allows combining regular and extended MCV - we don't need
extended MCVs on both sides. This helps when one of the tables does not
have extended statistics (e.g. because there are no correlations).
---
 src/backend/optimizer/path/clausesel.c|  63 +-
 src/backend/statistics/extended_stats.c   | 805 ++
 src/backend/statistics/mcv.c  | 757 
 .../statistics/extended_stats_internal.h  |  20 +
 src/include/statistics/statistics.h   |  12 +
 src/test/regress/expected/stats_ext.out   | 167 
 src/test/regress/sql/stats_ext.sql|  66 ++
 7 files changed, 1889 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 06f836308d0..1b2227321a2 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -50,6 +50,9 @@ static Selectivity clauselist_selectivity_or(PlannerInfo *root,
 			 JoinType jointype,
 			 SpecialJoinInfo *sjinfo,
 			 bool use_extended_stats);
+static inline bool treat_as_join_clause(PlannerInfo *root,
+		Node *clause, RestrictInfo *rinfo,
+		int varRelid, SpecialJoinInfo *sjinfo);
 
 /
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -129,12 +132,53 @@ clauselist_selectivity_ext(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 	int			listidx;
+	bool		single_clause_optimization = true;
+
+	/*
+	 * The optimization of skipping to clause_selectivity_ext for single
+	 * clauses means we can't improve join estimates with a single join
+	 * clause but additional baserel restrictions. So we disable it when
+	 * estimating joins.
+	 *
+	 * XXX Not sure if this is the right way to do it, but more elaborate
+	 * checks would mostly negate the whole point of the optimization.
+	 * The (Var op Var) patch has the same issue.
+	 *
+	 * XXX An alternative might be making clause_selectivity_ext smarter
+	 * and make it use the join extended stats there. But that seems kinda
+	 * against the whole point of the optimization (skipping expensive
+	 * stuff) and it's making other parts more complex.
+	 *
+	 * XXX Maybe this should check if there are at least some restrictions
+	 * on some base relations, which seems important. But then again, that
+	 * seems to go against the idea of this check to be cheap. Moreover, it
+	 * won't work for OR clauses, which may have multiple parts but we still
+	 * see them as a single BoolExpr clause (it doesn't work later, though).
+	 */
+	if (list_length(clauses) == 1)
+	{
+		Node *clause = linitial(clauses);
+		RestrictInfo *rinfo = NULL;
+
+		if (IsA(clause, RestrictInfo))
+		{
+			rinfo = (RestrictInfo *) clause;
+			clause = (Node *) rinfo->clause;
+		}
+
+		single_clause_optimization
+			= !treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo);
+	}
 
 	/*
 	 * If there's exactly one clause, just go directly to
 	 * clause_selectivity_ext(). None of what we might do below is relevant.
+	 *
+	 * XXX This means we won't try using extended stats on OR-clauses (which
+	 * are a single BoolExpr clause at this point), although we'll do that
+	 * later (once we look at the arguments).
 	 */
-	if (list_length(clauses) == 1)
+	if ((list_length(clauses) == 1) && single_clause_optimization)
 		return clause_selectivity_ext(root, (Node *) linitial(clauses),
 	  varRelid, jointype, sjinfo,
 	  use_extended_stats);
@@ -157,6 +201,23 @@ clauselist_selectivity_ext(PlannerInfo *root,
 			&estimatedclauses, false);
 	}
 
+	/*
+	 * Try applying extended statistics to joins. There's not much we can
+	 * do to detect when this makes sense, but we can check that there are
+	 * join clauses, and that at least some of the rels have stats.
+	 *
+	 * XXX Isn't this mutually exclusive with the preceding block which
+	 * calculates estimates for a single relation?
+	 */
+	if (use_extended_stats &&
+		statext_try_join_estimat

Re: Add id's to various elements in protocol.sgml

2022-03-02 Thread Brar Piening

On 02.03.2022 at 10:37, Peter Eisentraut wrote:

I have applied the part of your patch that adds the id's.  The
discussion about the formatting aspect can continue.


Thank you!

I've generated some data by outputting the element name whenever a
section or varlistentry lacks an id. That's how the situation in the
docs currently looks like:

   element    | count
--+---
 sect2    |   275
 sect3    |    94
 sect4    |    20
 simplesect   |    20
 varlistentry |  3976

Looking at this, I think that manually assigning an id to all ~400
sections currently lacking one to make them referable in a consistent
way is a bit of work but feasible.

Once we consitently have stable ids on section headers IMHO it makes
sense to also expose them as links. I'd probably also make the
stylesheet emit a non-terminating message/comment whenever it finds a
section without id in order to help keeping the layout consistent over time.

With regard to varlistentry I'd suggest to decide whether to add ids or
not on a case by case base. I already offered to add ids to long lists
upon request but I wouldn't want to blindly add ~4k ids that nobody
cares about. That part can also grow over time by people adding ids as
they deem them useful.

Any objections/thoughts?





Re: Add id's to various elements in protocol.sgml

2022-03-02 Thread Chapman Flack
On 03/02/22 12:46, Brar Piening wrote:
> With regard to varlistentry I'd suggest to decide whether to add ids or
> not on a case by case base. I already offered to add ids to long lists
> upon request but I wouldn't want to blindly add ~4k ids that nobody

Perhaps there are a bunch of variablelists where no one cares about
linking to any of the entries.

So maybe a useful non-terminating message to add eventually would
be one that identifies any varlistentry lacking an id, with a
variablelist where at least one other entry has an id.

Regards,
-Chap




Re: [Proposal] Global temporary tables

2022-03-02 Thread Adam Brusselback
>In my observation, very few users require an accurate query plan for
temporary tables to
perform manual analyze.

Absolutely not true in my observations or personal experience. It's one of
the main reasons I have needed to use (local) temporary tables rather than
just materializing a CTE when decomposing queries that are too complex for
Postgres to handle.

I wish I could use GTT to avoid the catalog bloat in those instances, but
that will only be possible if the query plans are accurate.


Re: [Proposal] Global temporary tables

2022-03-02 Thread Pavel Stehule
st 2. 3. 2022 v 19:02 odesílatel Adam Brusselback 
napsal:

> >In my observation, very few users require an accurate query plan for
> temporary tables to
> perform manual analyze.
>
> Absolutely not true in my observations or personal experience. It's one of
> the main reasons I have needed to use (local) temporary tables rather than
> just materializing a CTE when decomposing queries that are too complex for
> Postgres to handle.
>
> I wish I could use GTT to avoid the catalog bloat in those instances, but
> that will only be possible if the query plans are accurate.
>

This strongly depends on usage.  Very common patterns from MSSQL don't need
statistics. But on second thought, sometimes, the query should be divided
and temp tables are used for storing some middle results. In this case, you
cannot exist without statistics. In the first case, the temp tables can be
replaced by arrays. In the second case, the temp tables are not replaceable.

Regards

Pavel


Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-03-02 Thread Bharath Rupireddy
On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav
 wrote:
> > Also, how about special phases for SyncPostCheckpoint(),
> > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
> > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
> > it might be increase in future (?)), TruncateSUBTRANS()?
>
> SyncPreCheckpoint() is just incrementing a counter and
> PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel
> there is no need to add any phases for these as of now. We can add in
> the future if necessary. Added phases for SyncPostCheckpoint(),
> InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS().

Okay.

> > 10) I'm not sure if it's discussed, how about adding the number of
> > snapshot/mapping files so far the checkpoint has processed in file
> > processing while loops of
> > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
> > be many logical snapshot or mapping files and users may be interested
> > in knowing the so-far-processed-file-count.
>
> I had thought about this while sharing the v1 patch and mentioned my
> views upthread. I feel it won't give meaningful progress information
> (It can be treated as statistics). Hence not included. Thoughts?

Okay. If there are any complaints about it we can always add them later.

> > > > As mentioned upthread, there can be multiple backends that request a
> > > > checkpoint, so unless we want to store an array of pid we should store 
> > > > a number
> > > > of backend that are waiting for a new checkpoint.
> > >
> > > Yeah, you are right. Let's not go that path and store an array of
> > > pids. I don't see a strong use-case with the pid of the process
> > > requesting checkpoint. If required, we can add it later once the
> > > pg_stat_progress_checkpoint view gets in.
> >
> > I don't think that's really necessary to give the pid list.
> >
> > If you requested a new checkpoint, it doesn't matter if it's only your 
> > backend
> > that triggered it, another backend or a few other dozen, the result will be 
> > the
> > same and you have the information that the request has been seen.  We could
> > store just a bool for that but having a number instead also gives a bit more
> > information and may allow you to detect some broken logic on your client 
> > code
> > if it keeps increasing.
>
> It's a good metric to show in the view but the information is not
> readily available. Additional code is required to calculate the number
> of requests. Is it worth doing that? I feel this can be added later if
> required.

Yes, we can always add it later if required.

> Please find the v4 patch attached and share your thoughts.

I reviewed v4 patch, here are my comments:

1) Can we convert below into pgstat_progress_update_multi_param, just
to avoid function calls?
pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo);
pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,

2) Why are we not having special phase for CheckPointReplicationOrigin
as it does good bunch of work (writing to disk, XLogFlush,
durable_rename) especially when max_replication_slots is large?

3) I don't think "requested" is necessary here as it doesn't add any
value or it's not a checkpoint kind or such, you can remove it.

4) s:/'recycling old XLOG files'/'recycling old WAL files'
+  WHEN 16 THEN 'recycling old XLOG files'

5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition
next to pg_stat_progress_copy in system_view.sql? It looks like all
the progress reporting views are next to each other.

6) How about shutdown and end-of-recovery checkpoint? Are you planning
to have an ereport_startup_progress mechanism as 0002?

7) I think you don't need to call checkpoint_progress_start and
pgstat_progress_update_param, any other progress reporting function
for shutdown and end-of-recovery checkpoint right?

8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint
can't show progress report for shutdown and end-of-recovery
checkpoint, I think you need to specify that here in wal.sgml and
checkpoint.sgml.
+   command CHECKPOINT. The checkpointer process running the
+   checkpoint will report its progress in the
+   pg_stat_progress_checkpoint view. See
+for details.

9) Can you add a test case for pg_stat_progress_checkpoint view? I
think it's good to add one. See, below for reference:
-- Add a trigger to catch and print the contents of the catalog view
-- pg_stat_progress_copy during data insertion.  This allows to test
-- the validation of some progress reports for COPY FROM where the trigger
-- would fire.
create function notice_after_tab_progress_reporting() returns trigger AS
$$
declare report record;

10) Typo: it's not "is happens"
+   The checkpoint is happens without delays.

11) Can you be specific what are those "some operations" that forced a
checkpoint? May be like, basebackup, createdb or something?
+   The checkpoint is started because some operation forced a 

Re: [PATCH] Add reloption for views to enable RLS

2022-03-02 Thread Laurenz Albe
On Wed, 2022-03-02 at 10:10 +, Dean Rasheed wrote:
> > I kept "check_permissions_owner" for now. Constantly changing it around
> > with each iteration doesn't really bring any value IMHO, I'd rather have
> > a final consensus on how to name the option and *then* change it for good.
> 
> Yes indeed, it's annoying to keep changing the name between patch
> versions, so let's try to get a consensus now.
> 
> For my part, I find myself more and more convinced that
> "security_invoker" is the right name [...]
> 
> What are other people's opinions?

I am fine with "security_invoker".  If there are other databases that use the
same term for the same thing, that is a strong argument.

I also agree that having "off" for the default setting is nicer.

My main worry is that other people misunderstand it in the same way that
Walter did, namely that this behaves just like security invoker functions.
But if the behavior is well documented, I think that is ok.

Yours,
Laurenz Albe





Re: Support for grabbing multiple consecutive values with nextval()

2022-03-02 Thread Jille Timmermans

On 2022-02-28 11:13, Peter Eisentraut wrote:

On 27.02.22 10:42, Jille Timmermans wrote:
I wanted to be able to allocate a bunch of numbers from a sequence at 
once. Multiple people seem to be struggling with this 
(https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).


I propose to add an extra argument to nextval() that specifies how 
many numbers you want to allocate (default 1).


What is the use of this?

I note that the stackoverflow question wanted to return multiple
sequence values as a result set, whereas your implementation just
skips a number of values and returns the last one.  At least we should
be clear about what we are trying to achieve.
Both would work for me actually. I'm using COPY FROM to insert many rows 
and need to know their ids and COPY FROM doesn't support RETURNING.


I implemented this approach because:
- smaller diff
- maybe someone benefits from them being consecutive
- less data to send between client/server

The obvious downside is that people can make mistakes in whether the 
returned number is the first or last number of the series.





Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-02 Thread Chapman Flack
On 03/01/22 20:03, Nathan Bossart wrote:
> Here is a new version of the patch with the following changes:

I did not notice this earlier (sorry), but there seems to remain in
backup.sgml a programlisting example that shows a psql invocation
for pg_backup_start, then a tar command, then another psql invocation
for pg_backup_stop.

I think that was only workable for the exclusive mode, and now it is
necessary to issue pg_backup_start and pg_backup_stop in the same session.

(The 'touch backup_in_progress' business seems a bit bogus now too,
suggesting an exclusivity remembered from bygone days.)

I am not sure what a workable, simple example ought to look like.
Maybe a single psql script issuing the pg_backup_start and the
pg_backup_stop, with a tar command in between with \! ?

Several bricks shy of production-ready, but it would give the idea.

Regards,
-Chap




Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-03-02 Thread Robert Haas
On Tue, Feb 22, 2022 at 4:40 AM Andres Freund  wrote:
> On 2022-02-22 01:11:21 -0800, Andres Freund wrote:
> > I've started to work on a few debugging aids to find problem like
> > these. Attached are two WIP patches:
>
> Forgot to attach. Also importantly includes a tap test for several of these
> issues

Hi,

Just a few very preliminary comments:

- I am having some trouble understanding clearly what 0001 is doing.
I'll try to study it further.

- 0002 seems like it's generally a good idea. I haven't yet dug into
why the call sites for AssertFileNotDeleted() are where they are, or
whether that's a complete set of places to check.

- In general, 0003 makes a lot of sense to me.

+   /*
+* Finally tell the kernel to write the data to
storage. Don't smgr if
+* previously closed, otherwise we could end up evading fd-reuse
+* protection.
+*/

- I think the above hunk is missing a word, because it uses smgr as a
verb. I also think that it's easy to imagine this explanation being
insufficient for some future hacker to understand the issue.

- While 0004 seems useful for testing, it's an awfully big hammer. I'm
not sure we should be thinking about committing something like that,
or at least not as a default part of the build. But ... maybe?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-03-02 Thread Andres Freund
Hi,

On 2022-03-02 14:52:01 -0500, Robert Haas wrote:
> - I am having some trouble understanding clearly what 0001 is doing.
> I'll try to study it further.

It tests for the various scenarios I could think of that could lead to FD
reuse, to state the obvious ;). Anything particularly unclear.


> - In general, 0003 makes a lot of sense to me.
> 
> +   /*
> +* Finally tell the kernel to write the data to
> storage. Don't smgr if
> +* previously closed, otherwise we could end up evading 
> fd-reuse
> +* protection.
> +*/
> 
> - I think the above hunk is missing a word, because it uses smgr as a
> verb. I also think that it's easy to imagine this explanation being
> insufficient for some future hacker to understand the issue.

Yea, it's definitely not sufficient or gramattically correct. I think I wanted
to send something out, but was very tired by that point..


> - While 0004 seems useful for testing, it's an awfully big hammer. I'm
> not sure we should be thinking about committing something like that,
> or at least not as a default part of the build. But ... maybe?

Aggreed. I think it's racy anyway, because further files could get deleted
(e.g. segments > 0) after the barrier has been processed.


What I am stuck on is what we can do for the released branches. Data
corruption after two consecutive ALTER DATABASE SET TABLESPACEs seems like
something we need to address.

Greetings,

Andres Freund




Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Andres Freund
Hi,

On 2022-03-02 09:32:26 +0100, Peter Eisentraut wrote:
> On 01.03.22 22:34, Andres Freund wrote:
> > The cases I've heard about are about centralizing auth across multiple cloud
> > services. Including secret management in some form. E.g. allowing an
> > application to auth to postgres, redis and having the secret provided by
> > infrastructure, rather than having to hardcode it in config or such.
> > 
> > I can't see application developers configuring kerberos and I don't think
> > LDAP, PAM, Radius are a great answer either, due to the plaintext 
> > requirement
> > alone? LDAP is pretty clearly dying technology, PAM is fragile complicated C
> > stuff that's not portable across OSs. Radius is probably the most realistic,
> > but at least as implemented doesn't seem flexible enough (e.g. no access to
> > group memberships etc).
> > 
> > Nor does baking stuff like that in seem realistic to me, it'll presumably be
> > too cloud provider specific.
> 
> Let's gather some more information on this.  PostgreSQL should support the
> authentication that many people want to use out of the box.  I don't think
> it would be good to be at a point where all the built-in methods are
> outdated and if you want to use the good stuff you have to hunt for plugins.
> The number of different cloud APIs is effectively small.  I expect that
> there are a lot of similarities, like they probably all need support for
> http calls, they might need support for caching lookups, etc.  OIDC was
> mentioned elsewhere.  That's a standard.  Is that compatible with any cloud
> providers?  Would that be enough for many users?

I'm not opposed to putting something into the source tree eventually, if we
can figure out an overlapping set of capabilities that's useful enough. But I
don't see that as a reason to not make auth extensible? If anything, the
contrary - it's going to be much easier to figure out what this should look
like if it can be iteratively worked on with unmodified postgres.

Even if we were to put something in core, it seems that contrib/ would be a
better place than auth.c for something like it.

I think we should consider moving pam, ldap, radius to contrib
eventually. That way we'd not need to entirely remove them, but a "default"
postgres wouldn't have support for auth methods requiring plaintext secret
transmission.

Greetings,

Andres Freund




Re: trigger example for plsample

2022-03-02 Thread Mark Wong
On Fri, Feb 25, 2022 at 06:39:39PM +, Chapman Flack wrote:
> This patch is straightforward, does what it says, and passes the tests.
> 
> Regarding the duplication of code between plsample_func_handler and
> plsample_trigger_handler, perhaps that's for the best for now, as 3554 in
> the same commitfest also touches plsample, so merge conflicts may be
> minimized by not doing more invasive refactoring.
> 
> That would leave low-hanging fruit for a later patch that could refactor
> plsample to reduce the duplication (maybe adding a validator at the same
> time? That would also duplicate some of the checks in the existing handlers.)
> 
> I am not sure that structuring the trigger handler with separate compile and
> execute steps is worth the effort for a simple example like plsample. The main
> plsample_func_handler is not so structured.
> 
> It's likely that many real PLs will have some notion of compilation separate 
> from
> execution. But those will also have logic to do the compilation only once, and
> somewhere to cache the result of that for reuse across calls, and those kinds 
> of
> details might make plsample's basic skeleton more complex than needed.
> 
> I know that in just looking at expected/plsample.out, I was a little 
> distracted by
> seeing multiple "compile" messages for the same trigger function in the same
> session and wondering why that was.
> 
> So maybe it would be simpler and less distracting to assume that the PL 
> targeted
> by plsample is one that just has a simple interpreter that works from the 
> source text.

I've attached v2, which reduces the output:

* Removing the notices for the text body, and the "compile" message.
* Replaced the notice for "compile" message with a comment as a
  placeholder for where a compiling code or checking a cache may go.
* Reducing the number of rows inserted into the table, thus reducing
  the number of notice messages about which code path is taken.


I think that reduces the repetitiveness of the output...

Regards,
Mark
diff --git a/src/test/modules/plsample/expected/plsample.out 
b/src/test/modules/plsample/expected/plsample.out
index a0c318b6df..a7912c7897 100644
--- a/src/test/modules/plsample/expected/plsample.out
+++ b/src/test/modules/plsample/expected/plsample.out
@@ -6,9 +6,6 @@ AS $$
   Example of source with text result.
 $$ LANGUAGE plsample;
 SELECT plsample_result_text(1.23, 'abc', '{4, 5, 6}');
-NOTICE:  source text of function "plsample_result_text": 
-  Example of source with text result.
-
 NOTICE:  argument: 0; name: a1; value: 1.23
 NOTICE:  argument: 1; name: a2; value: abc
 NOTICE:  argument: 2; name: a3; value: {4,5,6}
@@ -25,12 +22,57 @@ AS $$
   Example of source with void result.
 $$ LANGUAGE plsample;
 SELECT plsample_result_void('{foo, bar, hoge}');
-NOTICE:  source text of function "plsample_result_void": 
-  Example of source with void result.
-
 NOTICE:  argument: 0; name: a1; value: {foo,bar,hoge}
  plsample_result_void 
 --
  
 (1 row)
 
+CREATE FUNCTION my_trigger_func() RETURNS trigger AS $$
+if TD_event == "INSERT"
+return TD_NEW
+elseif TD_event == "UPDATE"
+return TD_NEW
+else
+return "OK"
+end
+$$ language plsample;
+CREATE TABLE my_table (num integer, description text);
+CREATE TRIGGER my_trigger_func BEFORE INSERT OR UPDATE ON my_table
+   FOR EACH ROW EXECUTE FUNCTION my_trigger_func();
+CREATE TRIGGER my_trigger_func2 AFTER INSERT OR UPDATE ON my_table
+   FOR EACH ROW EXECUTE FUNCTION my_trigger_func(8);
+INSERT INTO my_table (num, description)
+VALUES (1, 'first');
+NOTICE:  trigger name: my_trigger_func
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by INSERT
+NOTICE:  triggered BEFORE
+NOTICE:  triggered per row
+NOTICE:  trigger name: my_trigger_func2
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by INSERT
+NOTICE:  triggered AFTER
+NOTICE:  triggered per row
+NOTICE:  trigger arg[0]: 8
+UPDATE my_table
+SET description = 'first, modified once'
+WHERE num = 1;
+NOTICE:  trigger name: my_trigger_func
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by UPDATE
+NOTICE:  triggered BEFORE
+NOTICE:  triggered per row
+NOTICE:  trigger name: my_trigger_func2
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by UPDATE
+NOTICE:  triggered AFTER
+NOTICE:  triggered per row
+NOTICE:  trigger arg[0]: 8
+DROP TRIGGER my_trigger_func ON my_table;
+DROP TRIGGER my_trigger_func2 ON my_table;
+DROP FUNCTION my_trigger_func;
diff --git a/src/test/modules/plsample/plsample.c 
b/src/test/modules/plsample/plsample.c
index 6fc33c728c..fea266f522 100644
--- a/src/test/modules/plsample/plsample.c
+++ b/src/test/modules/plsample/plsample.c
@@ -19,6 +19,7 @@
 #include "catalog/pg_type.h"
 #include "commands/event_trigger.h"
 #include "commands/trigger.h"
+#include "executor/sp

Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2022-03-02 09:32:26 +0100, Peter Eisentraut wrote:
> > On 01.03.22 22:34, Andres Freund wrote:
> > > The cases I've heard about are about centralizing auth across multiple 
> > > cloud
> > > services. Including secret management in some form. E.g. allowing an
> > > application to auth to postgres, redis and having the secret provided by
> > > infrastructure, rather than having to hardcode it in config or such.
> > > 
> > > I can't see application developers configuring kerberos and I don't think
> > > LDAP, PAM, Radius are a great answer either, due to the plaintext 
> > > requirement
> > > alone? LDAP is pretty clearly dying technology, PAM is fragile 
> > > complicated C
> > > stuff that's not portable across OSs. Radius is probably the most 
> > > realistic,
> > > but at least as implemented doesn't seem flexible enough (e.g. no access 
> > > to
> > > group memberships etc).
> > > 
> > > Nor does baking stuff like that in seem realistic to me, it'll presumably 
> > > be
> > > too cloud provider specific.
> > 
> > Let's gather some more information on this.  PostgreSQL should support the
> > authentication that many people want to use out of the box.  I don't think
> > it would be good to be at a point where all the built-in methods are
> > outdated and if you want to use the good stuff you have to hunt for plugins.
> > The number of different cloud APIs is effectively small.  I expect that
> > there are a lot of similarities, like they probably all need support for
> > http calls, they might need support for caching lookups, etc.  OIDC was
> > mentioned elsewhere.  That's a standard.  Is that compatible with any cloud
> > providers?  Would that be enough for many users?
> 
> I'm not opposed to putting something into the source tree eventually, if we
> can figure out an overlapping set of capabilities that's useful enough. But I
> don't see that as a reason to not make auth extensible? If anything, the
> contrary - it's going to be much easier to figure out what this should look
> like if it can be iteratively worked on with unmodified postgres.
> 
> Even if we were to put something in core, it seems that contrib/ would be a
> better place than auth.c for something like it.
> 
> I think we should consider moving pam, ldap, radius to contrib
> eventually. That way we'd not need to entirely remove them, but a "default"
> postgres wouldn't have support for auth methods requiring plaintext secret
> transmission.

Part of the point, for my part anyway, of dropping support for plaintext
transmission would be to remove support for that from libpq, otherwise a
compromised server could still potentially convince a client to provide
a plaintext password be sent to it.

I also just generally disagree with the idea that it makes sense for
these things to be in contrib.  We should be dropping them because
they're insecure- moving them to contrib doesn't change the issue that
we're distributing authentication solutions that send (either through an
encrypted tunnel, or not, neither is good) that pass plaintext passwords
around.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread samay sharma
Hi,

On Wed, Mar 2, 2022 at 12:32 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 01.03.22 22:34, Andres Freund wrote:
> > The cases I've heard about are about centralizing auth across multiple
> cloud
> > services. Including secret management in some form. E.g. allowing an
> > application to auth to postgres, redis and having the secret provided by
> > infrastructure, rather than having to hardcode it in config or such.
> >
> > I can't see application developers configuring kerberos and I don't think
> > LDAP, PAM, Radius are a great answer either, due to the plaintext
> requirement
> > alone? LDAP is pretty clearly dying technology, PAM is fragile
> complicated C
> > stuff that's not portable across OSs. Radius is probably the most
> realistic,
> > but at least as implemented doesn't seem flexible enough (e.g. no access
> to
> > group memberships etc).
> >
> > Nor does baking stuff like that in seem realistic to me, it'll
> presumably be
> > too cloud provider specific.
>
> Let's gather some more information on this.  PostgreSQL should support
> the authentication that many people want to use out of the box.  I don't
> think it would be good to be at a point where all the built-in methods
> are outdated and if you want to use the good stuff you have to hunt for
> plugins.  The number of different cloud APIs is effectively small.  I
> expect that there are a lot of similarities, like they probably all need
> support for http calls, they might need support for caching lookups,
> etc.  OIDC was mentioned elsewhere.  That's a standard.  Is that
> compatible with any cloud providers?  Would that be enough for many users?
>

I think we are discussing two topics in this thread which in my opinion are
orthogonal.


(a) Should we make authentication methods pluggable by exposing these
hooks? - These will allow users to add plugins of their own to support
whatever auth method they like. One immediate use case (and what prompted
me to start looking at this) is Azure Active Directory integration which is
a common request from Azure customers. We could also, over time, move some
of our existing auth methods into extensions if we don’t want to maintain
them in core.


Please note that these hooks do not restrict auth providers to use only
plaintext auth methods. We could do SCRAM with secrets which are stored
outside of Postgres using this auth plugin too. I can modify the
test_auth_provider sample extension to do something like that as Andres
suggested.


I don't see much point in that unless we deprecate *all* the
> auth methods that transmit a cleartext password.


(b) Should we allow plaintext auth methods (and should we deprecate a few
currently supported auth methods which use plaintext exchange)? - I think
this is also a very important discussion but has many factors to consider
independent of the hooks. Whatever decision is made here, we can impose
that in auth.c later for plugins. For eg. allow plugins to only do
plaintext stuff with SSL enabled (by checking if Port->ssl_in_use), or if
we remove AUTH_REQ_PASSWORD, then plugins any way can’t use plaintext
exchange with the client.


So, overall, if we are open to allowing plugins for auth methods, I can
proceed to modify the test_auth_provider extension to use SCRAM and allow
registering multiple providers for this specific change.


Regards,

Samay


Re: Adding CI to our tree

2022-03-02 Thread Justin Pryzby
On Mon, Feb 28, 2022 at 02:58:02PM -0600, Justin Pryzby wrote:
> I still think that if "Build Docs" is a separate cirrus task, it should 
> rebuild
> docs on every CI run, even if they haven't changed, for any patch that touches
> docs/.  It'll be confusing if cfbot shows 5 green circles and 4 of them were
> built 1 day ago, and 1 was built 3 weeks ago.  Docs are the task that runs
> quickest, so I don't think it's worth doing anything special there (especially
> without understanding the behavior of changesInclude()).
> 
> Also, to allow users to view the built HTML docs, cfbot would need to 1) keep
> track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow
> showing artifacts from the previous run.  If it's not already done, I think 
> the
> first half is a good idea on its own.  But the 2nd part doesn't seem 
> desirable.

Maybe changesInclude() could work if we use this URL (from cirrus'
documentation), which uses the artifacts from the last successful build.
https://api.cirrus-ci.com/v1/artifact/github/justinpryzby/postgres/Documentation/html_docs/html_docs/00-doc.html?branch=citest-cirrus2

That requires knowing the file being modified, so we'd have to generate an
index of changed files - which I've started doing here.

> However, I realized that we can filter on cfbot with either of these:
> | $CIRRUS_CHANGE_TITLE =~ '^\[CF...'
> | git log -1 |grep '^Author: Commitfest Bot '
> If we can assume that cfbot will continue submitting branches as a single
> patch, this resolves the question of a "base branch", for cfbot.

I don't know what you think of that idea, but I think I want to amend my
proposal: show HTML and coverage artifacts for HEAD~1, unless set otherwise by
an environment var.  Today, that'd do the right thing for cfbot, and also for
any 1-patch commits.

> These patches implement that idea, and make "code coverage" and "HTML diffs"
> stuff only run for cfbot commits.  This still needs another round of testing,
> though.

The patch was missing a file due to an issue while rebasing - oops.

BTW (regarding the last patch), I just noticed that -Og optimization can cause
warnings with gcc-4.8.5-39.el7.x86_64.

be-fsstubs.c: In function 'be_lo_export':
be-fsstubs.c:522:24: warning: 'fd' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  if (CloseTransientFile(fd) != 0)
^
trigger.c: In function 'ExecCallTriggerFunc':
trigger.c:2400:2: warning: 'result' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  return (HeapTuple) DatumGetPointer(result);
  ^
xml.c: In function 'xml_pstrdup_and_free':
xml.c:1205:2: warning: 'result' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  return result;

-- 
Justin
>From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d10b0a82f9f..1b7c36283e9 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +181,12 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+#apt-get update
+#apt-get -y install ...
 
   configure_script: |
 su postgres <<-EOF
@@ -237,7 +240,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export
 
-  setup_cores_script:
+  setup_os_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
@@ -384,6 +387,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set
 
+  setup_os_script: |
+REM choco install -y ...
+
   configure_script:
 # copy errors out when using forward slashes
 - copy src\tools\ci\windows_build_config.pl src\tools\msvc\config.pl
@@ -479,6 +485,10 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  setup_os_script: |
+#apt-get update
+#apt-get -y install ...
+
   ###
   # Test that code can be built with gcc/clan

Re: Proposal: Support custom authentication methods using hooks

2022-03-02 Thread Andres Freund
Hi,

On 2022-03-02 15:26:32 -0500, Stephen Frost wrote:
> Part of the point, for my part anyway, of dropping support for plaintext
> transmission would be to remove support for that from libpq, otherwise a
> compromised server could still potentially convince a client to provide
> a plaintext password be sent to it.

IMO that's an argument for an opt-in option to permit plaintext, not an
argument for removal of the code alltogether. Even that will need a long
transition time, because it's effectively a form of an ABI break. Upgrading
libpq will suddenly cause applications to stop working. So adding an opt-out
option to disable plaintext is the next step...

I don't think it makes sense to discuss this topic as part of this thread
really. It seems wholly independent of making authentication pluggable.


> I also just generally disagree with the idea that it makes sense for
> these things to be in contrib.  We should be dropping them because
> they're insecure- moving them to contrib doesn't change the issue that
> we're distributing authentication solutions that send (either through an
> encrypted tunnel, or not, neither is good) that pass plaintext passwords
> around.

Shrug. I don't think it's a good idea to just leave people hanging without a
replacement. It's OK to make it a bit harder and require explicit
configuration, but dropping support for reasonable configurations IMO is
something we should be very hesitant doing.

Greetings,

Andres Freund




Re: [Proposal] Global temporary tables

2022-03-02 Thread Andres Freund
Hi,

On 2022-02-27 06:09:54 +0100, Pavel Stehule wrote:
> ne 27. 2. 2022 v 5:13 odesílatel Andres Freund  napsal:
> > On 2022-02-27 04:17:52 +0100, Pavel Stehule wrote:
> > > Without this, the GTT will be terribly slow like current temporary tables
> > > with a lot of problems with bloating of pg_class, pg_attribute and
> > > pg_depend tables.
> >
> > I think it's not a great idea to solve multiple complicated problems at
> > once...

> I thought about this issue for a very long time, and I didn't find any
> better (without more significant rewriting of pg storage). In a lot of
> projects, that I know, the temporary tables are strictly prohibited due
> possible devastating impact to system catalog bloat.  It is a serious
> problem. So any implementation of GTT should solve the questions: a) how to
> reduce catalog bloating, b) how to allow session related statistics for
> GTT. I agree so implementation of GTT like template based LTT (local
> temporary tables) can be very simple (it is possible by extension), but
> with the same unhappy performance impacts.

> I don't say so current design should be accepted without any discussions
> and without changes. Maybe GTT based on LTT can be better than nothing
> (what we have now), and can be good enough for a lot of projects where the
> load is not too high (and almost all projects have low load).

I think there's just no way that it can be merged with anything close to the
current design - it's unmaintainable. The need for the feature doesn't change
that.

That's not to say it's impossible to come up with a workable design. But it's
definitely not easy. If I were to work on this - which I am not planning to -
I'd try to solve the problems of "LTT" first, with an eye towards using the
infrastructure for GTT.

I think you'd basically have to come up with a generic design for partitioning
catalog tables into local / non-local storage, without needing explicit code
for each catalog. That could also be used to store the default catalog
contents separately from user defined ones (e.g. pg_proc is pretty large).

Greetings,

Andres Freund




Re: Add 64-bit XIDs into PostgreSQL 15

2022-03-02 Thread Andres Freund
Hi,

On 2022-03-02 16:25:33 +0300, Aleksander Alekseev wrote:
> I agree with Bruce it would be great to deliver this in PG15.

> Please let me know if you believe it's unrealistic for any reason so I will
> focus on testing and reviewing other patches.

I don't see 15 as a realistic target for this patch. There's huge amounts of
work left, it has gotten very little review.

I encourage trying to break down the patch into smaller incrementally useful
pieces. E.g. making all the SLRUs 64bit would be a substantial and
independently committable piece.

Greetings,

Andres Freund




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-02 Thread Andres Freund
Hi,

On 2022-03-01 08:35:27 -0500, Stephen Frost wrote:
> I'm not really sure why we're arguing about this, but clearly the authn
> ID of the leader process is what should be used because that's the
> authentication under which the parallel worker is running, just as much
> as the effective role is the authorization.  Having this be available in
> worker processes would certainly be good as it would allow more query
> plans to be considered when these functions are used.  At this time, I
> don't think that outweighs the complications around having it and I'm
> not suggesting that Jacob needs to go do that, but surely it would be
> better.

I don't think we should commit this without synchronizing the authn between
worker / leader (in a separate commit). Too likely that some function that's
marked parallel ok queries the authn_id, opening up a security/monitoring hole
or such because of a bogus return value.

Greetings,

Andres Freund




  1   2   >