Skipping schema changes in publication

2022-03-22 Thread vignesh C
Hi,

This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;

A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.

Attached patch has the implementation for this.
This feature is for the pg16 version.
Thoughts?

Regards,
Vignesh
From 153e033a78ace66bfbe1cd37db2f3de506740bcb Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Fri, 18 Mar 2022 10:41:35 +0530
Subject: [PATCH v1] Skip publishing the tables of schema.

A new option "SKIP ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more skip schemas to be specified, publisher will skip sending the data
of the tables present in the skip schema to the subscriber.

The new syntax allows specifying schemas. For example:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;

A new column pnskip is added to table "pg_publication_namespace", to maintain
the schemas that the user wants to skip publishing through the publication.
Modified the output plugin (pgoutput) to skip publishing the changes if the
relation is part of skip schema publication.

Updates pg_dump to identify and dump skip schema publications. Updates the \d
family of commands to display skip schema publications and \dRp+ variant will
now display associated skip schemas if any.
---
 doc/src/sgml/catalogs.sgml|   9 ++
 doc/src/sgml/logical-replication.sgml |   7 +-
 doc/src/sgml/ref/alter_publication.sgml   |  28 +++-
 doc/src/sgml/ref/create_publication.sgml  |  28 +++-
 doc/src/sgml/ref/psql-ref.sgml|   5 +-
 src/backend/catalog/pg_publication.c  |  66 +++---
 src/backend/commands/publicationcmds.c| 123 +++---
 src/backend/commands/tablecmds.c  |   2 +-
 src/backend/nodes/copyfuncs.c |  14 ++
 src/backend/nodes/equalfuncs.c|  14 ++
 src/backend/parser/gram.y |  99 +-
 src/backend/replication/pgoutput/pgoutput.c   |  24 +---
 src/backend/utils/cache/relcache.c|  22 +++-
 src/bin/pg_dump/pg_dump.c |  33 -
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/pg_dump/pg_dump_sort.c|   7 +
 src/bin/pg_dump/t/002_pg_dump.pl  |  30 +
 src/bin/psql/describe.c   |  17 +++
 src/bin/psql/tab-complete.c   |  25 ++--
 src/include/catalog/pg_publication.h  |  20 ++-
 .../catalog/pg_publication_namespace.h|   1 +
 src/include/commands/publicationcmds.h|   3 +-
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/publication.out |  84 +++-
 src/test/regress/sql/publication.sql  |  41 +-
 .../t/030_rep_changes_skip_schema.pl  |  96 ++
 src/tools/pgindent/typedefs.list  |   1 +
 28 files changed, 678 insertions(+), 124 deletions(-)
 create mode 100644 src/test/subscription/t/030_rep_changes_skip_schema.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2a8cd02664..18e3cf82aa 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6281,6 +6281,15 @@ SCRAM-SHA-256$:&l
Reference to schema
   
  
+
+
+  
+   pnskip bool
+  
+  
+   True if the schema is skip schema
+  
+ 
 

   
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 555fbd749c..e2a4b89226 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -599,9 +599,10 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
 
   
To add tables to a publication, the user must have ownership rights on the
-   table. To add all tables in schema to a publication, the user must be a
-   superuser. To create a publication that publishes all tables or all tables in
-   schema automatically, the user must be a superuser.
+   table. To add all tables in schema or skip all tables in schema to a
+   publication, the user must be a superuser. To create a publication that
+   publishes all tables or all tables in schema automaticall

Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-22 Thread Imseih (AWS), Sami
>BTW have we discussed another idea I mentioned before that we have the
>leader process periodically check the number of completed indexes and
>advertise it in its progress information? I'm not sure which one is
>better but this idea would require only changes of vacuum code and
>probably simpler than the current idea.

>Regards,


If I understand correctly,  to accomplish this we will need to have the leader 
check the number of indexes completed In the ambukdelete or amvacuumcleanup 
callbacks. These routines do not know about  PVIndStats, and they are called 
by both parallel and non-parallel vacuums.

From what I can see, PVIndstats will need to be passed down to these routines 
or pass a NULL for non-parallel vacuums.

Sami




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-22 Thread Aleksander Alekseev
Hi Peter,

> I think there needs to be a bit more soul searching here on how to
> handle that in the future and how to transition it.  I don't think
> targeting this patch for PG15 is useful at this point.

The patches can be reordered so that we are still able to deliver SLRU
refactoring in PG15.

> As a more general point, I don't like plastering these bulky casts all
> over the place.  Casts hide problems.

Regarding the casts, I don't like them either. I agree that it could
be a good idea to invest a little more time into figuring out if this
transit can be handled in a better way and deliver this change in the
next CF. However, if no one will be able to suggest a better
alternative, I think we should continue making progress here. The
32-bit XIDs are a major inconvenience for many users.

-- 
Best regards,
Aleksander Alekseev




Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-22 Thread Masahiko Sawada
On Tue, Mar 22, 2022 at 4:27 PM Imseih (AWS), Sami  wrote:
>
> >BTW have we discussed another idea I mentioned before that we have the
> >leader process periodically check the number of completed indexes and
> >advertise it in its progress information? I'm not sure which one is
> >better but this idea would require only changes of vacuum code and
> >probably simpler than the current idea.
>
> >Regards,
>
>
> If I understand correctly,  to accomplish this we will need to have the leader
> check the number of indexes completed In the ambukdelete or amvacuumcleanup
> callbacks. These routines do not know about  PVIndStats, and they are called
> by both parallel and non-parallel vacuums.
>
> From what I can see, PVIndstats will need to be passed down to these routines
> or pass a NULL for non-parallel vacuums.
>

Can the leader pass a callback that checks PVIndStats to ambulkdelete
an amvacuumcleanup callbacks? I think that in the passed callback, the
leader checks if the number of processed indexes and updates its
progress information if the current progress needs to be updated.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: pgcrypto: Remove internal padding implementation

2022-03-22 Thread Peter Eisentraut

On 22.03.22 00:51, Nathan Bossart wrote:

On Wed, Mar 16, 2022 at 11:12:06AM +0100, Peter Eisentraut wrote:

Right, the previous behaviors were clearly faulty.  I have updated the
commit message to call out the behavior change more clearly.

This patch is now complete from my perspective.


I took a look at this patch and found nothing of concern.


committed




Re: logical decoding and replication of sequences

2022-03-22 Thread Peter Eisentraut

On 21.03.22 22:54, Tomas Vondra wrote:

On 3/21/22 14:05, Peter Eisentraut wrote:

On 20.03.22 23:55, Tomas Vondra wrote:

Attached is a rebased patch, addressing most of the remaining issues.


This looks okay to me, if the two FIXMEs are addressed.  Remember to
also update protocol.sgml if you change LOGICAL_REP_MSG_SEQUENCE.


Thanks. Do we want to use a different constant for the sequence message?
I've used 'X' for the WIP patch, but maybe there's a better value?


I would do small 's'.  Alternatively, 'Q'/'q' is still available, too.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-22 Thread Yugo NAGATA
On Tue, 22 Mar 2022 09:08:15 +0900
Yugo NAGATA  wrote:

> Hi Ishii-san,
> 
> On Sun, 20 Mar 2022 09:52:06 +0900 (JST)
> Tatsuo Ishii  wrote:
> 
> > Hi Yugo,
> > 
> > I have looked into the patch and I noticed that  > linkend=... endterm=...> is used in pgbench.sgml. e.g.
> > 
> > 
> > 
> > AFAIK this is the only place where "endterm" is used. In other places
> > "link" tag is used instead:
> 
> Thank you for pointing out it. 
> 
> I've checked other places using  referring to , and found
> that "xreflabel"s are used in such  tags. So, I'll fix it 
> in this style.

I attached the updated patch. I also fixed the following paragraph which I had
forgotten to fix in the previous patch.

 The first seven lines report some of the most important parameter settings.
 The sixth line reports the maximum number of tries for transactions with
 serialization or deadlock errors

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From b9f993e81836c4379478809da0e690023c319038 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v18 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++---
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 000ffc4a5c..ab2c5dfc5f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -289,6 +289,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -306,6 +312,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-(void *) st->variables,
-st->nvariables,
+(void *) variables->vars,
+variables->nvars,
 sizeof(Variable),
 compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1562,21 +1584,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * siz

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-22 Thread Pavel Borisov
>
> > I think there needs to be a bit more soul searching here on how to
> > handle that in the future and how to transition it.  I don't think
> > targeting this patch for PG15 is useful at this point.
>
> The patches can be reordered so that we are still able to deliver SLRU
> refactoring in PG15.
>
Sure.

> > As a more general point, I don't like plastering these bulky casts all
> > over the place.  Casts hide problems.
>
> Regarding the casts, I don't like them either. I agree that it could
> be a good idea to invest a little more time into figuring out if this
> transit can be handled in a better way and deliver this change in the
> next CF. However, if no one will be able to suggest a better
> alternative, I think we should continue making progress here. The
> 32-bit XIDs are a major inconvenience for many users.
>

I'd like to add that the initial way of shifting to 64bit was based on
XID_FMT in a print formatting template which could be changed from 32 to 64
bit when shifting to 64-bit xids later. But this template is not
localizable so hackers recommended using %lld/%llu with (long
long)/(unsigned long long cast) which is a current best practice elsewhere
in the code (e.g. recent 1f8bc448680bf93a9). So I suppose we already have a
good enough way to stick to.

This approach in 0001 inherently processes both 32/64 bit xids and should
not change with later committing 64bit xids later (
https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
)

The thing that needs to change then is suppressing output of Epoch. It
should be done when 64-xids are committed and it is done by 0006 in the
mentioned thread. Until that I've left Epoch in the output.

Big thanks for your considerations!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: On login trigger: take three

2022-03-22 Thread Daniel Gustafsson
> On 22 Mar 2022, at 04:48, Andres Freund  wrote:

> docs fail to build: 
> https://cirrus-ci.com/task/5556234047717376?logs=docs_build#L349

Ugh, that one was on me and not the original author. Fixed.

--
Daniel Gustafsson   https://vmware.com/



v29-0001-Provide-a-GUC-for-temporarily-ignoring-event-tri.patch
Description: Binary data


v29-0002-Add-a-new-login-event-and-login-event-trigger-su.patch
Description: Binary data


Re: Tab completion for ALTER MATERIALIZED VIEW ... SET ACCESS METHOD

2022-03-22 Thread Yugo NAGATA
On Sat, 19 Mar 2022 19:31:59 +0900
Michael Paquier  wrote:

> On Fri, Mar 18, 2022 at 03:13:05PM +0900, Michael Paquier wrote:
> > Thanks.  This looks rather sane to me.  I'll split things into 3
> > commits in total, as of the psql completion, SET TABLESPACE and SET
> > ACCESS METHOD.  The first and third patches are only for HEAD, while
> > the documentation hole with SET TABLESPACE should go down to v10.
> > Backpatching the tests of ALTER MATERIALIZED VIEW ALL IN TABLESPACE
> > would not hurt, either, as there is zero coverage for that now.
> 
> I have applied the set, after splitting things mostly as mentioned
> upthread:
> - The doc change for SET TABLESPACE on ALTER MATVIEW has been
> backpatched.
> - The regression tests for SET TABLESPACE have been applied on HEAD,
> as these are independent of the rest, good on their own.
> - All the remaining parts for SET ACCESS METHOD (psql tab completion,
> tests and docs) have been merged together on HEAD.  I could not
> understand why the completion done after SET ACCESS METHOD was not
> grouped with the other parts for ALTER MATVIEW, so I have moved the
> new entry there, and I have added a test checking after an error for
> multiple subcommands, while on it.
> 
> Thanks, Nagata-san!

Thank you!

-- 
Yugo NAGATA 




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-22 Thread Dilip Kumar
On Tue, Mar 22, 2022 at 10:28 AM Dilip Kumar  wrote:
>
>
> I think this make sense.  I haven't changed the original patch as you
> told you were improving on some comments, so in order to avoid
> conflict I have created this add on patch.
>

In my previous patch mistakenly I used src_dboid instead of
dest_dboid.  Fixed in this version.  For destination db I have used
lock mode as  AccessSharedLock.  Logically if we see access wise we
don't want anyone else to be accessing that db but that is anyway
protected because it is not visible to anyone else.  So I think
AccessSharedLock should be correct here because we are just taking
this lock because we are accessing pages in shared buffers from this
database's relations.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 9636688..49fe104 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -460,12 +460,6 @@ CreateDatabaseUsingWalLog(Oid src_dboid, Oid dst_dboid, Oid src_tsid, Oid dst_ts
 	Assert(rnodelist != NIL);
 
 	/*
-	 * Database id is common for all the relation so set it before entering to
-	 * the loop.
-	 */
-	relid.dbId = src_dboid;
-
-	/*
 	 * Iterate over each relfilenode and copy the relation data block by block
 	 * from source database to the destination database.
 	 */
@@ -488,7 +482,15 @@ CreateDatabaseUsingWalLog(Oid src_dboid, Oid dst_dboid, Oid src_tsid, Oid dst_ts
 		dstrnode.dbNode = dst_dboid;
 		dstrnode.relNode = srcrnode.relNode;
 
-		/* Acquire the lock on relation before start copying. */
+		/*
+		 * Acquire relation lock on the source and the destination relation id
+		 * before start copying.
+		 */
+		relid.dbId = src_dboid;
+		relid.relId = relinfo->reloid;
+		LockRelationId(&relid, AccessShareLock);
+
+		relid.dbId = dst_dboid;
 		relid.relId = relinfo->reloid;
 		LockRelationId(&relid, AccessShareLock);
 
@@ -1218,6 +1220,17 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	InvokeObjectPostCreateHook(DatabaseRelationId, dboid, 0);
 
 	/*
+	 * Acquire a lock on the target database, although this is a new database
+	 * and no one else should be able to see it.  But if we are using wal log
+	 * strategy then we are going to access the relation pages using shared
+	 * buffers.  Therefore, it would be better to take the database lock.  And,
+	 * later we would acquire the relation lock as and when we would access the
+	 * individual relations' shared buffers.
+	 */
+	if (dbstrategy == CREATEDB_WAL_LOG)
+		LockSharedObject(DatabaseRelationId, dboid, 0, AccessShareLock);
+
+	/*
 	 * Once we start copying subdirectories, we need to be able to clean 'em
 	 * up if we fail.  Use an ENSURE block to make sure this happens.  (This
 	 * is not a 100% solution, because of the possibility of failure during
@@ -1342,6 +1355,10 @@ createdb_failure_callback(int code, Datum arg)
 	{
 		DropDatabaseBuffers(fparms->dest_dboid);
 		ForgetDatabaseSyncRequests(fparms->dest_dboid);
+
+		/* Release lock on the target database. */
+		UnlockSharedObject(DatabaseRelationId, fparms->dest_dboid, 0,
+		   AccessShareLock);
 	}
 
 	/*


Re: Supply restore_command to pg_rewind via CLI argument

2022-03-22 Thread Alexey Kondratov
Hi,

On Tue, Mar 22, 2022 at 3:32 AM Andres Freund  wrote:
>
> Doesn't apply once more: http://cfbot.cputube.org/patch_37_3213.log
>

Thanks for the reminder, a rebased version is attached.


Regards
-- 
Alexey Kondratov
From df56b5c7b882e781fdc0b92e7a83331f0baab094 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 29 Jun 2021 17:17:47 +0300
Subject: [PATCH v4] Allow providing restore_command as a command line option
 to pg_rewind

This could be useful when postgres is usually run with
-c config_file=..., so the actual configuration and restore_command
is not inside $PGDATA/postgresql.conf.
---
 doc/src/sgml/ref/pg_rewind.sgml  | 19 +
 src/bin/pg_rewind/pg_rewind.c| 45 ++-
 src/bin/pg_rewind/t/001_basic.pl |  1 +
 src/bin/pg_rewind/t/RewindTest.pm| 95 ++--
 src/test/perl/PostgreSQL/Test/Cluster.pm |  5 +-
 5 files changed, 106 insertions(+), 59 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..af75f35867 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,25 @@ PostgreSQL documentation
   
  
 
+ 
+  -C restore_command
+  --target-restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieving
+WAL files from the WAL archive if these files are no longer available
+in the pg_wal directory of the target cluster.
+   
+   
+If restore_command is already set in
+postgresql.conf, you can provide the
+--restore-target-wal option instead. If both options
+are provided, then --target-restore-command
+will be used.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index b39b5c1aac..9aca041425 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -85,21 +85,22 @@ usage(const char *progname)
 	printf(_("%s resynchronizes a PostgreSQL cluster with another copy of the cluster.\n\n"), progname);
 	printf(_("Usage:\n  %s [OPTION]...\n\n"), progname);
 	printf(_("Options:\n"));
-	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
-			 " retrieve WAL files from archives\n"));
-	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
-	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
-	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
-	printf(_("  -n, --dry-run  stop before modifying anything\n"));
-	printf(_("  -N, --no-sync  do not wait for changes to be written\n"
-			 " safely to disk\n"));
-	printf(_("  -P, --progress write progress messages\n"));
-	printf(_("  -R, --write-recovery-conf  write configuration for replication\n"
-			 " (requires --source-server)\n"));
-	printf(_("  --debugwrite a lot of debug messages\n"));
-	printf(_("  --no-ensure-shutdown   do not automatically fix unclean shutdown\n"));
-	printf(_("  -V, --version  output version information, then exit\n"));
-	printf(_("  -?, --help show this help, then exit\n"));
+	printf(_("  -c, --restore-target-wal  use restore_command in target configuration to\n"
+			 "retrieve WAL files from archives\n"));
+	printf(_("  -C, --target-restore-command=COMMAND  target WAL restore_command\n"));
+	printf(_("  -D, --target-pgdata=DIRECTORY existing data directory to modify\n"));
+	printf(_("  --source-pgdata=DIRECTORY source data directory to synchronize with\n"));
+	printf(_("  --source-server=CONNSTR   source server to synchronize with\n"));
+	printf(_("  -n, --dry-run stop before modifying anything\n"));
+	printf(_("  -N, --no-sync do not wait for changes to be written\n"
+			 "safely to disk\n"));
+	printf(_("  -P, --progresswrite progress messages\n"));
+	printf(_("  -R, --write-recovery-conf write configuration for replication\n"
+			 "(requires --source-server)\n"));
+	printf(_("  --debug   write a lot of debug messages\n"));
+	printf(_("  --no-ensure-shutdown  do not automatically fix unclean shutdown\n"));
+	printf(_("  -V, --version output version information, then exit\n"));
+	printf(_("  -?, --helpshow this help, then exit\n"));
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
 	printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL)

Re: [PATCH] Add native windows on arm64 support

2022-03-22 Thread Niyas Sait
Thanks, Thomas for reviewing the patch!

Yes, we cannot turn off ASLR for Arm64 targets with MSVC. I haven't seen
any issues so far on win/arm64 with this option turned off. I guess it
should be okay. If we really need ASLR turned off, then I guess might have
to use clang.

Yes, we could look into providing a build machine. Do you have any
reference to what the CI system looks like now for PostgresSQL and how to
add new workers etc.?

Niyas

On Fri, 18 Mar 2022 at 20:50, Thomas Munro  wrote:

> On Wed, Feb 23, 2022 at 11:09 PM Niyas Sait  wrote:
> > I have created a patch that adds support for building Postgres for the
> windows/arm64 platform using the MSVC toolchain. Following changes have
> been included
> >
> > 1. Extend MSVC scripts to handle ARM64 platform.
> > 2. Add arm64 definition of spin_delay function.
> > 3. Exclude arm_acle.h import with MSVC compiler.
> >
> > Compilation steps are consistent and similar to other windows platforms.
> >
> > The change has been tested on windows/x86_64 and windows/arm64 and all
> regression tests passes on both platforms.
>
> +# arm64 linker only supports dynamic base address
> +my $cfgrandbaseaddress = $self->{platform} eq 'ARM64' ? 'true' :
> 'false';
> ...
> +  $cfgrandbaseaddress
>
> Does that mean that you can't turn off ASLR on this arch?  Does it
> cause random backend failures due to inability to map shared memory at
> the expected address?  Our experience with EXEC_BACKEND on various
> Unix systems tells us that you have to turn off ASLR if you want 100%
> success rate on starting backends, but I guess it depends on the
> details of how the randomisation is done.
>
> Any interest in providing a build farm animal, so that we could know
> that this works?
>


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-22 Thread Tatsuo Ishii
>> I've checked other places using  referring to , and found
>> that "xreflabel"s are used in such  tags. So, I'll fix it 
>> in this style.
> 
> I attached the updated patch. I also fixed the following paragraph which I had
> forgotten to fix in the previous patch.
> 
>  The first seven lines report some of the most important parameter settings.
>  The sixth line reports the maximum number of tries for transactions with
>  serialization or deadlock errors

Thank you for the updated patch. I think the patches look good and now
it's ready for commit. If there's no objection, I would like to
commit/push the patches.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Matthias Kurz
Hi everyone!

I am watching this thread since quite a while and I am waiting eagerly a
long time already that this feature finally lands in PostgreSQL.
Given that in around 2 weeks PostgreSQL 15 will go into feature freeze (in
the last years that usually happened around the 8th of April AFAIK), is
there any chance this will be committed? As far as I understand the patches
are almost ready.

Sorry for the noise, I just wanted to draw attention that there are people
out there looking forward to JSON_TABLE ;)

Thanks everyone for your fantastic work!
Matthias


On Sun, 13 Mar 2022 at 22:22, Andrew Dunstan  wrote:

>
> On 2/9/22 08:22, Himanshu Upadhyaya wrote:
> > On Wed, Feb 2, 2022 at 12:44 AM Andrew Dunstan 
> wrote:
> >>
> >> rebased with some review comments attended to.
> > I am in process of reviewing these patches, initially, have started
> > with 0002-JSON_TABLE-v55.patch.
> > Tested many different scenarios with various JSON messages and these
> > all are working as expected. Just one question on the below output.
> >
> > ‘postgres[1406146]=#’SELECT * FROM JSON_TABLE(jsonb '1', '$' COLUMNS
> > (a int PATH '$.a' ERROR ON EMPTY)) jt;
> >  a
> > ---
> >
> > (1 row)
> >
> > ‘postgres[1406146]=#’SELECT * FROM JSON_TABLE(jsonb '1', '$' COLUMNS
> > (a int PATH '$.a' ERROR ON ERROR)) jt;
> >  a
> > ---
> >
> > (1 row)
> >
> > is not "ERROR ON ERROR" is expected to give error?
>
>
> I think I understand what's going on here. In the first example 'ERROR
> ON EMPTY' causes an error condition, but as the default action for an
> error condition is to return null that's what happens. To get an error
> raised you would need to say 'ERROR ON EMPTY ERROR ON ERROR'. I don't
> know if that's according to spec. It seems kinda screwy, arguably a POLA
> violation, although that would hardly be a first for the SQL Standards
> body.  But I'm speculating here, I'm not a standards lawyer.
>
> In the second case it looks like there isn't really an error. There
> would be if you used 'strict' in the path expression.
>
>
> This whole area needs more documentation.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>
>


Re: Identify missing publications from publisher while create/alter subscription.

2022-03-22 Thread vignesh C
On Tue, Mar 22, 2022 at 5:29 AM Andres Freund  wrote:
>
> On 2022-02-13 19:34:05 +0530, vignesh C wrote:
> > Thanks for the comments, the attached v14 patch has the changes for the 
> > same.
>
> The patch needs a rebase, it currently fails to apply:
> http://cfbot.cputube.org/patch_37_2957.log

The attached v15 patch is rebased on top of HEAD.

Regards,
Vignesh
From fa175c7c823dc9fbcca7676ddec944430da81022 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Tue, 22 Mar 2022 15:09:13 +0530
Subject: [PATCH v15] Identify missing publications from publisher while
 create/alter subscription.

Creating/altering subscription is successful when we specify a publication which
does not exist in the publisher. This patch checks if the specified publications
are present in the publisher and throws an error if any of the publication is
missing in the publisher.

Author: Vignesh C
Reviewed-by: Bharath Rupireddy, Japin Li, Dilip Kumar, Euler Taveira, Ashutosh Sharma
Discussion: https://www.postgresql.org/message-id/flat/20220321235957.i4jtjn4wyjucex6b%40alap3.anarazel.de#b846fd4ef657cfaa8c9890f044e4
---
 doc/src/sgml/ref/alter_subscription.sgml  |  13 ++
 doc/src/sgml/ref/create_subscription.sgml |  20 ++-
 src/backend/commands/subscriptioncmds.c   | 201 +++---
 src/bin/psql/tab-complete.c   |  15 +-
 src/test/subscription/t/007_ddl.pl|  54 ++
 5 files changed, 270 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index ac2db249cb..995c1f270d 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -172,6 +172,19 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publications doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index b701752fc9..f2e7e8744d 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -110,12 +110,14 @@ CREATE SUBSCRIPTION subscription_nametrue.  Setting this to
   false will force the values of
-  create_slot, enabled and
-  copy_data to false.
+  create_slot, enabled,
+  copy_data and
+  validate_publication to false.
   (You cannot combine setting connect
   to false with
   setting create_slot, enabled,
-  or copy_data to true.)
+  copy_data or
+  validate_publication to true.)
  
 
  
@@ -170,6 +172,18 @@ CREATE SUBSCRIPTION subscription_name
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publications doesn't exist. The default is
+  false.
+ 
+
+   
   
  
 
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index e16f04626d..6c066a1dfc 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -64,6 +64,7 @@
 #define SUBOPT_TWOPHASE_COMMIT		0x0200
 #define SUBOPT_DISABLE_ON_ERR		0x0400
 #define SUBOPT_LSN	0x0800
+#define SUBOPT_VALIDATE_PUB			0x1000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -86,6 +87,7 @@ typedef struct SubOpts
 	bool		streaming;
 	bool		twophase;
 	bool		disableonerr;
+	bool		validate_publication;
 	XLogRecPtr	lsn;
 } SubOpts;
 
@@ -137,6 +139,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->twophase = false;
 	if (IsSet(supported_opts, SUBOPT_DISABLE_ON_ERR))
 		opts->disableonerr = false;
+	if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB))
+		opts->validate_publication = false;
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -292,6 +296,15 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_LSN;
 			opts->lsn = lsn;
 		}
+		else if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB) &&
+ strcmp(defel->defname, "validate_publication") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_VALIDATE_PUB))
+errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_VALIDATE_PUB;
+			opts->validate_publication = defGetBoolean(defel);
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
@@ -327,10 +340,19 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		

Re: [PATCH] Add native windows on arm64 support

2022-03-22 Thread Julien Rouhaud
Hi,

Please don't top-post here.  See
https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics.

On Tue, Mar 22, 2022 at 09:37:46AM +, Niyas Sait wrote:
>
> Yes, we could look into providing a build machine. Do you have any
> reference to what the CI system looks like now for PostgresSQL and how to
> add new workers etc.?

It's all documented at
https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto.




Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-03-22 Thread Bharath Rupireddy
On Tue, Mar 22, 2022 at 12:20 AM Andres Freund  wrote:
> > Something like attached
> > v6-0001-Deduplicate-checkpoint-restartpoint-complete-mess.patch?
>
> Mostly. I don't see a reason for the use of the stringinfo. And I think
> LogCheckpointStart() should be dealt with similarly.
>
> I'd just make it a restartpoint ? _("restartpoint") : _("checkpoint") or such
> in the argument? Then translators don't need to translate the two messages
> separately.

Do you mean like this?
ereport(LOG,
/* translator: the placeholders show checkpoint options */
(errmsg("%s starting:%s%s%s%s%s%s%s%s",
restartpoint ? _("restartpoint") : _("checkpoint"),
(flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
(flags & CHECKPOINT_END_OF_RECOVERY) ? "
end-of-recovery" : "",
(flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
(flags & CHECKPOINT_FORCE) ? " force" : "",
(flags & CHECKPOINT_WAIT) ? " wait" : "",
(flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
(flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
(flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : "")));

Are we allowed to use _("foo") with errmsg? Isn't "foo" going to get
translated twice that way?

> Or we could just not translate restartpoint/checkpoint - after all we don't
> translate the flags in LogCheckpointStart() either. But on balance I'd lean
> towards the above.

I'm not sure if it's correct to translate some parts of the message
and leave others. What exactly is the reason for this? I think the
reason in this case might be that some flag names with hyphens and
spaces before words may not have the right/matching words in all
languages. What happens if we choose to translate/not translate the
entire message?

How about just doing this for LogCheckpointStart?
StringInfoData logmsg;
initStringInfo(&logmsg);
appendStringInfo(&logmsg,
/* translator: the placeholders show checkpoint options */
 restartpoint ? _("restartpoint
starting:%s%s%s%s%s%s%s%s") :
_("checkpoint starting:%s%s%s%s%s%s%s%s"),
 (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
 (flags & CHECKPOINT_END_OF_RECOVERY) ? "
end-of-recovery" : "",
 (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
 (flags & CHECKPOINT_FORCE) ? " force" : "",
 (flags & CHECKPOINT_WAIT) ? " wait" : "",
 (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
 (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
 (flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : "");
ereport(LOG, errmsg_internal("%s", logmsg.data));
pfree(logmsg.data);

Similarly, for LogCheckpointEnd:
StringInfoData logmsg;
initStringInfo(&logmsg);
appendStringInfo(&logmsg,
restartpoint ? _("restartpoint complete: ") :
   _("checkpoint complete: "));
appendStringInfo(&logmsg,
/* translator: the placeholders show restartpoint/checkpoint stats */
 _("wrote %d buffers (%.1f%%); "
  "%d WAL file(s) added, %d removed, %d recycled; "
  "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
  "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
  "distance=%d kB, estimate=%d kB"),
   CheckpointStats.ckpt_bufs_written,
   (double) CheckpointStats.ckpt_bufs_written *
100 / NBuffers,
   CheckpointStats.ckpt_segs_added,
   CheckpointStats.ckpt_segs_removed,
   CheckpointStats.ckpt_segs_recycled,
   write_msecs / 1000, (int) (write_msecs % 1000),
   sync_msecs / 1000, (int) (sync_msecs % 1000),
   total_msecs / 1000, (int) (total_msecs % 1000),
   CheckpointStats.ckpt_sync_rels,
   longest_msecs / 1000, (int) (longest_msecs % 1000),
   average_msecs / 1000, (int) (average_msecs % 1000),
   (int) (PrevCheckPointDistance / 1024.0),
   (int) (CheckPointDistanceEstimate / 1024.0));
if (CheckpointStats.repl_snap_files_rmvd_cnt > 0)
{
long t_msecs;

t_msecs =
TimestampDifferenceMilliseconds(CheckpointStats.repl_snap_start_t,

CheckpointStats.repl_snap_end_t);

appendStringInfo(&logmsg,
_("; logical decoding snapshot file(s)
removed=%lu, time=%ld.%03d s, cutoff LSN=%X/%X"),
  CheckpointStats.repl_snap_files_rmvd_cnt,
  t_msecs / 1000, (int) (t_msecs % 1000),

LSN_FORMAT_ARGS(CheckpointStats.repl_snap_cutoff_lsn));
}
if 

Re: [PATCH] pg_stat_toast v9

2022-03-22 Thread Gunnar "Nick" Bluth
Am 22.03.22 um 02:17 schrieb Andres Freund:
> Hi,
> 
> On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote:
>> v8 (applies cleanly to today's HEAD/master) attached.
> 
> This doesn't apply anymore, likely due to my recent pgstat changes - which
> you'd need to adapt to...

Now, that's been quite an overhaul... kudos!


> http://cfbot.cputube.org/patch_37_3457.log
> 
> Marked as waiting on author.

v9 attached.

TBTH, I don't fully understand all the external/static stuff, but it
applies to HEAD/master, compiles and passes all tests, so... ;-)

Best regards,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato doc/src/sgml/config.sgml  |  26 
 doc/src/sgml/monitoring.sgml  | 163 ++
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  40 +++
 src/backend/catalog/system_views.sql  |  20 
 src/backend/postmaster/pgstat.c   | 161 -
 src/backend/utils/activity/Makefile   |   1 +
 src/backend/utils/activity/pgstat_toast.c | 157 +
 src/backend/utils/adt/pgstatfuncs.c   |  72 
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 
 src/include/pgstat.h  | 110 -
 src/include/utils/pgstat_internal.h   |   1 +
 src/test/regress/expected/rules.out   |  17 +++
 src/test/regress/expected/stats.out   |  62 ++
 src/test/regress/sql/stats.sql|  28 +
 17 files changed, 897 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a48973b3c..64b4219ded 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7892,6 +7892,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for a column.
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 35b2923c5e..7bbacd67fb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4942,6 +4953,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+   Attribute (column) number in the relation
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+   Name of the relation
+  
+ 
+
+ 
+  
+   attname name
+  
+  
+   Name of the attribute (column)
+  
+ 
+
+ 
+  
+   storagemethod char
+  
+  
+   Storage method of the attribute
+  
+ 
+
+ 
+  
+   externalized bigint
+  
+  
+   Number of ti

Re: [PATCH] Add reloption for views to enable RLS

2022-03-22 Thread Dean Rasheed
On Mon, 21 Mar 2022 at 09:47, Laurenz Albe  wrote:
>
> Thanks for your diligent work on this, and the patch looks good to me.

Thanks for looking again. Pushed.

Regards,
Dean




Re: logical replication empty transactions

2022-03-22 Thread Amit Kapila
On Tue, Mar 22, 2022 at 7:25 AM houzj.f...@fujitsu.com
 wrote:
>
> > On Monday, March 21, 2022 6:01 PM Amit Kapila 
> > wrote:
>
> Oh, sorry, I posted the wrong patch, here is the correct one.
>

The test change looks good to me. I think additionally we can verify
that the record is not reflected in the subscriber table. Apart from
that, I had made minor changes mostly in the comments in the attached
patch. If those look okay to you, please include those in the next
version.


-- 
With Regards,
Amit Kapila.


v28-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data


v28_diff_amit.1.patch
Description: Binary data


Re: Patch proposal - parameter to limit amount of FPW because of hint bits per second

2022-03-22 Thread Michail Nikolaev
Hello, Peter.

Thanks for your comments.

> There is one FPI per checkpoint for any leaf page that is modified
> during that checkpoint. The difference between having that happen once
> or twice per leaf page and having that happen many more times per leaf
> page could be very large.

Yes, I am almost sure proactively calling of_bt_simpledel_pass() will
positively impact the system on many workloads. But also I am almost
sure it will not change the behavior of the incident I mention -
because it is not related to multiple checkpoints.

> Of course it's true that that might not make that much difference. Who
> knows? But if you're not willing to measure it then we'll never know.
> What version are you using here? How frequently were checkpoints
> occurring in the period in question, and how does that compare to
> normal? You didn't even include this basic information.

Yes, I probably had to provide more details. Downtime is pretty short
(you could see network peak on telemetry image from the first letter)
- so, just 1-3 minutes. Checkpoints are about each 30 min.
It is just an issue with super-high WAL traffic caused by tons of FPI
traffic after a long transaction commit. The issue resolved fast on
its own, but downtime still happens.

> Many things have changed in this area already, and it's rather unclear
> how much just upgrading to Postgres 14 would help.

Version is 11. Yes, many things have changed but IFAIK nothing's
changed related to FPI mechanics (LP_DEAD and other hint bits,
including HEAP).

I could probably try to reproduce the issue, but I'm not sure how to
do it in a fast and reliable way (it is hard to wait for a day for
each test). Probably it may be possible by some temporary crutch in
postgres source (to emulate old transaction commit somehow).

> The main reason that this can be so complex is that FPIs are caused by
> more frequent checkpoints, but *also* cause more frequent checkpoints
> in turn. So you could have a "death spiral" with FPIs -- the effect is
> nonlinear, which has the potential to lead to pathological, chaotic
> behavior. The impact on response time is *also* nonlinear and chaotic,
> in turn.

Could you please explain "death spiral" mechanics related to FPIs?

> What do you do when there were too many FPIs for a long time, but also too 
> much
> avoiding them earlier on? It's very complicated.

Yes, it could cause at least performance degradation in case of too
aggressive avoiding the FPI. I am 100% sure such settings should be
disabled by default. It is more about the physical limits of servers.
Personally I would like to set it to about 75% of resources.

Also, there are some common things between checkpoints and vacuum -
they are processes which are required to be done regularly (but not
right now) and they are limited in resources. Setting LP_DEAD (and
other hint bits, especially in HEAP) is also something required to be
done regularly (but not right now). But it is not limited by
resources.

BTW, probably new index creation is something with the same nature.

Best regards,
Michail.




Re: logical decoding and replication of sequences

2022-03-22 Thread Amit Kapila
On Mon, Mar 21, 2022 at 4:25 AM Tomas Vondra
 wrote:
>
> Hi,
>
> Attached is a rebased patch, addressing most of the remaining issues.
>

It appears that on the apply side, the patch always creates a new
relfilenode irrespective of whether the sequence message is
transactional or not. Is it required to create a new relfilenode for
non-transactional messages? If not that could be costly?

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences

2022-03-22 Thread Petr Jelinek


> On 22. 3. 2022, at 13:09, Amit Kapila  wrote:
> 
> On Mon, Mar 21, 2022 at 4:25 AM Tomas Vondra
>  wrote:
>> 
>> Hi,
>> 
>> Attached is a rebased patch, addressing most of the remaining issues.
>> 
> 
> It appears that on the apply side, the patch always creates a new
> relfilenode irrespective of whether the sequence message is
> transactional or not. Is it required to create a new relfilenode for
> non-transactional messages? If not that could be costly?
> 


That's a good catch, I think we should just write the page in the 
non-transactional case, no need to mess with relnodes.


Petr



Re: Allow file inclusion in pg_hba and pg_ident files

2022-03-22 Thread Aleksander Alekseev
Hi hackers,

> The cfbot says that the patch doesn't apply anymore, so here's a v3 with
the
> changes mentioned below.

I came across this thread while looking for the patches that need review.

The v3-0001 patch LGTM.

Since v3-0002 adds a new view and alters pg_proc.dat shouldn't it also
increase CATALOG_VERSION_NO? Not sure if we generally do this in the
patches or expect the committer to make the change manually.

Same question regarding v3-0003. Other than that the patch looks OK, but
doesn't seem to add any tests for the new functionality. Do you think it
would be possible to test-cover the file inclusion? Personally I don't
think it's that critical to have these particular tests, but if they can be
added, I think we should do so.

I didn't review v3-0004 since it's marked as PoC and seems to be a separate
feature that is not targeting PG15. I suggest excluding it from the
patchset in order to keep the focus. Consider creating a new thread and a
new CF entry after we deal with v3-0001...v3-0003.

All in all, the patchset seems to be in good shape and I don't have
anything but some little nitpicks. It passes `make installcheck` and I
verified manually that the file inclusion 1) works 2) write proper error
messages to the logfile when the included file doesn't exist or has wrong
permissions.

-- 
Best regards,
Aleksander Alekseev


Re: Partial aggregates pushdown

2022-03-22 Thread Tomas Vondra
On 3/22/22 01:49, Andres Freund wrote:
> On 2022-01-17 15:27:53 +0300, Alexander Pyhalov wrote:
>> Alexander Pyhalov писал 2022-01-17 15:26:
>>> Updated patch.
>>
>> Sorry, missed attachment.
> 
> Needs another update: http://cfbot.cputube.org/patch_37_3369.log
> 
> Marked as waiting on author.
> 

TBH I'm still not convinced this is the right approach. I've voiced this
opinion before, but to reiterate the main arguments:

1) It's not clear to me how could this get extended to aggregates with
more complex aggregate states, to support e.g. avg() and similar fairly
common aggregates.

2) I'm not sure relying on aggpartialpushdownsafe without any version
checks etc. is sufficient. I mean, how would we know the remote node has
the same idea of representing the aggregate state. I wonder how this
aligns with assumptions we do e.g. for functions etc.

Aside from that, there's a couple review comments:

1) should not remove the comment in foreign_expr_walker

2) comment in deparseAggref is obsolete/inaccurate

3) comment for partial_agg_ok should probably explain when we consider
aggregate OK to be pushed down

4) I'm not sure why get_rcvd_attinmeta comment talks about "return type
bytea" and "real input type".

5) Talking about "partial" aggregates is a bit confusing, because that
suggests this is related to actual "partial aggregates". But it's not.

6) Can add_foreign_grouping_paths do without the new 'partial'
parameter? Clearly, it can be deduced from extra->patype, no?

7) There's no docs for PARTIALCONVERTERFUNC / PARTIAL_PUSHDOWN_SAFE in
CREATE AGGREGATE sgml docs.

8) I don't think "serialize" in the converter functions is the right
term, considering those functions are not "serializing" anything. If
anything, it's the remote node that is serializing the agg state and the
local not is deserializing it. Or maybe I just misunderstand where are
the converter functions executed?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Allow file inclusion in pg_hba and pg_ident files

2022-03-22 Thread Aleksander Alekseev
Hi hackers,

> It passes `make installcheck` ...

`make installcheck-world`, of course. Sorry for the confusion.

-- 
Best regards,
Aleksander Alekseev


Re: generic plans and "initial" pruning

2022-03-22 Thread Amit Langote
On Tue, Mar 15, 2022 at 3:19 PM Amit Langote  wrote:
> On Tue, Mar 15, 2022 at 5:06 AM Robert Haas  wrote:
> > On Mon, Mar 14, 2022 at 3:38 PM Tom Lane  wrote:
> > > Also, while I've not spent much time at all reading this patch,
> > > it seems rather desperately undercommented, and a lot of the
> > > new names are unintelligible.  In particular, I suspect that the
> > > patch is significantly redesigning when/where run-time pruning
> > > happens (unless it's just letting that be run twice); but I don't
> > > see any documentation or name changes suggesting where that
> > > responsibility is now.
> >
> > I am sympathetic to that concern. I spent a while staring at a
> > baffling comment in 0001 only to discover it had just been moved from
> > elsewhere. I really don't feel that things in this are as clear as
> > they could be -- although I hasten to add that I respect the people
> > who have done work in this area previously and am grateful for what
> > they did. It's been a huge benefit to the project in spite of the
> > bumps in the road. Moreover, this isn't the only code in PostgreSQL
> > that needs improvement, or the worst. That said, I do think there are
> > problems. I don't yet have a position on whether this patch is making
> > that better or worse.
>
> Okay, I'd like to post a new version with the comments edited to make
> them a bit more intelligible.  I understand that the comments around
> the new invocation mode(s) of runtime pruning are not as clear as they
> should be, especially as the changes that this patch wants to make to
> how things work are not very localized.

Actually, another area where the comments may not be as clear as they
should have been is the changes that the patch makes to the
AcquireExecutorLocks() logic that decides which relations are locked
to safeguard the plan tree for execution, which are those given by
RTE_RELATION entries in the range table.

Without the patch, they are found by actually scanning the range table.

With the patch, it's the same set of RTEs if the plan doesn't contain
any pruning nodes, though instead of the range table, what is scanned
is a bitmapset of their RT indexes that is made available by the
planner in the form of PlannedStmt.lockrels.  When the plan does
contain a pruning node (PlannedStmt.containsInitialPruning), the
bitmapset is constructed by calling ExecutorGetLockRels() on the plan
tree, which walks it to add RT indexes of relations mentioned in the
Scan nodes, while skipping any nodes that are pruned after performing
initial pruning steps that may be present in their containing parent
node's PartitionPruneInfo.  Also, the RT indexes of partitioned tables
that are present in the PartitionPruneInfo itself are also added to
the set.

While expanding comments added by the patch to make this clear, I
realized that there are two problems, one of them quite glaring:

* Planner's constructing this bitmapset and its copying along with the
PlannedStmt is pure overhead in the cases that this patch has nothing
to do with, which is the kind of thing that Andres cautioned against
upthread.

* Not all partitioned tables that would have been locked without the
patch to come up with a Append/MergeAppend plan may be returned by
ExecutorGetLockRels().  For example, if none of the query's
runtime-prunable quals were found to match the partition key of an
intermediate partitioned table and thus that partitioned table not
included in the PartitionPruneInfo.  Or if an Append/MergeAppend
covering a partition tree doesn't contain any PartitionPruneInfo to
begin with, in which case, only the leaf partitions and none of
partitioned parents would be accounted for by the
ExecutorGetLockRels() logic.

The 1st one seems easy to fix by not inventing PlannedStmt.lockrels
and just doing what's being done now: scan the range table if
(!PlannedStmt.containsInitialPruning).

The only way perhaps to fix the second one is to reconsider the
decision we made in the following commit:

commit 52ed730d511b7b1147f2851a7295ef1fb5273776
Author: Tom Lane 
Date:   Sun Oct 7 14:33:17 2018 -0400

Remove some unnecessary fields from Plan trees.

In the wake of commit f2343653f, we no longer need some fields that
were used before to control executor lock acquisitions:

* PlannedStmt.nonleafResultRelations can go away entirely.

* partitioned_rels can go away from Append, MergeAppend, and ModifyTable.
However, ModifyTable still needs to know the RT index of the partition
root table if any, which was formerly kept in the first entry of that
list.  Add a new field "rootRelation" to remember that.  rootRelation is
partly redundant with nominalRelation, in that if it's set it will have
the same value as nominalRelation.  However, the latter field has a
different purpose so it seems best to keep them distinct.

That is, add back the partitioned_rels field, at least to Append and
MergeAppend, to store the RT indexes of partitioned tab

Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 21, 2022, at 10:03 PM, Thomas Munro  wrote:
> 
> On Fri, Mar 18, 2022 at 4:22 PM Mark Dilger
>  wrote:
>> (FYI, I got a test failure from src/test/recovery/t/013_crash_restart.pl 
>> when testing v1-0001.  I'm not sure yet what that is about.)
> 
> Doesn't look like 0001 has anything to do with that...   Are you on a
> Mac?

Yes, macOS Catalina, currently 10.15.7.
  
>  Did it look like this recent failure from CI?
> 
> https://cirrus-ci.com/task/4686108033286144
> https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/regress_log_013_crash_restart
> https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/013_crash_restart_primary.log

I no longer have the logs for comparison.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-22 Thread Aleksander Alekseev
Hi hackers,

> Here is v23. As was suggested by Alexander above, I've changed the order of 
> the patches and improved the commit message. Now, SLRU patch is the first.

Many thanks!

> There were proposals to make use of SLRU pages numbers that are in fact 
> unsigned and change from int to uint64. I fully support this, but I'm not 
> sure this big SLRU refactoring should be done in this patchset.

If it takes a lot of effort and doesn't bring us any closer to 64-bit
XIDs, I suggest not doing this in v23-0001. I can invest some time
into this refactoring in April and create a separate CF entry, if
someone will second the idea.

> In general, I consider this patchset is ready to commit. It would be great to 
> deliver it in PG15.

+1.

v23-0002 seems to have two extra sentences in the commit message that
are outdated, but this is a minor issue. The commit message should be:

"""
Replace the %u formatting string for XIDs with %llu and cast to
unsigned long long. While actually XIDs are still 32 bit, this patch
completely supports both 32 and 64 bit.
"""

Since Peter expressed some concerns regarding v23-0002, maybe we
should discuss it a bit more. Although personally I doubt that we can
do much better than that, and as I recall this particular change was
explicitly requested by several people.

-- 
Best regards,
Aleksander Alekseev




Re: New Object Access Type hooks

2022-03-22 Thread Andrew Dunstan


On 3/22/22 01:03, Thomas Munro wrote:
> On Fri, Mar 18, 2022 at 4:22 PM Mark Dilger
>  wrote:
>> (FYI, I got a test failure from src/test/recovery/t/013_crash_restart.pl 
>> when testing v1-0001.  I'm not sure yet what that is about.)
> Doesn't look like 0001 has anything to do with that...   Are you on a
> Mac?  Did it look like this recent failure from CI?


Probably not connected. It's working fine for me on Ubuntu/


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Oleg Bartunov
On Tue, Mar 22, 2022 at 12:53 PM Matthias Kurz  wrote:

> Hi everyone!
>
> I am watching this thread since quite a while and I am waiting eagerly a
> long time already that this feature finally lands in PostgreSQL.
> Given that in around 2 weeks PostgreSQL 15 will go into feature freeze (in
> the last years that usually happened around the 8th of April AFAIK), is
> there any chance this will be committed? As far as I understand the patches
> are almost ready.
>

We are waiting too :)


>
> Sorry for the noise, I just wanted to draw attention that there are people
> out there looking forward to JSON_TABLE ;)
>

IS JSON is also cool in light of the work on JSON Schema
https://github.com/json-schema-org/vocab-database/blob/main/database.md,
which opens a lot of useful features and optimizations like  json
dictionary compression.


>
> Thanks everyone for your fantastic work!
> Matthias
>
>
> On Sun, 13 Mar 2022 at 22:22, Andrew Dunstan  wrote:
>
>>
>> On 2/9/22 08:22, Himanshu Upadhyaya wrote:
>> > On Wed, Feb 2, 2022 at 12:44 AM Andrew Dunstan 
>> wrote:
>> >>
>> >> rebased with some review comments attended to.
>> > I am in process of reviewing these patches, initially, have started
>> > with 0002-JSON_TABLE-v55.patch.
>> > Tested many different scenarios with various JSON messages and these
>> > all are working as expected. Just one question on the below output.
>> >
>> > ‘postgres[1406146]=#’SELECT * FROM JSON_TABLE(jsonb '1', '$' COLUMNS
>> > (a int PATH '$.a' ERROR ON EMPTY)) jt;
>> >  a
>> > ---
>> >
>> > (1 row)
>> >
>> > ‘postgres[1406146]=#’SELECT * FROM JSON_TABLE(jsonb '1', '$' COLUMNS
>> > (a int PATH '$.a' ERROR ON ERROR)) jt;
>> >  a
>> > ---
>> >
>> > (1 row)
>> >
>> > is not "ERROR ON ERROR" is expected to give error?
>>
>>
>> I think I understand what's going on here. In the first example 'ERROR
>> ON EMPTY' causes an error condition, but as the default action for an
>> error condition is to return null that's what happens. To get an error
>> raised you would need to say 'ERROR ON EMPTY ERROR ON ERROR'. I don't
>> know if that's according to spec. It seems kinda screwy, arguably a POLA
>> violation, although that would hardly be a first for the SQL Standards
>> body.  But I'm speculating here, I'm not a standards lawyer.
>>
>> In the second case it looks like there isn't really an error. There
>> would be if you used 'strict' in the path expression.
>>
>>
>> This whole area needs more documentation.
>>
>>
>> cheers
>>
>>
>> andrew
>>
>> --
>> Andrew Dunstan
>> EDB: https://www.enterprisedb.com
>>
>>
>>
>>

-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Allow file inclusion in pg_hba and pg_ident files

2022-03-22 Thread Julien Rouhaud
Hi,

On Tue, Mar 22, 2022 at 03:21:20PM +0300, Aleksander Alekseev wrote:
>
> The v3-0001 patch LGTM.
>
> Since v3-0002 adds a new view and alters pg_proc.dat shouldn't it also
> increase CATALOG_VERSION_NO? Not sure if we generally do this in the
> patches or expect the committer to make the change manually.

It's better to no include any catversion bump, otherwise the patches will
rot very fast.  Author can mention the need for a catversion bump in the
commit message, and I usually do so but I apparently forgot.

> Same question regarding v3-0003. Other than that the patch looks OK, but
> doesn't seem to add any tests for the new functionality. Do you think it
> would be possible to test-cover the file inclusion? Personally I don't
> think it's that critical to have these particular tests, but if they can be
> added, I think we should do so.

Yes, as I mentioned in the first email I'm willing to add test coverage.  But
this will require TAP tests, and it's likely going to be a bit annoying to make
sure it has decent coverage and works on all platforms, so I'd rather do it
once I know there's at least some basic agreement on the feature and/or the
approach.  Unfortunately, until now there was no feedback on that part despite
the activity on the thread.  In my experience it's a good sign that this will
get rejected soon, so I still didn't write tests.

> I didn't review v3-0004 since it's marked as PoC and seems to be a separate
> feature that is not targeting PG15. I suggest excluding it from the
> patchset in order to keep the focus. Consider creating a new thread and a
> new CF entry after we deal with v3-0001...v3-0003.

This feature was discussed in some old thread when the file inclusion was first
discussed almost a decade ago, and in my understanding it was part of the "must
have" (along with 0002) in order to accept the feature.  That's why I added
it, since I'm also willing to work of that if needed, whether before or after
the file inclusion thing.

> All in all, the patchset seems to be in good shape and I don't have
> anything but some little nitpicks. It passes `make installcheck` and I
> verified manually that the file inclusion 1) works 2) write proper error
> messages to the logfile when the included file doesn't exist or has wrong
> permissions.

Thanks!




Re: LogwrtResult contended spinlock

2022-03-22 Thread Alvaro Herrera
On 2022-Mar-21, Andres Freund wrote:

> This patch currently doesn't apply: http://cfbot.cputube.org/patch_37_2716.log

Updated.

> Are you aiming this for v15? Otherwise I'd like to move the entry to the next
> CF. Marked as waiting-on-author.

I'd like to get 0001 pushed to pg15, yes.  I'll let 0002 sit here for
discussion, but I haven't seen any evidence that we need it.  If others
vouch for it, I can push that one too, but I'd rather have it be a
separate thing.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)
>From 4d5a8d915b497b79bbe62e18352f7b9d8c1b9bca Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 2 Feb 2021 14:03:43 -0300
Subject: [PATCH v6 1/2] Change XLogCtl->LogwrtResult to use atomic ops
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Currently, access to LogwrtResult is protected by a spinlock.  This
becomes severely contended in some scenarios, such as with a largish
replication flock: walsenders all calling GetFlushRecPtr repeatedly
cause the processor heat up to the point where eggs can be fried on top.

This can be reduced to a non-problem by replacing XLogCtl->LogwrtResult
with a struct containing a pair of atomically accessed variables.

In addition, this commit splits the process-local copy of these values
(kept in the freestanding LogwrtResult struct) into two separate
LogWriteResult and LogFlushResult.  This is not strictly necessary, but
it makes it clearer that these are updated separately, each on their own
schedule.

Author: Álvaro Herrera 
Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql
---
 src/backend/access/transam/xlog.c | 195 +++---
 src/include/port/atomics.h|  29 +
 src/tools/pgindent/typedefs.list  |   1 +
 3 files changed, 125 insertions(+), 100 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4ac3871c74..311a8a1192 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -285,16 +285,14 @@ static bool doPageWrites;
  *
  * LogwrtRqst indicates a byte position that we need to write and/or fsync
  * the log up to (all records before that point must be written or fsynced).
- * LogwrtResult indicates the byte positions we have already written/fsynced.
- * These structs are identical but are declared separately to indicate their
- * slightly different functions.
+ * LogWrtResult indicates the byte positions we have already written/fsynced.
+ * These structs are similar but are declared separately to indicate their
+ * slightly different functions; in addition, the latter is read and written
+ * using atomic operations.
  *
- * To read XLogCtl->LogwrtResult, you must hold either info_lck or
- * WALWriteLock.  To update it, you need to hold both locks.  The point of
- * this arrangement is that the value can be examined by code that already
- * holds WALWriteLock without needing to grab info_lck as well.  In addition
- * to the shared variable, each backend has a private copy of LogwrtResult,
- * which is updated when convenient.
+ * In addition to the shared variable, each backend has a private copy of
+ * each member of LogwrtResult (LogWriteResult and LogFlushResult), each of
+ * which is separately updated when convenient.
  *
  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
  * (protected by info_lck), but we don't need to cache any copies of it.
@@ -317,18 +315,18 @@ static bool doPageWrites;
  *--
  */
 
+typedef struct XLogwrtAtomic
+{
+	pg_atomic_uint64 Write;		/* last byte + 1 of write position */
+	pg_atomic_uint64 Flush;		/* last byte + 1 of flush position */
+} XLogwrtAtomic;
+
 typedef struct XLogwrtRqst
 {
 	XLogRecPtr	Write;			/* last byte + 1 to write out */
 	XLogRecPtr	Flush;			/* last byte + 1 to flush */
 } XLogwrtRqst;
 
-typedef struct XLogwrtResult
-{
-	XLogRecPtr	Write;			/* last byte + 1 written out */
-	XLogRecPtr	Flush;			/* last byte + 1 flushed */
-} XLogwrtResult;
-
 /*
  * Inserting to WAL is protected by a small fixed number of WAL insertion
  * locks. To insert to the WAL, you must hold one of the locks - it doesn't
@@ -480,6 +478,7 @@ typedef struct XLogCtlData
 {
 	XLogCtlInsert Insert;
 
+	XLogwrtAtomic LogwrtResult; /* uses atomics */
 	/* Protected by info_lck: */
 	XLogwrtRqst LogwrtRqst;
 	XLogRecPtr	RedoRecPtr;		/* a recent copy of Insert->RedoRecPtr */
@@ -497,12 +496,6 @@ typedef struct XLogCtlData
 	pg_time_t	lastSegSwitchTime;
 	XLogRecPtr	lastSegSwitchLSN;
 
-	/*
-	 * Protected by info_lck and WALWriteLock (you must hold either lock to
-	 * read it, but both to update)
-	 */
-	XLogwrtResult LogwrtResult;
-
 	/*
 	 * Latest initialized page in the cache (last byte position + 1).
 	 *
@@ -622,10 +615,11 @@ static ControlFileData *ControlFile = NULL;
 static int	UsableBytesInSegment;
 
 /*
- * Private

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

2022-03-22 Thread Michail Nikolaev
Hello, Andres.

> Fails to apply at the moment: http://cfbot.cputube.org/patch_37_2947.log

Thanks for notifying me. BTW, some kind of automatic email in case of
status change could be very helpful.

> Marked as waiting for author.

New version is attached, build is passing
(https://cirrus-ci.com/build/5599876384817152), so, moving it back to
"ready for committer" .

Best regards,
Michail.
From 9ecb33a54971cfa1c766ed9d129c6abb44e39f98 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev 
Date: Sat, 15 Jan 2022 16:21:51 +0300
Subject: [PATCH v10 1/3] code

---
 src/backend/access/common/bufmask.c  | 25 
 src/backend/access/gist/gistget.c| 43 +++--
 src/backend/access/gist/gistxlog.c   | 15 +
 src/backend/access/hash/hash.c   |  4 +-
 src/backend/access/hash/hash_xlog.c  | 17 +
 src/backend/access/hash/hashsearch.c | 18 --
 src/backend/access/hash/hashutil.c   | 33 +-
 src/backend/access/heap/heapam.c | 42 +---
 src/backend/access/heap/heapam_handler.c |  5 +-
 src/backend/access/index/genam.c | 20 +++---
 src/backend/access/index/indexam.c   | 81 +---
 src/backend/access/nbtree/nbtinsert.c| 22 +--
 src/backend/access/nbtree/nbtree.c   |  4 +-
 src/backend/access/nbtree/nbtsearch.c| 14 +++-
 src/backend/access/nbtree/nbtutils.c | 33 +-
 src/backend/access/nbtree/nbtxlog.c  | 16 +
 src/backend/access/table/tableam.c   |  4 +-
 src/backend/access/transam/rmgr.c|  4 +-
 src/backend/access/transam/xlogutils.c   |  6 ++
 src/backend/storage/ipc/standby.c|  6 ++
 src/bin/pg_rewind/parsexlog.c|  2 +-
 src/bin/pg_waldump/rmgrdesc.c|  2 +-
 src/include/access/bufmask.h |  1 +
 src/include/access/gist.h|  5 ++
 src/include/access/gistxlog.h|  1 +
 src/include/access/hash.h|  2 +
 src/include/access/hash_xlog.h   |  1 +
 src/include/access/heapam.h  |  2 +-
 src/include/access/nbtree.h  |  2 +
 src/include/access/nbtxlog.h |  1 +
 src/include/access/relscan.h | 15 -
 src/include/access/rmgr.h|  2 +-
 src/include/access/rmgrlist.h| 44 ++---
 src/include/access/tableam.h | 14 ++--
 src/include/access/xlog_internal.h   |  4 ++
 35 files changed, 421 insertions(+), 89 deletions(-)

diff --git a/src/backend/access/common/bufmask.c 
b/src/backend/access/common/bufmask.c
index 4e953bfd61..22026482ad 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -128,3 +128,28 @@ mask_page_content(Page page)
memset(&((PageHeader) page)->pd_upper, MASK_MARKER,
   sizeof(uint16));
 }
+
+/*
+ * mask_lp_dead
+ *
+ * In some index AMs, line pointer flags can be modified without emitting any
+ * WAL record. Sometimes it is required to mask LP_DEAD flags set on primary to
+ * set own values on standby.
+ */
+void
+mask_lp_dead(Page page)
+{
+   OffsetNumber offnum,
+maxoff;
+
+   maxoff = PageGetMaxOffsetNumber(page);
+   for (offnum = FirstOffsetNumber;
+offnum <= maxoff;
+offnum = OffsetNumberNext(offnum))
+   {
+   ItemId  itemId = PageGetItemId(page, offnum);
+
+   if (ItemIdHasStorage(itemId) && ItemIdIsDead(itemId))
+   itemId->lp_flags = LP_NORMAL;
+   }
+}
diff --git a/src/backend/access/gist/gistget.c 
b/src/backend/access/gist/gistget.c
index adbf622c83..1905c04c51 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/bufmask.h"
 #include "access/genam.h"
 #include "access/gist_private.h"
 #include "access/relscan.h"
@@ -49,6 +50,7 @@ gistkillitems(IndexScanDesc scan)
Assert(so->curBlkno != InvalidBlockNumber);
Assert(!XLogRecPtrIsInvalid(so->curPageLSN));
Assert(so->killedItems != NULL);
+   Assert(so->numKilled > 0);
 
buffer = ReadBuffer(scan->indexRelation, so->curBlkno);
if (!BufferIsValid(buffer))
@@ -62,8 +64,13 @@ gistkillitems(IndexScanDesc scan)
 * If page LSN differs it means that the page was modified since the 
last
 * read. killedItems could be not valid so LP_DEAD hints applying is not
 * safe.
+*
+* Another case - standby was promoted after start of current 
transaction.
+* It is not required for correctness, but it is better to just skip
+* everything.
 */
-   if (BufferGetLSNAtomic(buffer) != so->curPageLSN)
+   if ((BufferGetLSNAtomic(buffer) != so->curPageLSN) ||
+   (scan->xactStartedInRecovery && !RecoveryInProgress()))
{
UnlockReleaseBuffer(buffer);
so->numKilled = 0;  /* rese

Re: LogwrtResult contended spinlock

2022-03-22 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Mar-21, Andres Freund wrote:
>> Are you aiming this for v15? Otherwise I'd like to move the entry to the next
>> CF. Marked as waiting-on-author.

> I'd like to get 0001 pushed to pg15, yes.  I'll let 0002 sit here for
> discussion, but I haven't seen any evidence that we need it.  If others
> vouch for it, I can push that one too, but I'd rather have it be a
> separate thing.

I looked briefly at 0001, and I've got to say that I disagree with
your decision to rearrange the representation of the local LogwrtResult
copy.  It clutters the patch tremendously and makes it hard to
understand what the actual functional change is.  Moreover, I'm
not entirely convinced that it's a notational improvement in the
first place.

Perhaps it'd help if you split 0001 into two steps, one to do the
mechanical change of the representation and then a second patch that
converts the shared variable to atomics.  Since you've moved around
the places that read the shared variable, that part is subtler than
one could wish and really needs to be studied on its own.

regards, tom lane




Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-03-22 Thread Tom Lane
Thomas Munro  writes:
> On Tue, Mar 22, 2022 at 4:13 PM Tom Lane  wrote:
>> OK.  You want me to push 75674c7e to HEAD?

> Thanks, yes, please do.

Done.

regards, tom lane




Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Andrew Dunstan


On 3/22/22 09:28, Oleg Bartunov wrote:
>
>
> On Tue, Mar 22, 2022 at 12:53 PM Matthias Kurz 
> wrote:
>
> Hi everyone!
>
> I am watching this thread since quite a while and I am waiting
> eagerly a long time already that this feature finally lands in
> PostgreSQL.
> Given that in around 2 weeks PostgreSQL 15 will go into feature
> freeze (in the last years that usually happened around the 8th of
> April AFAIK), is there any chance this will be committed? As far
> as I understand the patches are almost ready.
>
>
> We are waiting too :)


I'm planning on pushing the functions patch set this week and json-table
next week.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: New Object Access Type hooks

2022-03-22 Thread Andrew Dunstan


On 3/21/22 19:08, Mark Dilger wrote:
>
>> On Mar 21, 2022, at 1:30 PM, Andrew Dunstan  wrote:
>>
>> To the best of my knowledge .control files are only used by extensions,
>> not by other modules. They are only referenced in
>> src/backend/commands/extension.c in the backend code. For example,
>> auto_explain which is a loadable module but not en extension does not
>> have one, and I bet if you remove it you'll find this will work just fine.
> Fixed, also with adjustments to Joshua's function comments.
>

Pushed with slight adjustments - the LOAD was unnecessary as was the
setting of client_min_messages - the latter would have made buildfarm
animals unhappy.


Now you need to re-submit your GUCs patch I think.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-03-22 Thread Andres Freund
Hi,

On 2022-03-22 16:32:24 +0530, Bharath Rupireddy wrote:
> On Tue, Mar 22, 2022 at 12:20 AM Andres Freund  wrote:
> > > Something like attached
> > > v6-0001-Deduplicate-checkpoint-restartpoint-complete-mess.patch?
> >
> > Mostly. I don't see a reason for the use of the stringinfo. And I think
> > LogCheckpointStart() should be dealt with similarly.
> >
> > I'd just make it a restartpoint ? _("restartpoint") : _("checkpoint") or 
> > such
> > in the argument? Then translators don't need to translate the two messages
> > separately.
> 
> Do you mean like this?
> ereport(LOG,
> /* translator: the placeholders show checkpoint options */
> (errmsg("%s starting:%s%s%s%s%s%s%s%s",
> restartpoint ? _("restartpoint") : _("checkpoint"),
> (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
> (flags & CHECKPOINT_END_OF_RECOVERY) ? "
> end-of-recovery" : "",
> (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
> (flags & CHECKPOINT_FORCE) ? " force" : "",
> (flags & CHECKPOINT_WAIT) ? " wait" : "",
> (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
> (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
> (flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : "")));

Yes.


> Are we allowed to use _("foo") with errmsg? Isn't "foo" going to get
> translated twice that way?

The format string gets translated, arguments don't. Choosing the German
translation (don't speak other languages :(), you can for example see:

#. translator: the placeholders show checkpoint options
#: access/transam/xlog.c:8692
#, c-format
msgid "checkpoint starting:%s%s%s%s%s%s%s%s"
msgstr "Checkpoint beginnt:%s%s%s%s%s%s%s%s"

you can see that the first words are translated, but the arguments are the
same. So there shouldn't be be any double translation.


> > Or we could just not translate restartpoint/checkpoint - after all we don't
> > translate the flags in LogCheckpointStart() either. But on balance I'd lean
> > towards the above.
> 
> I'm not sure if it's correct to translate some parts of the message
> and leave others. What exactly is the reason for this?

Yea, it's a bit odd. I'd still separate combining the messages (and thus
translating "restartpoint" and "checkpoint" separately) from translating
flags.


> I think the reason in this case might be that some flag names with hyphens
> and spaces before words may not have the right/matching words in all
> languages. What happens if we choose to translate/not translate the entire
> message?

If individual words aren't translated the "original" word would be used.


> How about just doing this for LogCheckpointStart?
> StringInfoData logmsg;
> initStringInfo(&logmsg);
> appendStringInfo(&logmsg,
> /* translator: the placeholders show checkpoint options */
>  restartpoint ? _("restartpoint
> starting:%s%s%s%s%s%s%s%s") :
> _("checkpoint starting:%s%s%s%s%s%s%s%s"),
>  (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
>  (flags & CHECKPOINT_END_OF_RECOVERY) ? "
> end-of-recovery" : "",
>  (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
>  (flags & CHECKPOINT_FORCE) ? " force" : "",
>  (flags & CHECKPOINT_WAIT) ? " wait" : "",
>  (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
>  (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
>  (flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : "");
> ereport(LOG, errmsg_internal("%s", logmsg.data));
> pfree(logmsg.data);

Why do you insist on using stringinfos? It doesn't add *anything* here over
just a plain ereport()? And is considerably more verbose?




> > Both seem still very long. I still am doubtful this level of detail is
> > appropriate. Seems more like a thing for a tracepoint or such. How about 
> > just
> > printing the time for the logical decoding operations in aggregate, without
> > breaking down into files, adding LSNs etc?
> 
> The distinction that the patch makes right now is for snapshot and
> rewrite mapping files and it makes sense to have them separately.

-1. The line also needs to be readable...

Greetings,

Andres Freund




Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Matthias Kurz
On Tue, 22 Mar 2022 at 15:31, Andrew Dunstan  wrote:

>
> I'm planning on pushing the functions patch set this week and json-table
> next week.
>

Great! Thank you very much!


Re: psql - add SHOW_ALL_RESULTS option

2022-03-22 Thread Peter Eisentraut

On 17.03.22 19:04, Fabien COELHO wrote:


Hello Peter,


See attached v16 which removes the libpq workaround.


I suppose this depends on

https://www.postgresql.org/message-id/flat/ab4288f8-be5c-57fb-2400-e3e857f53e46%40enterprisedb.com 



getting committed, because right now this makes the psql TAP tests 
fail because of the duplicate error message.


How should we handle that?


Ok, it seems I got the patch wrong.

Attached v17 is another try. The point is to record the current status, 
whatever it is, buggy or not, and to update the test when libpq fixes 
things, whenever this is done.


Your patch contains this test case:

+# Test voluntary crash
+my ($ret, $out, $err) = $node->psql(
+   'postgres',
+   "SELECT 'before' AS running;\n" .
+   "SELECT pg_terminate_backend(pg_backend_pid());\n" .
+   "SELECT 'AFTER' AS not_running;\n");
+
+is($ret, 2, "server stopped");
+like($out, qr/before/, "output before crash");
+ok($out !~ qr/AFTER/, "no output after crash");
+is($err, 'psql::2: FATAL:  terminating connection due to 
administrator command

+psql::2: FATAL:  terminating connection due to administrator command
+server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+psql::2: fatal: connection to server was lost', "expected error 
message");


The expected output (which passes) contains this line twice:

psql::2: FATAL:  terminating connection due to administrator command
psql::2: FATAL:  terminating connection due to administrator command

If I paste this test case into current master without your patch, I only 
get this line once.  So your patch is changing this output.  The whole 
point of the libpq fixes was to not have this duplicate output.  So I 
think something is still wrong somewhere.





Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Alvaro Herrera
On 2022-Mar-22, Andrew Dunstan wrote:

> I'm planning on pushing the functions patch set this week and json-table
> next week.

I think it'd be a good idea to push the patches one by one and let a day
between each.  That would make it easier to chase buildfarm issues
individually, and make sure they are fixed before the next step.
Each patch in each series is already big enough.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Daniel Gustafsson
> On 22 Mar 2022, at 16:31, Andrew Dunstan  wrote:

> I'm planning on pushing the functions patch set this week and json-table
> next week.

My comments from 30827b3c-edf6-4d41-bbf1-298181874...@yesql.se are yet to be
addressed (or at all responded to) in this patchset.  I'll paste the ones which
still apply to make it easier:

+ into both child and parrent columns for all missing values.
s/parrent/parent/

-   objectname = "xmltable";
+   objectname = rte->tablefunc ?
+   rte->tablefunc->functype == TFT_XMLTABLE ?
+   "xmltable" : "json_table" : NULL;
In which case can rte->tablefunc be NULL for a T_TableFuncScan? Also, nested
ternary operators are confusing at best, I think this should be rewritten as
plain if statements.

In general when inspecting functype I think it's better to spell it out with if
statements rather than ternary since it allows for grepping the code easier.
Having to grep for TFT_XMLTABLE to find where TFT_JSON_TABLE could be used
isn't all that convenient.

+ errmsg("JSON_TABLE() is not yet implemented for json type"),
I can see this being potentially confusing to many, en errhint with a reference
to jsonb seems like a good idea.

+/* Recursively register column name in the path name list. */
+static void
+registerJsonTableColumn(JsonTableContext *cxt, char *colname)
This comment is IMO misleading since the function isn't actually recursive, but 
a
helper function for a recursive function.

+   switch (get_typtype(typid))
+   {
+   case TYPTYPE_COMPOSITE:
+   return true;
+
+   case TYPTYPE_DOMAIN:
+   return typeIsComposite(getBaseType(typid));
+   }
switch statements without a default runs the risk of attracting unwanted
compiler warning attention, or make static analyzers angry. This one can
easily be rewritten with two if-statements on a cached get_typtype()
returnvalue.

+ * Returned false at the end of a scan, true otherwise.
s/Returned/Returns/ (applies at two places)

+ /* state->ordinal--; */ /* skip current outer row, reset counter */
Is this dead code to be removed, or left in there as a reminder to fix
something?

--
Daniel Gustafsson   https://vmware.com/





Re: psql - add SHOW_ALL_RESULTS option

2022-03-22 Thread Peter Eisentraut

On 17.03.22 19:04, Fabien COELHO wrote:

Indeed! The missing part was put back in v17.


Some unrelated notes on this v17 patch:

-use Test::More;
+use Test::More tests => 41;

The test counts are not needed/wanted anymore.


+
+\set ECHO none
+

This seems inappropriate.


+--
+-- autocommit
+--

+--
+-- test ON_ERROR_ROLLBACK
+--

This test file already contains tests for autocommit and 
ON_ERROR_ROLLBACK.  If you want to change those, please add yours into 
the existing sections, not make new ones.  I'm not sure if your tests 
add any new coverage, or if it is just duplicate.





Re: Use generation context to speed up tuplesorts

2022-03-22 Thread David Rowley
On Wed, 23 Feb 2022 at 15:25, Andres Freund  wrote:
> On 2022-02-18 12:10:51 +1300, David Rowley wrote:
> > The other way I thought to fix it was by changing the logic for when
> > generation blocks are freed.  In the problem case mentioned above, the
> > block being freed is the current block (which was just allocated).  I
> > made some changes to adjust this behaviour so that we no longer free
> > the block when the final chunk is pfree()'d. Instead, that now lingers
> > and can be reused by future allocations, providing they fit inside it.
>
> That makes sense to me, as long as we keep just one such block.

I've rewritten the patch in such a way that it no longer matters which
block becomes free.  What I did was add a new field to the
GenerationContext named "freeblock".  We now simply store up to 1
recently emptied block there instead of calling free() on it.  When
we're allocating new memory, we'll try and make use of the "freeblock"
when we don't have enough space in the current block.

I feel like this is quite a good change as it saves continual
free/malloc calls for FIFO pattern users of the generation context.
Since that's pretty much the workload that this context is intended
for, that seems like a very good change to make.

I did consider only recording a block as free once it reaches the
maximum size.  As I have the code currently, we'll continually reuse
all blocks, including the initial sized ones.  It will end up filling
blocks quicker as we'll be reusing those smaller initial blocks
continually with FIFO workloads. I'm not entirely sure if that
matters. I just wanted to point out that I thought about it.

Aside from the freeblock, I've also added code for adding a "keeper"
block.  This is allocated in the same malloc'd chunk as the Generation
context itself. One thing I'm not too sure about is if the keeper
block change is worth the trouble.  If we pfree() all of the chunks
out of the keeper block, there's a special case in GenerationFree() to
empty that block rather than free() it.  This also means we need a
special case in GenerationAlloc() so we try to see if the keeper block
has any space before we go and allocate a new block.   My current
thoughts are that the keeper block is really only useful for
Generation contexts that remain very small and don't ever require
allocation of additional blocks.  This will mean all the memory can be
allocated along with the context, which saves a malloc and free call.
Does anyone have any thoughts on this?

> > The problem I see with this method is that there still could be some
> > pathological case that causes us to end up storing just a single tuple per
> > generation block.
>
> Crazy idea: Detect the situation, and recompact. Create a new context, copy
> all the tuples over, delete the old context. That could be a win even in less
> adversarial situations than "a single tuple per generation block".

I think that would need some new MemoryContext API functions in which
callers could use to determine the fragmentation and do something
about it on the consumer side. Otherwise, as Tomas says, if we did it
from within the context code itself we'd have no visibility of what is
pointing to the memory we're moving around.

If nobody has any objections to the 0001 patch then I'd like to move
ahead with it in the next few days.  For the 0002 patch, I'm currently
feeling like we shouldn't be using the Generation context for bounded
sorts. The only way I can think to do that is with an API change to
tuplesort. I feel 0001 is useful even without 0002.

David
From 546ac7a3f5ad993c76576e09dd031a453a70c692 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Wed, 16 Feb 2022 10:23:32 +1300
Subject: [PATCH v4 1/2] Improve the generation memory allocator

Here we make a series of improvements to the generation memory
allocator:

1. Allow generation contexts to have a minimum, initial and maximum block
sizes. The standard allocator allows this already but when the generation
context was added, it only allowed fixed-sized blocks.  The problem with
fixed-sized blocks is that it's difficult to choose how large to make the
blocks.  If the chosen size is too small then we'd end up with a large
number of blocks and a large number of malloc calls. If the block size is
made too large, then memory is wasted.

2. Add support for "keeper" blocks.  This is a special block that is
allocated along with the context itself but is never freed.  Instead,
when the last chunk in the keeper block is freed, we simply mark the block
as empty to allow new allocations to make use of it.

3. Add facility to "recycle" newly empty blocks instead of freeing them
and having to later malloc an entire new block again.  We do this by
recording a single GenerationBlock which has become empty of any chunks.
When we run out of space in the current block, we check to see if there is
a "freeblock" and use that if it contains enough space for the allocation.

Author: David Rowley, Tomas Vondra
Discussion: 

Re: New Object Access Type hooks

2022-03-22 Thread Julien Rouhaud
Hi,

On Tue, Mar 22, 2022 at 10:41:05AM -0400, Andrew Dunstan wrote:
> 
> Pushed with slight adjustments - the LOAD was unnecessary as was the
> setting of client_min_messages - the latter would have made buildfarm
> animals unhappy.

For the record this just failed on my buildfarm animal:
https://brekka.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwing&dt=2022-03-22%2014%3A40%3A10&stg=misc-check.




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-22 Thread Robert Haas
On Tue, Mar 22, 2022 at 5:00 AM Dilip Kumar  wrote:
> In my previous patch mistakenly I used src_dboid instead of
> dest_dboid.  Fixed in this version.  For destination db I have used
> lock mode as  AccessSharedLock.  Logically if we see access wise we
> don't want anyone else to be accessing that db but that is anyway
> protected because it is not visible to anyone else.  So I think
> AccessSharedLock should be correct here because we are just taking
> this lock because we are accessing pages in shared buffers from this
> database's relations.

Here's my worked-over version of your previous patch. I haven't tried
to incorporate your incremental patch that you just posted.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Add-new-block-by-block-strategy-for-CREATE-DATABA.patch
Description: Binary data


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-22 Thread Robert Haas
On Tue, Mar 22, 2022 at 11:23 AM Robert Haas  wrote:
> Here's my worked-over version of your previous patch. I haven't tried
> to incorporate your incremental patch that you just posted.

Also, please have a look at the XXX comments that I added in a few
places where I think you need to make further changes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Oleg Bartunov
On Tue, Mar 22, 2022 at 5:32 PM Andrew Dunstan  wrote:

>
> On 3/22/22 09:28, Oleg Bartunov wrote:
> >
> >
> > On Tue, Mar 22, 2022 at 12:53 PM Matthias Kurz 
> > wrote:
> >
> > Hi everyone!
> >
> > I am watching this thread since quite a while and I am waiting
> > eagerly a long time already that this feature finally lands in
> > PostgreSQL.
> > Given that in around 2 weeks PostgreSQL 15 will go into feature
> > freeze (in the last years that usually happened around the 8th of
> > April AFAIK), is there any chance this will be committed? As far
> > as I understand the patches are almost ready.
> >
> >
> > We are waiting too :)
>
>
> I'm planning on pushing the functions patch set this week and json-table
> next week.
>
>
Great, Andrew !


>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>
>

-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 22, 2022, at 8:14 AM, Julien Rouhaud  wrote:
> 
> Hi,
> 
> On Tue, Mar 22, 2022 at 10:41:05AM -0400, Andrew Dunstan wrote:
>> 
>> Pushed with slight adjustments - the LOAD was unnecessary as was the
>> setting of client_min_messages - the latter would have made buildfarm
>> animals unhappy.
> 
> For the record this just failed on my buildfarm animal:
> https://brekka.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwing&dt=2022-03-22%2014%3A40%3A10&stg=misc-check.

culicidae is complaining:

==~_~===-=-===~_~== 
pgsql.build/src/test/modules/test_oat_hooks/log/postmaster.log 
==~_~===-=-===~_~==
2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  starting PostgreSQL 
15devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 11.2.0-18) 11.2.0, 
64-bit
2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  listening on Unix 
socket "/tmp/pg_regress-RiE7x8/.s.PGSQL.6280"
2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.202 UTC [2167006][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.203 UTC [2167009][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  checkpointer 
process (PID 2167006) exited with exit code 1
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  terminating any 
other active server processes
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  shutting down 
because restart_after_crash is off
2022-03-22 14:53:27.206 UTC [2166986][postmaster][:0] LOG:  database system is 
shut down
==~_~===-=-===~_~== pgsql.build/src/test/modules/test_rls_hooks/log/initdb.log 
==~_~===-=-===~_~==


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Andrew Dunstan


On 3/22/22 10:53, Alvaro Herrera wrote:
> On 2022-Mar-22, Andrew Dunstan wrote:
>
>> I'm planning on pushing the functions patch set this week and json-table
>> next week.
> I think it'd be a good idea to push the patches one by one and let a day
> between each.  That would make it easier to chase buildfarm issues
> individually, and make sure they are fixed before the next step.
> Each patch in each series is already big enough.



OK, can do it that way. First one will be later today then.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Andrew Dunstan


On 3/22/22 10:55, Daniel Gustafsson wrote:
>> On 22 Mar 2022, at 16:31, Andrew Dunstan  wrote:
>> I'm planning on pushing the functions patch set this week and json-table
>> next week.
> My comments from 30827b3c-edf6-4d41-bbf1-298181874...@yesql.se are yet to be
> addressed (or at all responded to) in this patchset.  I'll paste the ones 
> which
> still apply to make it easier:


Thanks for reminding me. I will make sure these are attended to.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Broken make dependency somewhere near win32ver.o?

2022-03-22 Thread Andres Freund
Hi,

On 2022-03-22 18:09:08 +1300, Thomas Munro wrote:
> On Tue, Mar 22, 2022 at 4:14 PM Andres Freund  wrote:
> > The problem looks to be that pg_dumpall doesn't have a dependency on OBJs,
> > which in turn is what contains the dependency on WIN32RES, which in turn
> > contains win32ver.o. So the build succeeds if pg_dump/restores's 
> > dependencies
> > are built first, but not if pg_dumpall starts to be built before that...
> >
> > Seems we just need to add $(WIN32RES) to pg_dumpall: ?
> 
> Ah, yeah, that looks right.  I don't currently have a mingw setup to
> test, but clearly $(WIN32RES) is passed to $(CC) despite not being
> listed as a dependency.

Pushed a fix for that. Ended up doing it for all branches, although I was
debating with myself about doing so.

I did a quick search and didn't find other cases of this problem.

Greetings,

Andres Freund




Re: refactoring basebackup.c (zstd workers)

2022-03-22 Thread Robert Haas
On Mon, Mar 21, 2022 at 2:41 PM Dagfinn Ilmari Mannsåker
 wrote:
> This is no longer the accurate. How about something like like "Specifies
> details of the chosen compression method"?

Good catch. v5 attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v5-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: [PATCH] pgbench: add multiconnect option

2022-03-22 Thread David Christensen
On Sat, Mar 19, 2022 at 11:43 AM Fabien COELHO  wrote:

>
> Hi Sami,
>
> > Pgbench is a simple benchmark tool by design, and I wonder if adding
> > a multiconnect feature will cause pgbench to be used incorrectly.
>
> Maybe, but I do not see how it would be worse that what pgbench already
> allows.
>

I agree that pgbench is simple; perhaps really too simple when it comes to
being able to measure much more than basic query flows.  What pgbench does
have in its favor is being distributed with the core distribution.

I think there is definitely space for a more complicated benchmarking tool
that exercises more scenarios and more realistic query patterns and
scenarios.  Whether that is distributed with the core is another question.

David


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-22 Thread Andres Freund
Hi,

On 2022-03-22 11:23:16 -0400, Robert Haas wrote:
> From 116bcdb6174a750b7ef7ae05ef6f39cebaf9bcf5 Mon Sep 17 00:00:00 2001
> From: Robert Haas 
> Date: Tue, 22 Mar 2022 11:22:26 -0400
> Subject: [PATCH v1] Add new block-by-block strategy for CREATE DATABASE.

I might have missed it because I just skimmed the patch. But I still think it
should contain a comment detailing why accessing catalogs from another
database is safe in this instance, and perhaps a comment or three in places
that could break it (e.g. snapshot computation, horizon stuff).

Greetings,

Andres Freund




Re: New Object Access Type hooks

2022-03-22 Thread Andrew Dunstan


On 3/22/22 11:26, Mark Dilger wrote:
>
>> On Mar 22, 2022, at 8:14 AM, Julien Rouhaud  wrote:
>>
>> Hi,
>>
>> On Tue, Mar 22, 2022 at 10:41:05AM -0400, Andrew Dunstan wrote:
>>> Pushed with slight adjustments - the LOAD was unnecessary as was the
>>> setting of client_min_messages - the latter would have made buildfarm
>>> animals unhappy.
>> For the record this just failed on my buildfarm animal:
>> https://brekka.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwing&dt=2022-03-22%2014%3A40%3A10&stg=misc-check.
> culicidae is complaining:
>
> ==~_~===-=-===~_~== 
> pgsql.build/src/test/modules/test_oat_hooks/log/postmaster.log 
> ==~_~===-=-===~_~==
> 2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  starting 
> PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 11.2.0-18) 
> 11.2.0, 64-bit
> 2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  listening on Unix 
> socket "/tmp/pg_regress-RiE7x8/.s.PGSQL.6280"
> 2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
> test_oat_hooks must be loaded via shared_preload_libraries
> 2022-03-22 14:53:27.202 UTC [2167006][not initialized][:0] FATAL:  
> test_oat_hooks must be loaded via shared_preload_libraries
> 2022-03-22 14:53:27.203 UTC [2167009][not initialized][:0] FATAL:  
> test_oat_hooks must be loaded via shared_preload_libraries
> 2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  checkpointer 
> process (PID 2167006) exited with exit code 1
> 2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  terminating any 
> other active server processes
> 2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  shutting down 
> because restart_after_crash is off
> 2022-03-22 14:53:27.206 UTC [2166986][postmaster][:0] LOG:  database system 
> is shut down
> ==~_~===-=-===~_~== 
> pgsql.build/src/test/modules/test_rls_hooks/log/initdb.log ==~_~===-=-===~_~==
>
>


That seems quite weird. I'm not sure how it's getting loaded at all if
not via shared_preload_libraries


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-03-22 Thread Robert Haas
On Mon, Mar 21, 2022 at 8:52 PM Andres Freund  wrote:
> I noticed this still has an open CF entry: 
> https://commitfest.postgresql.org/37/3296/
> I assume it can be marked as committed?

Yeah, done now. But don't forget that we still need to do something on
the "wrong fds used for refilenodes after pg_upgrade relfilenode
changes Reply-To:" thread, and I think the ball is in your court
there.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-22 Thread Robert Haas
On Tue, Mar 22, 2022 at 11:42 AM Andres Freund  wrote:
> I might have missed it because I just skimmed the patch. But I still think it
> should contain a comment detailing why accessing catalogs from another
> database is safe in this instance, and perhaps a comment or three in places
> that could break it (e.g. snapshot computation, horizon stuff).

Please see the function header comment for ScanSourceDatabasePgClass.
I don't quite see how changes in those places would break this, but if
you want to be more specific perhaps I will see the light?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Mingw task for Cirrus CI

2022-03-22 Thread Melih Mutlu
Hi Andres,

Rebased it.
I also removed the temp installation task and
used NoDefaultCurrentDirectoryInExePath env variable instead.

Best,
Melih
From e8a1dae0ec10efd8a967070e0d412d2bf875fa93 Mon Sep 17 00:00:00 2001
From: Melih Mutlu 
Date: Mon, 21 Feb 2022 14:46:05 +0300
Subject: [PATCH] Added Windows with MinGW environment in Cirrus CI

---
 .cirrus.yml | 79 +++--
 1 file changed, 65 insertions(+), 14 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index e5335fede7..1ed40347cf 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -23,7 +23,6 @@ env:
   CHECKFLAGS: -Otarget
   PROVE_FLAGS: --timer
   PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
-  TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
   PG_TEST_EXTRA: kerberos ldap ssl
 
 
@@ -334,13 +333,30 @@ task:
 cores_script: src/tools/ci/cores_backtrace.sh macos "${HOME}/cores"
 
 
+WINDOWS_ENVIRONMENT_BASE: &WINDOWS_ENVIRONMENT_BASE
+env:
+  # Half the allowed per-user CPU cores
+  CPUS: 4
+  # The default working dir is in a directory msbuild complains about
+  CIRRUS_WORKING_DIR: "c:/cirrus"
+  TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
+
+  # Avoids port conflicts between concurrent tap test runs
+  PG_TEST_USE_UNIX_SOCKETS: 1
+
+only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
+
+sysinfo_script: |
+  chcp
+  systeminfo
+  powershell -Command get-psdrive -psprovider filesystem
+  set
+
 task:
+  << : *WINDOWS_ENVIRONMENT_BASE
   name: Windows - Server 2019, VS 2019
 
   env:
-# Half the allowed per-user CPU cores
-CPUS: 4
-
 # Our windows infrastructure doesn't have test concurrency above the level
 # of a single vcregress test target. Due to that, it's useful to run prove
 # with multiple jobs. For the other tasks it isn't, because two sources
@@ -350,15 +366,11 @@ task:
 # likely can be improved upon further.
 PROVE_FLAGS: -j10 --timer
 
-# The default cirrus working dir is in a directory msbuild complains about
-CIRRUS_WORKING_DIR: "c:/cirrus"
 # Avoid re-installing over and over
 NO_TEMP_INSTALL: 1
 # git's tar doesn't deal with drive letters, see
 # https://postgr.es/m/b6782dc3-a7b0-ed56-175f-f8f54cb08d67%40dunslane.net
 TAR: "c:/windows/system32/tar.exe"
-# Avoids port conflicts between concurrent tap test runs
-PG_TEST_USE_UNIX_SOCKETS: 1
 PG_REGRESS_SOCK_DIR: "c:/cirrus/"
 # -m enables parallelism
 # verbosity:minimal + Summary reduce verbosity, while keeping a summary of
@@ -389,12 +401,6 @@ task:
 cpu: $CPUS
 memory: 4G
 
-  sysinfo_script: |
-chcp
-systeminfo
-powershell -Command get-psdrive -psprovider filesystem
-set
-
   setup_additional_packages_script: |
 REM choco install -y --no-progress ...
 
@@ -454,6 +460,51 @@ task:
   path: "crashlog-*.txt"
   type: text/plain
 
+task:
+  << : *WINDOWS_ENVIRONMENT_BASE
+  name: Windows - Server 2019, MinGW64
+  windows_container:
+image: $CONTAINER_REPO/windows_ci_mingw64:latest
+cpu: $CPUS
+memory: 4G
+  env:
+CCACHE_DIR: C:/msys64/ccache
+BUILD_DIR: "%CIRRUS_WORKING_DIR%/build"
+
+  ccache_cache:
+folder: ${CCACHE_DIR}
+
+  mingw_info_script:
+- C:\msys64\usr\bin\bash.exe -lc "where gcc"
+- C:\msys64\usr\bin\bash.exe -lc "gcc --version"
+- C:\msys64\usr\bin\bash.exe -lc "where perl"
+- C:\msys64\usr\bin\bash.exe -lc "perl --version"
+
+  configure_script:
+- C:\msys64\usr\bin\bash.exe -lc "mkdir %BUILD_DIR% &&
+  cd %BUILD_DIR% &&
+  %CIRRUS_WORKING_DIR%/configure
+--host=x86_64-w64-mingw32
+--enable-cassert
+--enable-tap-tests
+--with-icu
+--with-libxml
+--with-libxslt
+--with-lz4
+--enable-debug
+CC='ccache gcc'
+CXX='ccache g++'"
+
+  build_script:
+C:\msys64\usr\bin\bash.exe -lc "cd %BUILD_DIR% && make -s world-bin -j${CPUS}"
+
+  upload_caches: ccache
+
+  tests_script:
+  - set "NoDefaultCurrentDirectoryInExePath=0"
+  - C:\msys64\usr\bin\bash.exe -lc "cd %BUILD_DIR% && make -s ${CHECK} ${CHECKFLAGS} -j${CPUS} TMPDIR=%BUILD_DIR%/tmp_install"
+
+  on_failure: *on_failure
 
 task:
   name: CompilerWarnings
-- 
2.35.1.windows.2



Re: New Object Access Type hooks

2022-03-22 Thread Tom Lane
Andrew Dunstan  writes:
> That seems quite weird. I'm not sure how it's getting loaded at all if
> not via shared_preload_libraries

Some other animals are showing this:

diff -U3 
/home/postgres/pgsql/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
 /home/postgres/pgsql/src/test/modules/test_oat_hooks/results/test_oat_hooks.out
--- 
/home/postgres/pgsql/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
2022-03-22 11:57:40.224991011 -0400
+++ 
/home/postgres/pgsql/src/test/modules/test_oat_hooks/results/test_oat_hooks.out 
2022-03-22 11:59:59.998983366 -0400
@@ -48,6 +48,8 @@
 SELECT * FROM regress_test_table;
 NOTICE:  in executor check perms: superuser attempting execute
 NOTICE:  in executor check perms: superuser finished execute
+NOTICE:  in executor check perms: superuser attempting execute
+NOTICE:  in executor check perms: superuser finished execute
  t 
 ---
 (0 rows)
@@ -95,6 +97,8 @@
   ^
 NOTICE:  in executor check perms: non-superuser attempting execute
 NOTICE:  in executor check perms: non-superuser finished execute
+NOTICE:  in executor check perms: non-superuser attempting execute
+NOTICE:  in executor check perms: non-superuser finished execute
  t 
 ---
 (0 rows)
@@ -168,6 +172,8 @@
   ^
 NOTICE:  in executor check perms: superuser attempting execute
 NOTICE:  in executor check perms: superuser finished execute
+NOTICE:  in executor check perms: superuser attempting execute
+NOTICE:  in executor check perms: superuser finished execute
  t 
 ---
 (0 rows)


I can duplicate that by adding "force_parallel_mode = regress"
to test_oat_hooks.conf, so a fair bet is that the duplication
comes from executing the same hook in both leader and worker.

regards, tom lane




Re: New Object Access Type hooks

2022-03-22 Thread Andrew Dunstan


On 3/22/22 12:02, Tom Lane wrote:
> Andrew Dunstan  writes:
>> That seems quite weird. I'm not sure how it's getting loaded at all if
>> not via shared_preload_libraries
> Some other animals are showing this:
>
> diff -U3 
> /home/postgres/pgsql/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
>  
> /home/postgres/pgsql/src/test/modules/test_oat_hooks/results/test_oat_hooks.out
> --- 
> /home/postgres/pgsql/src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
>   2022-03-22 11:57:40.224991011 -0400
> +++ 
> /home/postgres/pgsql/src/test/modules/test_oat_hooks/results/test_oat_hooks.out
>2022-03-22 11:59:59.998983366 -0400
> @@ -48,6 +48,8 @@
>  SELECT * FROM regress_test_table;
>  NOTICE:  in executor check perms: superuser attempting execute
>  NOTICE:  in executor check perms: superuser finished execute
> +NOTICE:  in executor check perms: superuser attempting execute
> +NOTICE:  in executor check perms: superuser finished execute
>   t 
>  ---
>  (0 rows)
> @@ -95,6 +97,8 @@
>^
>  NOTICE:  in executor check perms: non-superuser attempting execute
>  NOTICE:  in executor check perms: non-superuser finished execute
> +NOTICE:  in executor check perms: non-superuser attempting execute
> +NOTICE:  in executor check perms: non-superuser finished execute
>   t 
>  ---
>  (0 rows)
> @@ -168,6 +172,8 @@
>^
>  NOTICE:  in executor check perms: superuser attempting execute
>  NOTICE:  in executor check perms: superuser finished execute
> +NOTICE:  in executor check perms: superuser attempting execute
> +NOTICE:  in executor check perms: superuser finished execute
>   t 
>  ---
>  (0 rows)
>
>
> I can duplicate that by adding "force_parallel_mode = regress"
> to test_oat_hooks.conf, so a fair bet is that the duplication
> comes from executing the same hook in both leader and worker.
>
>   



OK, thanks. My test didn't include that one setting :-(


If I can't com up with a very quick fix I'll revert it.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 22, 2022, at 9:09 AM, Andrew Dunstan  wrote:
> 
> If I can't com up with a very quick fix I'll revert it.

The problem is coming from the REGRESS_exec_check_perms, which was included in 
the patch to demonstrate when the other hooks fired relative to the 
ExecutorCheckPerms_hook, but since it is causing problems, I can submit a patch 
with that removed.  Give me a couple minutes


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: LogwrtResult contended spinlock

2022-03-22 Thread Alvaro Herrera
On 2022-Mar-22, Tom Lane wrote:

> I looked briefly at 0001, and I've got to say that I disagree with
> your decision to rearrange the representation of the local LogwrtResult
> copy.  It clutters the patch tremendously and makes it hard to
> understand what the actual functional change is.  Moreover, I'm
> not entirely convinced that it's a notational improvement in the
> first place.
> 
> Perhaps it'd help if you split 0001 into two steps, one to do the
> mechanical change of the representation and then a second patch that
> converts the shared variable to atomics.  Since you've moved around
> the places that read the shared variable, that part is subtler than
> one could wish and really needs to be studied on its own.

Hmm, I did it the other way around: first change to use atomics, then
the mechanical change.  I think that makes the usefulness of the change
more visible, because before the atomics use the use of the combined
struct as a unit remains sensible.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
>From dd9b53878faeedba921ea7027e98ddbb433e8647 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 2 Feb 2021 14:03:43 -0300
Subject: [PATCH v7 1/3] Change XLogCtl->LogwrtResult to use atomic ops
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Currently, access to LogwrtResult is protected by a spinlock.  This
becomes severely contended in some scenarios, such as with a largish
replication flock: walsenders all calling GetFlushRecPtr repeatedly
cause the processor heat up to the point where eggs can be fried on top.

This can be reduced to a non-problem by replacing XLogCtl->LogwrtResult
with a struct containing a pair of atomically accessed variables.

In addition, this commit splits the process-local copy of these values
(kept in the freestanding LogwrtResult struct) into two separate
LogWriteResult and LogFlushResult.  This is not strictly necessary, but
it makes it clearer that these are updated separately, each on their own
schedule.

Author: Álvaro Herrera 
Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql
---
 src/backend/access/transam/xlog.c | 85 +++
 src/include/port/atomics.h| 29 +++
 src/tools/pgindent/typedefs.list  |  1 +
 3 files changed, 72 insertions(+), 43 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4ac3871c74..6f2eb494fe 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -285,16 +285,13 @@ static bool doPageWrites;
  *
  * LogwrtRqst indicates a byte position that we need to write and/or fsync
  * the log up to (all records before that point must be written or fsynced).
- * LogwrtResult indicates the byte positions we have already written/fsynced.
- * These structs are identical but are declared separately to indicate their
- * slightly different functions.
+ * LogWrtResult indicates the byte positions we have already written/fsynced.
+ * These structs are similar but are declared separately to indicate their
+ * slightly different functions; in addition, the latter is read and written
+ * using atomic operations.
  *
- * To read XLogCtl->LogwrtResult, you must hold either info_lck or
- * WALWriteLock.  To update it, you need to hold both locks.  The point of
- * this arrangement is that the value can be examined by code that already
- * holds WALWriteLock without needing to grab info_lck as well.  In addition
- * to the shared variable, each backend has a private copy of LogwrtResult,
- * which is updated when convenient.
+ * In addition to the shared variable, each backend has a private copy of
+ * LogwrtResult, each member of which is separately updated when convenient.
  *
  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
  * (protected by info_lck), but we don't need to cache any copies of it.
@@ -317,6 +314,12 @@ static bool doPageWrites;
  *--
  */
 
+typedef struct XLogwrtAtomic
+{
+	pg_atomic_uint64 Write;		/* last byte + 1 written out */
+	pg_atomic_uint64 Flush;		/* last byte + 1 flushed */
+} XLogwrtAtomic;
+
 typedef struct XLogwrtRqst
 {
 	XLogRecPtr	Write;			/* last byte + 1 to write out */
@@ -480,6 +483,7 @@ typedef struct XLogCtlData
 {
 	XLogCtlInsert Insert;
 
+	XLogwrtAtomic LogwrtResult; /* uses atomics */
 	/* Protected by info_lck: */
 	XLogwrtRqst LogwrtRqst;
 	XLogRecPtr	RedoRecPtr;		/* a recent copy of Insert->RedoRecPtr */
@@ -497,12 +501,6 @@ typedef struct XLogCtlData
 	pg_time_t	lastSegSwitchTime;
 	XLogRecPtr	lastSegSwitchLSN;
 
-	/*
-	 * Protected by info_lck and WALWriteLock (you must hold either lock to
-	 * read it, but both to update)
-	 */
-	XLogwrtResult LogwrtResult;
-
 	/*
 	 * Latest initialized page in the cache (last byte position + 1).
 	 *
@@ -622,7 +620,7

Re: Partial aggregates pushdown

2022-03-22 Thread Alexander Pyhalov

Tomas Vondra писал 2022-03-22 15:28:

On 3/22/22 01:49, Andres Freund wrote:

On 2022-01-17 15:27:53 +0300, Alexander Pyhalov wrote:

Alexander Pyhalov писал 2022-01-17 15:26:

Updated patch.


Sorry, missed attachment.


Needs another update: http://cfbot.cputube.org/patch_37_3369.log

Marked as waiting on author.



TBH I'm still not convinced this is the right approach. I've voiced 
this

opinion before, but to reiterate the main arguments:

1) It's not clear to me how could this get extended to aggregates with
more complex aggregate states, to support e.g. avg() and similar fairly
common aggregates.


Hi.
Yes, I'm also not sure how to proceed with aggregates with complex 
state.
Likely it needs separate function to export their state, but then we 
should
somehow ensure that this function exists and our 'importer' can handle 
its result.
Note that for now we have no mechanics in postgres_fdw to find out 
remote server version

on planning stage.


2) I'm not sure relying on aggpartialpushdownsafe without any version
checks etc. is sufficient. I mean, how would we know the remote node 
has

the same idea of representing the aggregate state. I wonder how this
aligns with assumptions we do e.g. for functions etc.


It seems to be not a problem for me, as for now we don't care about 
remote node internal aggregate state representation.

We currently get just aggregate result from remote node. For aggregates
with 'internal' stype we call converter locally, and it converts 
external result from

aggregate return type to local node internal representation.



Aside from that, there's a couple review comments:

1) should not remove the comment in foreign_expr_walker


Fixed.



2) comment in deparseAggref is obsolete/inaccurate


Fixed.



3) comment for partial_agg_ok should probably explain when we consider
aggregate OK to be pushed down


Expanded comment.


4) I'm not sure why get_rcvd_attinmeta comment talks about "return type
bytea" and "real input type".


Expanded comment.  Tupdesc can be retrieved from 
node->ss.ss_ScanTupleSlot,
and so we expect to see bytea (as should be produced by partial 
aggregation).

But when we scan data, we get aggregate
output type (which matches converter input type), so attinmeta should
be fixed.
If we deal with aggregate which doesn't have converter, partial_agg_ok()
ensures that agg->aggfnoid return type matches agg->aggtranstype.



5) Talking about "partial" aggregates is a bit confusing, because that
suggests this is related to actual "partial aggregates". But it's not.


How should we call them? It's about pushing "Partial count()" or  
"Partial sum()" to the remote server,
why it's not related to partial aggregates? Do you mean that it's not 
about parallel aggregate processing?



6) Can add_foreign_grouping_paths do without the new 'partial'
parameter? Clearly, it can be deduced from extra->patype, no?


Fixed this.



7) There's no docs for PARTIALCONVERTERFUNC / PARTIAL_PUSHDOWN_SAFE in
CREATE AGGREGATE sgml docs.


Added documentation. I'd appreciate advice on how it should be extended.



8) I don't think "serialize" in the converter functions is the right
term, considering those functions are not "serializing" anything. If
anything, it's the remote node that is serializing the agg state and 
the

local not is deserializing it. Or maybe I just misunderstand where are
the converter functions executed?


Converter function transforms aggregate result to serialized internal 
representation,

which is expected from partial aggregate. I mean, it converts aggregate
result type to internal representation and then efficiently executes
serialization code (i.e. converter(x) == serialize(to_internal(x))).

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 7ad4eacf017a4fd3793f0949ef43ccc8292bf3f6 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  63 +-
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 187 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 doc/src/sgml/ref/create_aggregate.sgml|  27 +++
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 ++-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  20 +-
 src/include/catalog/pg_aggregate.dat  | 110 ++-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 13 files changed, 702 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..272e2391d7f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -197,6 +197,7 @@ s

Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger


> On Mar 22, 2022, at 9:11 AM, Mark Dilger  wrote:
> 
> Give me a couple minutes



v1-0001-Fix-build-farm-failures-in-test_oat_hooks.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-22 Thread Tom Lane
Mark Dilger  writes:
> The problem is coming from the REGRESS_exec_check_perms, which was included 
> in the patch to demonstrate when the other hooks fired relative to the 
> ExecutorCheckPerms_hook, but since it is causing problems, I can submit a 
> patch with that removed.  Give me a couple minutes

Maybe better to suppress the audit messages if in a parallel worker?

regards, tom lane




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-22 Thread Bharath Rupireddy
On Sat, Mar 19, 2022 at 5:18 AM Andres Freund  wrote:
>
> Hi,
>
> First look at this patch, so I might be repeating stuff already commented on /
> discussed.

Thanks for taking a look at the patch.

> On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote:
> > +--
> > +-- pg_get_raw_wal_record()
>
> What is raw about the function?

It right now gives data starting from the output of XLogReadRecord
upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord
returns a pointer to the decoded record's header, I'm not sure it's
the right choice. Actually, this function's intention(not an immediate
use-case though), is to feed the WAL record to another function and
then, say, repair a corrupted page given a base data page.

As I said upthread, I'm open to removing this function for now, when a
realistic need comes we can add it back. It also raised some concerns
around the security and permissions. Thoughts?

> Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a
> NULL lsn, does it?  Also, that's the default, why is it restated here?

pg_get_wal_records_info needed that option (if end_lsn being the
default, providing WAL info upto the end of WAL). Also, we can emit
better error message ("invalid WAL start LSN") instead of generic one.
I wanted to keep error message and code same across all the functions
hence CALLED ON NULL INPUT option for pg_get_raw_wal_record.

> > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC;
> > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor;
>
> I don't think it's appropriate for pg_monitor to see all the data in the WAL.

How about pg_read_server_files or some other?

> > +-- pg_get_wal_stats()
>
> This seems like an exceedingly expensive way to compute this. Not just because
> of doing the grouping, window etc, but also because it's serializing the
> "data" field from pg_get_wal_records_info() just to never use it. With any
> appreciatable amount of data the return value pg_get_wal_records_info() will
> be serialized into a on-disk tuplestore.
>
> This is probably close to an order of magnitude slower than pg_waldump
> --stats. Which imo renders this largely useless.

Yeah that's true. Do you suggest having pg_get_wal_stats() a
c-function like in v8 patch [1]?

SEe some numbers at [2] with pg_get_wal_stats using
pg_get_wal_records_info and pg_get_wal_records_info as a direct
c-function like in v8 patch [1]. A direct c-function always fares
better (84 msec vs 1400msec).

> The column names don't seem great either. "tfpil"?

"total fpi length" - tfpil wanted to keep it short - it's just an
internal column name isn't it? The actual column name the user sees is
fpi_length.

> > +void
> > +_PG_init(void)
>
> > +void
> > +_PG_fini(void)
>
> Why have this stuff if it's not used?

I kept it as a placeholder for future code additions, for instance
test_decoding.c and ssl_passphrase_func.c has empty _PG_init(),
_PG_fini(). If okay, I can mention there like "placeholder for now",
otherwise I can remove it.

> > +static XLogRecPtr
> > +ValidateInputLSN(XLogRecPtr lsn)
>
> > +static XLogRecPtr
> > +ValidateStartAndEndLSNs(XLogRecPtr start_lsn, XLogRecPtr end_lsn)
> > +{
>
> These two functions are largely redundant, that doesn't seem great.

I will modify it in the next version.

> > +Datum
> > +pg_get_raw_wal_record(PG_FUNCTION_ARGS)
>
> Most of this has another copy in pg_get_wal_record_info(). Can more of this be
> deduplicated?

I will do, if we decide on whether or not to have the
pg_get_raw_wal_record function at all? Please see my comments above.

> > + initStringInfo(&temp);
> > + desc->rm_desc(&temp, record);
> > + appendStringInfo(&rec_desc, "%s", temp.data);
> > + pfree(temp.data);
> > + initStringInfo(&rec_blk_ref);
>
> This seems unnecessarily wasteful. You serialize into one stringinfo, just to
> then copy that stringinfo into another stringinfo. Just to then allocate yet
> another stringinfo.

Yeah, I will remove it. Looks like all the rm_desc callbacks append to
the passed-in buffer and not reset it, so we should be good.

> > + /* Block references (detailed format). */
>
> This comment seems copied from pg_waldump, but doesn't make sense here,
> because there's no short format.

Yes, I will remove it.

> > + for (block_id = 0; block_id <= record->max_block_id; block_id++)
> > + {
>
> To me duplicating this much code from waldump seems like a bad idea from a
> maintainability POV.

Even if we were to put the above code from pg_walinspect and
pg_waldump into, say, walutils.c or some other existing file, there we
had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
printf() sort of thing, isn't it clumsy? Also, unnecessary if
conditions need to be executed for every record. For maintainability,
I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
things in both places (of course this might sound dumbest way of doing
it, IMHO, it's sensib

Re: New Object Access Type hooks

2022-03-22 Thread Tom Lane
Andrew Dunstan  writes:
> On 3/22/22 11:26, Mark Dilger wrote:
>> culicidae is complaining:
>> 2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
>> test_oat_hooks must be loaded via shared_preload_libraries

> That seems quite weird. I'm not sure how it's getting loaded at all if
> not via shared_preload_libraries

After checking culicidae's config, I've duplicated this failure
by building with EXEC_BACKEND defined.  So I'd opine that there
is something broken about the method test_oat_hooks uses to
decide if it was loaded via shared_preload_libraries or not.
(Note that the failures appear to be coming out of auxiliary
processes such as the checkpointer.)

As a quick-n-dirty fix to avoid reverting the entire test module,
perhaps just delete this error check for now.

regards, tom lane




Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger


> On Mar 22, 2022, at 9:33 AM, Tom Lane  wrote:
> 
> Andrew Dunstan  writes:
>> On 3/22/22 11:26, Mark Dilger wrote:
>>> culicidae is complaining:
>>> 2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
>>> test_oat_hooks must be loaded via shared_preload_libraries
> 
>> That seems quite weird. I'm not sure how it's getting loaded at all if
>> not via shared_preload_libraries
> 
> After checking culicidae's config, I've duplicated this failure
> by building with EXEC_BACKEND defined.  So I'd opine that there
> is something broken about the method test_oat_hooks uses to
> decide if it was loaded via shared_preload_libraries or not.
> (Note that the failures appear to be coming out of auxiliary
> processes such as the checkpointer.)
> 
> As a quick-n-dirty fix to avoid reverting the entire test module,
> perhaps just delete this error check for now.

Ok, done as you suggest:



v2-0001-Fix-buildfarm-test-failures-in-test_oat_hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-22 Thread Tom Lane
Mark Dilger  writes:
> On Mar 22, 2022, at 9:33 AM, Tom Lane  wrote:
>> As a quick-n-dirty fix to avoid reverting the entire test module,
>> perhaps just delete this error check for now.

> Ok, done as you suggest:

I only suggested removing the error check in _PG_init, not
changing the way the test works.

regards, tom lane




Re: New Object Access Type hooks

2022-03-22 Thread Andrew Dunstan


On 3/22/22 12:58, Tom Lane wrote:
> Mark Dilger  writes:
>> On Mar 22, 2022, at 9:33 AM, Tom Lane  wrote:
>>> As a quick-n-dirty fix to avoid reverting the entire test module,
>>> perhaps just delete this error check for now.
>> Ok, done as you suggest:
> I only suggested removing the error check in _PG_init, not
> changing the way the test works.
>
>   



Mark and I discussed this offline, and decided there was no requirement
for the module to be preloaded. Do you have a different opinion?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 22, 2022, at 9:58 AM, Tom Lane  wrote:
> 
>> Ok, done as you suggest:
> 
> I only suggested removing the error check in _PG_init, not
> changing the way the test works.

I should have been more explicit and said, "done as y'all suggest".  The "To" 
line of that email was to you and Andrew.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New Object Access Type hooks

2022-03-22 Thread Tom Lane
Andrew Dunstan  writes:
> On 3/22/22 12:58, Tom Lane wrote:
>> I only suggested removing the error check in _PG_init, not
>> changing the way the test works.

> Mark and I discussed this offline, and decided there was no requirement
> for the module to be preloaded. Do you have a different opinion?

No, I was actually about to make the same point: it seems to me there
are arguable use-cases for loading it shared, loading it per-session
(perhaps via ALTER USER SET or ALTER DATABASE SET to target particular
users/DBs), or even manually LOADing it.  So the module code should
not be prejudging how it's used.

On reflection, I withdraw my complaint about changing the way the
test script loads the module.  Getting rid of the need for a custom
.conf file simplifies the test module, and that seems good.
So I'm on board with Mark's patch now.

regards, tom lane




LockAcquireExtended() dontWait vs weaker lock levels than already held

2022-03-22 Thread Andres Freund
Hi,

When LockAcquireExtended(dontWait=false) acquires a lock where we already hold
stronger lock and somebody else is also waiting for that lock, it goes through
a fairly circuitous path to acquire the lock:

A conflicting lock is detected: if (lockMethodTable->conflictTab[lockmode] & 
lock->waitMask)
LockAcquireExtended() -> WaitOnLock() -> ProcSleep()
ProcSleep() follows this special path:
 * Special case: if I find I should go in front of some waiter, check to
 * see if I conflict with already-held locks or the requests before that
 * waiter.  If not, then just grant myself the requested lock 
immediately.
and grants the lock.


However, in dontWait mode, there's no such path. Which means that
LockAcquireExtended() returns LOCKACQUIRE_NOT_AVAIL despite the fact that 
dontWait=false
would succeed in granting us the lock.

This seems decidedly suboptimal.

For one, the code flow to acquire a lock we already hold in some form is
unnecessarily hard to understand and expensive. There's no comment in
LockAcquireExtended() explaining that WaitOnLock() might immediately grant us
the lock in that case, we emit bogus TRACE_POSTGRESQL_LOCK_WAIT_START() etc.

For another, LockAcquireExtended(dontWait=true) returning spuriously is quite
confusing behaviour, and quite plausibly could cause bugs in fairly random
places.


Not planning to do anything about this for now, but I did want something I can
find if I hit such a problem in the future...


Greetings,

Andres Freund




Re: New Object Access Type hooks

2022-03-22 Thread Andrew Dunstan


On 3/22/22 13:08, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 3/22/22 12:58, Tom Lane wrote:
>>> I only suggested removing the error check in _PG_init, not
>>> changing the way the test works.
>> Mark and I discussed this offline, and decided there was no requirement
>> for the module to be preloaded. Do you have a different opinion?
> No, I was actually about to make the same point: it seems to me there
> are arguable use-cases for loading it shared, loading it per-session
> (perhaps via ALTER USER SET or ALTER DATABASE SET to target particular
> users/DBs), or even manually LOADing it.  So the module code should
> not be prejudging how it's used.
>
> On reflection, I withdraw my complaint about changing the way the
> test script loads the module.  Getting rid of the need for a custom
> .conf file simplifies the test module, and that seems good.
> So I'm on board with Mark's patch now.
>
>   



OK, I have pushed that.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-22 Thread Andres Freund
Hi,

On 2022-03-22 21:57:51 +0530, Bharath Rupireddy wrote:
> On Sat, Mar 19, 2022 at 5:18 AM Andres Freund  wrote:
> > On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote:
> > > +--
> > > +-- pg_get_raw_wal_record()
> >
> > What is raw about the function?
> 
> It right now gives data starting from the output of XLogReadRecord
> upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord
> returns a pointer to the decoded record's header, I'm not sure it's
> the right choice. Actually, this function's intention(not an immediate
> use-case though), is to feed the WAL record to another function and
> then, say, repair a corrupted page given a base data page.
> 
> As I said upthread, I'm open to removing this function for now, when a
> realistic need comes we can add it back. It also raised some concerns
> around the security and permissions. Thoughts?

I'm ok with having it with appropriate permissions, I just don't like the
name.


> > Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with 
> > a
> > NULL lsn, does it?  Also, that's the default, why is it restated here?
> 
> pg_get_wal_records_info needed that option (if end_lsn being the
> default, providing WAL info upto the end of WAL). Also, we can emit
> better error message ("invalid WAL start LSN") instead of generic one.
> I wanted to keep error message and code same across all the functions
> hence CALLED ON NULL INPUT option for pg_get_raw_wal_record.

I think it should be strict if it behaves strict. I fail to see what
consistency in error messages is worth here. And I'd probably just create two
different functions for begin and begin & end LSN and mark those as strict as
well.


> > > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC;
> > > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor;
> >
> > I don't think it's appropriate for pg_monitor to see all the data in the 
> > WAL.
> 
> How about pg_read_server_files or some other?

That seems more appropriate.


> > > +-- pg_get_wal_stats()
> >
> > This seems like an exceedingly expensive way to compute this. Not just 
> > because
> > of doing the grouping, window etc, but also because it's serializing the
> > "data" field from pg_get_wal_records_info() just to never use it. With any
> > appreciatable amount of data the return value pg_get_wal_records_info() will
> > be serialized into a on-disk tuplestore.
> >
> > This is probably close to an order of magnitude slower than pg_waldump
> > --stats. Which imo renders this largely useless.
> 
> Yeah that's true. Do you suggest having pg_get_wal_stats() a
> c-function like in v8 patch [1]?

Yes.


> SEe some numbers at [2] with pg_get_wal_stats using
> pg_get_wal_records_info and pg_get_wal_records_info as a direct
> c-function like in v8 patch [1]. A direct c-function always fares
> better (84 msec vs 1400msec).

That indeed makes the view as is pretty much useless. And it'd probably be
worse in a workload with longer records / many FPIs.


> > > +void
> > > +_PG_init(void)
> >
> > > +void
> > > +_PG_fini(void)
> >
> > Why have this stuff if it's not used?
> 
> I kept it as a placeholder for future code additions, for instance
> test_decoding.c and ssl_passphrase_func.c has empty _PG_init(),
> _PG_fini(). If okay, I can mention there like "placeholder for now",
> otherwise I can remove it.

That's not comparable, the test_decoding case has it as a placeholder because
it serves as a template to create further output plugins. Something not the
case here. So please remove.


> > > + for (block_id = 0; block_id <= record->max_block_id; block_id++)
> > > + {
> >
> > To me duplicating this much code from waldump seems like a bad idea from a
> > maintainability POV.
> 
> Even if we were to put the above code from pg_walinspect and
> pg_waldump into, say, walutils.c or some other existing file, there we
> had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
> printf() sort of thing, isn't it clumsy?

Why is that needed? Just use the stringinfo in both places? You're outputting
the exact same thing in both places right now. There's already a stringinfo in
XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was
written), so you could just convert at least the relevant printfs in
XLogDumpDisplayRecord().


> Also, unnecessary if
> conditions need to be executed for every record. For maintainability,
> I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
> things in both places (of course this might sound dumbest way of doing
> it, IMHO, it's sensible, given the if(pg_walinspect)-else
> if(pg_waldump) sorts of code that we need to do in the common
> functions). Thoughts?

IMO we shouldn't merge this with as much duplication as there is right now,
the notes don't change that it's a PITA to maintain.


> Yeah. It is to handle some edge cases to print the WAL  upto end_lsn
> and avoid waiting in read_local_xlog_page. I will chan

Re: New Object Access Type hooks

2022-03-22 Thread Tom Lane
Andrew Dunstan  writes:
> OK, I have pushed that.

It seems like you could remove the NO_INSTALLCHECK restriction
too.  You already removed the comment defending it, and it
seems to work fine as an installcheck now if I remove that
locally.

Other nitpicks:

* the IsParallelWorker test could use a comment

* I notice a typo "permisisons" in test_oat_hooks.sql

regards, tom lane




Re: pg14 psql broke \d datname.nspname.relname

2022-03-22 Thread Robert Haas
On Mon, Mar 21, 2022 at 9:32 PM Mark Dilger
 wrote:
> [ new patch version ]

This patch adds three new arguments to processSQLNamePattern() and
documents one of them. It adds three new parameters to
patternToSQLRegex() as well, and documents none of them. I think that
the text of the comment might need some updating too, in particular
the sentence "Additional dots in the name portion are not treated as
special."

There are no comments explaining the left_is_literal stuff. It appears
that your intention here is that if the pattern string supplied by the
user contains any of *?|+()[]{}.^\ not surrounded by double-quotes, we
signal the caller. Some callers then use this to issue a complaint
that the database name must be a literal. To me, this behavior doesn't
really make sense. If something is a literal, that means we're not
going to interpret the special characters that it contains. Here, we
are interpreting the special characters just so we can complain that
they exist. It seems to me that a simpler solution would be to not
interpret them at all. I attach a patch showing what I mean by that.
It just rips out the dbname_is_literal stuff in favor of doing nothing
at all. To put the whole thing another way, if the user types "\d
}.public.ft", your code wants to complain about the fact that the user
is trying to use regular expression characters in a place where they
are not allowed to do that. I argue that we should instead just be
comparing "}" against the database name and see whether it happens to
match.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0002-Remove-dbname_is_literal-stuff.patch
Description: Binary data


0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch
Description: Binary data


Re: [PATCH] Proof of concept for GUC improvements

2022-03-22 Thread David Christensen


> On Mar 21, 2022, at 7:53 PM, Tom Lane  wrote:
> 
> Andres Freund  writes:
>> My impression is that there's not a lot of enthusiasm for the concept? If
>> that's true we maybe ought to mark the CF entry as rejected?
> 
> Yeah, I'm kind of leaning that way too.  I don't see how we can
> incorporate the symbolic values into any existing display paths
> without breaking applications that expect the old output.
> That being the case, it seems like we'd have "two ways to do it"
> indefinitely, which would add enough confusion that I'm not
> sure there's a net gain.  In particular, I foresee novice questions
> along the lines of "I set foo to disabled, why is it showing
> as zero?"

Yeah, my main motivation here was about trying to have less special values in 
the config files, but I guess it would effectively be making everything 
effectively an enum and still relies on knowing just what the specific magic 
values are, no not really a net gain in this department. 

For the record, I thought this would have a fairly low chance of getting in, 
was mainly curious what level of effort it would take to get something like 
this going and get some feedback on the actual idea. 

> If we'd done it like this from the beginning, it'd have been
> great, but retrofitting it now is a lot less appealing.

Yeah, agreed on this. As far as I’m concerned we can reject. 

Thanks,

David



Re: LockAcquireExtended() dontWait vs weaker lock levels than already held

2022-03-22 Thread Robert Haas
On Tue, Mar 22, 2022 at 1:43 PM Andres Freund  wrote:
> When LockAcquireExtended(dontWait=false) acquires a lock where we already hold
> stronger lock and somebody else is also waiting for that lock, it goes through
> a fairly circuitous path to acquire the lock:
>
> A conflicting lock is detected: if (lockMethodTable->conflictTab[lockmode] & 
> lock->waitMask)
> LockAcquireExtended() -> WaitOnLock() -> ProcSleep()
> ProcSleep() follows this special path:
>  * Special case: if I find I should go in front of some waiter, check 
> to
>  * see if I conflict with already-held locks or the requests before 
> that
>  * waiter.  If not, then just grant myself the requested lock 
> immediately.
> and grants the lock.

I think this happens because lock.c is trying to imagine a world in
which we don't know anything a priori about which locks are stronger
or weaker than others and everything is deduced from the conflict
matrix. I think at some point in time someone believed that we might
use different conflict matrixes for different lock types. With an
arbitrary conflict matrix, "stronger" and "weaker" aren't even
necessarily well-defined ideas: A could conflict with B, B with C, and
C with A, or something crazy like that. It seems rather unlikely to me
that we'd ever do such a thing at this point. In fact, there are a lot
of things in lock.c that we'd probably do differently if we were doing
that work over.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: LogwrtResult contended spinlock

2022-03-22 Thread Alvaro Herrera
So I've been wondering about this block at the bottom of XLogWrite:

/*
 * Make sure that the shared 'request' values do not fall behind the
 * 'result' values.  This is not absolutely essential, but it saves some
 * code in a couple of places.
 */
{
SpinLockAcquire(&XLogCtl->info_lck);
if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
SpinLockRelease(&XLogCtl->info_lck);
}

I just noticed that my 0001 makes the comment a lie: it is now quite
possible that 'result' is advanced beyond 'request'.  Before the patch
that never happened because they were both advanced in the region locked
by the spinlock.

I think we could still maintain this promise if we just moved this
entire block before the first pg_atomic_monotonic_advance_u64 setting
XLogCtl->LogwrtResult.Write.  Or we could halve the whole block, and put
one acquire/test/set/release stanza before each monotonic increase of
the corresponding variable.


However, I wonder if this is still necessary.  This code was added in
4d14fe0048c (March 2001) and while everything else was quite different
back then, this hasn't changed at all.  I can't quite figure out what
are those "couple of places" that would need additional code if this
block is just removed.  I tried running the tests (including
wal_consistency_checking), and nothing breaks.  Reading the code
surrounding the other accesses of XLogCtl->LogwrtRqst, there's nothing
that looks to me like it depends on these values not lagging behind
LogwrtResult.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."   (New York Times, about Microsoft PowerPoint)




Re: LockAcquireExtended() dontWait vs weaker lock levels than already held

2022-03-22 Thread Andres Freund
Hi,

On 2022-03-22 14:20:55 -0400, Robert Haas wrote:
> On Tue, Mar 22, 2022 at 1:43 PM Andres Freund  wrote:
> > When LockAcquireExtended(dontWait=false) acquires a lock where we already 
> > hold
> > stronger lock and somebody else is also waiting for that lock, it goes 
> > through
> > a fairly circuitous path to acquire the lock:
> >
> > A conflicting lock is detected: if (lockMethodTable->conflictTab[lockmode] 
> > & lock->waitMask)
> > LockAcquireExtended() -> WaitOnLock() -> ProcSleep()
> > ProcSleep() follows this special path:
> >  * Special case: if I find I should go in front of some waiter, 
> > check to
> >  * see if I conflict with already-held locks or the requests before 
> > that
> >  * waiter.  If not, then just grant myself the requested lock 
> > immediately.
> > and grants the lock.
>
> I think this happens because lock.c is trying to imagine a world in
> which we don't know anything a priori about which locks are stronger
> or weaker than others and everything is deduced from the conflict
> matrix. I think at some point in time someone believed that we might
> use different conflict matrixes for different lock types. With an
> arbitrary conflict matrix, "stronger" and "weaker" aren't even
> necessarily well-defined ideas: A could conflict with B, B with C, and
> C with A, or something crazy like that. It seems rather unlikely to me
> that we'd ever do such a thing at this point. In fact, there are a lot
> of things in lock.c that we'd probably do differently if we were doing
> that work over.

We clearly already know how to compute whether a lock is "included" in
something we already hold - after all ProcSleep() successfully does so.

Isn't it a pretty trivial test? Seems like it'd boil down to something like

acquireMask = lockMethodTable->conflictTab[lockmode];
if ((MyProc->heldLocks & acquireMask) == acquireMask)
/* already hold lock conflicting with it, grant the new lock to myself) */
else
/* current behaviour */

LockCheckConflicts() mostly knows how to deal with this. It's just that we don't
even use LockCheckConflicts() if a lock acquisition conflicts with waitMask:

/*
 * If lock requested conflicts with locks requested by waiters, must join
 * wait queue.  Otherwise, check for conflict with already-held locks.
 * (That's last because most complex check.)
 */
if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
found_conflict = true;
else
found_conflict = LockCheckConflicts(lockMethodTable, lockmode,
lock, proclock);

Yes, there's more deadlocks that can be solved by queue reordering, but the
simple cases that ProcSleep() handles don't seem problematic to solve in
lock.c directly either...

Greetings,

Andres Freund




Re: [PATCH] Proof of concept for GUC improvements

2022-03-22 Thread Andres Freund
On 2022-03-22 13:15:34 -0500, David Christensen wrote:
> > On Mar 21, 2022, at 7:53 PM, Tom Lane  wrote:
> > If we'd done it like this from the beginning, it'd have been
> > great, but retrofitting it now is a lot less appealing.
>
> Yeah, agreed on this. As far as I’m concerned we can reject.

Updated CF entry to say so...




Re: LockAcquireExtended() dontWait vs weaker lock levels than already held

2022-03-22 Thread Robert Haas
On Tue, Mar 22, 2022 at 3:01 PM Andres Freund  wrote:
> We clearly already know how to compute whether a lock is "included" in
> something we already hold - after all ProcSleep() successfully does so.
>
> Isn't it a pretty trivial test? Seems like it'd boil down to something like

I don't mind you fixing the behavior. I just couldn't pass up an
opportunity to complain about the structure of lock.c.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Column Filtering in Logical Replication

2022-03-22 Thread Alvaro Herrera
On 2022-Mar-19, Tomas Vondra wrote:

> @@ -174,7 +182,13 @@ ALTER PUBLICATION noinsert SET (publish = 'update, 
> delete');
>
> Add some tables to the publication:
>  
> -ALTER PUBLICATION mypublication ADD TABLE users, departments;
> +ALTER PUBLICATION mypublication ADD TABLE users (user_id, firstname), 
> departments;
> +
> +
> +  
> +   Change the set of columns published for a table:
> +
> +ALTER PUBLICATION mypublication SET TABLE users (user_id, firstname, 
> lastname), TABLE departments;
>  
>  
>

Hmm, it seems to me that if you've removed the feature to change the set
of columns published for a table, then the second example should be
removed as well.

> +/*
> + * Transform the publication column lists expression for all the relations
> + * in the list.
> + *
> + * XXX The name is a bit misleading, because we don't really transform
> + * anything here - we merely check the column list is compatible with the
> + * definition of the publication (with publish_via_partition_root=false)
> + * we only allow column lists on the leaf relations. So maybe rename it?
> + */
> +static void
> +TransformPubColumnList(List *tables, const char *queryString,
> +bool pubviaroot)
> +{

I agree with renaming this function.  Maybe CheckPubRelationColumnList() ?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"This is a foot just waiting to be shot"(Andrew Dunstan)




Re: MDAM techniques and Index Skip Scan patch

2022-03-22 Thread Dmitry Dolgov
> On Mon, Mar 21, 2022 at 06:34:09PM -0700, Andres Freund wrote:
>
> > I don't mind having all of the alternatives under the same CF item, only
> > one being tested seems to be only a small limitation of cfbot.
>
> IMO it's pretty clear that having "duelling" patches below one CF entry is a
> bad idea. I think they should be split, with inactive approaches marked as
> returned with feeback or whatnot.

On the other hand even for patches with dependencies (i.e. the patch A
depends on the patch B) different CF items cause a lot of confusion for
reviewers. I guess for various flavours of the same patch it would be
even worse. But I don't have a strong opinion here.

> Either way, currently this patch fails on cfbot due to a new GUC:
> https://api.cirrus-ci.com/v1/artifact/task/5134905372835840/log/src/test/recovery/tmp_check/regression.diffs
> https://cirrus-ci.com/task/5134905372835840

This seems to be easy to solve.
>From 5bae9fdf8b74e5996b606e78f8b2a5fb327e011b Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Mon, 17 May 2021 11:47:07 +0200
Subject: [PATCH v41 1/6] Unique expressions

Extend PlannerInfo and Path structures with the list of relevant unique
expressions. It specifies which keys must be unique on the query
level, and allows to leverage this into on the path level. At the moment
only distinctClause makes use of such mechanism, which enables potential
use of index skip scan.

Originally proposed by David Rowley, based on the UniqueKey patch
implementation from Andy Fan, contains few bits out of previous version
from Jesper Pedersen, Floris Van Nee.
---
 src/backend/nodes/list.c| 31 +
 src/backend/optimizer/path/Makefile |  3 +-
 src/backend/optimizer/path/pathkeys.c   | 62 +
 src/backend/optimizer/path/uniquekeys.c | 92 +
 src/backend/optimizer/plan/planner.c| 36 +-
 src/backend/optimizer/util/pathnode.c   | 32 ++---
 src/include/nodes/pathnodes.h   |  5 ++
 src/include/nodes/pg_list.h |  1 +
 src/include/optimizer/pathnode.h|  1 +
 src/include/optimizer/paths.h   |  9 +++
 10 files changed, 261 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index f843f861ef..a53a50f372 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -1653,3 +1653,34 @@ list_oid_cmp(const ListCell *p1, const ListCell *p2)
 		return 1;
 	return 0;
 }
+
+/*
+ * Return true iff every entry in "members" list is also present
+ * in the "target" list.
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+found = true;
+break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 86a35cdef1..e2be1fbf90 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@
 #include "utils/lsyscache.h"
 
 
+static bool pathkey_is_unique(PathKey *new_pathkey, List *pathkeys);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 			 RelOptInfo *partrel,
@@ -96,6 +97,29 @@ make_canonical_pathkey(PlannerInfo *root,
 	return pk;
 }
 
+/*
+ * pathkey_is_unique
+ *		Checks if the new pathkey's equivalence class is the same as that of
+ *		any existing member of the pathkey list.
+ */
+static bool
+pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
+{
+	EquivalenceClass *new_ec = new_pathkey->pk_eclass;
+	ListCell   *lc;
+
+	/* If same EC already is already in the list, then not unique */
+	foreach(lc, pathkeys)
+	{
+		PathKey*old_pathkey = (PathKey *) lfirst(lc);
+
+		if (new_ec == old_pathkey->pk_eclass)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * pathkey_is_redundant
  *	   Is a pathkey redundant with one already in the given list?
@@ -1152,6 +1176,44 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	return pathkeys;
 }
 
+/*
+ * make_pathkeys_for_uniquekeyclauses
+ *		Generate a pathkeys list to be used for uniquekey clauses
+ */
+List *
+make_pathkeys_for_uniquekeys(PlannerInfo *root,
+			 List *sortclauses,
+			 List 

Re: Optimize external TOAST storage

2022-03-22 Thread Robert Haas
On Thu, Mar 17, 2022 at 4:05 PM Nathan Bossart  wrote:
> On Thu, Mar 17, 2022 at 01:04:17PM -0400, Robert Haas wrote:
> > Right, so perhaps the ultimate thing here would be a more fine-grained
> > knob than SET STORAGE EXTERNAL -- something that allows you to specify
> > that you want to compress only when it really helps. While some people
> > might find that useful, I think the current patch is less ambitious,
> > and I think that's OK. It just wants to save something in the cases
> > where it's basically free. Unfortunately we've learned that it's never
> > *entirely* free because making the last TOAST chunk longer can always
> > cost you something, even if it gets longer by only 1 byte. But for
> > larger values it's hard for that to be significant.
>
> I guess I think we should be slightly more ambitious.  One idea could be to
> create a default_toast_compression_ratio GUC with a default of 0.95.  This
> means that, by default, a compressed attribute must be 0.95x or less of the
> size of the uncompressed attribute to be stored compressed.  Like
> default_toast_compression, this could also be overridden at the column
> level with something like
>
> ALTER TABLE mytbl ALTER mycol SET COMPRESSION lz4 RATIO 0.9;
>
> If the current "basically free" patch is intended for v15, then maybe all
> this extra configurability stuff could wait for a bit, especially if we can
> demonstrate a decent read performance boost with a more conservative
> setting.  However, I don't see anything terribly complicated about the
> proposed configurability changes (assuming my proposal makes some amount of
> sense to you and others).

We seem to have a shortage of "others" showing up with opinions on
this topic, but I guess I'm not very confident about the general
utility of such a setting. Just to be clear, I'm also not very
confident about the usefulness of the existing settings for
controlling TOAST. Why is it useful default behavior to try to get
rows down to 2kB by default, rather than 1.8kB or 3kB? Even more, why
don't we try to compress primarily based on the length of individual
attributes and then compress further only if the resulting tuple
doesn't fit into a page at all? There doesn't seem to be anything
magic about fitting tuples into a quarter-page, yet the code acts as
though that's the primary goal - and then, when that didn't turn out
to work well in all cases, we added a user-settable parameter
(toast_tuple_target) to let you say you really want tuples in table X
to fit into a third of a page or a fifth of a page instead of a
quarter. And there's some justification for that: a proposal to
fundamentally change the algorithm would likely have gotten bogged
down for years, and the parameter no doubt lets you solve some
problems. Yet if the whole algorithm is wrong, and I think maybe it
is, then being able to change the constants is not really getting us
where we need to be.

Then, too, I'm not very confident about the usefulness of EXTENDED,
EXTERNAL, and MAIN. I think it's useful to be able to categorically
disable compression (as EXTERNAL does), but you can't categorically
disable out-of-line storage because the value can be bigger than the
page, so MAIN vs. EXTENDED is just changing the threshold for the use
of out-of-line storage. However, it does so in a way that IMHO is not
particularly intuitive, which goes back to my earlier point about the
algorithm seeming wacky, and it's in any case unclear why we should
offer exactly two possible thresholds and not any of the intermediate
values.

I think I'm prepared to concede that the setting you are proposing
here is easier to understand than either of those. Saying "store this
value compressed only if you an thereby reduce the size by at least
10%" is an understandable directive which I think will make sense even
to people who know very little about PostgreSQL or databases
generally, as long as they have some vague idea how compression works.
However, I guess I am not confident that the setting would get much
use, or that it is the best way to measure what we care about. For
instance, if my value is 19kB in length, saving 10% isn't very
impactful, because it's still the same number of chunks, and the only
benefit I gain is that the last chunk is smaller, which is probably
not very impactful because I'm still going to have 8 full chunks. I
will gain more if the value is bigger or smaller. Going up, if I
increase the size to 19MB or 190MB, I'm really saving a very
significant amount of cache space and, potentially, I/O. If I was
happy with a 10% savings at 19kB, I will likely be happy with a
considerably smaller economy here. Going down, if I decrease the size
to 1.9kB, I'm not storing any full-sized chunks any more, so reducing
the size of the one partial chunk that I'm storing seems like it could
have a real impact, provided of course that other toasted values in
the same table are similarly small. If the whole TOAST table is full
of just partial chunks, th

Re: MDAM techniques and Index Skip Scan patch

2022-03-22 Thread Thomas Munro
On Tue, Mar 22, 2022 at 2:34 PM Andres Freund  wrote:
> IMO it's pretty clear that having "duelling" patches below one CF entry is a
> bad idea. I think they should be split, with inactive approaches marked as
> returned with feeback or whatnot.

I have the impression that this thread is getting some value from
having a CF entry, as a multi-person collaboration where people are
trading ideas and also making progress that no one wants to mark as
returned, but it's vexing for people managing the CF because it's not
really proposed for 15.  Perhaps what we lack is a new status, "Work
In Progress" or something?




Re: New Object Access Type hooks

2022-03-22 Thread Andrew Dunstan


On 3/22/22 14:01, Tom Lane wrote:
> Andrew Dunstan  writes:
>> OK, I have pushed that.
> It seems like you could remove the NO_INSTALLCHECK restriction
> too.  You already removed the comment defending it, and it
> seems to work fine as an installcheck now if I remove that
> locally.
>
> Other nitpicks:
>
> * the IsParallelWorker test could use a comment
>
> * I notice a typo "permisisons" in test_oat_hooks.sql
>
>   



Fixed


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] Accept IP addresses in server certificate SANs

2022-03-22 Thread Jacob Champion
On Tue, 2022-03-22 at 13:32 +0900, Kyotaro Horiguchi wrote:
> At Fri, 18 Mar 2022 16:38:57 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > 
> > fe-secure-common.c doesn't need netinet/in.h.
> > 
> > 
> > +++ b/src/include/utils/inet.h
> > ..
> > +#include "common/inet-common.h"
> > 
> > I'm not sure about the project policy on #include practice, but I
> > think it is the common practice not to include headers that are not
> > required by the file itself.  In this case, fe-secure-common.h itself
> > doesn't need the include.  Instead, fe-secure-openssl.c and
> > fe-secure-common.c needs the include.

Thanks, looks like I had some old header dependencies left over from
several versions ago. Fixed in v9.

> I noticed that this doesn't contain doc changes.
> 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2Fcurrent%2Flibpq-ssl.html&data=04%7C01%7Cpchampion%40vmware.com%7Cb25566c0f0124a30221908da0bbcec13%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637835203290105956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sZuKc9UxmW1oZQij%2F%2F91rkEF57BZiQebkXtvEt%2FdROU%3D&reserved=0
> 
> > In verify-full mode, the host name is matched against the
> > certificate's Subject Alternative Name attribute(s), or against the
> > Common Name attribute if no Subject Alternative Name of type dNSName
> > is present. If the certificate's name attribute starts with an
> > asterisk (*), the asterisk will be treated as a wildcard, which will
> > match all characters except a dot (.). This means the certificate will
> > not match subdomains. If the connection is made using an IP address
> > instead of a host name, the IP address will be matched (without doing
> > any DNS lookups).
> 
> This refers to dNSName, so we should revise this so that it describes
> the new behavior.

v9 contains the bare minimum but I don't think it's quite enough. How
much of the behavior (and edge cases) do you think we should detail
here? All of it?

Thanks,
--Jacob
commit 6278176f7f417f0afecceb71807eb480e20769ef
Author: Jacob Champion 
Date:   Tue Mar 22 13:21:48 2022 -0700

squash! libpq: add OAUTHBEARER SASL mechanism

feedback:
- clean up header deps
- add a mention in the docs

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 3998b1781b..bdbfed1aa5 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -8343,7 +8343,8 @@ 
ldap://ldap.acme.com/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*)
   
In verify-full mode, the host name is matched against the
certificate's Subject Alternative Name attribute(s), or against the
-   Common Name attribute if no Subject Alternative Name of type 
dNSName is
+   Common Name attribute if no Subject Alternative Name of type 
dNSName
+   (or iPAddress, if the connection uses an IP address) is
present.  If the certificate's name attribute starts with an asterisk
(*), the asterisk will be treated as
a wildcard, which will match all characters except a 
dot
diff --git a/src/interfaces/libpq/fe-secure-common.c 
b/src/interfaces/libpq/fe-secure-common.c
index 4d78715756..9c408df369 100644
--- a/src/interfaces/libpq/fe-secure-common.c
+++ b/src/interfaces/libpq/fe-secure-common.c
@@ -19,9 +19,9 @@
 
 #include "postgres_fe.h"
 
-#include 
 #include 
 
+#include "common/inet-common.h"
 #include "fe-secure-common.h"
 
 #include "libpq-int.h"
diff --git a/src/interfaces/libpq/fe-secure-common.h 
b/src/interfaces/libpq/fe-secure-common.h
index 20ff9ba5db..d18db7138c 100644
--- a/src/interfaces/libpq/fe-secure-common.h
+++ b/src/interfaces/libpq/fe-secure-common.h
@@ -16,7 +16,6 @@
 #ifndef FE_SECURE_COMMON_H
 #define FE_SECURE_COMMON_H
 
-#include "common/inet-common.h"
 #include "libpq-fe.h"
 
 extern int pq_verify_peer_name_matches_certificate_name(PGconn *conn,
diff --git a/src/interfaces/libpq/fe-secure-openssl.c 
b/src/interfaces/libpq/fe-secure-openssl.c
index 06b166b7fd..b70bd2eb19 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+#include "common/inet-common.h"
 #include "libpq-fe.h"
 #include "fe-auth.h"
 #include "fe-secure-common.h"
From 84a107da33b7d901cb318f8c2fd140644fbc5100 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 24 Nov 2021 14:46:11 -0800
Subject: [PATCH v9 1/3] Move inet_net_pton() to src/port

This will be helpful for IP address verification in libpq.
---
 src/backend/utils/adt/Makefile  |  1 -
 src/include/port.h  |  3 +++
 src/include/utils/builtins.h|  4 
 src/port/Makefile   |  1 +
 src/{backend/utils/adt => port}/inet_net_pton.c | 16 +++-
 src/tools/msvc/Mkvcbuild.pm |  2 +-
 6 files changed, 20 insertions(+), 7 deletions(-)
 rename src/{backend/utils/adt => port}/inet_net_pton.c (96%)

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-22 Thread Robert Haas
On Mon, Mar 21, 2022 at 2:23 PM Robert Haas  wrote:
> On Mon, Mar 21, 2022 at 11:21 AM Dilip Kumar  wrote:
> > I tried to debug the case but I realized that somehow
> > CHECK_FOR_INTERRUPTS() is not calling the
> > AcceptInvalidationMessages() and I could not find the same while
> > looking into the code as well.   While debugging I noticed that
> > AcceptInvalidationMessages() is called multiple times but that is only
> > through LockRelationId() but while locking the relation we had already
> > closed the previous smgr because at a time we keep only one smgr open.
> > And that's the reason it is not hitting the issue which we think it
> > could. Is there any condition under which it will call
> > AcceptInvalidationMessages() through CHECK_FOR_INTERRUPTS() ? because
> > I could not see while debugging as well as in code.
>
> Yeah, I think the reason you can't find it is that it's not there. I
> was confused in what I wrote earlier. I think we only process sinval
> catchups when we're idle, not at every CHECK_FOR_INTERRUPTS(). And I
> think the reason for that is precisely that it would be hard to write
> correct code otherwise, since invalidations might then get processed
> in a lot more places. So ... I guess all we really need to do here is
> avoid assuming that the results of smgropen() are valid across any
> code that might acquire relation locks. Which I think the code already
> does.

So I talked to Andres and Thomas about this and they told me that I
was right to worry about this problem. Over on the thread about "wrong
fds used for refilenodes after pg_upgrade relfilenode changes
Reply-To:" there is a plan to make use ProcSignalBarrier to make smgr
objects disappear, and ProcSignalBarrier can be processed at any
CHECK_FOR_INTERRUPTS(), so then we'd have a problem here. Commit
f10f0ae420ee62400876ab34dca2c09c20dcd030 established a policy that you
should always re-fetch the smgr object instead of reusing one you've
already got, and even before that it was known to be unsafe to keep
them around for any period of time, because anything that opened a
relation, including a syscache lookup, could potentially accept
invalidations. So most of our code is already hardened against the
possibility of smgr objects disappearing. I have a feeling there may
be some that isn't, but it would be good if this patch didn't
introduce more such code at the same time that patch is trying to
introduce more ways to get rid of smgr objects. It was suggested to me
that what this patch ought to be doing is calling
CreateFakeRelcacheEntry() and then using RelationGetSmgr(fakerel)
every time we need the SmgrRelation, without ever keeping it around
for any amount of code. That way, if the smgr relation gets closed out
from under us at a CHECK_FOR_INTERRUPTS(), we'll just recreate it at
the next RelationGetSmgr() call.

Andres also noted that he thinks the patch performs redundant cleanup,
because of the fact that it uses RelationCreateStorage. That will
arrange to remove files on abort, but createdb() also has its own
mechanism for that. It doesn't seem like a thing to do twice in two
different ways.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: MDAM techniques and Index Skip Scan patch

2022-03-22 Thread Tom Lane
Peter Geoghegan  writes:
> Like many difficult patches, the skip scan patch is not so much
> troubled by problems with the implementation as it is troubled by
> *ambiguity* about the design. Particularly concerning how skip scan
> meshes with existing designs, as well as future designs --
> particularly designs for other MDAM techniques. I've started this
> thread to have a big picture conversation about how to think about
> these things.

Peter asked me off-list to spend some time thinking about the overall
direction we ought to be pursuing here.  I have done that, and here
are a few modest suggestions.

1. Usually I'm in favor of doing this sort of thing in an index AM
agnostic way, but here I don't see much point.  All of the ideas at
stake rely fundamentally on having a lexicographically-ordered multi
column index; but we don't have any of those except btree, nor do
I think we're likely to get any soon.  This motivates the general
tenor of my remarks below, which is "do it in access/nbtree/ not in
the planner".

2. The MDAM paper Peter cited is really interesting.  You can see
fragments of those ideas in our existing btree code, particularly in
the scan setup stuff that detects redundant or contradictory keys and
determines a scan start strategy.  The special handling we implemented
awhile ago for ScalarArrayOp index quals is also a subset of what they
were talking about.  It seems to me that if we wanted to implement more
of those ideas, the relevant work should almost all be done in nbtree
proper.  The planner would need only minor adjustments: btcostestimate
would have to be fixed to understand the improvements, and there are
some rules in indxpath.c that prevent us from passing "too complicated"
sets of indexquals to the AM, which would need to be relaxed or removed
altogether.

3. "Loose" indexscan (i.e., sometimes re-descend from the tree root
to find the next index entry) is again something that seems like it's
mainly nbtree's internal problem.  Loose scan is interesting if we
have index quals for columns that are after the first column that lacks
an equality qual, otherwise not.  I've worried in the past that we'd
need planner/statistical support to figure out whether a loose scan
is likely to be useful compared to just plowing ahead in the index.
However, that seems to be rendered moot by the idea used in the current
patchsets, ie scan till we find that we'll have to step off the current
page, and re-descend at that point.  (When and if we find that that
heuristic is inadequate, we could work on passing some statistical data
forward.  But we don't need any in the v1 patch.)  Again, we need some
work in btcostestimate to understand how the index scan cost will be
affected, but I still don't see any pressing need for major planner
changes or plan tree contents changes.

4. I find each of the above ideas to be far more attractive than
optimizing SELECT-DISTINCT-that-matches-an-index, so I don't really
understand why the current patchsets seem to be driven largely
by that single use-case.  I wouldn't even bother with that for the
initial patch.  When we do get around to it, it still doesn't need
major planner support, I think --- again fixing the cost estimation
is the bulk of the work.  Munro's original 2014 patch showed that we
don't really need all that much to get the planner to build such a
plan; the problem is to convince it that that plan will be cheap.

In short: I would throw out just about all the planner infrastructure
that's been proposed so far.  It looks bulky, expensive, and
drastically undercommented, and I don't think it's buying us anything
of commensurate value.  The part of the planner that actually needs
serious thought is btcostestimate, which has been woefully neglected in
both of the current patchsets.

BTW, I've had a bee in my bonnet for a long time about whether some of
nbtree's scan setup work could be done once during planning, rather than
over again during each indexscan start.  This issue might become more
pressing if the work becomes significantly more complicated/expensive,
which these ideas might cause.  But it's a refinement that could be
left for later --- and in any case, the responsibility would still
fundamentally be nbtree's.  I don't think the planner would do more
than call some AM routine that could add decoration to an IndexScan
plan node.

Now ... where did I put my flameproof vest?

regards, tom lane




Re: SQL/JSON: JSON_TABLE

2022-03-22 Thread Andrew Dunstan


On 3/22/22 11:27, Andrew Dunstan wrote:
> On 3/22/22 10:53, Alvaro Herrera wrote:
>> On 2022-Mar-22, Andrew Dunstan wrote:
>>
>>> I'm planning on pushing the functions patch set this week and json-table
>>> next week.
>> I think it'd be a good idea to push the patches one by one and let a day
>> between each.  That would make it easier to chase buildfarm issues
>> individually, and make sure they are fixed before the next step.
>> Each patch in each series is already big enough.
>
>
> OK, can do it that way. First one will be later today then.
>
>

OK, pushed the first of the functions patches. That means the cfbot will
break on these two CF items, but it's way too much trouble to have to
remake the patch sets every day, so we can just live without the cfbot
on these for a bit. I will of course test before committing and fix any
bitrot.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Optimize external TOAST storage

2022-03-22 Thread Nathan Bossart
On Tue, Mar 22, 2022 at 04:34:05PM -0400, Robert Haas wrote:
> We seem to have a shortage of "others" showing up with opinions on
> this topic, but I guess I'm not very confident about the general
> utility of such a setting. Just to be clear, I'm also not very
> confident about the usefulness of the existing settings for
> controlling TOAST. Why is it useful default behavior to try to get
> rows down to 2kB by default, rather than 1.8kB or 3kB? Even more, why
> don't we try to compress primarily based on the length of individual
> attributes and then compress further only if the resulting tuple
> doesn't fit into a page at all? There doesn't seem to be anything
> magic about fitting tuples into a quarter-page, yet the code acts as
> though that's the primary goal - and then, when that didn't turn out
> to work well in all cases, we added a user-settable parameter
> (toast_tuple_target) to let you say you really want tuples in table X
> to fit into a third of a page or a fifth of a page instead of a
> quarter. And there's some justification for that: a proposal to
> fundamentally change the algorithm would likely have gotten bogged
> down for years, and the parameter no doubt lets you solve some
> problems. Yet if the whole algorithm is wrong, and I think maybe it
> is, then being able to change the constants is not really getting us
> where we need to be.
> 
> Then, too, I'm not very confident about the usefulness of EXTENDED,
> EXTERNAL, and MAIN. I think it's useful to be able to categorically
> disable compression (as EXTERNAL does), but you can't categorically
> disable out-of-line storage because the value can be bigger than the
> page, so MAIN vs. EXTENDED is just changing the threshold for the use
> of out-of-line storage. However, it does so in a way that IMHO is not
> particularly intuitive, which goes back to my earlier point about the
> algorithm seeming wacky, and it's in any case unclear why we should
> offer exactly two possible thresholds and not any of the intermediate
> values.

I agree with all of this.  Adding configurability for each constant might
help folks in the short term, but using these knobs probably requires quite
a bit of expertise in Postgres internals as well as a good understanding of
your data.  I think we ought to revist TOAST configurability from a user
perspective.  IOW what can be chosen automatically, and how do we enable
users to effectively configure the settings that cannot be chosen
automatically?  IMO this is a worthwhile conversation to have as long as it
doesn't stray too far into the "let's rewrite TOAST" realm.  I think there
is always going to be some level of complexity with stuff like TOAST, but
there are probably all sorts of ways to simplify/improve it also.

> Maybe the conclusion here is that more thought is needed before
> changing anything in this area

You've certainly got me thinking more about this.  If the scope of this
work is going to expand, a few months before the first v16 commitfest is
probably the right time for it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-03-22 Thread samay sharma
Hi Jacob,

Thank you for porting this on top of the pluggable auth methods API. I've
addressed the feedback around other backend changes in my latest patch, but
the client side changes still remain. I had a few questions to understand
them better.

(a) What specifically do the client side changes in the patch implement?
(b) Are the changes you made on the client side specific to OAUTH or are
they about making SASL more generic? As an additional question, if someone
wanted to implement something similar on top of your patch, would they
still have to make client side changes?

Regards,
Samay

On Fri, Mar 4, 2022 at 11:13 AM Jacob Champion  wrote:

> Hi all,
>
> v3 rebases this patchset over the top of Samay's pluggable auth
> provider API [1], included here as patches 0001-3. The final patch in
> the set ports the server implementation from a core feature to a
> contrib module; to switch between the two approaches, simply leave out
> that final patch.
>
> There are still some backend changes that must be made to get this
> working, as pointed out in 0009, and obviously libpq support still
> requires code changes.
>
> --Jacob
>
> [1]
> https://www.postgresql.org/message-id/flat/CAJxrbyxTRn5P8J-p%2BwHLwFahK5y56PhK28VOb55jqMO05Y-DJw%40mail.gmail.com
>


Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-22 Thread Imseih (AWS), Sami
>Can the leader pass a callback that checks PVIndStats to ambulkdelete
>an amvacuumcleanup callbacks? I think that in the passed callback, the
>leader checks if the number of processed indexes and updates its
>progress information if the current progress needs to be updated.

Thanks for the suggestion.

I looked at this option a but today and found that passing the callback 
will also require signature changes to the ambulkdelete and 
amvacuumcleanup routines. 

This will also require us to check after x pages have been 
scanned inside vacuumscan and vacuumcleanup. After x pages
the callback can then update the leaders progress.
I am not sure if adding additional complexity to the scan/cleanup path
 is justified for what this patch is attempting to do. 

There will also be a lag of the leader updating the progress as it
must scan x amount of pages before updating. Obviously, the more
Pages to the scan, the longer the lag will be.

Would like to hear your thoughts on the above.

Regards,

Sami Imseih
Amazon Web Services.







Re: SQL/JSON: functions

2022-03-22 Thread Andrew Dunstan


On 3/5/22 09:39, Andrew Dunstan wrote:
>
> here's a new set of patches, omitting the GUC patch and with the
> beginnings of some message cleanup - there's more work to do there.
>
>
>
> This patchset restores the RETURNING clause for JSON() and JSON_SCALAR()
> but without the GUC
>
>


I have committed the first of these. That will break the cfbot for this
and the json_table patches. The remainder should be committed in the
following days.


cheers


andew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: New Object Access Type hooks

2022-03-22 Thread Tom Lane
Andrew Dunstan  writes:
> Fixed

Now that we got past the hard failures, we can see that the test
falls over with (some?) non-default encodings, as for instance
here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2022-03-22%2020%3A23%3A13

I can replicate that by running the test under LANG=en_US.iso885915.
What I think is happening is:

(1) Rather unwisely, the relevant InvokeNamespaceSearchHook calls
appear in recomputeNamespacePath.  That means that their timing
depends heavily on accidents of caching.

(2) If we decide that we need an encoding conversion to talk to
the client, there'll be a lookup for the conversion function
early during session startup.  That will cause the namespace
search path to get computed then, before the test module has been
loaded and certainly before the audit GUC has been turned on.

(3) At the point where the test expects some audit notices
to come out, nothing happens because the search path is
already validated.

I'm inclined to think that (1) is a seriously bad idea,
not only because of this instability, but because

(a) the namespace cache logic is unlikely to cause the search-path
cache to get invalidated when something happens that might cause an
OAT hook to wish to change its decision, and

(b) this placement means that the hook is invoked during cache loading
operations that are likely to be super-sensitive to any additional
catalog accesses a hook might wish to do.  (I await the results of the
CLOBBER_CACHE_ALWAYS animals with trepidation.)

Now, if our attitude to the OAT hooks is that we are going to
sprinkle some at random and whether they are useful is someone
else's problem, then maybe these are not interesting concerns.

regards, tom lane




  1   2   >