date:20220105

Re: Refactoring of compression options in pg_basebackup

2022-01-05 Thread Michael Paquier

(Added Jeevan in CC, for awareness)

On Mon, Jan 03, 2022 at 03:35:57PM +, gkokola...@pm.me wrote:
> I propose to initialize streamer to NULL when declared at the top of
> CreateBackupStreamer().

Yes, that may be noisy.  Done this way.

> You can see that after the check_pg_config() test, only 3 tests follow,
> namely:
>  *  $node->command_ok()
>  *  is(scalar(), ...)
>  *  is($gzip_is_valid, ...)

Indeed.  The CF bot was complaining about that, actually.

Thinking more about this stuff, pg_basebackup --compress is an option
that exists already for a couple of years, and that's independent of
the backend-side compression that Robert and Jeevan are working on, so
I'd like to move on this code cleanup.  We can always figure out the
LZ4 part for pg_basebackup after, if necessary.

Attached is an updated patch.  The CF bot should improve with that.
--
Michael
From a216d5569fa6d963c2f846d9f5539d5ea7ddbd62 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 5 Jan 2022 16:58:00 +0900
Subject: [PATCH v2] Refactor options of pg_basebackup for compression level
 and methods

---
 src/bin/pg_basebackup/pg_basebackup.c| 81 +++-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 47 +++-
 src/bin/pg_basebackup/walmethods.c   | 47 
 doc/src/sgml/ref/pg_basebackup.sgml  | 22 +-
 4 files changed, 156 insertions(+), 41 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1739ac6382..7f29200832 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -113,6 +113,7 @@ static bool showprogress = false;
 static bool estimatesize = true;
 static int	verbose = 0;
 static int	compresslevel = 0;
+static WalCompressionMethod compression_method = COMPRESSION_NONE;
 static IncludeWal includewal = STREAM_WAL;
 static bool fastcheckpoint = false;
 static bool writerecoveryconf = false;
@@ -359,7 +360,9 @@ usage(void)
 	printf(_("  -X, --wal-method=none|fetch|stream\n"
 			 " include required WAL files with specified method\n"));
 	printf(_("  -z, --gzip compress tar output\n"));
-	printf(_("  -Z, --compress=0-9 compress tar output with given compression level\n"));
+	printf(_("  -Z, --compress=1-9 compress tar output with given compression level\n"));
+	printf(_("  --compression-method=METHOD\n"
+			 " method to compress data\n"));
 	printf(_("\nGeneral options:\n"));
 	printf(_("  -c, --checkpoint=fast|spread\n"
 			 " set fast or spread checkpointing\n"));
@@ -524,7 +527,7 @@ LogStreamerMain(logstreamer_param *param)
 	stream.do_sync);
 	else
 		stream.walmethod = CreateWalTarMethod(param->xlog,
-			  COMPRESSION_NONE, /* ignored */
+			  compression_method,
 			  compresslevel,
 			  stream.do_sync);
 
@@ -975,7 +978,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
 	 bool is_recovery_guc_supported,
 	 bool expect_unterminated_tarfile)
 {
-	bbstreamer *streamer;
+	bbstreamer *streamer = NULL;
 	bbstreamer *manifest_inject_streamer = NULL;
 	bool		inject_manifest;
 	bool		must_parse_archive;
@@ -1034,19 +1037,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
 			archive_file = NULL;
 		}
 
+		if (compression_method == COMPRESSION_NONE)
+			streamer = bbstreamer_plain_writer_new(archive_filename,
+   archive_file);
 #ifdef HAVE_LIBZ
-		if (compresslevel != 0)
+		else if (compression_method == COMPRESSION_GZIP)
 		{
 			strlcat(archive_filename, ".gz", sizeof(archive_filename));
 			streamer = bbstreamer_gzip_writer_new(archive_filename,
   archive_file,
   compresslevel);
 		}
-		else
 #endif
-			streamer = bbstreamer_plain_writer_new(archive_filename,
-   archive_file);
-
+		else
+		{
+			Assert(false);		/* not reachable */
+		}
 
 		/*
 		 * If we need to parse the archive for whatever reason, then we'll
@@ -1765,6 +1771,7 @@ main(int argc, char **argv)
 		{"no-manifest", no_argument, NULL, 5},
 		{"manifest-force-encode", no_argument, NULL, 6},
 		{"manifest-checksums", required_argument, NULL, 7},
+		{"compression-method", required_argument, NULL, 8},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -1872,14 +1879,10 @@ main(int argc, char **argv)
 do_sync = false;
 break;
 			case 'z':
-#ifdef HAVE_LIBZ
-compresslevel = Z_DEFAULT_COMPRESSION;
-#else
-compresslevel = 1;	/* will be rejected below */
-#endif
+compression_method = COMPRESSION_GZIP;
 break;
 			case 'Z':
-if (!option_parse_int(optarg, "-Z/--compress", 0, 9,
+if (!option_parse_int(optarg, "-Z/--compress", 1, 9,
 	  &compresslevel))
 	exit(1);
 break;
@@ -1941,6 +1944,18 @@ main(int argc, char **argv)
 			case 7:
 manifest_checksums = pg_strdup(optarg);
 break;
+			case 8:
+if (pg_strcasecmp(optarg, "gzip") == 0)
+	compress

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-01-05 Thread Michael Paquier

On Thu, Dec 16, 2021 at 11:51:55AM +0900, Michael Paquier wrote:
> I may have missed one thing or two, but I think that's pretty much
> what we should be looking for to do the switch to TAP in terms of
> coverage.

Rebased patch to cool down the CF bot, as per the addition of
--no-sync to pg_upgrade.
--
Michael
From f7af03ec5e5c6a685796d58b6d82eb9f603565de Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 5 Jan 2022 17:11:12 +0900
Subject: [PATCH v5] Switch tests of pg_upgrade to use TAP

---
 src/bin/pg_upgrade/.gitignore|   5 +
 src/bin/pg_upgrade/Makefile  |  23 +-
 src/bin/pg_upgrade/TESTING   |  33 ++-
 src/bin/pg_upgrade/t/001_basic.pl|   9 +
 src/bin/pg_upgrade/t/002_pg_upgrade.pl   | 275 ++
 src/bin/pg_upgrade/test.sh   | 279 ---
 src/test/perl/PostgreSQL/Test/Cluster.pm |  25 ++
 src/tools/msvc/vcregress.pl  |  94 +---
 8 files changed, 355 insertions(+), 388 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/001_basic.pl
 create mode 100644 src/bin/pg_upgrade/t/002_pg_upgrade.pl
 delete mode 100644 src/bin/pg_upgrade/test.sh

diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore
index 2d3bfeaa50..3b64522ab6 100644
--- a/src/bin/pg_upgrade/.gitignore
+++ b/src/bin/pg_upgrade/.gitignore
@@ -7,3 +7,8 @@
 /loadable_libraries.txt
 /log/
 /tmp_check/
+
+# Generated by pg_regress
+/sql/
+/expected/
+/results/
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 44d06be5a6..35b6c123a5 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -28,6 +28,12 @@ OBJS = \
 override CPPFLAGS := -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
+# required for 002_pg_upgrade.pl
+REGRESS_SHLIB=$(abs_top_builddir)/src/test/regress/regress$(DLSUFFIX)
+export REGRESS_SHLIB
+REGRESS_OUTPUTDIR=$(abs_top_builddir)/src/bin/pg_upgrade
+export REGRESS_OUTPUTDIR
+
 all: pg_upgrade
 
 pg_upgrade: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
@@ -49,17 +55,8 @@ clean distclean maintainer-clean:
 	   pg_upgrade_dump_globals.sql \
 	   pg_upgrade_dump_*.custom pg_upgrade_*.log
 
-# When $(MAKE) is present, make automatically infers that this is a
-# recursive make. which is not actually what we want here, as that
-# e.g. prevents output synchronization from working (as make thinks
-# that the subsidiary make knows how to deal with that itself, but
-# we're invoking a shell script that doesn't know). Referencing
-# $(MAKE) indirectly avoids that behaviour.
-# See https://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html#MAKE-Variable
-NOTSUBMAKEMAKE=$(MAKE)
+check:
+	$(prove_check)
 
-check: test.sh all temp-install
-	MAKE=$(NOTSUBMAKEMAKE) $(with_temp_install) bindir=$(abs_top_builddir)/tmp_install/$(bindir) EXTRA_REGRESS_OPTS="$(EXTRA_REGRESS_OPTS)" $(SHELL) $<
-
-# installcheck is not supported because there's no meaningful way to test
-# pg_upgrade against a single already-running server
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_upgrade/TESTING b/src/bin/pg_upgrade/TESTING
index 78b9747908..43a71566e2 100644
--- a/src/bin/pg_upgrade/TESTING
+++ b/src/bin/pg_upgrade/TESTING
@@ -2,21 +2,30 @@ THE SHORT VERSION
 -
 
 On non-Windows machines, you can execute the testing process
-described below by running
+described below by running the following command in this directory:
 	make check
-in this directory.  This will run the shell script test.sh, performing
-an upgrade from the version in this source tree to a new instance of
-the same version.
 
-To test an upgrade from a different version, you must have a built
-source tree for the old version as well as this version, and you
-must have done "make install" for both versions.  Then do:
+This will run the TAP tests to run pg_upgrade, performing an upgrade
+from the version in this source tree to a  new instance of the same
+version.
+
+To test an upgrade from a different version, there are two options
+available:
+
+1) You have a built source tree for the old version as well as this
+version's binaries.  Then set up the following variables before
+launching the test:
 
 export oldsrc=...somewhere/postgresql	(old version's source tree)
-export oldbindir=...otherversion/bin	(old version's installed bin dir)
-export bindir=...thisversion/bin	(this version's installed bin dir)
-export libdir=...thisversion/lib	(this version's installed lib dir)
-sh test.sh
+export oldinstall=...otherversion/	(old version's install base path)
+
+2) You have a dump that can be used to set up the old version, as well
+as this version's binaries.  Then set up the following variables:
+export olddump=...somewhere/dump.sql	(old version's dump)
+export oldinstall=...otherversion/	(old version's install base path)
+
+Finally, the tests can be done by

Re: row filtering for logical replication

2022-01-05 Thread Amit Kapila

On Tue, Dec 28, 2021 at 6:33 PM houzj.f...@fujitsu.com
 wrote:
>
> On Mon, Dec 27, 2021 9:19 PM Hou Zhijie  wrote:
> > On Mon, Dec 27, 2021 9:16 PM houzj.f...@fujitsu.com 
> > wrote:
> > > On Thur, Dec 23, 2021 4:28 PM Peter Smith  wrote:
> > > > Here is the v54* patch set:
> > >
> > > Attach the v55 patch set which add the following testcases in 0002 patch.
>
> When reviewing the row filter patch, I found few things that could be 
> improved.
> 1) We could transform the same row filter expression twice when
>ALTER PUBLICATION ... SET TABLE WHERE (...). Because we invoke
>GetTransformedWhereClause in both AlterPublicationTables() and
>publication_add_relation(). I was thinking it might be better if we only
>transform the expression once in AlterPublicationTables().
>
> 2) When transforming the expression, we didn’t set the correct p_sourcetext.
>Since we need to transforming serval expressions which belong to different
>relations, I think it might be better to pass queryString down to the 
> actual
>transform function and set p_sourcetext to the actual queryString.
>

I have tried the following few examples to check the error_position
and it seems to be showing correct position without your 0004 patch.
postgres=# create publication pub for table t1 where (10);
ERROR:  argument of PUBLICATION WHERE must be type boolean, not type integer
LINE 1: create publication pub for table t1 where (10);

 ^

Also, transformPubWhereClauses() seems to be returning the same list
as it was passed to it. Do we really need to return anything from
transformPubWhereClauses()?

-- 
With Regards,
Amit Kapila.

Re: Refactoring of compression options in pg_basebackup

2022-01-05 Thread gkokolatos

On Wednesday, January 5th, 2022 at 9:00 AM, Michael Paquier 
 wrote:

> On Mon, Jan 03, 2022 at 03:35:57PM +, gkokola...@pm.me wrote:
> > I propose to initialize streamer to NULL when declared at the top of
> > CreateBackupStreamer().
>
> Yes, that may be noisy.  Done this way.

Great.

> > You can see that after the check_pg_config() test, only 3 tests follow,
> > namely:
> >  *  $node->command_ok()
> >  *  is(scalar(), ...)
> >  *  is($gzip_is_valid, ...)
>
> Indeed.  The CF bot was complaining about that, actually.

Great.

> Thinking more about this stuff, pg_basebackup --compress is an option
> that exists already for a couple of years, and that's independent of
> the backend-side compression that Robert and Jeevan are working on, so
> I'd like to move on this code cleanup.  We can always figure out the
> LZ4 part for pg_basebackup after, if necessary.

I agree that the cleanup in itself is helpful. It feels awkward to have two
utilities under the same path, with distinct options for the same
functionality.

When the backend-side compression is completed, were there really be a need for
client-side compression? If yes, then it seems logical to have distinct options
for them and this cleanup makes sense. If not, then it seems logical to maintain
the current options list and 'simply' change the internals of the code, and this
cleanup makes sense.

> Attached is an updated patch.  The CF bot should improve with that.

+1

> --
> Michael

Cheers,
//Georgios

Re: refactoring basebackup.c

2022-01-05 Thread tushar


On 1/4/22 8:07 PM, Robert Haas wrote:

Before sending an email like this, it would be a good idea to read the
documentation for the --server-compression option.

Sure, Thanks Robert.

One scenario where I feel error message is confusing and if it is not 
supported at all then error message need to be a little bit more clear


if we use -z  (or -Z ) with -t , we are getting this error
[edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch -z
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.

but after removing -z option  backup is in tar mode only

edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch
[edb@centos7tushar bin]$ ls /tmp/test0
backup_manifest  base.tar

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: row filtering for logical replication

2022-01-05 Thread Amit Kapila

On Wed, Jan 5, 2022 at 2:45 PM Amit Kapila  wrote:
>
> On Tue, Dec 28, 2021 at 6:33 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Mon, Dec 27, 2021 9:19 PM Hou Zhijie  wrote:
> > > On Mon, Dec 27, 2021 9:16 PM houzj.f...@fujitsu.com 
> > > 
> > > wrote:
> > > > On Thur, Dec 23, 2021 4:28 PM Peter Smith  wrote:
> > > > > Here is the v54* patch set:
> > > >
> > > > Attach the v55 patch set which add the following testcases in 0002 
> > > > patch.
> >
> > When reviewing the row filter patch, I found few things that could be 
> > improved.
> > 1) We could transform the same row filter expression twice when
> >ALTER PUBLICATION ... SET TABLE WHERE (...). Because we invoke
> >GetTransformedWhereClause in both AlterPublicationTables() and
> >publication_add_relation(). I was thinking it might be better if we only
> >transform the expression once in AlterPublicationTables().
> >
> > 2) When transforming the expression, we didn’t set the correct p_sourcetext.
> >Since we need to transforming serval expressions which belong to 
> > different
> >relations, I think it might be better to pass queryString down to the 
> > actual
> >transform function and set p_sourcetext to the actual queryString.
> >
>
> I have tried the following few examples to check the error_position
> and it seems to be showing correct position without your 0004 patch.
> postgres=# create publication pub for table t1 where (10);
> ERROR:  argument of PUBLICATION WHERE must be type boolean, not type integer
> LINE 1: create publication pub for table t1 where (10);
>
>  ^
>

I understand why the error position could vary even though it is
showing the correct location in the above example after reading
another related email [1].

> Also, transformPubWhereClauses() seems to be returning the same list
> as it was passed to it. Do we really need to return anything from
> transformPubWhereClauses()?
>

One more point about this function: the patch seems to be doing some
work even when where clause is not specified which can be avoided.

Another minor comment:
+static bool pgoutput_row_filter(enum ReorderBufferChangeType changetype,

Do we need to specify the 'enum' type before changetype parameter?

[1] - https://www.postgresql.org/message-id/1513381.1640626456%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.

Re: Proposal: remove obsolete hot-standby testing infrastructure

2022-01-05 Thread Peter Eisentraut


On 03.01.22 22:50, Tom Lane wrote:

The attached proposed patch removes some ancient infrastructure for
manually testing hot standby.  I doubt anyone has used this in years,
because AFAICS there is nothing here that's not done better by the
src/test/recovery TAP tests.  (Or if there is, we ought to migrate
it into the TAP tests.)


I looked into this some time ago and concluded that this test contains a 
significant amount of testing that isn't obviously done anywhere else. 
I don't have the notes anymore, and surely some things have progressed 
since, but I wouldn't just throw the old test suite away without 
actually checking.

[PATCH] Run UTF8-dependent tests for citext [Re: daitch_mokotoff module]

2022-01-05 Thread Dag Lem

Tom Lane  writes:

> Dag Lem  writes:
>
>> Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
>> two other modules are using UTF8 characters in tests - citext and
>> unaccent.
>
> Yeah, neither of those have been upgraded to said best practice.
> (If you feel like doing the legwork to improve that situation,
> that'd be great.)
>

Please find attached a patch to run the previously commented-out
UTF8-dependent tests for citext, according to best practice. For now I
don't dare to touch the unaccent module, which seems to be UTF8-only
anyway.


Best regards

Dag Lem

diff --git a/contrib/citext/Makefile b/contrib/citext/Makefile
index a7de52928d..789932fe36 100644
--- a/contrib/citext/Makefile
+++ b/contrib/citext/Makefile
@@ -11,7 +11,7 @@ DATA = citext--1.4.sql \
 	citext--1.0--1.1.sql
 PGFILEDESC = "citext - case-insensitive character string data type"
 
-REGRESS = citext
+REGRESS = citext citext_utf8
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out
index 3bac0534fb..48b4de8993 100644
--- a/contrib/citext/expected/citext.out
+++ b/contrib/citext/expected/citext.out
@@ -48,29 +48,6 @@ SELECT 'a'::citext <> 'ab'::citext AS t;
  t
 (1 row)
 
--- Multibyte sanity tests. Uncomment to run.
--- SELECT 'À'::citext =  'À'::citext AS t;
--- SELECT 'À'::citext =  'à'::citext AS t;
--- SELECT 'À'::text   =  'à'::text   AS f; -- text wins.
--- SELECT 'À'::citext <> 'B'::citext AS t;
--- Test combining characters making up canonically equivalent strings.
--- SELECT 'Ä'::text   <> 'Ä'::text   AS t;
--- SELECT 'Ä'::citext <> 'Ä'::citext AS t;
--- Test the Turkish dotted I. The lowercase is a single byte while the
--- uppercase is multibyte. This is why the comparison code can't be optimized
--- to compare string lengths.
--- SELECT 'i'::citext = 'İ'::citext AS t;
--- Regression.
--- SELECT 'láska'::citext <> 'laská'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext = 'Ask Bjørn Hansen'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext = 'ASK BJØRN HANSEN'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext <> 'Ask Bjorn Hansen'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext <> 'ASK BJORN HANSEN'::citext AS t;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS zero;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ask bjørn hansen'::citext) AS zero;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ASK BJØRN HANSEN'::citext) AS zero;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjorn Hansen'::citext) AS positive;
--- SELECT citext_cmp('Ask Bjorn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS negative;
 -- Test > and >=
 SELECT 'B'::citext > 'a'::citext AS t;
  t 
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index 57fc863f7a..8ab4d4224e 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -48,29 +48,6 @@ SELECT 'a'::citext <> 'ab'::citext AS t;
  t
 (1 row)
 
--- Multibyte sanity tests. Uncomment to run.
--- SELECT 'À'::citext =  'À'::citext AS t;
--- SELECT 'À'::citext =  'à'::citext AS t;
--- SELECT 'À'::text   =  'à'::text   AS f; -- text wins.
--- SELECT 'À'::citext <> 'B'::citext AS t;
--- Test combining characters making up canonically equivalent strings.
--- SELECT 'Ä'::text   <> 'Ä'::text   AS t;
--- SELECT 'Ä'::citext <> 'Ä'::citext AS t;
--- Test the Turkish dotted I. The lowercase is a single byte while the
--- uppercase is multibyte. This is why the comparison code can't be optimized
--- to compare string lengths.
--- SELECT 'i'::citext = 'İ'::citext AS t;
--- Regression.
--- SELECT 'láska'::citext <> 'laská'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext = 'Ask Bjørn Hansen'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext = 'ASK BJØRN HANSEN'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext <> 'Ask Bjorn Hansen'::citext AS t;
--- SELECT 'Ask Bjørn Hansen'::citext <> 'ASK BJORN HANSEN'::citext AS t;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS zero;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ask bjørn hansen'::citext) AS zero;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ASK BJØRN HANSEN'::citext) AS zero;
--- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjorn Hansen'::citext) AS positive;
--- SELECT citext_cmp('Ask Bjorn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS negative;
 -- Test > and >=
 SELECT 'B'::citext > 'a'::citext AS t;
  t 
diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out
new file mode 100644
index 00..1f4fa79aff
--- /dev/null
+++ b/contrib/citext/expected/citext_utf8.out
@@ -0,0 +1,119 @@
+/*
+ * This test must be run in a database with UTF-8 encoding,
+ * because other encodings don't support all the characters used.
+ */
+SELECT getdatabaseencoding() <> 'UTF8'
+   AS skip_test \gset
+\if :skip_test
+\quit
+\endif
+set client_encoding

RE: Optionally automatically disable logical replication subscriptions on error

2022-01-05 Thread osumi.takami...@fujitsu.com

On Tuesday, December 28, 2021 11:53 AM Wang, Wei/王 威  
wrote:
> On Thursday, December 16, 2021 8:51 PM osumi.takami...@fujitsu.com
>  wrote:
> > Attached the updated patch v14.
> 
> A comment to the timing of printing a log:
Thank you for your review !

> After the log[1] was printed, I altered subscription's option
> (DISABLE_ON_ERROR) from true to false before invoking
> DisableSubscriptionOnError to disable subscription. Subscription was not
> disabled.
> [1] "LOG:  logical replication subscription "sub1" will be disabled due to an
> error"
> 
> I found this log is printed in function WorkerErrorRecovery:
> + ereport(LOG,
> + errmsg("logical replication subscription \"%s\" will
> be disabled due to an error",
> +MySubscription->name));
> This log is printed here, but in DisableSubscriptionOnError, there is a check 
> to
> confirm subscription's disableonerr field. If disableonerr is found changed 
> from
> true to false in DisableSubscriptionOnError, subscription will not be 
> disabled.
> 
> In this case, "disable subscription" is printed, but subscription will not be
> disabled actually.
> I think it is a little confused to user, so what about moving this message 
> after
> the check which is mentioned above in DisableSubscriptionOnError?
Makes sense. I moved the log print after
the check of the necessity to disable the subscription.

Also, I've scrutinized and refined the new TAP test as well for refactoring.
As a result, I fixed wait_for_subscriptions()
so that some extra codes that can be simplified,
such as escaped variable and one part of WHERE clause, are removed.
Other change I did is to replace two calls of wait_for_subscriptions()
with polling_query_until() for the subscriber, in order to
make the tests better and more suitable for the test purposes.
Again, for this refinement, I've conducted a tight loop test
to check no timing issue and found no problem.

Best Regards,
Takamichi Osumi



v15-0001-Optionally-disable-subscriptions-on-error.patch
Description: v15-0001-Optionally-disable-subscriptions-on-error.patch

Re: row filtering for logical replication

2022-01-05 Thread Alvaro Herrera

BTW I think it's not great to commit with the presented split.  We would
have non-trivial short-lived changes for no good reason (0002 in
particular).  I think this whole series should be a single patch, with
the commit message being a fusion of messages explaining in full what
the functional change is, listing all the authors together.  Having a
commit message like in 0001 where all the distinct changes are explained
in separate sections with each section listing its own author, does not
sound very useful or helpful.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)

Re: Pluggable toaster

2022-01-05 Thread Simon Riggs

On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev  wrote:

> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
> - "one toast fits all"  may be not the best solution for particular
>   type or/and use cases
> - it doesn't know the internal structure of data type, so it  cannot
>   choose an optimal toast strategy
> - it can't  share common parts between different rows and even
>   versions of rows

Agreed, Oleg has made some very clear analysis of the value of having
a higher degree of control over toasting from within the datatype.

In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices

> Modification of current toaster for all tasks and cases looks too
> complex, moreover, it  will not works for  custom data types. Postgres
> is an extensible database,  why not to extent its extensibility even
> further, to have pluggable TOAST! We  propose an idea to separate
> toaster from  heap using  toaster API similar to table AM API etc.
> Following patches are applicable over patch in [1]

ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?

We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.

Some other thoughts:

I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.

I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)

--
Simon Riggshttp://www.EnterpriseDB.com/

Re: refactoring basebackup.c

2022-01-05 Thread Robert Haas

On Wed, Jan 5, 2022 at 5:11 AM tushar  wrote:
> One scenario where I feel error message is confusing and if it is not
> supported at all then error message need to be a little bit more clear
>
> if we use -z  (or -Z ) with -t , we are getting this error
> [edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch -z
> pg_basebackup: error: only tar mode backups can be compressed
> Try "pg_basebackup --help" for more information.
>
> but after removing -z option  backup is in tar mode only
>
> edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch
> [edb@centos7tushar bin]$ ls /tmp/test0
> backup_manifest  base.tar

OK, fair enough, I can adjust the error message for that case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Refactoring of compression options in pg_basebackup

2022-01-05 Thread Robert Haas

On Mon, Dec 20, 2021 at 7:43 PM Michael Paquier  wrote:
> Yeah, consistency would be good.  For the client-side compression of
> LZ4, we have shaped things around the existing --compress option, and
> there is 6f164e6 that offers an API to parse that at option-level,
> meaning less custom error strings.  I'd like to think that splitting
> the compression level and the compression method is still the right
> choice, except if --server-compression combined with a client-side
> compression is a legal combination.  This would not really make sense,
> though, no?  So we'd better block this possibility from the start?

Right. It's blocked right now, but Tushar noticed on the other thread
that the error message isn't as good as it could be, so I'll improve
that a bit. Still the issue wasn't overlooked.

> > If they don't decompress the backup and run pg_verifybackup on it then
> > I'm not sure how much they help. Yet, I don't know how to do that
> > portably.
>
> They help in checking that an environment does not use a buggy set of
> GZIP, at least.  Using pg_verifybackup on a base backup with
> --format='t' could be tweaked with $ENV{TAR} for the tarballs
> generation, for example, as we do in some other tests.  Option sets
> like "xvf" or "zxvf" should be rather portable across the buildfarm,
> no?  I'd like to think that this is not a requirement for adding
> checks in the compression path, as a first step, though, but I agree
> that it could be extended more.

Oh, well, if we have a working tar available, then it's not so bad. I
was thinking we couldn't really count on that, especially on Windows.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Refactoring of compression options in pg_basebackup

2022-01-05 Thread Tom Lane

Robert Haas  writes:
> Oh, well, if we have a working tar available, then it's not so bad. I
> was thinking we couldn't really count on that, especially on Windows.

I think the existing precedent is to skip the test if tar isn't there,
cf pg_basebackup/t/010_pg_basebackup.pl.  But certainly the majority of
buildfarm animals have it.

regards, tom lane

Re: Refactoring of compression options in pg_basebackup

2022-01-05 Thread Robert Haas

On Wed, Jan 5, 2022 at 4:17 AM  wrote:
> When the backend-side compression is completed, were there really be a need 
> for
> client-side compression? If yes, then it seems logical to have distinct 
> options
> for them and this cleanup makes sense. If not, then it seems logical to 
> maintain
> the current options list and 'simply' change the internals of the code, and 
> this
> cleanup makes sense.

I think we're going to want to offer both options. We can't know
whether the user prefers to consume CPU cycles on the server or on the
client. Compressing on the server has the advantage of potentially
saving transfer bandwidth, but the server is also often the busiest
part of the whole system, and users are often keen to offload as much
work as possible.

Given that, I'd like us to be thinking about what the full set of
options looks like once we have (1) compression on either the server
or the client and (2) multiple compression algorithms and (3) multiple
compression levels. Personally, I don't really like the decision made
by this proposed patch. In this patch's view of the world, -Z is a way
of providing the compression level for whatever compression algorithm
you happen to have selected, but I think of -Z as being the upper-case
version of -z which I think of as selecting specifically gzip. It's
not particularly intuitive to me that in a command like pg_basebackup
--compress=,  is a compression level rather than
an algorithm. So what I would propose is probably something like:

pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]

And then make -z short for --compress=gzip and -Z  short for
--compress=gzip --compression-level=. That would be a
backward-incompatible change to the definition of --compress, but as
long as -Z  works the same as today, I don't think many people will
notice. If we like, we can notice if the argument to --compress is an
integer and suggest using either -Z or --compress=gzip
--compression-level= instead.

In the proposed patch, you end up with pg_basebackup
--compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
find that quite odd, though as with all such things, opinions may
vary. In my proposal, that would be an error, because it would be
equivalent to --compress=lz4 --compress=gzip --compression-level=2,
and would thus involve conflicting compression method specifications.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: SQL:2011 application time

2022-01-05 Thread Peter Eisentraut


On 21.11.21 02:51, Paul A Jungwirth wrote:
Here are updated patches. They are rebased and clean up some of my 
TODOs.


This patch set looks very interesting.  It's also very big, so it's
difficult to see how to get a handle on it.  I did a pass through it
to see if there were any obvious architectural or coding style
problems.  I also looked at some of your TODO comments to see if I had
something to contribute there.

I'm confused about how to query tables based on application time
periods.  Online, I see examples using AS OF, but in the SQL standard
I only see this used for system time, which we are not doing here.
What is your understanding of that?


v10-0001-Add-PERIODs.patch

src/backend/commands/tablecmds.c

Might be worth explaining somewhere why AT_PASS_ADD_PERIOD needs to be
its own pass. -- Ah, this is explained in ATPrepCmd().  Maybe that is
okay, but I would tend to prefer a comprehensive explanation here
rather than sprinkled around.

make_period_not_backward(): Hardcoding the name of the operator as "<"
is not good.  You should perhaps lookup the less-than operator in the
type cache.  Look around for TYPECACHE_LT_OPR for how this is usually done.

validate_period(): Could use an explanatory comment.  There are a
bunch of output arguments, and it's not clear what all of this is
supposed to do, and what "validating" is in this context.

MergeAttributes(): I would perhaps initially just prohibit inheritance
situations that involve periods on either side.  (It should work for
partitioning, IMO, but that should be easier to arrange.)

AlterTableGetLockLevel(): The choice of AccessExclusiveLock looks
correct.  I think the whole thing can also be grouped with some of the
other "affects concurrent SELECTs" cases?

Maybe the node type Period could have a slightly more specific name,
perhaps PeriodDef, analogous to ColumnDef?

I didn't follow why indexes would have periods, for example, the new
period field in IndexStmt.  Is that explained anywhere?

While reading this patch I kept wondering whether it would be possible
to fold periods into pg_attribute, perhaps with negative attribute
numbers.  Have you looked into something like that?  No doubt it's
also complicated, but it might simplify some things, like the name
conflict checking.


v10-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch

src/backend/catalog/Catalog.pm: I see you use this change in the
subsequent patches, but I would recommend skipping all this.  The
comments added are kind of redundant with the descr fields anyway.

transformIndexConstraint(): As above, we can't look up the && operator
by name.  In this case, I suppose we should look it up through the
index AM support operators.

Further, the additions to this function are very complicated and not
fully explained.  I'm suspicious about things like
findNewOrOldColumn() -- generally we should look up columns by number
not name.  Perhaps you can add a header comment or split out the code
further into smaller functions.

pg_dump.c getIndexes() has been refactored since to make
version-specific additions easier.  But your patch is now failing to
apply because of this.

Of course, the main problem in this patch is that for most uses it
requires btree_gist.  I think we should consider moving that into
core, or at least the support for types that are most relevant to this
functionality, specifically the date/time types.  Aside from user
convenience, this would also allow writing more realistic test cases.


v10-0003-Add-UPDATE-DELETE-FOR-PORTION-OF.patch

Use of MINVALUE and MAXVALUE for unbounded seems problematic to me.
(If it is some value, it is not really larger than any value.)  We
have the keyword UNBOUNDED, which seems better suited.

src/backend/access/brin/brin_minmax_multi.c

These renaming changes seem unrelated (but still seem like a good
idea).  Should they be progressed separately?

Again, some hardcoded operator name lookup in this patch.

I don't understand why a temporal primary key is required for doing
UPDATE FOR PORTION OF.  I don't see this in the standard.


v10-0004-Add-temporal-FOREIGN-KEYs.patch

Do we really need different trigger names depending on whether the
foreign key is temporal?

range_as_string() doesn't appear to be used anywhere.

I ran out of steam on this patch, it's very big.  But it seems sound
in general.


How to proceed.  I suppose we could focus on committing 0001 and 0002
first.  That would be a sensible feature set even if the remaining
patches did not make a release.  I do feel we need to get btree_gist
into core.  That might be a big job by itself.  I'm also bemused why
btree_gist is so bloated compared to btree_gin.  btree_gin uses macros
to eliminate duplicate code where btree_gist is full of
copy-and-paste.  So there are some opportunities there to make things
more compact.  Is there anything else you think we can do as
preparatory work to make the main patches more manageable?

Re: Converting WAL to SQL

2022-01-05 Thread Bruce Momjian

On Tue, Jan  4, 2022 at 10:47:47AM -0300, Fabrízio de Royes Mello wrote:
> 
> On Tue, Jan 4, 2022 at 9:22 AM Michael Paquier  wrote:
> >
> > On Wed, Dec 29, 2021 at 08:50:23AM -0300, Fabrízio de Royes Mello wrote:
> > > Try this:
> > > https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw
> >
> > You may want to be careful with this, and I don't know if anybody is
> > using that for serious cases so some spots may have been missed.
> >
> 
> I used it in the past during a major upgrade process from 9.2 to 9.6. 
> 
> What we did was decode the 9.6 wal files and apply transactions to the
> old 9.2 to keep it in sync with the new promoted version. This was our
> "rollback" strategy if something went wrong with the new 9.6 version.

How did you deal with the issue that SQL isn't granular enough (vs.
row-level changes) to reproduce the result reliably, as outlined here?

https://momjian.us/main/blogs/pgblog/2019.html#March_6_2019

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Index-only scan for btree_gist turns bpchar to char

2022-01-05 Thread Japin Li



On Wed, 05 Jan 2022 at 03:19, Tom Lane  wrote:
> Alexander Lakhin  writes:
>> While testing the index-only scan fix, I've discovered that replacing
>> the index-only scan with the index scan changes contrib/btree_gist
>> output because index-only scan for btree_gist returns a string without
>> padding.
>
> Ugh, yeah.  This seems to be because gbt_bpchar_compress() strips
> trailing spaces (using rtrim1) before storing the value.  The
> idea evidently is to simplify gbt_bpchar_consistent, but it's not
> acceptable if the opclass is supposed to support index-only scan.
>
> I see two ways to fix this:
>
> * Disallow index-only scan, by removing the fetch function for this
> opclass.  This'd require a module version bump, so people wouldn't
> get that fix automatically.
>
> * Change gbt_bpchar_compress to not trim spaces (it becomes just
> like gbt_text_compress), and adapt gbt_bpchar_consistent to cope.
> This does nothing for the problem immediately, unless you REINDEX
> affected indexes --- but over time an index's entries would get
> replaced with untrimmed versions.
>
> I also wondered if we could make the fetch function reconstruct the
> padding, but it doesn't seem to have access to the necessary info.
>

If we fix this In the second way, the range query has the same results
in both seq scan and index only scan.  However, it will incur other
problems.  For the following query:

SELECT *, octet_length(a) FROM chartmp WHERE a = '31b0';

Currently, we can get

   a  | octet_length
--+--
 31b0 |4

After fixed, we cannot get any result.  For the equal condition,
we must put the extra spaces to make it work.

Here is a patch for POC testing.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/contrib/btree_gist/btree_text.c b/contrib/btree_gist/btree_text.c
index 8019d11281..5fd425047f 100644
--- a/contrib/btree_gist/btree_text.c
+++ b/contrib/btree_gist/btree_text.c
@@ -121,16 +121,7 @@ gbt_bpchar_compress(PG_FUNCTION_ARGS)
}
 
if (entry->leafkey)
-   {
-
-   Datum   d = DirectFunctionCall1(rtrim1, entry->key);
-   GISTENTRY   trim;
-
-   gistentryinit(trim, d,
- entry->rel, entry->page,
- entry->offset, true);
-   retval = gbt_var_compress(&trim, &tinfo);
-   }
+   retval = gbt_var_compress(entry, &tinfo);
else
retval = entry;
 
@@ -179,7 +170,6 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS)
boolretval;
GBT_VARKEY *key = (GBT_VARKEY *) DatumGetPointer(entry->key);
GBT_VARKEY_R r = gbt_var_key_readable(key);
-   void   *trim = (void *) DatumGetPointer(DirectFunctionCall1(rtrim1, 
PointerGetDatum(query)));
 
/* All cases served by this function are exact */
*recheck = false;
@@ -189,7 +179,7 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS)
tinfo.eml = pg_database_encoding_max_length();
}
 
-   retval = gbt_var_consistent(&r, trim, strategy, PG_GET_COLLATION(),
+   retval = gbt_var_consistent(&r, query, strategy, PG_GET_COLLATION(),

GIST_LEAF(entry), &tinfo, fcinfo->flinfo);
PG_RETURN_BOOL(retval);
 }

Re: Index-only scan for btree_gist turns bpchar to char

2022-01-05 Thread Tom Lane

Japin Li  writes:
> Here is a patch for POC testing.

This is certainly not right.  You've made gbt_bpchar_consistent
work identically to gbt_text_consistent, but it needs to implement
a test equivalent to bpchareq, ie ignore trailing spaces in both
inputs.

The minimum-effort fix would be to apply rtrim1 to both strings
in gbt_bpchar_consistent, but I wonder if we can improve on that
by pushing the ignore-trailing-spaces behavior further down.
I didn't look yet at whether gbt_var_consistent can support
any type-specific behavior.

regards, tom lane

Re: refactoring basebackup.c

2022-01-05 Thread tushar

On Tue, Dec 28, 2021 at 1:12 PM Jeevan Ladhe 
wrote:

> Hi Tushar,
>
> You need to apply Robert's v10 version patches 0002, 0003 and 0004, before
> applying the lz4 patch(v8 version).
> Please let me know if you still face any issues.
>

Thanks, Jeevan.
I tested —server-compression option using different other options of
pg_basebackup, also checked -t/—server-compression from pg_basebackup of
v15 will
throw an error if the server version is v14 or below. Things are looking
good to me.
Two open  issues -
1)lz4 value is missing for --server-compression in pg_basebackup --help
2)Error messages need to improve if using -t server with -z/-Z

regards,

Re: SQL/JSON: functions

2022-01-05 Thread Andrew Dunstan



On 1/5/22 00:51, Himanshu Upadhyaya wrote:
> On Thu, Dec 9, 2021 at 7:34 PM Himanshu Upadhyaya
>  wrote:
>> 3)
>> Is not that result of the two below queries should match because both are 
>> trying to retrieve the information from the JSON object.
>>
>> postgres=# SELECT JSON_OBJECT('track' VALUE '{
>> "segments": [
>>   {
>> "location":   [ 47.763, 13.4034 ],
>> "start time": "2018-10-14 10:05:14",
>> "HR": 73
>>   },
>>   {
>> "location":   [ 47.706, 13.2635 ],
>> "start time": "2018-10-14 101:39:21",
>> "HR": 135
>>   }
>> ]
>>   }
>> }')->'track'->'segments';
>>  ?column?
>> --
>>
>> (1 row)
>>
>> postgres=# select '{
>>   "track": {
>> "segments": [
>>   {
>> "location":   [ 47.763, 13.4034 ],
>> "start time": "2018-10-14 10:05:14",
>> "HR": 73
>>   },
>>   {
>> "location":   [ 47.706, 13.2635 ],
>> "start time": "2018-10-14 10:39:21",
>> "HR": 135
>>   }
>> ]
>>   }
>> }'::jsonb->'track'->'segments';
>>  
>> ?column?
>> ---
>>  [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 
>> 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": 
>> "2018-10-14 10:39:21"}]
>> (1 row)
>>
> just wanted to check your opinion on the above, is this an expected behaviour?


Your VALUE clause is actually not legal JSON - it has one too many
braces at the end. The reason postgres didn't complain about it is that
JSON_OBJECT is treating it as a string. If you correct the JSON and cast
it as jsonb you get the desired result:

andrew=# SELECT JSON_OBJECT('track' VALUE '{
"segments": [
  {
"location":   [ 47.763, 13.4034 ],
"start time": "2018-10-14 10:05:14",
"HR": 73
  },
  {
"location":   [ 47.706, 13.2635 ],
"start time": "2018-10-14 101:39:21",
"HR": 135
  }
]
  }'::jsonb)->'track'->'segments';
  
?column?
  

 [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 
10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": 
"2018-10-14 101:39:21"}]
(1 row)


>> Few comments For 0002-SQL-JSON-constructors-v59.patch:
> Also, any thoughts on this?



I will look at that separately.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: make tuplestore helper function

2022-01-05 Thread Melanie Plageman

Thanks for the detailed review!

On Fri, Dec 17, 2021 at 3:04 PM Justin Pryzby  wrote:
>
> On Mon, Dec 13, 2021 at 05:27:52PM -0500, Melanie Plageman wrote:
> > Done in attached v4
>
> Thanks.
>
> I think these comments can be removed from the callers:
> /* it's a simple type, so don't use get_call_result_type */

Done in attached v5

> You removed one call to tuplestore_donestoring(), but not the others.
> I guess you could remove them all (optionally as a separate patch).

I removed it in that one location because I wanted to get rid of the
local variable it used. I am fine with removing the other occurrences,
but I thought there might be some reason why instead of removing it,
they made it into a no-op.

> The rest of these are questions more than comments, and I'm not necessarily
> suggesting to change the patch:
>
> This isn't new in your patch, but for me, it's more clear if the callers have 
> a
> separate boolean for randomAccess, rather than having the long expression in
> the function call:
>
> +   tupstore = MakeFuncResultTuplestore(fcinfo, NULL,
> +   rsinfo->allowedModes & SFRM_Materialize_Random);
>
> vs
>
> randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
> -   tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
> +   tupstore = MakeFuncResultTuplestore(fcinfo, NULL, randomAccess);
>
> One could also argue for passing rsinfo->allowedModes, instead of
> (rsinfo->allowedModes & SFRM_Materialize_Random).

I've changed the patch to have MakeFuncResultTuplestore() check
rsinfo->allowedModes and pass in the resulting boolean to
tuplestore_begin_heap(). Because I modified the
MakeFuncResultTuplestore() function signature to accommodate this, I had
to collapse the two patches into one, as I couldn't maintain the call
sites which passed in a constant.

> There's a couples places that you're checking expectedDesc where it wasn't
> being checked before.  Is that some kind of live bug ?
> pg_config() text_to_table()

Yes, expectedDesc was accessed in these locations without checking that
it is not NULL. Should that be a separate patch?

> It'd be nice if the "expected tuple format" error didn't need to be duplicated
> for each SRFs.  I suppose the helper function could just take a boolean
> determining whether or not to throw an error (passing "expectedDesc==NULL").
> You'd have to call the helper before CreateTupleDescCopy() - which you're
> already doing in a couple places for similar reasons.

So, I don't think it makes sense to move that error message into
MakeFuncResultTuplestore() for the cases in which NULL is passed in for
the tuple descriptor. expectedDesc is not accessed at all in these cases
inside the function (since get_call_result_type()) is not called. For
the cases when a tuple descriptor is passed in, only two callers
actually check for expectedDesc -- each_worker_jsonb and each_worker.
Therefore, I don't think it makes sense to add another parameter just to
move that check inside for two callers.

> I noticed that tuplestore.h isn't included most places.  I suppose it's
> included via execnodes.h.  Your patch doesn't change that, arguably it
> should've always been included.

I've now included it in funcapi.c.

- Melanie
From 8ea7f90162f73049541fb6f53e714298a5ecfc41 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 2 Nov 2021 12:20:55 -0400
Subject: [PATCH v5] Add helper to make tuplestore

And use it to refactor out existing duplicate code.

Note that for func-api callers previously calling
tuplestore_begin_heap() with a constant value for the randomAccess
parameter, by using MakeFuncResultTuplestore(), they are now passing
true or false based on whether or not SFRM_Materialize_Random is set in
rsinfo->allowedModes.

This is consistent with the instructions in
src/backend/utils/fmgr/README, which state that tuplestores "must be
created with randomAccess = true if SFRM_Materialize_Random is set in
allowedModes, but it can (and preferably should) be created with
randomAccess = false if not".

Author: Melanie Plageman 
Reviewed-by: Justin Pryzby 
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs%2BPxeHw1CWJeXKofkE6TuZg%40mail.gmail.com
---
 src/backend/access/transam/xlogfuncs.c| 16 +---
 src/backend/commands/event_trigger.c  | 32 +--
 src/backend/commands/extension.c  | 48 +--
 src/backend/commands/prepare.c| 15 +---
 src/backend/foreign/foreign.c | 51 +--
 src/backend/libpq/hba.c   | 19 +
 src/backend/replication/logical/launcher.c| 16 +---
 .../replication/logical/logicalfuncs.c| 18 +---
 src/backend/replication/logical/origin.c  | 19 +
 src/backend/replication/slotfuncs.c   | 16 +---
 src/backend/replication/walsender.c   | 16 +---
 src/backend/storage/ipc/shmem.c   | 16 +---
 src/backend/utils/adt/datetime.c  | 1

Re: Converting WAL to SQL

2022-01-05 Thread Julien Rouhaud

On Thu, Jan 6, 2022 at 12:19 AM Bruce Momjian  wrote:
>
> On Tue, Jan  4, 2022 at 10:47:47AM -0300, Fabrízio de Royes Mello wrote:
> >
> >
> > What we did was decode the 9.6 wal files and apply transactions to the
> > old 9.2 to keep it in sync with the new promoted version. This was our
> > "rollback" strategy if something went wrong with the new 9.6 version.
>
> How did you deal with the issue that SQL isn't granular enough (vs.
> row-level changes) to reproduce the result reliably, as outlined here?

This is a logical decoding plugin, so it's SQL containing decoded
row-level changes.  It will behave the same as a
publication/suscription (apart from being far less performant, due to
being plain SQL of course).

Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2022-01-05 Thread Andres Freund

Hi,

On 2021-12-29 11:31:51 -0800, Andres Freund wrote:
> That's pretty much the same - XLogInsert() can trigger an
> XLogWrite()/Flush().
> 
> I think it's a complete no-go to add throttling to these places. It's quite
> possible that it'd cause new deadlocks, and it's almost guaranteed to have
> unintended consequences (e.g. replication falling back further because
> XLogFlush() is being throttled).

I thought of another way to implement this feature. What if we checked the
current distance somewhere within XLogInsert(), but only set
InterruptPending=true, XLogDelayPending=true. Then in ProcessInterrupts() we
check if XLogDelayPending is true and sleep the appropriate time.

That way the sleep doesn't happen with important locks held / within a
critical section, but we still delay close to where we went over the maximum
lag. And the overhead should be fairly minimal.

I'm doubtful that implementing the waits on a transactional level provides a
meaningful enough amount of control - there's just too much WAL that can be
generated within a transaction.

Greetings,

Andres Freund

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-05 Thread Simon Riggs

On Thu, 30 Dec 2021 at 13:19, Maxim Orlov  wrote:
>
> Hi, hackers!
>
> Long time wraparound was a really big pain for highly loaded systems. One 
> source of performance degradation is the need to vacuum before every 
> wraparound.
> And there were several proposals to make XIDs 64-bit like [1], [2], [3] and 
> [4] to name a few.

Very good to see this revived.

> PFA updated working patch v6 for PG15 development cycle.
> It is based on a patch by Alexander Korotkov version 5 [5] with a few fixes, 
> refactoring and was rebased to PG15.
>
> Main changes:
> - Change TransactionId to 64bit

This sounds like a good route to me.

> - Disk tuple format (HeapTupleHeader) is unchanged: xmin and xmax
>   remains 32bit
> -- 32bit xid is named ShortTransactionId now.
> -- Exception: see "double xmax" format below.
> - Heap page format is changed to contain xid and multixact base value,
>   tuple's xmin and xmax are offsets from.
> -- xid_base and multi_base are stored as a page special data. PageHeader
>remains unmodified.
> -- If after upgrade page has no free space for special data, tuples are
>converted to "double xmax" format: xmin became virtual
>FrozenTransactionId, xmax occupies the whole 64bit.
>Page converted to new format when vacuum frees enough space.
> - In-memory tuples (HeapTuple) were enriched with copies of xid_base and
>   multi_base from a page.

I think we need more Overview of Behavior than is available with this
patch, perhaps in the form of a README, such as in
src/backend/access/heap/README.HOT.

Most people's comments are about what the opportunities and problems
caused, and mine are definitely there also. i.e. explain the user
visible behavior.
Please explain the various new states that pages can be in and what
the effects are,

My understanding is this would be backwards compatible, so we can
upgrade to it. Please confirm.

Thanks

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-05 Thread Zhihong Yu

On Tue, Jan 4, 2022 at 10:21 PM Shinya Kato 
wrote:

> Thank you for the new patch!
>
> On 2021-12-15 15:40, kuroda.hay...@fujitsu.com wrote:
> > Dear Kato-san,
> >
> > Thank you for giving comments! And sorry for late reply.
> > I rebased my patches.
> >
> >> Even for local-only transaction, I thought it useless to execute
> >> CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
> >> make it so that it determines outside whether it contains SQL to the
> >> remote or not?
> >
> > Yeah, remote-checking timeout will be enable even ifa local
> > transaction is opened.
> > In my understanding, postgres cannot distinguish whether opening
> > transactions
> > are using only local object or not.
> > My first idea was that turning on the timeout when GetFdwRoutineXXX
> > functions
> > were called, but in some cases FDWs may not use remote connection even
> > if
> > they call such a handler function. e.g. postgresExplainForeignScan().
> > Hence I tried another approach that unregister all checking callback
> > the end of each transactions. Only FDWs can notice that transactions
> > are remote or local,
> > so they must register callback when they really connect to other
> > database.
> > This implementation will avoid calling remote checking in the case of
> > local transaction,
> > but multiple registering and unregistering may lead another overhead.
> > I attached which implements that.
> >
> It certainly incurs another overhead, but I think it's better than the
> previous one.
> So far, I haven't encountered any problems, but I'd like other people's
> opinions.
>
> >> The following points are minor.
> >> 1. In foreign.c, some of the lines are too long and should be broken.
> >> 2. In pqcomm.c, the newline have been removed, what is the intention
> >> of
> >> this?
> >> 3. In postgres.c,
> >> 3-1. how about inserting a comment between lines 2713 and 2714,
> >> similar
> >> to line 2707?
> >> 3-2. the line breaks in line 3242 and line 3243 should be aligned.
> >> 3-3. you should change
> >>  /*
> >>  * Skip checking foreign servers while reading
> >> messages.
> >>  */
> >> to
> >>  /*
> >>   * Skip checking foreign servers while reading
> >> messages.
> >>   */
> >> 4. In connection.c, There is a typo in line 1684, so "fucntion" should
> >> be changed to "function".
> >
> > Maybe all of them were fixed. Thanks!
> Thank you, and it looks good to me.
>
>
> Hi,
+   UnregisterAllCheckingRemoteServersCallback();

UnregisterAllCheckingRemoteServersCallback
-> UnregisterAllCheckingRemoteServersCallbacks

+   CheckingRemoteServersCallbackItem *item;
+   item = fdw_callbacks;

The above two lines can be combined.

+UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback
callback,
+   void *arg)

Is the above func called anywhere ?

+   if (item->callback == callback && item->arg == arg)

Is comparing argument pointers stable ? What if the same argument is cloned
?

+   CallCheckingRemoteServersCallbacks();
+
+   if (remote_servers_connection_check_interval > 0)
+   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
+
 remote_servers_connection_check_interval);

Should the call to CallCheckingRemoteServersCallbacks() be placed under the
if block checking remote_servers_connection_check_interval ?
If remote_servers_connection_check_interval is 0, it seems the callbacks
shouldn't be called.

Cheers

Re: [PATCH] Run UTF8-dependent tests for citext [Re: daitch_mokotoff module]

2022-01-05 Thread Tom Lane

Dag Lem  writes:
> Please find attached a patch to run the previously commented-out
> UTF8-dependent tests for citext, according to best practice. For now I
> don't dare to touch the unaccent module, which seems to be UTF8-only
> anyway.

I tried this on a bunch of different locale settings and concluded that
we need to restrict the locale to avoid failures: it falls over with
locale C.  With that, it passes on all UTF8 LANG settings on RHEL8
and FreeBSD 12, and all except am_ET.UTF-8 on current macOS.  I'm not
sure what the deal is with am_ET, but macOS has a long and sad history
of wonky UTF8 locales, so I was actually expecting worse.  If the
buildfarm shows more problems, we can restrict it further --- I won't
be too upset if we end up restricting to just Linux systems, like
collate.linux.utf8.  Anyway, pushed to see what happens.

regards, tom lane

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

2022-01-05 Thread Andres Freund

Hi,

On 2021-12-28 15:49:00 +, Jelte Fennema wrote:
> The first patch is a cleaned up version of my previous patch. I think I 
> addressed
> all feedback on the previous version in that patch (e.g. removed windows 
> code, 
> fixed formatting).

To me it seems a bit problematic to introduce a divergence between windows /
everything else here. Isn't that just going to lead to other complaints just
like this thread, where somebody discovered the hard way that there's platform
dependent behaviour here?


> The second patch is a new one, it implements honouring of the connect_timeout 
> connection option in PQcancel. This patch requires the first patch to also be 
> applied,
> but since it seemed fairly separate and the code is not trivial I didn't want 
> the first
> patch to be blocked on this.
> 
> Finally, I would love it if once these fixes are merged the would also be 
> backpatched to 
> previous versions of libpq. Does that seem possible? As far as I can tell it 
> would be fine, 
> since it doesn't really change any of the public APIs. The only change is 
> that the pg_cancel 
> struct now has a few additional fields. But since that struct is defined in 
> libpq-int.h, so that 
> struct should not be used by users of libpq directly, right?.

I'm not really convinced this is a good patch to backpatch. There does seem to
be some potential for subtle breakage - code in signal handlers is notoriously
finnicky, it's a rarely exercised code path, etc. It's also not fixing
something that previously worked.


> +  * NOTE: These socket options are currently not set for Windows. The
> +  * reason is that signal safety in this function is very important, and 
> it
> +  * was not clear to if the functions required to set the socket options 
> on
> +  * Windows were signal-safe.
> +  */
> +#ifndef WIN32
> + if (!IS_AF_UNIX(cancel->raddr.addr.ss_family))
> + {
> +#ifdef TCP_USER_TIMEOUT
> + if (cancel->pgtcp_user_timeout >= 0)
> + {
> + if (setsockopt(tmpsock, IPPROTO_TCP, TCP_USER_TIMEOUT,
> +(char *) 
> &cancel->pgtcp_user_timeout,
> +
> sizeof(cancel->pgtcp_user_timeout)) < 0)
> + {
> + strlcpy(errbuf, "PQcancel() -- 
> setsockopt(TCP_USER_TIMEOUT) failed: ", errbufsize);
> + goto cancel_errReturn;
> + }
> + }
> +#endif
> +
> + if (cancel->keepalives != 0)
> + {
> + int on = 1;
> +
> + if (setsockopt(tmpsock,
> +SOL_SOCKET, SO_KEEPALIVE,
> +(char *) &on, sizeof(on)) < 
> 0)
> + {
> + strlcpy(errbuf, "PQcancel() -- 
> setsockopt(SO_KEEPALIVE) failed: ", errbufsize);
> + goto cancel_errReturn;
> + }
> + }

This is very repetitive - how about introducing a helper function for this?



> @@ -4467,8 +4601,8 @@ retry3:
>  
>   crp.packetlen = pg_hton32((uint32) sizeof(crp));
>   crp.cp.cancelRequestCode = (MsgType) pg_hton32(CANCEL_REQUEST_CODE);
> - crp.cp.backendPID = pg_hton32(be_pid);
> - crp.cp.cancelAuthCode = pg_hton32(be_key);
> + crp.cp.backendPID = pg_hton32(cancel->be_pid);
> + crp.cp.cancelAuthCode = pg_hton32(cancel->be_key);


Others might differ, but I'd separate changing the type passed to
internal_cancel() into its own commit.


Greetings,

Andres Freund

Are we missing a dot in initdb's output?

2022-01-05 Thread Daniel Westermann (DWE)

Hi,

this is more a cosmetic concern, but anyway: running initdb gives this output:

...
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
...

Shouldn't there be a "." after "authentication for local connections"? Probably 
it should be like this:
initdb: warning: Enabling "trust" authentication for local connections.

initdb's output a few lines earlier gives this, which all close with a "dot" 
and start with upper case:

"The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled."


Regards
Daniel

Re: Converting WAL to SQL

2022-01-05 Thread Fabrízio de Royes Mello

On Wed, Jan 5, 2022 at 2:19 PM Julien Rouhaud  wrote:
>
> On Thu, Jan 6, 2022 at 12:19 AM Bruce Momjian  wrote:
> >
> > On Tue, Jan  4, 2022 at 10:47:47AM -0300, Fabrízio de Royes Mello wrote:
> > >
> > >
> > > What we did was decode the 9.6 wal files and apply transactions to the
> > > old 9.2 to keep it in sync with the new promoted version. This was our
> > > "rollback" strategy if something went wrong with the new 9.6 version.
> >
> > How did you deal with the issue that SQL isn't granular enough (vs.
> > row-level changes) to reproduce the result reliably, as outlined here?
>
> This is a logical decoding plugin, so it's SQL containing decoded
> row-level changes.  It will behave the same as a
> publication/suscription (apart from being far less performant, due to
> being plain SQL of course).

Exactly!

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

2022-01-05 Thread Tom Lane

Andres Freund  writes:
> On 2021-12-28 15:49:00 +, Jelte Fennema wrote:
>> Finally, I would love it if once these fixes are merged the would also be 
>> backpatched to 
>> previous versions of libpq.

> I'm not really convinced this is a good patch to backpatch. There does seem to
> be some potential for subtle breakage - code in signal handlers is notoriously
> finnicky, it's a rarely exercised code path, etc. It's also not fixing
> something that previously worked.

IMO, this is a new feature not a bug fix, and as such it's not backpatch
material ... especially if we don't have 100.00% confidence in it.

regards, tom lane

Re: Are we missing a dot in initdb's output?

2022-01-05 Thread Tom Lane

"Daniel Westermann (DWE)"  writes:
> initdb: warning: enabling "trust" authentication for local connections
> You can change this by editing pg_hba.conf or using the option -A, or
> --auth-local and --auth-host, the next time you run initdb.

> Shouldn't there be a "." after "authentication for local connections"?

Meh, I think this is mimicking our coding style for primary vs. detail
messages.

regards, tom lane

Re: daitch_mokotoff module

2022-01-05 Thread Dag Lem

Dag Lem  writes:

> Thomas Munro  writes:
>
>> On Wed, Jan 5, 2022 at 2:49 AM Dag Lem  wrote:
>>> However I guess this won't make any difference wrt. actually running the
>>> tests, as long as there seems to be an encoding problem in the cfbot
>>
>> Fixed -- I told it to pull down patches as binary, not text.  Now it
>> makes commits that look healthier, and so far all the Unix systems
>> have survived CI:
>>
>> https://github.com/postgresql-cfbot/postgresql/commit/79700efc61d15c2414b8450a786951fa9308c07f
>> http://cfbot.cputube.org/dag-lem.html
>>
>
> Great!
>
> Dag
>
>

After this I did the mistake of including a patch for citext in this
thread, which is now picked up by cfbot instead of the Daitch-Mokotoff
patch.

Attaching the original patch again in order to hopefully fix my mistake.

Best regards

Dag Lem

diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile
index 0704894f88..1d5bd84be8 100644
--- a/contrib/fuzzystrmatch/Makefile
+++ b/contrib/fuzzystrmatch/Makefile
@@ -3,14 +3,15 @@
 MODULE_big = fuzzystrmatch
 OBJS = \
 	$(WIN32RES) \
+	daitch_mokotoff.o \
 	dmetaphone.o \
 	fuzzystrmatch.o
 
 EXTENSION = fuzzystrmatch
-DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.0--1.1.sql
+DATA = fuzzystrmatch--1.2.sql fuzzystrmatch--1.1--1.2.sql fuzzystrmatch--1.0--1.1.sql
 PGFILEDESC = "fuzzystrmatch - similarities and distance between strings"
 
-REGRESS = fuzzystrmatch
+REGRESS = fuzzystrmatch fuzzystrmatch_utf8
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -22,3 +23,8 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+daitch_mokotoff.o: daitch_mokotoff.h
+
+daitch_mokotoff.h: daitch_mokotoff_header.pl
+	perl $< > $@
diff --git a/contrib/fuzzystrmatch/daitch_mokotoff.c b/contrib/fuzzystrmatch/daitch_mokotoff.c
new file mode 100644
index 00..1b7263c349
--- /dev/null
+++ b/contrib/fuzzystrmatch/daitch_mokotoff.c
@@ -0,0 +1,593 @@
+/*
+ * Daitch-Mokotoff Soundex
+ *
+ * Copyright (c) 2021 Finance Norway
+ * Author: Dag Lem 
+ *
+ * This implementation of the Daitch-Mokotoff Soundex System aims at high
+ * performance.
+ *
+ * - The processing of each phoneme is initiated by an O(1) table lookup.
+ * - For phonemes containing more than one character, a coding tree is traversed
+ *   to process the complete phoneme.
+ * - The (alternate) soundex codes are produced digit by digit in-place in
+ *   another tree structure.
+ *
+ *  References:
+ *
+ * https://www.avotaynu.com/soundex.htm
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * https://familypedia.fandom.com/wiki/Daitch-Mokotoff_Soundex
+ * https://stevemorse.org/census/soundex.html (dmlat.php, dmsoundex.php)
+ * https://github.com/apache/commons-codec/ (dmrules.txt, DaitchMokotoffSoundex.java)
+ * https://metacpan.org/pod/Text::Phonetic (DaitchMokotoff.pm)
+ *
+ * A few notes on other implementations:
+ *
+ * - All other known implementations have the same unofficial rules for "UE",
+ *   these are also adapted by this implementation (0, 1, NC).
+ * - The only other known implementation which is capable of generating all
+ *   correct soundex codes in all cases is the JOS Soundex Calculator at
+ *   https://www.jewishgen.org/jos/jossound.htm
+ * - "J" is considered (only) a vowel in dmlat.php
+ * - The official rules for "RS" are commented out in dmlat.php
+ * - Identical code digits for adjacent letters are not collapsed correctly in
+ *   dmsoundex.php when double digit codes are involved. E.g. "BESST" yields
+ *   744300 instead of 743000 as for "BEST".
+ * - "J" is considered (only) a consonant in DaitchMokotoffSoundex.java
+ * - "Y" is not considered a vowel in DaitchMokotoffSoundex.java
+ *
+ * Permission to use, copy, modify, and distribute this software and its
+ * documentation for any purpose, without fee, and without a written agreement
+ * is hereby granted, provided that the above copyright notice and this
+ * paragraph and the following two paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
+ * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
+ * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
+ * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+*/
+
+#include "daitch_mokotoff.h"
+
+#include "postgres.h"
+#include "utils/builtins.h"
+#include "mb/pg_wchar.h"
+
+#include 
+#include 
+
+/* Internal C implementation */
+static char *_daitch_mokotoff(char *word, char *soundex, size_t n);
+
+
+PG_FUN

Re: Collecting statistics about contents of JSONB columns

2022-01-05 Thread Thomas Munro

On Sat, Jan 1, 2022 at 11:07 AM Tomas Vondra
 wrote:
> 0006-Add-jsonb-statistics-20211230.patch

Hi Tomas,

-CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+ï»¿CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)

https://cirrus-ci.com/task/6405547984420864

It looks like there is a Unicode BOM sequence in there, which is
upsetting our Windows testing but not the 3 Unixes (not sure why).
Probably added by an editor.

Re: Logical insert/update/delete WAL records for custom table AMs

2022-01-05 Thread Simon Riggs

On Wed, 10 Nov 2021 at 03:17, Amit Kapila  wrote:
>
> On Tue, Nov 9, 2021 at 5:12 AM Jeff Davis  wrote:
> >
> > On Fri, 2021-11-05 at 10:00 +0530, Amit Kapila wrote:
> > > I am not talking about decoding plugin but rather decoding itself,
> > > basically, the work we do in reorderbuffer.c, decode.c, etc. The two
> > > things I remember were tuple format and transaction ids as mentioned
> > > in my previous email.
> >
> > If it's difficult to come up with something that will work for all
> > table AMs, then it suggests that we might want to go towards fully
> > extensible rmgr, and have a decoding method in the rmgr.
> >
> > I started a thread (with a patch) here:
> >
> >
> > https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.ca...@j-davis.com
> >
> > > I think we should try to find a solution for
> > > tuple format as the current decoding code relies on it if we want
> > > decoding to deal with another table AMs transparently.
> >
> > Agreed, but it seems like we need basic extensibility first. For now,
> > we'll need to convert to a heap tuple,
> >
>
> Okay, but that might have a cost because we need to convert it before
> WAL logging it, and then we probably also need to convert back to the
> original table AM format during recovery/standby apply.

I spoke with Jeff in detail about this patch in NYC Dec 21, and I now
think it is worth pursuing. It seems much more likely that this would
be acceptable than fully extensible rmgr.

Amit asks a good question: should we be writing a message that seems
to presume the old heap tuple format? My answer is that we clearly
need it to be written in *some* common format, and the current heap
format currently works, so de facto it is the format we should use.
Right now, TAMs have to reformat back into this same format, so it is
the way the APIs work. Put it another way: I don't see any other
format that makes sense to use, now, but that could change in the
future.

So I'm signing up as a reviewer and we'll see if this is good to go.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Collecting statistics about contents of JSONB columns

2022-01-05 Thread Simon Riggs

On Fri, 31 Dec 2021 at 22:07, Tomas Vondra
 wrote:

> The patch does something far more
> elegant - it simply uses stavalues to store an array of JSONB documents,
> each describing stats for one path extracted from the sampled documents.

Sounds good.

> I'm sure there's plenty open questions - for example I think we'll need
> some logic to decide which paths to keep, otherwise the statistics can
> get quite big, if we're dealing with large / variable documents. We're
> already doing something similar for MCV lists.
>
> One of Nikita's patches not included in this thread allow "selective"
> statistics, where you can define in advance a "filter" restricting which
> parts are considered interesting by ANALYZE. That's interesting, but I
> think we need some simple MCV-like heuristics first anyway.
>
> Another open question is how deep the stats should be. Imagine documents
> like this:
>
>{"a" : {"b" : {"c" : {"d" : ...
>
> The current patch build stats for all possible paths:
>
>   "a"
>   "a.b"
>   "a.b.c"
>   "a.b.c.d"
>
> and of course many of the paths will often have JSONB documents as
> values, not simple scalar values. I wonder if we should limit the depth
> somehow, and maybe build stats only for scalar values.

The user interface for designing filters sounds hard, so I'd hope we
can ignore that for now.

--
Simon Riggshttp://www.EnterpriseDB.com/

Re: biblio.sgml dead link

2022-01-05 Thread Daniel Gustafsson

> On 5 Jan 2022, at 02:26, Michael Paquier  wrote:
> 
> On Tue, Jan 04, 2022 at 06:10:07PM +0100, Erik Rijkers wrote:
>> The replacement is a ~200 euro pdf (2021).  I'd be thankful if someone would
>> send the pdf to me; maybe I can update my JSON tests.
>> 
>> And we should remove that entry from the bibliography (or have it point to
>> the new page [1]).
> 
> Removing the entry seems a bit overdoing it to me, and updating to a
> paywall does not sound completely right to me either.  Another thing
> that we could do is to just remove the link, but keep its reference.
> 
> Any thoughts from others?

We definitely shouldn't remove it, it's referenced from functions-json.html and
that IMO adds value.

I think we should remove the link, not because it costs money but because it
can be purchased from several places, and choosing one to "favor" seems to
invite criticism.  Kind of how we don't link to an online store for buying the
other books.

--
Daniel Gustafsson   https://vmware.com/

Re: pg_stat_statements and "IN" conditions

2022-01-05 Thread Dmitry Dolgov

> On Tue, Jan 04, 2022 at 06:02:43PM -0500, Tom Lane wrote:
> We can debate whether the rules proposed here are good for
> pg_stat_statements or not, but it seems inevitable that they will be a
> disaster for some other consumers of the query hash.

Hm, which consumers do you mean here, potential extension? Isn't the
ability to use an external module to compute queryid make this situation
possible anyway?

> do you really think that a query with two int
> parameters is equivalent to one with five float parameters for all
> query-identifying purposes?

Nope, and it will be hard to figure this out no matter which approach
we're talking about, because it mostly depends on the context and type
of queries I guess. Instead, such functionality should allow some
reasonable configuration. To be clear, the use case I have in mind here
is not four or five, but rather a couple of hundreds constants where
chances that the whole construction was generated automatically by ORM
is higher than normal.

> I can see the merits of allowing different numbers of IN elements
> to be considered equivalent for pg_stat_statements, but this patch
> seems to go far beyond that basic idea, and I fear the side-effects
> will be very bad.

Not sure why it goes far beyond, but then there were two approaches
under consideration, as I've stated in the first message. I already
don't remember all the details, but another one was evolving around
doing similar things in a more limited fashion in transformAExprIn. The
problem would be then to carry the information, necessary to represent
the act of "merging" some number of queryids together. Any thoughts
here?

The idea of keeping the original queryid untouched and add another type
of id instead sounds interesting, but it will add too much overhead for
a quite small use case I guess.

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

2022-01-05 Thread Robert Haas

On Sun, Jun 13, 2021 at 8:25 PM Thomas Munro  wrote:
> Just as an FYI: this entry currently fails with "Timed out!" on cfbot
> because of an oversight in the master branch[1], AFAICS.  It should
> pass again once that's fixed.
>
> [1] 
> https://www.postgresql.org/message-id/CA%2BhUKGLah2w1pWKHonZP_%2BEQw69%3Dq56AHYwCgEN8GDzsRG_Hgw%40mail.gmail.com

That's fixed now. So what should we do about this patch? This is a
bug, so it would be nice to do *something*. I don't really like the
fact that this makes the behavior contingent on USE_ASSERT_CHECKING,
and I suggest that you make a new symbol like USE_BARRIER_SMGR_RELEASE
which by default gets defined on WIN32, but can be defined elsewhere
if you want (see the treatment of EXEC_BACKEND in pg_config_manual.h).
Furthermore, I can't see back-patching this, given that it would be
the very first use of the barrier machinery. But I think it would be
good to get something into master, because then we'd actually be using
this procsignalbarrier stuff for something. On a good day we've fixed
a bug. On a bad day we'll learn something new about how
procsignalbarrier needs to work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Make relfile tombstone files conditional on WAL level

2022-01-05 Thread Robert Haas

On Mon, Aug 2, 2021 at 6:38 PM Andres Freund  wrote:
> I guess there's a somewhat hacky way to get somewhere without actually
> increasing the size. We could take 3 bytes from the fork number and use that
> to get to a 7 byte relfilenode portion. 7 bytes are probably enough for
> everyone.
>
> It's not like we can use those bytes in a useful way, due to alignment
> requirements. Declaring that the high 7 bytes are for the relNode portion and
> the low byte for the fork would still allow efficient comparisons and doesn't
> seem too ugly.

I think this idea is worth more consideration. It seems like 2^56
relfilenodes ought to be enough for anyone, recalling that you can
only ever have 2^64 bytes of WAL. So if we do this, we can eliminate a
bunch of code that is there to guard against relfilenodes being
reused. In particular, we can remove the code that leaves a 0-length
tombstone file around until the next checkpoint to guard against
relfilenode reuse. On Windows, we still need
https://commitfest.postgresql.org/36/2962/ because of the problem that
Windows won't remove files from the directory listing until they are
both unlinked and closed. But in general this seems like it would lead
to cleaner code. For example, GetNewRelFileNode() needn't loop. If it
allocate the smallest unsigned integer that the cluster (or database)
has never previously assigned, the file should definitely not exist on
disk, and if it does, an ERROR is appropriate, as the database is
corrupted. This does assume that allocations from this new 56-bit
relfilenode counter are properly WAL-logged.

I think this would also solve a problem Dilip mentioned to me today:
suppose you make ALTER DATABASE SET TABLESPACE WAL-logged, as he's
been trying to do. Then suppose you do "ALTER DATABASE foo SET
TABLESPACE used_recently_but_not_any_more". You might get an error
complaining that “some relations of database \“%s\” are already in
tablespace \“%s\“” because there could be tombstone files in that
database. With this combination of changes, you could just use the
barrier mechanism from https://commitfest.postgresql.org/36/2962/ to
wait for those files to disappear, because they've got to be
previously-unliked files that Windows is still returning because
they're still opening -- or else they could be a sign of a corrupted
database, but there are no other possibilities.

I think, anyway.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: SQL:2011 application time

2022-01-05 Thread Corey Huinker

On Wed, Jan 5, 2022 at 11:07 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 21.11.21 02:51, Paul A Jungwirth wrote:
> > Here are updated patches. They are rebased and clean up some of my
> > TODOs.
>
> This patch set looks very interesting.  It's also very big, so it's
> difficult to see how to get a handle on it.  I did a pass through it
> to see if there were any obvious architectural or coding style
> problems.  I also looked at some of your TODO comments to see if I had
> something to contribute there.
>
> I'm confused about how to query tables based on application time
> periods.  Online, I see examples using AS OF, but in the SQL standard
> I only see this used for system time, which we are not doing here.
> What is your understanding of that?
>

Paul has previously supplied me with this document
https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
and that formed the basis of a lot of my questions a few months earlier.

There was similar work being done for system periods, which are a bit
simpler but require a side (history) table to be created. I was picking
people's brains about some aspects of system versioning to see if I could
help bringing that into this already very large patchset, but haven't yet
felt like I had done enough research to post it.

It is my hope that we can at least get the syntax for both application and
system versioning committed, even if it's just stubbed in with
not-yet-supported errors.

Re: Pre-allocating WAL files

2022-01-05 Thread Bossart, Nathan

On 12/30/21, 3:52 AM, "Maxim Orlov"  wrote:
> I did check the patch too and found it to be ok. Check and check-world are 
> passed. 
> Overall idea seems to be good in my opinion, but I'm not sure where is the 
> optimal place to put the pre-allocation.
>
> On Thu, Dec 30, 2021 at 2:46 PM Pavel Borisov  wrote:
>> I've checked the patch v7. It applies cleanly, code is good, check-world 
>> tests passed without problems. 
>> I think it's ok to use checkpointer for this and the overall patch can be 
>> committed. But the seen performance gain makes me think again before adding 
>> this feature. I did tests myself a couple of months ago and got similar 
>> results.
>> Really don't know whether is it worth the effort.

Thank you both for your review.

Nathan

Bugs in pgoutput.c

2022-01-05 Thread Tom Lane

Commit 6ce16088b caused me to look at pgoutput.c's handling of
cache invalidations, and I was pretty appalled by what I found.

* rel_sync_cache_relation_cb does the wrong thing when called for
a cache flush (i.e., relid == 0).  Instead of invalidating all
RelationSyncCache entries as it should, it will do nothing.

* When rel_sync_cache_relation_cb does invalidate an entry,
it immediately zaps the entry->map structure, even though that
might still be in use (as per the adjacent comment that carefully
explains why this isn't safe).  I'm not sure if this could lead
to a dangling-pointer core dump, but it sure seems like it could
lead to failing to translate tuples that are about to be sent.

* Similarly, rel_sync_cache_publication_cb is way too eager to
reset the pubactions flags, which would likely lead to failing
to transmit changes that we should transmit.

The attached patch fixes these things, but I'm still pretty
unhappy with the general design of the data structures in
pgoutput.c, because there is this weird random mishmash of
static variables along with a palloc'd PGOutputData struct.
This cannot work if there are ever two active LogicalDecodingContexts
in the same process.  I don't think serial use of LogicalDecodingContexts
(ie, destroy one and then make another) works very well either,
because pgoutput_shutdown is a mere fig leaf that ignores all the
junk the module previously made (in CacheMemoryContext no less).
So I wonder whether either of those scenarios is possible/supported/
expected to be needed in future.

Also ... maybe I'm not looking in the right place, but I do not
see anything anywhere in logical decoding that is taking any lock
on the relation being processed.  How can that be safe?  In
particular, how do we know that the data collected by get_rel_sync_entry
isn't already stale by the time we return from the function?
Setting replicate_valid = true at the bottom of the function would
overwrite any notification we might have gotten from a syscache callback
while reading catalog data.

regards, tom lane

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index a08da859b4..f7e991cd16 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -108,11 +108,13 @@ typedef struct RelationSyncEntry
 {
 	Oid			relid;			/* relation oid */
 
+	bool		replicate_valid;	/* overall validity flag for entry */
+
 	bool		schema_sent;
 	List	   *streamed_txns;	/* streamed toplevel transactions with this
  * schema */
 
-	bool		replicate_valid;
+	/* are we publishing this rel? */
 	PublicationActions pubactions;
 
 	/*
@@ -903,7 +905,9 @@ LoadPublications(List *pubnames)
 }
 
 /*
- * Publication cache invalidation callback.
+ * Publication syscache invalidation callback.
+ *
+ * Called for invalidations on pg_publication.
  */
 static void
 publication_invalidation_cb(Datum arg, int cacheid, uint32 hashvalue)
@@ -1130,13 +1134,12 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 			  HASH_ENTER, &found);
 	Assert(entry != NULL);
 
-	/* Not found means schema wasn't sent */
+	/* initialize entry, if it's new */
 	if (!found)
 	{
-		/* immediately make a new entry valid enough to satisfy callbacks */
+		entry->replicate_valid = false;
 		entry->schema_sent = false;
 		entry->streamed_txns = NIL;
-		entry->replicate_valid = false;
 		entry->pubactions.pubinsert = entry->pubactions.pubupdate =
 			entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
 		entry->publish_as_relid = InvalidOid;
@@ -1166,13 +1169,40 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 		{
 			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
 			if (data->publications)
+			{
 list_free_deep(data->publications);
-
+data->publications = NIL;
+			}
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
 		}
 
+		/*
+		 * Reset schema_sent status as the relation definition may have
+		 * changed.  Also reset pubactions to empty in case rel was dropped
+		 * from a publication.  Also free any objects that depended on the
+		 * earlier definition.
+		 */
+		entry->schema_sent = false;
+		list_free(entry->streamed_txns);
+		entry->streamed_txns = NIL;
+		entry->pubactions.pubinsert = false;
+		entry->pubactions.pubupdate = false;
+		entry->pubactions.pubdelete = false;
+		entry->pubactions.pubtruncate = false;
+		if (entry->map)
+		{
+			/*
+			 * Must free the TupleDescs contained in the map explicitly,
+			 * because free_conversion_map() doesn't.
+			 */
+			FreeTupleDesc(entry->map->indesc);
+			FreeTupleDesc(entry->map->outdesc);
+			free_conversion_map(entry->map);
+		}
+		entry->map = NULL;
+
 		/*
 		 * Build publication cache. We can't use one provided by relcache as
 		 * relcache considers all publications given relation is in, but here
@@ -1212,16 +1242,18 @@ get_rel_sync_entr

Re: a misbehavior of partition row movement (?)

2022-01-05 Thread Alvaro Herrera

Pushed 0001.

I had to adjust the query used in pg_dump; you changed the attribute
name in the query used for pg15, but left unchanged the one for older
branches, so pg_dump failed for all branches other than 15.  Also,
psql's describe.c required a small tweak to a version number test.
https://github.com/alvherre/postgres/commit/3451612e0fa082d3ec953551c6d25432bd725502

Thanks!  What was 0002 is attached, to keep cfbot happy.  It's identical
to your v11-0002.

I have pushed it thinking that we would not backpatch any of this fix.
However, after running the tests and realizing that I didn't need an
initdb for either patch, I wonder if maybe the whole series *is*
backpatchable.

There is one possible problem, which is that psql and pg_dump would need
testing to verify that they work decently (i.e. no crash, no
misbehavior) with partitioned tables created with the original code.
But there are few ABI changes, maybe we can cope and get all branches
fixes instead of just 15.

What do you think?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)
>From d1b067ad541f80191763e329577e0d3f62d00d82 Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Mon, 11 Oct 2021 14:57:19 +0900
Subject: [PATCH v12] Enforce foreign key correctly during cross-partition
 updates

When an update on a partitioned table referenced in foreign keys
constraint causes a row to move from one partition to another, which
is implemented by deleting the old row from the source leaf partition
followed by inserting the new row into the destination leaf partition,
firing the foreign key triggers on that delete event can result in
surprising outcomes for those keys.  For example, a given foreign
key's delete trigger which implements the ON DELETE CASCADE clause of
that key will delete any referencing rows, although it should not,
because the referenced row is simply being moved into another
partition.

This commit teaches trigger.c to skip queuing such delete trigger
events on the leaf partitions in favor of an UPDATE event fired on
the "root" target relation.  Doing so makes sense because both the
old and the new tuple "logically" belong to the latter.

The after trigger event queuing interface now allows passing the
source and the destination partitions of a particular cross-partition
update when registering the update event for the root partitioned
table.  Along with the 2 ctids of the old and the new tuple, an after
trigger event now also stores the OIDs of those partitions. The tuples
fetched from the source and the destination partitions are converted
into the root table format before they are passed to the trigger
function.

The implementation currently has a limitation that only the foreign
keys pointing into the query's target relation are considered, not
those of its sub-partitioned partitions.  That seems like a
reasonable limitation, it sounds rare to have distinct foreign keys
pointing into sub-partitioned partitions, but not into the root
table.
---
 doc/src/sgml/ref/update.sgml  |   7 +
 src/backend/commands/trigger.c| 322 +++---
 src/backend/executor/execMain.c   |  19 +-
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeModifyTable.c| 187 -
 src/backend/utils/adt/ri_triggers.c   |   6 +
 src/include/commands/trigger.h|   4 +
 src/include/executor/executor.h   |   3 +-
 src/include/nodes/execnodes.h |   3 +
 src/test/regress/expected/foreign_key.out | 204 +-
 src/test/regress/sql/foreign_key.sql  | 135 -
 11 files changed, 840 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..3ba13010e7 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -316,6 +316,13 @@ UPDATE count
partition (provided the foreign data wrapper supports tuple routing), they
cannot be moved from a foreign-table partition to another partition.
   
+
+  
+   An attempt of moving a row from one partition to another will fail if a
+   foreign key is found to directly reference a non-root partitioned table
+   in the partition tree, unless that table is also directly mentioned
+   in the UPDATEquery.
+  
  
 
  
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 452b743f21..1ed6dd1b38 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -94,7 +94,11 @@ static HeapTuple ExecCallTriggerFunc(TriggerData *trigdata,
 	 FmgrInfo *finfo,
 	 Instrumentation *instr,
 	 MemoryContext per_tuple_context);
-static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
+static void AfterTriggerSaveEvent(EState *estate,
+  ModifyTableState *mtstate,
+  ResultRelInfo *re

Re: row filtering for logical replication

2022-01-05 Thread Peter Smith

FYI - v58 is currently known to be broken due to a recent commit [1].

I plan to post a v59* later today to address this as well as other
recent review comments.

--
[1] 
https://github.com/postgres/postgres/commit/6ce16088bfed97f982f66a9dc17b8364df289e4d

Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Emit "checkpoint skipped because system is idle" message at LOG level if log_checkpoints is set

2022-01-05 Thread Justin Pryzby

On Wed, Jan 05, 2022 at 10:45:06AM +0530, Dilip Kumar wrote:
> On Wed, Jan 5, 2022 at 10:24 AM Bharath Rupireddy 
>  wrote:
> >
> > Postgres server emits a message at DEBUG1 level when it skips a
> > checkpoint. At times, developers might be surprised after figuring out
> > from server logs that there were no checkpoints happening at all
> > during a certain period of time when DEBUG1 messages aren't captured.
> > How about emitting the message at LOG level if log_checkpoints is set?
> > Patch attached.
> 
> +1 to convert to LOG when log_checkpoints is set.

I think it would be odd to write logs of increased severity, for the case where
we did not do anything.  I think it really is a debug log.

I don't think the log level should be changed to avoid "developer" confusion,
as you said (I'm not sure if you mean a postgres developer or an application
developer, though).

Is there any evidence that this has caused user confusion in the last 4 years ?

|commit 6ef2eba3f57f17960b7cd4958e18aa79e357de2f
|Author: Andres Freund 
|Date:   Thu Dec 22 11:31:50 2016 -0800
|
|Skip checkpoints, archiving on idle systems.

Note that logging a message may not be benign ; I think it could cause the
disks to spin up, that would othewise have been in power saving mode,
especially if you log to syslog, which can issue fsync.  Also, part of the
argument for enabling log_checkpoint by default was that a small, quiescent
instance would not write logs every 5 minutes.

-- 
Justin

Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint

2022-01-05 Thread Bossart, Nathan

On 12/23/21, 3:17 AM, "Bharath Rupireddy" 
 wrote:
> Currently the end-of-recovery checkpoint can be much slower, impacting
> the server availability, if there are many replication slot files
> .snap or map- to be enumerated and deleted. How about skipping
> the .snap and map- file handling during the end-of-recovery
> checkpoint? It makes the server available faster and the next regular
> checkpoint can deal with these files. If required, we can have a GUC
> (skip_replication_slot_file_handling or some other better name) to
> control this default being the existing behavior.

I suggested something similar as a possibility in the other thread
where these tasks are being discussed [0].  I think it is worth
considering, but IMO it is not a complete solution to the problem.  If
there are frequently many such files to delete and regular checkpoints
are taking longer, the shutdown/end-of-recovery checkpoint could still
take a while.  I think it would be better to separate these tasks from
checkpointing instead.

Nathan

[0] https://postgr.es/m/A285A823-0AF2-4376-838E-847FA4710F9A%40amazon.com

Re: Report checkpoint progress in server logs

2022-01-05 Thread Bruce Momjian

On Wed, Dec 29, 2021 at 10:40:59AM -0500, Tom Lane wrote:
> Magnus Hagander  writes:
> >> Therefore, reporting the checkpoint progress in the server logs, much
> >> like [1], seems to be the best way IMO.
> 
> > I find progress reporting in the logfile to generally be a terrible
> > way of doing things, and the fact that we do it for the startup
> > process is/should be only because we have no other choice, not because
> > it's the right choice.
> 
> I'm already pretty seriously unhappy about the log-spamming effects of
> 64da07c41 (default to log_checkpoints=on), and am willing to lay a side
> bet that that gets reverted after we have some field experience with it.
> This proposal seems far worse from that standpoint.  Keep in mind that
> our out-of-the-box logging configuration still doesn't have any log
> rotation ability, which means that the noisier the server is in normal
> operation, the sooner you fill your disk.

I think we are looking at three potential observable behaviors people
might care about:

*  the current activity/progress of checkpoints
*  the historical reporting of checkpoint completion, mixed in with other
   log messages for later analysis
*  the aggregate behavior of checkpoint operation

I think it is clear that checkpoint progress activity isn't useful for
the server logs because that information has little historical value,
but does fit for a progress view.  As Tom already expressed, we will
have to wait to see if non-progress checkpoint information in the logs
has sufficient historical value.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-05 Thread Bruce Momjian

On Tue, Jan 4, 2022 at 10:22:50PM +, Finnerty, Jim wrote:
> I'm concerned about the maintainability impact of having 2 new
> on-disk page formats.  It's already complex enough with XIDs and
> multixact-XIDs.
>
> If the lack of space for the two epochs in the special data area is
> a problem only in an upgrade scenario, why not resolve the problem
> before completing the upgrade process like a kind of post-process
> pg_repack operation that converts all "double xmax" pages to
> the "double-epoch" page format? i.e. maybe the "double xmax"
> representation is needed as an intermediate representation during
> upgrade, but after upgrade completes successfully there are no pages
> with the "double-xmax" representation.  This would eliminate a whole
> class of coding errors and would make the code dealing with 64-bit
> XIDs simpler and more maintainable.

Well, yes, we could do this, and it would avoid the complexity of having
to support two XID representations, but we would need to accept that
fast pg_upgrade would be impossible in such cases, since every page
would need to be checked and potentially updated.

You might try to do this while the server is first started and running
queries, but I think we found out from the online checkpoint patch that
having the server in an intermediate state while running queries is very
complex --- it might be simpler to just accept two XID formats all the
time than enabling the server to run with two formats for a short
period.  My big point is that this needs more thought.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: GUC flags

2022-01-05 Thread Justin Pryzby

On Wed, Jan 05, 2022 at 02:17:11PM +0900, Michael Paquier wrote:
> On Tue, Jan 04, 2022 at 09:06:48PM -0600, Justin Pryzby wrote:
> > I think pg_get_guc_flags() may be best, but I'm interested to hear other
> > opinions.
> 
> My opinion on this matter is rather close to what you have here with
> handling things through one extra attribute.  But I don't see the
> point of using an extra function where users would need to do a manual
> mapping of the flag bits back to a a text representation of them.

If it were implemented as a function, this would be essentially internal and
left undocumented.  Only exposed for the purpose of re-implementing check_guc.

> I would suggest to just add one text[] to pg_show_all_settings

Good idea to use the backing function without updating the view.

pg_settings is currently defined with "SELECT *".  Is it fine to enumerate a
list of columns instead ?
>From 713046d82668c4e189b78e9e274b272bccaaa480 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 5 Dec 2021 23:22:04 -0600
Subject: [PATCH v4 1/2] check_guc: fix absurd number of false positives

Avoid false positives;
Avoid various assumptions;
Avoid list of exceptions;
Simplify shell/awk/sed/grep;

This requires GNU awk for RS as a regex.
---
 src/backend/utils/misc/check_guc | 69 +---
 1 file changed, 10 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/misc/check_guc b/src/backend/utils/misc/check_guc
index b171ef0e4ff..323ca13191b 100755
--- a/src/backend/utils/misc/check_guc
+++ b/src/backend/utils/misc/check_guc
@@ -1,78 +1,29 @@
-#!/bin/sh
+#! /bin/sh
+set -e
 
-## currently, this script makes a lot of assumptions:
+## this script makes some assumptions about the formatting of files it parses.
 ## in postgresql.conf.sample:
 ##   1) the valid config settings may be preceded by a '#', but NOT '# '
 ##  (we use this to skip comments)
-##   2) the valid config settings will be followed immediately by  ' ='
-##  (at least one space preceding the '=')
-## in guc.c:
-##   3) the options have PGC_ on the same line as the option
-##   4) the options have '{' on the same line as the option
-
-##  Problems
-## 1) Don't know what to do with TRANSACTION ISOLATION LEVEL
-
-## if an option is valid but shows up in only one file (guc.c but not
-## postgresql.conf.sample), it should be listed here so that it
-## can be ignored
-INTENTIONALLY_NOT_INCLUDED="debug_deadlocks in_hot_standby \
-is_superuser lc_collate lc_ctype lc_messages lc_monetary lc_numeric lc_time \
-pre_auth_delay role seed server_encoding server_version server_version_num \
-session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_lwlocks \
-trace_notify trace_userlocks transaction_isolation transaction_read_only \
-zero_damaged_pages"
 
 ### What options are listed in postgresql.conf.sample, but don't appear
 ### in guc.c?
 
-# grab everything that looks like a setting and convert it to lower case
-SETTINGS=`grep ' =' postgresql.conf.sample |
-grep -v '^# ' | # strip comments
-sed -e 's/^#//' |
-awk '{print $1}'`
-
-SETTINGS=`echo "$SETTINGS" | tr 'A-Z' 'a-z'`
+# grab everything that looks like a setting
+SETTINGS=`sed '/^#[[:alnum:]]/!d; s/^#//; s/ =.*//; /^include/d' postgresql.conf.sample`
 
 for i in $SETTINGS ; do
-  hidden=0
   ## it sure would be nice to replace this with an sql "not in" statement
-  ## it doesn't seem to make sense to have things in .sample and not in guc.c
-#  for hidethis in $INTENTIONALLY_NOT_INCLUDED ; do
-#if [ "$hidethis" = "$i" ] ; then
-#  hidden=1
-#fi
-#  done
-  if [ "$hidden" -eq 0 ] ; then
-grep -i '"'$i'"' guc.c > /dev/null
-if [ $? -ne 0 ] ; then
-  echo "$i seems to be missing from guc.c";
-fi;
-  fi
+  grep -i "\"$i\"" guc.c >/dev/null ||
+echo "$i seems to be missing from guc.c";
 done
 
 ### What options are listed in guc.c, but don't appear
 ### in postgresql.conf.sample?
 
 # grab everything that looks like a setting and convert it to lower case
-
-SETTINGS=`grep '{.* PGC_' guc.c | awk '{print $1}' | \
-  sed -e 's/{//g' -e 's/"//g' -e 's/,//'`
-
-SETTINGS=`echo "$SETTINGS" | tr 'A-Z' 'a-z'`
-
+SETTINGS=`gawk -F '[",]' 'BEGIN{RS="\n\t{\n"} /",[[:space:]]*PGC_.*.*gettext_noop/ && !/NOT_IN_SAMPLE/{print tolower($2)}' guc.c`
 for i in $SETTINGS ; do
-  hidden=0
-  ## it sure would be nice to replace this with an sql "not in" statement
-  for hidethis in $INTENTIONALLY_NOT_INCLUDED ; do
-if [ "$hidethis" = "$i" ] ; then
-  hidden=1
-fi
-  done
-  if [ "$hidden" -eq 0 ] ; then
-grep -i '#'$i' ' postgresql.conf.sample > /dev/null
-if [ $? -ne 0 ] ; then
-  echo "$i seems to be missing from postgresql.conf.sample";
-fi
-  fi
+  grep "#$i " postgresql.conf.sample >/dev/null ||
+echo "$i seems to be missing from postgresql.conf.sample";
 done
-- 
2.17.1

>From d4773ee874ddc6aaa1238e4d1ae7ec6f8e20f36e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Dec 2021 12:06:35 -0600

Remove trailing comma from enums

2022-01-05 Thread Peter Smith

I saw one (and then went looking and found some more) enum with a
trailing comma.

These are quite rare in the PG src, so I doubt they are intentional.

PSA a patch to remove the trailing commas for all that I found.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Remove-trailing-commas-from-enums.patch
Description: Binary data

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-05 Thread Bruce Momjian

On Thu, Dec 30, 2021 at 03:15:16PM +0300, Maxim Orlov wrote:
> PFA updated working patch v6 for PG15 development cycle.
> It is based on a patch by Alexander Korotkov version 5 [5] with a few fixes,
> refactoring and was rebased to PG15.
> 
> Main changes:
> - Change TransactionId to 64bit
> - Disk tuple format (HeapTupleHeader) is unchanged: xmin and xmax
>   remains 32bit
> -- 32bit xid is named ShortTransactionId now.
> -- Exception: see "double xmax" format below.
> - Heap page format is changed to contain xid and multixact base value,
>   tuple's xmin and xmax are offsets from.
> -- xid_base and multi_base are stored as a page special data. PageHeader
>    remains unmodified.
> -- If after upgrade page has no free space for special data, tuples are
>    converted to "double xmax" format: xmin became virtual
>    FrozenTransactionId, xmax occupies the whole 64bit.
>    Page converted to new format when vacuum frees enough space.

I think it is a great idea to allow the 64-XID to span the 32-bit xmin
and xmax fields when needed.  It would be nice if we can get focus on
this feature so we are sure it gets into PG 15.  Can we add this patch
incrementally so people can more easily analyze it?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: [PATCH] Accept IP addresses in server certificate SANs

2022-01-05 Thread Jacob Champion

On Mon, 2022-01-03 at 16:19 +, Jacob Champion wrote:
> On Fri, 2021-12-17 at 15:40 +0900, Kyotaro Horiguchi wrote:
> > 
> > + inet_net_pton_ipv4(const char *src, u_char *dst)
> >  (calls inet_net_pton_ipv4_internal(src, dst, true))
> > + inet_pton_ipv4(const char *src, u_char *dst)
> >  (calls inet_net_pton_ipv4_internal(src, dst, false))
> 
> Sounds good, I will make that change. Thanks for the feedback!

v3 implements a pg_inet_pton(), but for IPv6 instead of IPv4 as
presented above (since we only need inet_pton() for IPv6 in this case).
It's split into a separate patch (0003) for ease of review.

Thanks!
--Jacob
From 3081b1e0f7af60241978c77bba71e806ba5c25f6 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 24 Nov 2021 14:46:11 -0800
Subject: [PATCH v3 1/3] Move inet_net_pton() to src/port

This will be helpful for IP address verification in libpq.
---
 src/backend/utils/adt/Makefile  |  1 -
 src/include/port.h  |  3 +++
 src/include/utils/builtins.h|  4 
 src/port/Makefile   |  1 +
 src/{backend/utils/adt => port}/inet_net_pton.c | 16 +++-
 src/tools/msvc/Mkvcbuild.pm |  2 +-
 6 files changed, 20 insertions(+), 7 deletions(-)
 rename src/{backend/utils/adt => port}/inet_net_pton.c (96%)

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 41b486bcef..d173d52157 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -43,7 +43,6 @@ OBJS = \
 	geo_selfuncs.o \
 	geo_spgist.o \
 	inet_cidr_ntop.o \
-	inet_net_pton.o \
 	int.o \
 	int8.o \
 	json.o \
diff --git a/src/include/port.h b/src/include/port.h
index fd9c9d6f94..cca29839bc 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -518,6 +518,9 @@ extern int	pg_codepage_to_encoding(UINT cp);
 extern char *pg_inet_net_ntop(int af, const void *src, int bits,
 			  char *dst, size_t size);
 
+/* port/inet_net_pton.c */
+extern int	pg_inet_net_pton(int af, const char *src, void *dst, size_t size);
+
 /* port/pg_strong_random.c */
 extern void pg_strong_random_init(void);
 extern bool pg_strong_random(void *buf, size_t len);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index b07eefaf1e..a5a18e62ba 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -92,10 +92,6 @@ extern int	xidComparator(const void *arg1, const void *arg2);
 extern char *pg_inet_cidr_ntop(int af, const void *src, int bits,
 			   char *dst, size_t size);
 
-/* inet_net_pton.c */
-extern int	pg_inet_net_pton(int af, const char *src,
-			 void *dst, size_t size);
-
 /* network.c */
 extern double convert_network_to_scalar(Datum value, Oid typid, bool *failure);
 extern Datum network_scan_first(Datum in);
diff --git a/src/port/Makefile b/src/port/Makefile
index b3754d8940..7191cd6633 100644
--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -43,6 +43,7 @@ OBJS = \
 	bsearch_arg.o \
 	chklocale.o \
 	inet_net_ntop.o \
+	inet_net_pton.o \
 	noblock.o \
 	path.o \
 	pg_bitutils.o \
diff --git a/src/backend/utils/adt/inet_net_pton.c b/src/port/inet_net_pton.c
similarity index 96%
rename from src/backend/utils/adt/inet_net_pton.c
rename to src/port/inet_net_pton.c
index d3221a1313..bae50ba67e 100644
--- a/src/backend/utils/adt/inet_net_pton.c
+++ b/src/port/inet_net_pton.c
@@ -14,14 +14,18 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
  * OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  *
- *	  src/backend/utils/adt/inet_net_pton.c
+ *	  src/port/inet_net_pton.c
  */
 
 #if defined(LIBC_SCCS) && !defined(lint)
 static const char rcsid[] = "Id: inet_net_pton.c,v 1.4.2.3 2004/03/17 00:40:11 marka Exp $";
 #endif
 
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #include 
 #include 
@@ -29,9 +33,19 @@ static const char rcsid[] = "Id: inet_net_pton.c,v 1.4.2.3 2004/03/17 00:40:11 m
 #include 
 #include 
 
+#ifndef FRONTEND
 #include "utils/builtins.h" /* pgrminclude ignore */	/* needed on some
 		 * platforms */
 #include "utils/inet.h"
+#else
+/*
+ * In a frontend build, we can't include inet.h, but we still need to have
+ * sensible definitions of these two constants.  Note that pg_inet_net_ntop()
+ * assumes that PGSQL_AF_INET is equal to AF_INET.
+ */
+#define PGSQL_AF_INET	(AF_INET + 0)
+#define PGSQL_AF_INET6	(AF_INET + 1)
+#endif
 
 
 static int	inet_net_pton_ipv4(const char *src, u_char *dst);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 404c45a6f3..fa87511e04 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -100,7 +100,7 @@ sub mkvcbuild
 
 	our @pgportfiles = qw(
 	  chklocale.c explicit_bzero.c fls.c getpeereid.c getrusage.c inet_aton.c
-	  getaddrinfo.c gettimeofday.c inet_net_ntop.c kill.c open.c
+	  getaddrinfo.c gettimeofday.c inet_net_ntop.c inet_net_pton.c kill.c

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-05 Thread Bruce Momjian

On Tue, Jan  4, 2022 at 05:49:07PM +, Finnerty, Jim wrote:
> Hi Maxim,
> I’m glad to see that you’re trying to carry the 64-bit XID work forward.  
> I
> had not noticed that my earlier patch (also derived from Alexander Kortkov’s
> patch) was responded to back in September.  Perhaps we can merge some of the
> code cleanup that it contained, such as using XID_FMT everywhere and creating 
> a
> type for the kind of page returned by TransactionIdToPage() to make the code
> cleaner.
> 
> Is your patch functionally the same as the PostgresPro implementation?  If
> so, I think it would be helpful for everyone’s understanding to read the
> PostgresPro documentation on VACUUM.  See in particular section “Forced
> shrinking pg_clog and pg_multixact”
>
> https://postgrespro.com/docs/enterprise/9.6/routine-vacuuming#
> vacuum-for-wraparound

Good point --- we still need vacuum freeze.  It would be good to
understand how much value we get in allowing vacuum freeze to be done
less often --- how big can pg_xact/pg_multixact get before they are
problems?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-05 Thread Justin Pryzby

On Wed, Jan 05, 2022 at 06:51:37PM -0500, Bruce Momjian wrote:
> On Tue, Jan 4, 2022 at 10:22:50PM +, Finnerty, Jim wrote:
> > I'm concerned about the maintainability impact of having 2 new
> > on-disk page formats.  It's already complex enough with XIDs and
> > multixact-XIDs.
> >
> > If the lack of space for the two epochs in the special data area is
> > a problem only in an upgrade scenario, why not resolve the problem
> > before completing the upgrade process like a kind of post-process
> > pg_repack operation that converts all "double xmax" pages to
> > the "double-epoch" page format? i.e. maybe the "double xmax"
> > representation is needed as an intermediate representation during
> > upgrade, but after upgrade completes successfully there are no pages
> > with the "double-xmax" representation.  This would eliminate a whole
> > class of coding errors and would make the code dealing with 64-bit
> > XIDs simpler and more maintainable.
> 
> Well, yes, we could do this, and it would avoid the complexity of having
> to support two XID representations, but we would need to accept that
> fast pg_upgrade would be impossible in such cases, since every page
> would need to be checked and potentially updated.
> 
> You might try to do this while the server is first started and running
> queries, but I think we found out from the online checkpoint patch that

I think you meant the online checksum patch.  Which this reminded me of, too.

https://commitfest.postgresql.org/31/2611/

> having the server in an intermediate state while running queries is very
> complex --- it might be simpler to just accept two XID formats all the
> time than enabling the server to run with two formats for a short
> period.  My big point is that this needs more thought.

-- 
Justin

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-05 Thread Bruce Momjian

On Wed, Jan  5, 2022 at 06:12:26PM -0600, Justin Pryzby wrote:
> On Wed, Jan 05, 2022 at 06:51:37PM -0500, Bruce Momjian wrote:
> > On Tue, Jan 4, 2022 at 10:22:50PM +, Finnerty, Jim wrote:
> > > I'm concerned about the maintainability impact of having 2 new
> > > on-disk page formats.  It's already complex enough with XIDs and
> > > multixact-XIDs.
> > >
> > > If the lack of space for the two epochs in the special data area is
> > > a problem only in an upgrade scenario, why not resolve the problem
> > > before completing the upgrade process like a kind of post-process
> > > pg_repack operation that converts all "double xmax" pages to
> > > the "double-epoch" page format? i.e. maybe the "double xmax"
> > > representation is needed as an intermediate representation during
> > > upgrade, but after upgrade completes successfully there are no pages
> > > with the "double-xmax" representation.  This would eliminate a whole
> > > class of coding errors and would make the code dealing with 64-bit
> > > XIDs simpler and more maintainable.
> > 
> > Well, yes, we could do this, and it would avoid the complexity of having
> > to support two XID representations, but we would need to accept that
> > fast pg_upgrade would be impossible in such cases, since every page
> > would need to be checked and potentially updated.
> > 
> > You might try to do this while the server is first started and running
> > queries, but I think we found out from the online checkpoint patch that
> 
> I think you meant the online checksum patch.  Which this reminded me of, too.
> 
> https://commitfest.postgresql.org/31/2611/

Sorry, yes, I have checkpoint on my mind.  ;-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: make tuplestore helper function

2022-01-05 Thread Justin Pryzby

On Wed, Jan 05, 2022 at 12:09:16PM -0500, Melanie Plageman wrote:
> On Fri, Dec 17, 2021 at 3:04 PM Justin Pryzby  wrote:
> > The rest of these are questions more than comments, and I'm not necessarily
> > suggesting to change the patch:
> >
> > This isn't new in your patch, but for me, it's more clear if the callers 
> > have a
> > separate boolean for randomAccess, rather than having the long expression in
> > the function call:
> >
> > +   tupstore = MakeFuncResultTuplestore(fcinfo, NULL,
> > +   rsinfo->allowedModes & SFRM_Materialize_Random);
> >
> > vs
> >
> > randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 
> > 0;
> > -   tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
> > +   tupstore = MakeFuncResultTuplestore(fcinfo, NULL, randomAccess);
> >
> > One could also argue for passing rsinfo->allowedModes, instead of
> > (rsinfo->allowedModes & SFRM_Materialize_Random).
> 
> I've changed the patch to have MakeFuncResultTuplestore() check
> rsinfo->allowedModes and pass in the resulting boolean to
> tuplestore_begin_heap(). Because I modified the
> MakeFuncResultTuplestore() function signature to accommodate this, I had
> to collapse the two patches into one, as I couldn't maintain the call
> sites which passed in a constant.

> > There's a couples places that you're checking expectedDesc where it wasn't
> > being checked before.  Is that some kind of live bug ?
> > pg_config() text_to_table()
> 
> Yes, expectedDesc was accessed in these locations without checking that
> it is not NULL. Should that be a separate patch?

I don't know.  The patch is already easy to review, since it's mostly limited
to removing code and fixing inconsistencies (NULL check) and possible
inefficiencies (randomAccess).

If the expectedDesc NULL check were an 0001 patch, then 0002 (the main patch)
would be even easier to review.  Only foreign.c is different.

> > You removed one call to tuplestore_donestoring(), but not the others.
> > I guess you could remove them all (optionally as a separate patch).
> 
> I removed it in that one location because I wanted to get rid of the
> local variable it used.

What local variable ?  I see now that logicalfuncs.c never had a local variable
called tupstore.  I'm sure it was intended since 2014 (b89e15105) to say
tupestore_donestoring(p->tupstore).  But donestoring(typo) causes no error,
since the define is a NOP.

src/include/utils/tuplestore.h:#define tuplestore_donestoring(state) ((void) 0)

> I am fine with removing the other occurrences,
> but I thought there might be some reason why instead of removing it,
> they made it into a no-op.

I assume Tom left it (actually, added it back in dd04e958c) to avoid breaking
extensions for no reason.  And maybe to preserve the possbility of at some
point in the future doing something again during the "done" step.

I'd leave it for input from a committer about those:

 - remove tuplestore_donestoring() calls ?
 - split expectedDesc NULL check to an 0001 patch ?
 - anything other opened questions ?

I'm marking this as RFC, with the caveat that the newline before
MakeFuncResultTuplestore went missing again :)

-- 
Justin

Re: Remove trailing comma from enums

2022-01-05 Thread Thomas Munro

On Thu, Jan 6, 2022 at 12:56 PM Peter Smith  wrote:
> I saw one (and then went looking and found some more) enum with a
> trailing comma.
>
> These are quite rare in the PG src, so I doubt they are intentional.
>
> PSA a patch to remove the trailing commas for all that I found.

-1.  I don't see the problem with C99 trailing commas.  They avoid
noisy diff lines when patches add/remove items.

Re: Proposal: remove obsolete hot-standby testing infrastructure

2022-01-05 Thread Tom Lane

Peter Eisentraut  writes:
> On 03.01.22 22:50, Tom Lane wrote:
>> The attached proposed patch removes some ancient infrastructure for
>> manually testing hot standby.

> I looked into this some time ago and concluded that this test contains a 
> significant amount of testing that isn't obviously done anywhere else. 
> I don't have the notes anymore, and surely some things have progressed 
> since, but I wouldn't just throw the old test suite away without 
> actually checking.

Fair enough ... so I looked, and there's not much at all that
I'm worried about.

hs_standby_allowed:
This is basically checking that the standby can see data from
the primary, which we surely have covered.  Although it does
also cover propagation of nextval, which AFAICS is not tested
in src/test/recovery, so perhaps that's worth troubling over.

There are also some checks that particular commands are allowed
on the standby, which seem to me to be not too helpful;
see also comments on the next file.

hs_standby_disallowed:
Inverse of the above: check that some commands are disallowed.
We check some of these in 001_stream_rep.pl, and given the current
code structure in utility.c (ClassifyUtilityCommandAsReadOnly etc),
I do not see much point in adding more test cases of the same sort.
The only likely new bug in that area would be misclassification of
some new command, and no amount of testing of existing cases will
catch that.

There are also tests that particular functions are disallowed, which
isn't something that goes through ClassifyUtilityCommandAsReadOnly.
Nonetheless, adding more test cases here wouldn't help catch future
oversights of that type, so I remain unexcited.

hs_standby_functions:
Mostly also checking that things are disallowed.  There's also
a test of pg_cancel_backend, which is cute but probably suffers
from timing instability (i.e., delayed arrival of the signal
might change the output).  Moreover, pg_cancel_backend is already
covered in the isolation tests, and I see no reason to think
it'd operate differently on a standby.

hs_standby_check:
Checks pg_is_in_recovery(), which is checked far more thoroughly
by pg_ctl/t/003_promote.pl.

hs_primary_extremes:
Checks that we can cope with deep subtransaction nesting.
Maybe this is worth preserving, but I sort of doubt it ---
the standby doesn't even see the nesting does it?
Also checks that the standby can cope with 257 exclusive
locks at once, which corresponds to no interesting limit
that I know of.

So basically, I'd be inclined to add a couple of tests of
sequence-update propagation to src/test/recovery and
call it good.

regards, tom lane

Re: Remove trailing comma from enums

2022-01-05 Thread Tom Lane

Thomas Munro  writes:
> On Thu, Jan 6, 2022 at 12:56 PM Peter Smith  wrote:
>> These are quite rare in the PG src, so I doubt they are intentional.
>> PSA a patch to remove the trailing commas for all that I found.

> -1.  I don't see the problem with C99 trailing commas.  They avoid
> noisy diff lines when patches add/remove items.

I think they're rare because up till very recently we catered to
pre-C99 compilers that wouldn't accept them.  There's not much
point in insisting on that now, though.

Personally I'm less excited than Thomas about trailing commas
being good for reducing diff noise, mainly because I think
that "add new entries at the end" is an anti-pattern, and
if you put new items where they logically belong then the
problem is much rarer.  But I'm not going to argue against
committers who want to do it like that, either.

regards, tom lane

Re: Map WAL segment files on PMEM as WAL buffers

2022-01-05 Thread Takashi Menjo

Rebased.

On Fri, Nov 5, 2021 at 3:47 PM Takashi Menjo  wrote:
>
> Hi Daniel,
>
> The issue you told has been fixed.  I attach the v5 patchset to this email.
>
> The v5 has all the patches in the v4, and in addition, has the
> following two new patches:
>
> - (v5-0002) Support build with MSVC on Windows: Please have
> src\tools\msvc\config.pl as follows to "configure --with-libpmem:"
>
> $config->{pmem} = 'C:\path\to\pmdk\x64-windows';
>
> - (v5-0006) Compatible to Windows: This patch resolves conflicting
> mode_t typedefs and libpmem API variants (U or W, like Windows API).
>
> Best regards,
> Takashi
>
> On Thu, Nov 4, 2021 at 5:46 PM Takashi Menjo  wrote:
> >
> > Hello Daniel,
> >
> > Thank you for your comment. I had the following error message with
> > MSVC on Windows. It looks the same as what you told me. I'll fix it.
> >
> > | > cd src\tools\msvc
> > | > build
> > | (..snipped..)
> > | Copying pg_config_os.h...
> > | Generating configuration headers...
> > | undefined symbol: HAVE_LIBPMEM at src/include/pg_config.h line 347
> > at C:/Users/menjo/Documents/git/postgres/src/tools/msvc/Mkvcbuild.pm
> > line 860.
> >
> > Best regards,
> > Takashi
> >
> >
> > On Wed, Nov 3, 2021 at 10:04 PM Daniel Gustafsson  wrote:
> > >
> > > > On 28 Oct 2021, at 08:09, Takashi Menjo  wrote:
> > >
> > > > Rebased, and added the patches below into the patchset.
> > >
> > > Looks like the 0001 patch needs to be updated to support Windows and 
> > > MSVC.  See
> > > src/tools/msvc/Mkvcbuild.pm and Solution.pm et.al for inspiration on how 
> > > to add
> > > the MSVC equivalent of --with-libpmem.  Currently the patch fails in the
> > > "Generating configuration headers" step in Solution.pm.
> > >
> > > --
> > > Daniel Gustafsson   https://vmware.com/
> > >
> >
> >
> > --
> > Takashi Menjo 
>
>
>
> --
> Takashi Menjo 



-- 
Takashi Menjo 


v6-0003-Add-wal_pmem_map-to-GUC.patch
Description: Binary data


v6-0001-Add-with-libpmem-option-for-PMEM-support.patch
Description: Binary data


v6-0004-Export-InstallXLogFileSegment.patch
Description: Binary data


v6-0002-Support-build-with-MSVC-on-Windows.patch
Description: Binary data


v6-0005-Map-WAL-segment-files-on-PMEM-as-WAL-buffers.patch
Description: Binary data


v6-0008-Let-wal_pmem_map-be-constant-unless-with-libpmem.patch
Description: Binary data


v6-0006-Compatible-to-Windows.patch
Description: Binary data


v6-0009-Ensure-WAL-mappings-before-assertion.patch
Description: Binary data


v6-0007-WAL-statistics-in-cases-of-wal_pmem_map-true.patch
Description: Binary data


v6-0010-Update-document.patch
Description: Binary data


v6-0011-Preallocate-and-initialize-more-WAL-if-wal_pmem_m.patch
Description: Binary data

Re: Remove trailing comma from enums

2022-01-05 Thread Peter Smith

On Thu, Jan 6, 2022 at 12:23 PM Tom Lane  wrote:
>
> Thomas Munro  writes:
> > On Thu, Jan 6, 2022 at 12:56 PM Peter Smith  wrote:
> >> These are quite rare in the PG src, so I doubt they are intentional.
> >> PSA a patch to remove the trailing commas for all that I found.
>
> > -1.  I don't see the problem with C99 trailing commas.  They avoid
> > noisy diff lines when patches add/remove items.
>
> I think they're rare because up till very recently we catered to
> pre-C99 compilers that wouldn't accept them.  There's not much
> point in insisting on that now, though.
>
> Personally I'm less excited than Thomas about trailing commas
> being good for reducing diff noise, mainly because I think
> that "add new entries at the end" is an anti-pattern, and
> if you put new items where they logically belong then the
> problem is much rarer.  But I'm not going to argue against
> committers who want to do it like that, either.

FWIW, the background of this was that one of these examples overlapped
with a feature currently in development and it just caused a waste of
everyone's time by firstly "fixing" (removing) the extra comma and
then getting multiple code reviews saying the change was unrelated to
that feature and so having to remove that fix again. So I felt
removing all such commas at HEAD not only makes all the enums
consistent, but it may prevent similar time-wasting for others in the
future.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

2022-01-05 Thread Peter Smith

On Thu, Jan 6, 2022 at 1:10 AM Alvaro Herrera  wrote:
>
> BTW I think it's not great to commit with the presented split.  We would
> have non-trivial short-lived changes for no good reason (0002 in
> particular).  I think this whole series should be a single patch, with

Yes, we know that eventually these parts will be combined and
committed as a single patch. What you see not is still a
work-in-progress. The current separation has been mostly for helping
multiple people collaborate without too much clashing. e.g., the 0002
patch has been kept separate just to help do performance testing of
that part in isolation.

> the commit message being a fusion of messages explaining in full what
> the functional change is, listing all the authors together.  Having a
> commit message like in 0001 where all the distinct changes are explained
> in separate sections with each section listing its own author, does not
> sound very useful or helpful.
>

Yes, the current v58-0001 commit message is just a combination of
previous historical patch comments as each of them got merged back
into the main patch. This message format was just a quick/easy way to
ensure that no information was accidentally lost along the way. We
understand that prior to the final commit this will all need to be
fused together just like you are suggesting.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: biblio.sgml dead link

2022-01-05 Thread Michael Paquier

On Wed, Jan 05, 2022 at 09:56:26PM +0100, Daniel Gustafsson wrote:
> I think we should remove the link, not because it costs money but because it
> can be purchased from several places, and choosing one to "favor" seems to
> invite criticism.  Kind of how we don't link to an online store for buying the
> other books.

Yeah, that's my feeling as well.  So done this way across the board.
--
Michael


signature.asc
Description: PGP signature

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-05 Thread Alexander Korotkov

Hi!

On Thu, Jan 6, 2022 at 3:02 AM Bruce Momjian  wrote:
>
> On Thu, Dec 30, 2021 at 03:15:16PM +0300, Maxim Orlov wrote:
> > PFA updated working patch v6 for PG15 development cycle.
> > It is based on a patch by Alexander Korotkov version 5 [5] with a few fixes,
> > refactoring and was rebased to PG15.
> >
> > Main changes:
> > - Change TransactionId to 64bit
> > - Disk tuple format (HeapTupleHeader) is unchanged: xmin and xmax
> >   remains 32bit
> > -- 32bit xid is named ShortTransactionId now.
> > -- Exception: see "double xmax" format below.
> > - Heap page format is changed to contain xid and multixact base value,
> >   tuple's xmin and xmax are offsets from.
> > -- xid_base and multi_base are stored as a page special data. PageHeader
> >remains unmodified.
> > -- If after upgrade page has no free space for special data, tuples are
> >converted to "double xmax" format: xmin became virtual
> >FrozenTransactionId, xmax occupies the whole 64bit.
> >Page converted to new format when vacuum frees enough space.
>
> I think it is a great idea to allow the 64-XID to span the 32-bit xmin
> and xmax fields when needed.  It would be nice if we can get focus on
> this feature so we are sure it gets into PG 15.  Can we add this patch
> incrementally so people can more easily analyze it?

I see at least the following major issues/questions in this patch.
1) Current code relies on the fact that TransactionId can be
atomically read from/written to shared memory.  With 32-bit systems
and 64-bit TransactionId, that's not true anymore.  Therefore, the
patch has concurrency issues on 32-bit systems.  We need to carefully
review these issues and provide a fallback for 32-bit systems.  I
suppose nobody is thinking about dropping off 32-bit systems, right?
Also, I wonder how critical for us is the overhead for 32-bit systems.
They are going to become legacy, so overhead isn't so critical, right?
2) With this patch we still need to freeze to cut SLRUs.  This is
especially problematic with Multixacts, because systems heavily using
row-level locks can consume an enormous amount of multixacts.  That is
especially problematic when we have 2x bigger multixacts.  We probably
can provide an alternative implementation for multixact vacuum, which
doesn't require scanning all the heaps.  That is a pretty amount of
work though.  The clog is much smaller and we can cut it more rarely.
Perhaps, we could tolerate freezing to cut clog, couldn't we?
3) 2x bigger in-memory representation of TransactionId have overheads.
In particular, it could mitigate the effect of recent advancements
from Andres Freund.  I'm not exactly sure we should/can do something
with this.  But I think this should be at least carefully analyzed.
4) SP-GiST index stores TransactionId on pages.  Could we tolerate
dropping SP-GiST indexes on a major upgrade?  Or do we have to invent
something smarter?
5) 32-bit limitation within the page mentioned upthread by Stephen
Frost should be also carefully considered.  Ideally, we should get rid
of it, but I don't have particular ideas in this field for now.  At
least, we should make sure we did our best at error reporting and
monitoring capabilities.

I think the realistic goal for PG 15 development cycle would be
agreement on a roadmap for all the items above (and probably some
initial implementations).

--
Regards,
Alexander Korotkov

Re: row filtering for logical replication

2022-01-05 Thread Peter Smith

On Wed, Jan 5, 2022 at 9:52 PM Amit Kapila  wrote:
>
...

> Another minor comment:
> +static bool pgoutput_row_filter(enum ReorderBufferChangeType changetype,
>
> Do we need to specify the 'enum' type before changetype parameter?
>

That is because there is currently no typedef for the enum
ReorderBufferChangeType.

Of course, it is easy to add a typedef and then this 'enum' is not
needed in the signature, but I wasn't sure if adding a new typedef
strictly belonged as part of this Row-Filter patch or not.

--
Kind Regards.
Peter Smith.
Fujitsu Australia.

RE: Optionally automatically disable logical replication subscriptions on error

2022-01-05 Thread tanghy.f...@fujitsu.com

On Wednesday, January 5, 2022 8:53 PM osumi.takami...@fujitsu.com 
 wrote:
> 
> On Tuesday, December 28, 2021 11:53 AM Wang, Wei/王 威
>  wrote:
> > On Thursday, December 16, 2021 8:51 PM osumi.takami...@fujitsu.com
> >  wrote:
> > > Attached the updated patch v14.
> >
> > A comment to the timing of printing a log:
> Thank you for your review !
> 
> > After the log[1] was printed, I altered subscription's option
> > (DISABLE_ON_ERROR) from true to false before invoking
> > DisableSubscriptionOnError to disable subscription. Subscription was not
> > disabled.
> > [1] "LOG:  logical replication subscription "sub1" will be disabled due to 
> > an
> > error"
> >
> > I found this log is printed in function WorkerErrorRecovery:
> > +   ereport(LOG,
> > +   errmsg("logical replication subscription \"%s\" will
> > be disabled due to an error",
> > +  MySubscription->name));
> > This log is printed here, but in DisableSubscriptionOnError, there is a 
> > check to
> > confirm subscription's disableonerr field. If disableonerr is found changed 
> > from
> > true to false in DisableSubscriptionOnError, subscription will not be 
> > disabled.
> >
> > In this case, "disable subscription" is printed, but subscription will not 
> > be
> > disabled actually.
> > I think it is a little confused to user, so what about moving this message 
> > after
> > the check which is mentioned above in DisableSubscriptionOnError?
> Makes sense. I moved the log print after
> the check of the necessity to disable the subscription.
> 
> Also, I've scrutinized and refined the new TAP test as well for refactoring.
> As a result, I fixed wait_for_subscriptions()
> so that some extra codes that can be simplified,
> such as escaped variable and one part of WHERE clause, are removed.
> Other change I did is to replace two calls of wait_for_subscriptions()
> with polling_query_until() for the subscriber, in order to
> make the tests better and more suitable for the test purposes.
> Again, for this refinement, I've conducted a tight loop test
> to check no timing issue and found no problem.
> 

Thanks for updating the patch. Here are some comments:

1)
+   /*
+* We would not be here unless this subscription's disableonerr field 
was
+* true when our worker began applying changes, but check whether that
+* field has changed in the interim.
+*/
+   if (!subform->subdisableonerr)
+   {
+   /*
+* Disabling the subscription has been done already. No need of
+* additional work.
+*/
+   heap_freetuple(tup);
+   table_close(rel, RowExclusiveLock);
+   CommitTransactionCommand();
+   return;
+   }

I don't understand what does "Disabling the subscription has been done already"
mean, I think we only run here when subdisableonerr is changed in the interim.
Should we modify this comment? Or remove it because there are already some
explanations before.

2)
+   /* Set the subscription to disabled, and note the reason. */
+   values[Anum_pg_subscription_subenabled - 1] = BoolGetDatum(false);
+   replaces[Anum_pg_subscription_subenabled - 1] = true;

I didn't see the code corresponding to "note the reason". Should we modify the
comment?

3)
+   booldisableonerr;   /* Whether errors automatically disable 
*/

This comment is hard to understand. Maybe it can be changed to:

Indicates if the subscription should be automatically disabled when subscription
workers detect any errors.

Regards,
Tang

Re: ICU for global collation

2022-01-05 Thread Julien Rouhaud

On Tue, Jan 04, 2022 at 05:03:10PM +0100, Peter Eisentraut wrote:
> On 04.01.22 03:21, Julien Rouhaud wrote:
> 
> > > - if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
> > > - mylocale = pg_newlocale_from_collation(collid);
> > > + if (!lc_collate_is_c(collid))
> > > + {
> > > + if (collid != DEFAULT_COLLATION_OID)
> > > + mylocale = pg_newlocale_from_collation(collid);
> > > + else if (default_locale.provider == COLLPROVIDER_ICU)
> > > + mylocale = &default_locale;
> > > + }
> > 
> > There are really a lot of places with this new code.  Maybe it could be some
> > new function/macro to wrap that for the normal case (e.g. not formatting.c)?
> 
> Right, we could just put this into pg_newlocale_from_collation(), but the
> comment there says
> 
>  * In fact, they shouldn't call this function at all when they are dealing
>  * with the default locale.  That can save quite a bit in hotspots.
> 
> I don't know how to assess that.
> 
> We could pack this into a macro or inline function if we are concerned about
> this.

Yes that was my idea, just have a new function (inline function or a macro
then since pg_newlocale_from_collation() clearly warns about performance
concerns) that have the whole
is-not-c-collation-and-is-default-collation-or-icu-collation logic and calls
pg_newlocale_from_collation() only when needed.

Re: Map WAL segment files on PMEM as WAL buffers

2022-01-05 Thread Justin Pryzby

The cfbot showed issues compiling on linux and windows.
http://cfbot.cputube.org/takashi-menjo.html

https://cirrus-ci.com/task/6125740327436288
[02:30:06.538] In file included from xlog.c:38:
[02:30:06.538] ../../../../src/include/access/xlogpmem.h:32:42: error: unknown 
type name ‘tli’
[02:30:06.538]32 | PmemXLogEnsurePrevMapped(XLogRecPtr ptr, tli)
[02:30:06.538]   |  ^~~
[02:30:06.538] xlog.c: In function ‘GetXLogBuffer’:
[02:30:06.538] xlog.c:1959:19: warning: implicit declaration of function 
‘PmemXLogEnsurePrevMapped’ [-Wimplicit-function-declaration]
[02:30:06.538]  1959 |openLogSegNo = PmemXLogEnsurePrevMapped(endptr, tli);

https://cirrus-ci.com/task/6688690280857600?logs=build#L379
[02:33:25.752] c:\cirrus\src\include\access\xlogpmem.h(33,1): error C2081: 
'tli': name in formal parameter list illegal (compiling source file 
src/backend/access/transam/xlog.c) [c:\cirrus\postgres.vcxproj]

I'm attaching a probable fix.  Unfortunately, for patches like this, most of
the functionality isn't exercised unless the library is installed and
compilation and runtime are enabled by default.

In 0009: recaluculated => recalculated

0010-Update-document should be squished with 0003-Add-wal_pmem_map-to-GUC (and
maybe 0002 and 0001).  I believe the patches after 0005 are more WIP, so it's
fine if they're not squished yet.  I'm not sure what the point is of this one:
0008-Let-wal_pmem_map-be-constant-unl

+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("could not pmem_map_file \"%s\": %m", 
path)));

=> The outer parenthesis are not needed since e3a87b4.
>From e5614f2ea3ff6aaf016343f81f74366440e18f6f Mon Sep 17 00:00:00 2001
From: Takashi Menjo 
Date: Tue, 23 Mar 2021 13:32:27 +0900
Subject: [PATCH 01/13] Add --with-libpmem option for PMEM support

---
 configure  | 99 ++
 configure.ac   | 17 +++
 src/include/pg_config.h.in |  6 +++
 3 files changed, 122 insertions(+)

diff --git a/configure b/configure
index 3b19105328d..22c364fac4f 100755
--- a/configure
+++ b/configure
@@ -699,6 +699,7 @@ with_gnu_ld
 LD
 LDFLAGS_SL
 LDFLAGS_EX
+with_libpmem
 LZ4_LIBS
 LZ4_CFLAGS
 with_lz4
@@ -868,6 +869,7 @@ with_libxslt
 with_system_tzdata
 with_zlib
 with_lz4
+with_libpmem
 with_gnu_ld
 with_ssl
 with_openssl
@@ -1576,6 +1578,7 @@ Optional Packages:
   use system time zone data in DIR
   --without-zlib  do not use Zlib
   --with-lz4  build with LZ4 support
+  --with-libpmem  build with PMEM support
   --with-gnu-ld   assume the C compiler uses GNU ld [default=no]
   --with-ssl=LIB  use LIB for SSL/TLS support (openssl)
   --with-openssl  obsolete spelling of --with-ssl=openssl
@@ -9033,6 +9036,41 @@ fi
   done
 fi
 
+#
+# libpmem
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with PMEM support" >&5
+$as_echo_n "checking whether to build with PMEM support... " >&6; }
+
+
+
+# Check whether --with-libpmem was given.
+if test "${with_libpmem+set}" = set; then :
+  withval=$with_libpmem;
+  case $withval in
+yes)
+
+$as_echo "#define USE_LIBPMEM 1" >>confdefs.h
+
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-libpmem option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_libpmem=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_libpmem" >&5
+$as_echo "$with_libpmem" >&6; }
+
+
 #
 # Assignments
 #
@@ -13504,6 +13542,56 @@ fi
 fi
 
 
+if test "$with_libpmem" = yes; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for pmem_memcpy in -lpmem" >&5
+$as_echo_n "checking for pmem_memcpy in -lpmem... " >&6; }
+if ${ac_cv_lib_pmem_pmem_memcpy+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lpmem  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char pmem_memcpy ();
+int
+main ()
+{
+return pmem_memcpy ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_pmem_pmem_memcpy=yes
+else
+  ac_cv_lib_pmem_pmem_memcpy=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_pmem_pmem_memcpy" >&5
+$as_echo "$ac_cv_lib_pmem_pmem_memcpy" >&6; }
+if test "x$ac_cv_lib_pmem_pmem_memcpy" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBPMEM 1
+_ACEOF
+
+  LIBS="-lpmem $LIBS"
+
+else
+  as_fn_error $? "library 'libpmem' (version >= 1.5) is required for PMEM support" "$LINENO" 5
+fi
+
+fi
+
 
 ##
 ## H

Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2022-01-05 Thread Dilip Kumar

On Wed, Jan 5, 2022 at 11:16 PM Andres Freund  wrote:
>
> Hi,
>
> On 2021-12-29 11:31:51 -0800, Andres Freund wrote:
> > That's pretty much the same - XLogInsert() can trigger an
> > XLogWrite()/Flush().
> >
> > I think it's a complete no-go to add throttling to these places. It's quite
> > possible that it'd cause new deadlocks, and it's almost guaranteed to have
> > unintended consequences (e.g. replication falling back further because
> > XLogFlush() is being throttled).
>
> I thought of another way to implement this feature. What if we checked the
> current distance somewhere within XLogInsert(), but only set
> InterruptPending=true, XLogDelayPending=true. Then in ProcessInterrupts() we
> check if XLogDelayPending is true and sleep the appropriate time.
>
> That way the sleep doesn't happen with important locks held / within a
> critical section, but we still delay close to where we went over the maximum
> lag. And the overhead should be fairly minimal.

+1, this sounds like a really good idea to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Bugs in pgoutput.c

2022-01-05 Thread Amit Kapila

On Thu, Jan 6, 2022 at 3:42 AM Tom Lane  wrote:
>
> Commit 6ce16088b caused me to look at pgoutput.c's handling of
> cache invalidations, and I was pretty appalled by what I found.
>
> * rel_sync_cache_relation_cb does the wrong thing when called for
> a cache flush (i.e., relid == 0).  Instead of invalidating all
> RelationSyncCache entries as it should, it will do nothing.
>
> * When rel_sync_cache_relation_cb does invalidate an entry,
> it immediately zaps the entry->map structure, even though that
> might still be in use (as per the adjacent comment that carefully
> explains why this isn't safe).  I'm not sure if this could lead
> to a dangling-pointer core dump, but it sure seems like it could
> lead to failing to translate tuples that are about to be sent.
>
> * Similarly, rel_sync_cache_publication_cb is way too eager to
> reset the pubactions flags, which would likely lead to failing
> to transmit changes that we should transmit.
>
> The attached patch fixes these things, but I'm still pretty
> unhappy with the general design of the data structures in
> pgoutput.c, because there is this weird random mishmash of
> static variables along with a palloc'd PGOutputData struct.
> This cannot work if there are ever two active LogicalDecodingContexts
> in the same process.  I don't think serial use of LogicalDecodingContexts
> (ie, destroy one and then make another) works very well either,
> because pgoutput_shutdown is a mere fig leaf that ignores all the
> junk the module previously made (in CacheMemoryContext no less).
> So I wonder whether either of those scenarios is possible/supported/
> expected to be needed in future.
>
> Also ... maybe I'm not looking in the right place, but I do not
> see anything anywhere in logical decoding that is taking any lock
> on the relation being processed.  How can that be safe?
>

We don't need to acquire a lock on relation while decoding changes
from WAL because it uses a historic snapshot to build a relcache entry
and all the later changes to the rel are absorbed while decoding WAL.

It is important to not acquire a lock on user-defined relations during
decoding otherwise it could lead to deadlock as explained in the email
[1].

* Would it be better if we move all the initialization done by patch
in get_rel_sync_entry() to a separate function as I expect future
patches might need to reset more things?

*
  * logical decoding callback calls - but invalidation events can come in
- * *during* a callback if we access the relcache in the callback. Because
- * of that we must mark the cache entry as invalid but not remove it from
- * the hash while it could still be referenced, then prune it at a later
- * safe point.
- *
- * Getting invalidations for relations that aren't in the table is
- * entirely normal, since there's no way to unregister for an invalidation
- * event. So we don't care if it's found or not.
+ * *during* a callback if we do any syscache or table access in the
+ * callback.

As we don't take locks on tables, can invalidation events be accepted
during table access? I could be missing something but I see relation.c
accepts invalidation messages only when lock mode is not 'NoLock'.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1Ks%2Bp8wDbzhDr7yMYEWDbWFRJAd_uOY-moikc%2Bzr9ER%2Bg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

2022-01-05 Thread Amit Kapila

On Thu, Jan 6, 2022 at 8:43 AM Peter Smith  wrote:
>
> On Wed, Jan 5, 2022 at 9:52 PM Amit Kapila  wrote:
> >
> ...
>
> > Another minor comment:
> > +static bool pgoutput_row_filter(enum ReorderBufferChangeType changetype,
> >
> > Do we need to specify the 'enum' type before changetype parameter?
> >
>
> That is because there is currently no typedef for the enum
> ReorderBufferChangeType.
>

But I see that the 0002 patch is already adding the required typedef.

> Of course, it is easy to add a typedef and then this 'enum' is not
> needed in the signature, but I wasn't sure if adding a new typedef
> strictly belonged as part of this Row-Filter patch or not.
>

I don't see any harm in doing so.

-- 
With Regards,
Amit Kapila.

Re: In-placre persistance change of a relation

2022-01-05 Thread Kyotaro Horiguchi

At Tue, 4 Jan 2022 16:05:08 -0800, Andres Freund  wrote in 
> The tap tests seems to fail on all platforms. See
> https://cirrus-ci.com/build/4911549314760704
> 
> E.g. the linux failure is
> 
> [16:45:15.569]
> [16:45:15.569] #   Failed test 'inserted'
> [16:45:15.569] #   at t/027_persistence_change.pl line 121.
> [16:45:15.569] # Looks like you failed 1 test of 25.
> [16:45:15.569] [16:45:15] t/027_persistence_change.pl ..
> [16:45:15.569] Dubious, test returned 1 (wstat 256, 0x100)
> [16:45:15.569] Failed 1/25 subtests
> [16:45:15.569] [16:45:15]
> [16:45:15.569]
> [16:45:15.569] Test Summary Report
> [16:45:15.569] ---
> [16:45:15.569] t/027_persistence_change.pl(Wstat: 256 Tests: 25 
> Failed: 1)
> [16:45:15.569]   Failed test:  18
> [16:45:15.569]   Non-zero exit status: 1
> [16:45:15.569] Files=27, Tests=315, 220 wallclock secs ( 0.14 usr  0.03 sys + 
> 48.94 cusr 17.13 csys = 66.24 CPU)
> 
> https://api.cirrus-ci.com/v1/artifact/task/4785083130314752/tap/src/test/recovery/tmp_check/log/regress_log_027_persistence_change

Thank you very much. It still doesn't fail on my devlopment
environment (CentOS8), but I found a silly bug of the test script.
I'm still not sure the reason the test item failed but I repost the
updated version then watch what the CI says.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e2deae1bef19827803e0e8f85b1e45e3fcd88505 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v14 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  52 ++
 src/backend/access/transam/README |   8 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlog.c |  17 +
 src/backend/catalog/storage.c | 545 +-
 src/backend/commands/tablecmds.c  | 266 +++--
 src/backend/replication/basebackup.c  |   3 +-
 src/backend/storage/buffer/bufmgr.c   |  88 +++
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 344 +++
 src/backend/storage/smgr/md.c |  93 ++-
 src/backend/storage/smgr/smgr.c   |  32 +
 src/backend/storage/sync/sync.c   |  20 +-
 src/bin/pg_rewind/parsexlog.c |  24 +
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  42 +-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/bufmgr.h  |   2 +
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |  10 +-
 src/include/storage/smgr.h|  17 +
 src/test/recovery/t/027_persistence_change.pl | 247 
 24 files changed, 1707 insertions(+), 182 deletions(-)
 create mode 100644 src/test/recovery/t/027_persistence_change.pl

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 773d57..d251f22207 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rnode.dbNode,
+		   xlrec->rnode.spcNode,
+		   xlrec->rnode.relNode,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action;
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+			default:
+action = "";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }

Re: In-placre persistance change of a relation

2022-01-05 Thread Andres Freund

Hi, 

On January 5, 2022 8:30:17 PM PST, Kyotaro Horiguchi  
wrote:
>At Tue, 4 Jan 2022 16:05:08 -0800, Andres Freund  wrote in 
>> The tap tests seems to fail on all platforms. See
>> https://cirrus-ci.com/build/4911549314760704
>> 
>> E.g. the linux failure is
>> 
>> [16:45:15.569]
>> [16:45:15.569] #   Failed test 'inserted'
>> [16:45:15.569] #   at t/027_persistence_change.pl line 121.
>> [16:45:15.569] # Looks like you failed 1 test of 25.
>> [16:45:15.569] [16:45:15] t/027_persistence_change.pl ..
>> [16:45:15.569] Dubious, test returned 1 (wstat 256, 0x100)
>> [16:45:15.569] Failed 1/25 subtests
>> [16:45:15.569] [16:45:15]
>> [16:45:15.569]
>> [16:45:15.569] Test Summary Report
>> [16:45:15.569] ---
>> [16:45:15.569] t/027_persistence_change.pl(Wstat: 256 Tests: 25 
>> Failed: 1)
>> [16:45:15.569]   Failed test:  18
>> [16:45:15.569]   Non-zero exit status: 1
>> [16:45:15.569] Files=27, Tests=315, 220 wallclock secs ( 0.14 usr  0.03 sys 
>> + 48.94 cusr 17.13 csys = 66.24 CPU)
>> 
>> https://api.cirrus-ci.com/v1/artifact/task/4785083130314752/tap/src/test/recovery/tmp_check/log/regress_log_027_persistence_change
>
>Thank you very much. It still doesn't fail on my devlopment
>environment (CentOS8), but I found a silly bug of the test script.
>I'm still not sure the reason the test item failed but I repost the
>updated version then watch what the CI says.

Fwiw, you can now test the same way as cfbot does with a lower turnaround time, 
as explained in src/tools/ci/README
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Map WAL segment files on PMEM as WAL buffers

2022-01-05 Thread Thomas Munro

On Thu, Jan 6, 2022 at 5:00 PM Justin Pryzby  wrote:
> I'm attaching a probable fix.  Unfortunately, for patches like this, most of
> the functionality isn't exercised unless the library is installed and
> compilation and runtime are enabled by default.

By the way, you could add a separate patch marked not-for-commit that
adds, say, an apt-get command to the Linux task in the .cirrus.yml
file, and whatever --with-blah stuff you might need to the configure
part, if that'd be useful to test this code.  Eventually, if we wanted
to support that permanently for all CI testing, we'd want to push
package installation down to the image building scripts (not in the pg
source tree) so that CI starts with everything we need pre-installed,
but one of the goals of the recent CI work was to make it possible for
patches that include dependency changes to be possible (for example
the alternative SSL library threads).

Re: Skipping logical replication transactions on subscriber side

2022-01-05 Thread Amit Kapila

On Wed, Jan 5, 2022 at 9:48 AM Dilip Kumar  wrote:
>
> On Wed, Jan 5, 2022 at 9:01 AM Amit Kapila  wrote:
> >
> > On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada  
> > wrote:
> >
> > Do you mean to say that you want to omit it even when we are
> > committing the changes?
> >
> > > Apart from that, I'm vaguely concerned that the logic seems to be
> > > getting complex. Probably it comes from the fact that we store
> > > skip_xid in the catalog and update the catalog to clear/set the
> > > skip_xid. It might be worth revisiting the idea of storing skip_xid on
> > > shmem (e.g., ReplicationState)?
> > >
> >
> > IIRC, the problem with that idea was that we won't remember skip_xid
> > information after server restart and the user won't even know that it
> > has to set it again.
>
>
> I agree, that if we don't keep it in the catalog then after restart if
> the transaction replayed again then the user has to set the skip xid
> again and that would be pretty inconvenient because the user might
> have to analyze the failure again and repeat the same process he did
> before restart.  But OTOH the combination of restart and the skip xid
> might not be very frequent so this might not be a very bad option.
> Basically, I am in favor of storing it in a catalog as that solution
> looks cleaner at least from the user pov but if we think there are a
> lot of complexities from the implementation pov then we might analyze
> the approach of storing in shmem as well.
>

Fair point, but I think it is better to see the patch or the problems
that can't be solved if we pursue storing it in catalog. Even, if we
decide to store it in shmem, we need to invent some way to inform the
user that we have not honored the previous setting of skip_xid and it
needs to be reset again.

-- 
With Regards,
Amit Kapila.

Re: Refactoring of compression options in pg_basebackup

2022-01-05 Thread Michael Paquier

On Wed, Jan 05, 2022 at 10:33:38AM -0500, Robert Haas wrote:
> I think we're going to want to offer both options. We can't know
> whether the user prefers to consume CPU cycles on the server or on the
> client. Compressing on the server has the advantage of potentially
> saving transfer bandwidth, but the server is also often the busiest
> part of the whole system, and users are often keen to offload as much
> work as possible.

Yeah.  There are cases for both.  I just got to wonder whether it
makes sense to allow both server-side and client-side compression to
be used at the same time.  That would be a rather strange case, but
well, with the correct set of options that could be possible.

> Given that, I'd like us to be thinking about what the full set of
> options looks like once we have (1) compression on either the server
> or the client and (2) multiple compression algorithms and (3) multiple
> compression levels. Personally, I don't really like the decision made
> by this proposed patch. In this patch's view of the world, -Z is a way
> of providing the compression level for whatever compression algorithm
> you happen to have selected, but I think of -Z as being the upper-case
> version of -z which I think of as selecting specifically gzip. It's
> not particularly intuitive to me that in a command like pg_basebackup
> --compress=,  is a compression level rather than
> an algorithm. So what I would propose is probably something like:
> 
> pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
> pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]
>
> And then make -z short for --compress=gzip and -Z  short for
> --compress=gzip --compression-level=. That would be a
> backward-incompatible change to the definition of --compress, but as
> long as -Z  works the same as today, I don't think many people will
> notice. If we like, we can notice if the argument to --compress is an
> integer and suggest using either -Z or --compress=gzip
> --compression-level= instead.

My view of things is slightly different, aka I'd rather keep
--compress to mean a compression level with an integer option, but
introduce a --compression-method={lz4,gzip,none}, with -Z being a
synonym of --compression-method=gzip.  That's at least the path we
chose for pg_receivewal.  I don't mind sticking with one way or
another, as what you are proposing is basically the same thing I have
in mind, but both tools ought to use the same set of options.

Hmm.  Perhaps at the end the problem is with --compress, where we
don't know if it means a compression level or a compression method?
For me, --compress means the former, and for you the latter.  So a
third way of seeing things is to drop completely --compress, but have
one --compression-method and one --compression-level.  That would
bring a clear split.  Or just one --compression-method for the
client-side compression as you are proposing for the server-side
compression, however I'd like to think that a split between the method
and level is more intuitive.

> In the proposed patch, you end up with pg_basebackup
> --compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
> find that quite odd, though as with all such things, opinions may
> vary. In my proposal, that would be an error, because it would be
> equivalent to --compress=lz4 --compress=gzip --compression-level=2,
> and would thus involve conflicting compression method specifications.

It seems to me that you did not read the patch closely enough.  The
attached patch does not add support for LZ4 in pg_basebackup on the
client-side yet.  Once it gets added, though, the idea is that  using
--compress with LZ4 would result in an error.  That's what happens
with pg_receivewal on HEAD, for one.  The patch just shapes things to
plug LZ4 more easily in the existing code of pg_basebackup.c, and
walmethods.c.

So..  As of now, it is actually possible to cut the pie in three
parts.  There are no real objections to the cleanup of walmethods.c
and the addition of some conditional TAP tests with pg_basebackup and 
client-side compression, as far as I can see, only to the option
renaming part.  Attached are two patches, then.  0001 is the cleanup
of walmethods.c to rely the compression method, with more tests (tests
that could be moved into their own patch, as well).  0002 is the
addition of the options I suggested upthread, but we may change that 
depending on what gets used for the server-side compression for
consistency so I am not suggesting to merge that until we agree on the
full picture.  The point of this thread was mostly about 0001, so I am
fine to discard 0002.  Thoughts?
--
Michael
From b2beb69c23ea8346a684e8c7ae5a5b60bade2066 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 6 Jan 2022 13:46:56 +0900
Subject: [PATCH v3 1/2] Refactor walmethods.c's tar method to use compression
 method

Some TAP tests are added for pg_basebackup --compress, while on it.
---
 src/bin/pg_basebackup/pg_baseback

Re: Delay the variable initialization in get_rel_sync_entry

2022-01-05 Thread Amit Langote

On Wed, Jan 5, 2022 at 10:31 AM Michael Paquier  wrote:
> On Fri, Dec 24, 2021 at 01:27:26PM +, houzj.f...@fujitsu.com wrote:
> > Here is the perf result of pgoutput_change after applying the patch.
> > I didn't notice something else that stand out.
> >
> > |--2.99%--pgoutput_change
> > |--1.80%--get_rel_sync_entry
> > |--1.56%--hash_search
> >
> > Also attach complete profiles.
>
> Thanks.  I have also done my own set of measurements, and the
> difference is noticeable in the profiles I looked at.  So, applied
> down to 13.

Thanks.  I agree the variables were being defined in the wrong place
before this patch.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Map WAL segment files on PMEM as WAL buffers

2022-01-05 Thread Justin Pryzby

On Thu, Jan 06, 2022 at 05:52:01PM +1300, Thomas Munro wrote:
> On Thu, Jan 6, 2022 at 5:00 PM Justin Pryzby  wrote:
> > I'm attaching a probable fix.  Unfortunately, for patches like this, most of
> > the functionality isn't exercised unless the library is installed and
> > compilation and runtime are enabled by default.
> 
> By the way, you could add a separate patch marked not-for-commit that
> adds, say, an apt-get command to the Linux task in the .cirrus.yml
> file, and whatever --with-blah stuff you might need to the configure
> part, if that'd be useful to test this code.

In general, I think that's more or less essential.

But in this case it really doesn't work :(

running bootstrap script ... 2022-01-05 23:17:30.244 CST [12088] FATAL:  file 
not on PMEM: path "pg_wal/00010001"

-- 
Justin

Re: GUC flags

2022-01-05 Thread Michael Paquier

On Wed, Jan 05, 2022 at 05:55:17PM -0600, Justin Pryzby wrote:
> pg_settings is currently defined with "SELECT *".  Is it fine to enumerate a
> list of columns instead ?

I'd like to think that this is a better practice when it comes
documenting the columns, but I don't see an actual need for this extra
complication here.

> + initStringInfo(&ret);
> + appendStringInfoChar(&ret, '{');
> +
> + if (flags & GUC_NO_SHOW_ALL)
> + appendStringInfo(&ret, "NO_SHOW_ALL,");
> + if (flags & GUC_NO_RESET_ALL)
> + appendStringInfo(&ret, "NO_RESET_ALL,");
> + if (flags & GUC_NOT_IN_SAMPLE)
> + appendStringInfo(&ret, "NOT_IN_SAMPLE,");
> + if (flags & GUC_EXPLAIN)
> + appendStringInfo(&ret, "EXPLAIN,");
> + if (flags & GUC_RUNTIME_COMPUTED)
> + appendStringInfo(&ret, "RUNTIME_COMPUTED,");
> +
> + /* Remove trailing comma, if any */
> + if (ret.len > 1)
> + ret.data[--ret.len] = '\0';

The way of building the text array is incorrect here.  See
heap_tuple_infomask_flags() in pageinspect as an example with all the
HEAP_* flags.  I think that you should allocate an array of Datums,
use CStringGetTextDatum() to assign each array element, wrapping the
whole with construct_array() to build the final value for the
parameter tuple.
--
Michael


signature.asc
Description: PGP signature

Re: Refactoring of compression options in pg_basebackup

2022-01-05 Thread Michael Paquier

On Wed, Jan 05, 2022 at 10:22:06AM -0500, Tom Lane wrote:
> I think the existing precedent is to skip the test if tar isn't there,
> cf pg_basebackup/t/010_pg_basebackup.pl.  But certainly the majority of
> buildfarm animals have it.

Even Windows environments should be fine, aka recent edc2332.
--
Michael


signature.asc
Description: PGP signature

Re: GUC flags

2022-01-05 Thread Kyotaro Horiguchi

At Tue, 4 Jan 2022 21:06:48 -0600, Justin Pryzby  wrote 
in 
> On Wed, Jan 05, 2022 at 11:47:57AM +0900, Kyotaro Horiguchi wrote:
> > At Tue, 28 Dec 2021 20:32:40 -0600, Justin Pryzby  
> > wrote in 
> In that case, *all* the flags should be exposed.  There's currently 20, which
> means it may not work well after all - it's already too long, and could get
> longer, and/or overflow the alphabet...

Yeah, if we show all 20 properties, the string is too long as well as
all properties cannot have a sensible abbreviation character..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: GUC flags

2022-01-05 Thread Justin Pryzby

On Thu, Jan 06, 2022 at 02:19:08PM +0900, Michael Paquier wrote:
> On Wed, Jan 05, 2022 at 05:55:17PM -0600, Justin Pryzby wrote:
> > pg_settings is currently defined with "SELECT *".  Is it fine to enumerate a
> > list of columns instead ?
> 
> I'd like to think that this is a better practice when it comes
> documenting the columns, but I don't see an actual need for this extra
> complication here.

The reason is to avoid showing the flags in the pg_settings view, which should
not be bloated just so we can retire check_guc.

> > +   initStringInfo(&ret);
> > +   appendStringInfoChar(&ret, '{');
> > +
> > +   if (flags & GUC_NO_SHOW_ALL)
> > +   appendStringInfo(&ret, "NO_SHOW_ALL,");
> > +   if (flags & GUC_NO_RESET_ALL)
> > +   appendStringInfo(&ret, "NO_RESET_ALL,");
> > +   if (flags & GUC_NOT_IN_SAMPLE)
> > +   appendStringInfo(&ret, "NOT_IN_SAMPLE,");
> > +   if (flags & GUC_EXPLAIN)
> > +   appendStringInfo(&ret, "EXPLAIN,");
> > +   if (flags & GUC_RUNTIME_COMPUTED)
> > +   appendStringInfo(&ret, "RUNTIME_COMPUTED,");
> > +
> > +   /* Remove trailing comma, if any */
> > +   if (ret.len > 1)
> > +   ret.data[--ret.len] = '\0';
> 
> The way of building the text array is incorrect here.  See
> heap_tuple_infomask_flags() in pageinspect as an example with all the
> HEAP_* flags.  I think that you should allocate an array of Datums,
> use CStringGetTextDatum() to assign each array element, wrapping the
> whole with construct_array() to build the final value for the
> parameter tuple.

I actually did it that way last night ... however GetConfigOptionByNum() is
expecting it to return a text string, not an array.

-- 
Justin

Re: SQL:2011 application time

2022-01-05 Thread Paul A Jungwirth

On Wed, Jan 5, 2022 at 8:07 AM Peter Eisentraut
 wrote:
>
> This patch set looks very interesting.

Thank you for the review!

I'll work on your feedback but in the meantime here are replies to
your questions:

> I'm confused about how to query tables based on application time
> periods.  Online, I see examples using AS OF, but in the SQL standard
> I only see this used for system time, which we are not doing here.

Correct, the standard only gives it for system time. I think
application time is intended to be more "in user space" so it's fine
to use regular operators in your WHERE condition against the time
columns, whereas system time is more of a managed thing---automatic,
read-only, possibly stored in a separate table. Having a special
syntax cue lets the RDBMS know it needs to involve the historical
records.

> validate_period(): Could use an explanatory comment.  There are a
> bunch of output arguments, and it's not clear what all of this is
> supposed to do, and what "validating" is in this context.

I'm not too happy with that function, but a previous reviewer asked me
to factor out what was shared between the CREATE TABLE and ALTER TABLE
cases. It does some sanity checks on the columns you've chosen, and
along the way it collects info about those columns that we'll need
later. But yeah all those out parameters are pretty ugly. I'll see if
I can come up with a stronger abstraction for it, and at the very
least I'll add some comments.

> MergeAttributes(): I would perhaps initially just prohibit inheritance
> situations that involve periods on either side.  (It should work for
> partitioning, IMO, but that should be easier to arrange.)

Okay. I'm glad to hear you think partitioning won't be too hard. It is
one of the last things, but to me it's a bit intimidating.

> I didn't follow why indexes would have periods, for example, the new
> period field in IndexStmt.  Is that explained anywhere?

When you create a primary key or a unique constraint (which are backed
by a unique index), you can give a period name to make it a temporal
constraint. We create the index first and then create the constraint
as a side-effect of that (e.g. index_create calls
index_constraint_create). The analysis phase generates an IndexStmt.
So I think this was mostly a way to pass the period info down to the
constraint. It probably doesn't actually need to be stored on pg_index
though. Maybe it does for index_concurrently_create_copy. I'll add
some comments, but if you think it's the wrong approach let me know.

> While reading this patch I kept wondering whether it would be possible
> to fold periods into pg_attribute, perhaps with negative attribute
> numbers.  Have you looked into something like that?  No doubt it's
> also complicated, but it might simplify some things, like the name
> conflict checking.

Hmm, I thought that sort of thing would be frowned upon. :-) But also
it seems like periods really do have a bunch of details they need
beyond what other attributes have (e.g. the two source attributes, the
matching range type, the period type (application-vs-system), maybe
some extra things for table inheritance.

Also are you sure we aren't already using negative attnums somewhere
already? I thought I saw something like that.

> Of course, the main problem in this patch is that for most uses it
> requires btree_gist.  I think we should consider moving that into
> core, or at least the support for types that are most relevant to this
> functionality, specifically the date/time types.  Aside from user
> convenience, this would also allow writing more realistic test cases.

I think this would be great too. How realistic do you think it is? I
figured since exclusion constraints are also pretty useless without
btree_gist, it wasn't asking too much to have people install the
extension, but still it'd be better if it were all built in.

> src/backend/access/brin/brin_minmax_multi.c
>
> These renaming changes seem unrelated (but still seem like a good
> idea).  Should they be progressed separately?

I can pull this out into a separate patch. I needed to do it because
when I added an `#include ` somewhere, these conflicted
with the range_{de,}serialize functions declared there.

> I don't understand why a temporal primary key is required for doing
> UPDATE FOR PORTION OF.  I don't see this in the standard.

You're right, it's not in the standard. I'm doing that because
creating the PK is when we add the triggers to implement UPDATE FOR
PORTION OF. I thought it was acceptable since we also require a
PK/unique constraint as the referent of a foreign key. But we could
avoid it if I went back to the executor-based FOR PORTION OF
implementation, since that doesn't depend on triggers. What do you
think?

Also: I noticed recently that you can't use FOR PORTION OF against an
updatable view. I'm working on a new patch set to fix that. But the
main reason is this PK check. So that's maybe another reason to go
back to the executor implemen

RE: Optionally automatically disable logical replication subscriptions on error

2022-01-05 Thread osumi.takami...@fujitsu.com

On Thursday, January 6, 2022 12:17 PM Tang, Haiying/唐 海英 
 wrote:
> Thanks for updating the patch. Here are some comments:
Thank you for your review !

> 1)
> + /*
> +  * We would not be here unless this subscription's disableonerr field
> was
> +  * true when our worker began applying changes, but check whether
> that
> +  * field has changed in the interim.
> +  */
> + if (!subform->subdisableonerr)
> + {
> + /*
> +  * Disabling the subscription has been done already. No need
> of
> +  * additional work.
> +  */
> + heap_freetuple(tup);
> + table_close(rel, RowExclusiveLock);
> + CommitTransactionCommand();
> + return;
> + }
> 
> I don't understand what does "Disabling the subscription has been done
> already"
> mean, I think we only run here when subdisableonerr is changed in the interim.
> Should we modify this comment? Or remove it because there are already some
> explanations before.
Removed. The description you pointed out was redundant.

> 2)
> + /* Set the subscription to disabled, and note the reason. */
> + values[Anum_pg_subscription_subenabled - 1] =
> BoolGetDatum(false);
> + replaces[Anum_pg_subscription_subenabled - 1] = true;
> 
> I didn't see the code corresponding to "note the reason". Should we modify the
> comment?
Fixed the comment by removing the part.
We come here when an error occurred and the reason is printed as log
so no need to note more reason.

> 3)
> + booldisableonerr;   /* Whether errors automatically
> disable */
> 
> This comment is hard to understand. Maybe it can be changed to:
> 
> Indicates if the subscription should be automatically disabled when
> subscription workers detect any errors.
Agreed. Fixed.

Kindly have a look at the attached v16.

Best Regards,
Takamichi Osumi



v16-0001-Optionally-disable-subscriptions-on-error.patch
Description: v16-0001-Optionally-disable-subscriptions-on-error.patch

Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2022-01-05 Thread SATYANARAYANA NARLAPURAM

On Wed, Jan 5, 2022 at 9:46 AM Andres Freund  wrote:

> Hi,
>
> On 2021-12-29 11:31:51 -0800, Andres Freund wrote:
> > That's pretty much the same - XLogInsert() can trigger an
> > XLogWrite()/Flush().
> >
> > I think it's a complete no-go to add throttling to these places. It's
> quite
> > possible that it'd cause new deadlocks, and it's almost guaranteed to
> have
> > unintended consequences (e.g. replication falling back further because
> > XLogFlush() is being throttled).
>
> I thought of another way to implement this feature. What if we checked the
> current distance somewhere within XLogInsert(), but only set
> InterruptPending=true, XLogDelayPending=true. Then in ProcessInterrupts()
> we
> check if XLogDelayPending is true and sleep the appropriate time.
>
> That way the sleep doesn't happen with important locks held / within a
> critical section, but we still delay close to where we went over the
> maximum
> lag. And the overhead should be fairly minimal.
>

+1 to the idea, this way we can fairly throttle large and
smaller transactions the same way. I will work on this model and share the
patch. Please note that the lock contention still exists in this case.


> I'm doubtful that implementing the waits on a transactional level provides
> a
> meaningful enough amount of control - there's just too much WAL that can be
> generated within a transaction.
>


>
> Greetings,
>
> Andres Freund
>

Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2022-01-05 Thread Dilip Kumar

On Thu, Jan 6, 2022 at 11:27 AM SATYANARAYANA NARLAPURAM
 wrote:

> On Wed, Jan 5, 2022 at 9:46 AM Andres Freund  wrote:
>>
>> Hi,
>>
>> On 2021-12-29 11:31:51 -0800, Andres Freund wrote:
>> > That's pretty much the same - XLogInsert() can trigger an
>> > XLogWrite()/Flush().
>> >
>> > I think it's a complete no-go to add throttling to these places. It's quite
>> > possible that it'd cause new deadlocks, and it's almost guaranteed to have
>> > unintended consequences (e.g. replication falling back further because
>> > XLogFlush() is being throttled).
>>
>> I thought of another way to implement this feature. What if we checked the
>> current distance somewhere within XLogInsert(), but only set
>> InterruptPending=true, XLogDelayPending=true. Then in ProcessInterrupts() we
>> check if XLogDelayPending is true and sleep the appropriate time.
>>
>> That way the sleep doesn't happen with important locks held / within a
>> critical section, but we still delay close to where we went over the maximum
>> lag. And the overhead should be fairly minimal.
>
>
> +1 to the idea, this way we can fairly throttle large and smaller 
> transactions the same way. I will work on this model and share the patch. 
> Please note that the lock contention still exists in this case.

Generally while checking for the interrupt we should not be holding
any lwlock that means we don't have the risk of holding any buffer
locks, so any other reader can continue to read from those buffers.
We will only be holding some heavyweight locks like relation/tuple
lock but that will not impact anyone except the writers trying to
update the same tuple or the DDL who want to modify the table
definition so I don't think we have any issue here because anyway we
don't want other writers to continue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: 回复：Re: Re: 回复：Re: Is it worth pushing conditions to sublink/subplan?

2022-01-05 Thread wenjing

I corrected it according to your suggestion.

thanks

Wenjing.


Zhihong Yu  于2021年12月25日周六 02:26写道：

>
>
> On Thu, Dec 23, 2021 at 3:52 AM 曾文旌(义从) 
> wrote:
>
>>
>> Fixed a bug found during testing.
>>
>>
>> Wenjing
>>
>>
 Hi,
> +   if (condition_is_safe_pushdown_to_sublink(rinfo,
> expr_info->outer))
> +   {
> +   /* replace qual expr from outer var = const to var = const
> and push down to sublink query */
> +   sublink_query_push_qual(subquery, (Node
> *)copyObject(rinfo->clause), expr_info->outer, expr_info->inner);
>
> Since sublink_query_push_qual() is always guarded
> by condition_is_safe_pushdown_to_sublink(), it seems
> sublink_query_push_qual() can be folded into
> condition_is_safe_pushdown_to_sublink().
>
> For generate_base_implied_equalities():
>
> +   if (ec->ec_processed)
> +   {
> +   ec_index++;
> +   continue;
> +   }
> +   else if (list_length(ec->ec_members) > 1)
>
> Minor comment: the keyword `else` can be omitted (due to `continue` above).
>
> +* Since there may be an unexpanded sublink in the targetList,
> +* we'll skip it for now.
>
> Since there may be an -> If there is an
>
> +   {"lazy_process_sublink", PGC_USERSET, QUERY_TUNING_METHOD,
> +   gettext_noop("enable lazy process sublink."),
>
> Looking at existing examples from src/backend/utils/misc/guc.c,
> enable_lazy_sublink_processing seems to be consistent with existing guc
> variable naming.
>
> +lazy_process_sublinks(PlannerInfo *root, bool single_result_rte)
>
> lazy_process_sublinks -> lazily_process_sublinks
>
> +   else
> +   {
> /* There shouldn't be any OJ info to translate, as yet */
> Assert(subroot->join_info_list == NIL);
>
> Indentation for the else block is off.
>
> +   if (istop)
> +   f->quals = preprocess_expression_ext(root, f->quals,
> EXPRKIND_QUAL, false);
> +   else
> +   f->quals = preprocess_expression_ext(root, f->quals,
> EXPRKIND_QUAL, true);
>
> The above can be written as:
>
> +   f->quals = preprocess_expression_ext(root, f->quals,
> EXPRKIND_QUAL, !istop);
>
> For find_equal_conditions_contain_uplevelvar_in_sublink_query():
> +   context.has_unexpected_expr == false &&
> `!context.has_unexpected_expr` should suffice
>
> equal_expr_safety_check -> is_equal_expr_safe
>
> Cheers
>
>


0001-poc-pushdown-qual-to-sublink-v5.patch
Description: Binary data

Re: sqlsmith: ERROR: XX000: bogus varno: 2

2022-01-05 Thread Amit Langote

On Tue, Dec 21, 2021 at 6:20 AM Tom Lane  wrote:
> Robert Haas  writes:
> > OK ... but my point is that dump and restore does work. So whatever
> > cases pg_get_expr() doesn't work must be cases that aren't needed for
> > that to happen. Otherwise this problem would have been found long ago.
>
> pg_get_expr doesn't (or didn't) depend on expression_tree_walker,
> so there wasn't a problem there before.  I am worried that there
> might be other code paths, now or in future, that could try to apply
> expression_tree_walker/mutator to relpartbound trees, which is
> why I think it's a reasonable idea to teach them about such trees.
>
> > I realize that's a deep hole out of which we're unlikely to be able to
> > climb in the short or even medium term, but we don't have to keep
> > digging. We either make a rule that pg_get_expr() can apply to
> > everything stored in the catalogs and produce sensible answers, which
> > seems to be what you prefer, or we make it return nice errors for the
> > cases that it can't handle nicely, or some combination of the two. And
> > whatever we decide, we also document and enforce everywhere.
>
> I think having pg_get_expr throw an error for a query, as opposed to an
> expression, is fine.  What I don't want to do is subdivide things a lot
> more finely than that; thus lumping "relpartbound" into "expression"
> seems like a reasonable thing to do.  Especially since we already did it
> six years ago.

I admit that it was an oversight on my part that relpartbound trees
are not recognized by nodeFuncs.c. :-(

Thanks for addressing that in the patch you posted.  I guess fixing
only expression_tree_walker/mutator() suffices for now, but curious to
know if it was intentional that you decided not to touch the following
sites:

exprCollation(): it would perhaps make sense to return the collation
assigned to the 1st element of listdatums/lowerdatums/upperdatums,
especially given that transformPartitionBoundValue() does assign a
collation to the values in those lists based on the parent's partition
key specification.

exprType(): could be handled similarly

queryjumble.c: JumbleExpr(): whose header comment says:

 * expression_tree_walker() does, and therefore it's coded to be as parallel
 * to that function as possible.
 * ...
 * Note: the reason we don't simply use expression_tree_walker() is that the
 * point of that function is to support tree walkers that don't care about
 * most tree node types, but here we care about all types.  We should complain
 * about any unrecognized node type.

or maybe not, because relpartbound contents ought never reach queryjumble.c?

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-01-05 Thread Amul Sul

On Fri, Dec 31, 2021 at 7:56 AM Amit Langote  wrote:
>
> On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat  
> wrote:
>>
>> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote  wrote:
>> >
>> > Executing generic plans involving partitions is known to become slower
>> > as partition count grows due to a number of bottlenecks, with
>> > AcquireExecutorLocks() showing at the top in profiles.
>> >
>> > Previous attempt at solving that problem was by David Rowley [1],
>> > where he proposed delaying locking of *all* partitions appearing under
>> > an Append/MergeAppend until "initial" pruning is done during the
>> > executor initialization phase.  A problem with that approach that he
>> > has described in [2] is that leaving partitions unlocked can lead to
>> > race conditions where the Plan node belonging to a partition can be
>> > invalidated when a concurrent session successfully alters the
>> > partition between AcquireExecutorLocks() saying the plan is okay to
>> > execute and then actually executing it.
>> >
>> > However, using an idea that Robert suggested to me off-list a little
>> > while back, it seems possible to determine the set of partitions that
>> > we can safely skip locking.  The idea is to look at the "initial" or
>> > "pre-execution" pruning instructions contained in a given Append or
>> > MergeAppend node when AcquireExecutorLocks() is collecting the
>> > relations to lock and consider relations from only those sub-nodes
>> > that survive performing those instructions.   I've attempted
>> > implementing that idea in the attached patch.
>> >
>>
>> In which cases, we will have "pre-execution" pruning instructions that
>> can be used to skip locking partitions? Can you please give a few
>> examples where this approach will be useful?
>
>
> This is mainly to be useful for prepared queries, so something like:
>
> prepare q as select * from partitioned_table where key = $1;
>
> And that too when execute q(…) uses a generic plan. Generic plans are 
> problematic because it must contain nodes for all partitions (without any 
> plan time pruning), which means CheckCachedPlan() has to spend time 
> proportional to the number of partitions to determine that the plan is still 
> usable / has not been invalidated; most of that is AcquireExecutorLocks().
>
> Other bottlenecks, not addressed in this patch, pertain to some executor 
> startup/shutdown subroutines that process the range table of a PlannedStmt in 
> its entirety, whose length is also proportional to the number of partitions 
> when the plan is generic.
>
>> The benchmark is showing good results, indeed.
>
Indeed.

Here are few comments for v1 patch:

+   /* Caller error if we get here without contains_init_steps */
+   Assert(pruneinfo->contains_init_steps);

-   prunedata = prunestate->partprunedata[i];
-   pprune = &prunedata->partrelprunedata[0];

-   /* Perform pruning without using PARAM_EXEC Params */
-   find_matching_subplans_recurse(prunedata, pprune, true, &result);
+   if (parentrelids)
+   *parentrelids = NULL;

You got two blank lines after Assert.
--

+   /* Set up EState if not in the executor proper. */
+   if (estate == NULL)
+   {
+   estate = CreateExecutorState();
+   estate->es_param_list_info = params;
+   free_estate = true;
}

... [Skip]

+   if (free_estate)
+   {
+   FreeExecutorState(estate);
+   estate = NULL;
}

I think this work should be left to the caller.
--

/*
 * Stuff that follows matches exactly what ExecCreatePartitionPruneState()
 * does, except we don't need a PartitionPruneState here, so don't call
 * that function.
 *
 * XXX some refactoring might be good.
 */

+1, while doing it would be nice if foreach_current_index() is used
instead of the i & j sequence in the respective foreach() block, IMO.
--

+   while ((i = bms_next_member(validsubplans, i)) >= 0)
+   {
+   Plan   *subplan = list_nth(subplans, i);
+
+   context->relations =
+   bms_add_members(context->relations,
+   get_plan_scanrelids(subplan));
+   }

I think instead of get_plan_scanrelids() the
GetLockableRelations_worker() can be used; if so, then no need to add
get_plan_scanrelids() function.
--

 /* Nodes containing prunable subnodes. */
+   case T_MergeAppend:
+   {
+   PlannedStmt *plannedstmt = context->plannedstmt;
+   List   *rtable = plannedstmt->rtable;
+   ParamListInfo params = context->params;
+   PartitionPruneInfo *pruneinfo;
+   Bitmapset  *validsubplans;
+   Bitmapset  *parentrelids;

...
if (pruneinfo && pruneinfo->contains_init_steps)
{
int i;
...
   return false;
}
}
break;

Most of the declarations need t

Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2022-01-05 Thread SATYANARAYANA NARLAPURAM

On Wed, Jan 5, 2022 at 10:05 PM Dilip Kumar  wrote:

> On Thu, Jan 6, 2022 at 11:27 AM SATYANARAYANA NARLAPURAM
>  wrote:
>
> > On Wed, Jan 5, 2022 at 9:46 AM Andres Freund  wrote:
> >>
> >> Hi,
> >>
> >> On 2021-12-29 11:31:51 -0800, Andres Freund wrote:
> >> > That's pretty much the same - XLogInsert() can trigger an
> >> > XLogWrite()/Flush().
> >> >
> >> > I think it's a complete no-go to add throttling to these places. It's
> quite
> >> > possible that it'd cause new deadlocks, and it's almost guaranteed to
> have
> >> > unintended consequences (e.g. replication falling back further because
> >> > XLogFlush() is being throttled).
> >>
> >> I thought of another way to implement this feature. What if we checked
> the
> >> current distance somewhere within XLogInsert(), but only set
> >> InterruptPending=true, XLogDelayPending=true. Then in
> ProcessInterrupts() we
> >> check if XLogDelayPending is true and sleep the appropriate time.
> >>
> >> That way the sleep doesn't happen with important locks held / within a
> >> critical section, but we still delay close to where we went over the
> maximum
> >> lag. And the overhead should be fairly minimal.
> >
> >
> > +1 to the idea, this way we can fairly throttle large and smaller
> transactions the same way. I will work on this model and share the patch.
> Please note that the lock contention still exists in this case.
>
> Generally while checking for the interrupt we should not be holding
> any lwlock that means we don't have the risk of holding any buffer
> locks, so any other reader can continue to read from those buffers.
> We will only be holding some heavyweight locks like relation/tuple
> lock but that will not impact anyone except the writers trying to
> update the same tuple or the DDL who want to modify the table
> definition so I don't think we have any issue here because anyway we
> don't want other writers to continue.
>

Yes, it should be ok. I was just making it explicit on Andres' previous
comment on holding the heavyweight locks when wait before the commit.

>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>

Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-05 Thread Julien Rouhaud

Hi Pavel,

On Wed, Mar 24, 2021 at 08:09:13AM +0100, Pavel Stehule wrote:
> 
> I am continuing to  talk with Hannu about syntax. I think so using the
> compile option is a good direction, but maybe renaming from "routine_label"
> to "topscope_label" (proposed by Hannu) or maybe "outerscope_label" can be
> a good idea.

It's not clear to me what is the status of this patch.  You sent some updated
patch since with some typo fixes but I don't see any conclusion about the
direction this patch should go and/or how to name the new option.

Should the patch be reviewed or switched to Waiting on Author?

Re: a misbehavior of partition row movement (?)

2022-01-05 Thread Amit Langote

Hi Alvaro,

On Thu, Jan 6, 2022 at 7:27 AM Alvaro Herrera  wrote:
> Pushed 0001.

Thank you.

> I had to adjust the query used in pg_dump; you changed the attribute
> name in the query used for pg15, but left unchanged the one for older
> branches, so pg_dump failed for all branches other than 15.

Oops, should've tested that.

>  Also,
> psql's describe.c required a small tweak to a version number test.

Ah, yes -- 15, not 14 anymore.

> https://github.com/alvherre/postgres/commit/3451612e0fa082d3ec953551c6d25432bd725502
>
> Thanks!  What was 0002 is attached, to keep cfbot happy.  It's identical
> to your v11-0002.
>
> I have pushed it thinking that we would not backpatch any of this fix.
> However, after running the tests and realizing that I didn't need an
> initdb for either patch, I wonder if maybe the whole series *is*
> backpatchable.

Interesting thought.

We do lack help from trigger.c in the v12 branch in that there's no
Trigger.tgisclone, which is used in a couple of places in the fix.  I
haven't checked how big of a deal it would be to back-port
Trigger.tgisclone to v12, but maybe that's doable.

> There is one possible problem, which is that psql and pg_dump would need
> testing to verify that they work decently (i.e. no crash, no
> misbehavior) with partitioned tables created with the original code.

I suppose you mean checking if the psql and pg_dump after applying
*0001* work sanely with partitioned tables defined without 0001?

Will test that.

> But there are few ABI changes, maybe we can cope and get all branches
> fixes instead of just 15.
>
> What do you think?

Yeah, as long as triggers are configured as required by the fix, and
that would be ensured if we're able to back-patch 0001 down to v12, I
suppose it would be nice to get this fixed down to v12.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Emit "checkpoint skipped because system is idle" message at LOG level if log_checkpoints is set

2022-01-05 Thread Kyotaro Horiguchi

At Wed, 5 Jan 2022 17:18:06 -0600, Justin Pryzby  wrote 
in 
> On Wed, Jan 05, 2022 at 10:45:06AM +0530, Dilip Kumar wrote:
> > On Wed, Jan 5, 2022 at 10:24 AM Bharath Rupireddy 
> >  wrote:
> > >
> > > Postgres server emits a message at DEBUG1 level when it skips a
> > > checkpoint. At times, developers might be surprised after figuring out
> > > from server logs that there were no checkpoints happening at all
> > > during a certain period of time when DEBUG1 messages aren't captured.
> > > How about emitting the message at LOG level if log_checkpoints is set?
> > > Patch attached.
> > 
> > +1 to convert to LOG when log_checkpoints is set.
> 
> I think it would be odd to write logs of increased severity, for the case 
> where
> we did not do anything.  I think it really is a debug log.
> 
> I don't think the log level should be changed to avoid "developer" confusion,
> as you said (I'm not sure if you mean a postgres developer or an application
> developer, though).
> 
> Is there any evidence that this has caused user confusion in the last 4 years 
> ?
> 
> |commit 6ef2eba3f57f17960b7cd4958e18aa79e357de2f
> |Author: Andres Freund 
> |Date:   Thu Dec 22 11:31:50 2016 -0800
> |
> |Skip checkpoints, archiving on idle systems.
> 
> Note that logging a message may not be benign ; I think it could cause the
> disks to spin up, that would othewise have been in power saving mode,
> especially if you log to syslog, which can issue fsync.  Also, part of the
> argument for enabling log_checkpoint by default was that a small, quiescent
> instance would not write logs every 5 minutes.

Agreed. -1 to just raising elevel of the message.

If someone keen to show some debug messages, it is viable for
arbitrary messages by lowering log_min_messages then inserting a
custom filter to emit_log_hook.  It invites some overhead on
irrelevant processes, but that overhead would be avoidable with a
*bit* dirty hack in the filter,

We might want to discuss more convenient or cleaner way to get the
same result.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: In-placre persistance change of a relation

2022-01-05 Thread Kyotaro Horiguchi

At Wed, 05 Jan 2022 20:42:32 -0800, Andres Freund  wrote in 
> Hi, 
> 
> On January 5, 2022 8:30:17 PM PST, Kyotaro Horiguchi 
>  wrote:
> >I'm still not sure the reason the test item failed but I repost the
> >updated version then watch what the CI says.
> 
> Fwiw, you can now test the same way as cfbot does with a lower turnaround 
> time, as explained in src/tools/ci/README

Fantastic! I'll give it a try. Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Make relfile tombstone files conditional on WAL level

2022-01-05 Thread Dilip Kumar

On Thu, Jan 6, 2022 at 3:07 AM Robert Haas  wrote:
>
> On Mon, Aug 2, 2021 at 6:38 PM Andres Freund  wrote:
> > I guess there's a somewhat hacky way to get somewhere without actually
> > increasing the size. We could take 3 bytes from the fork number and use that
> > to get to a 7 byte relfilenode portion. 7 bytes are probably enough for
> > everyone.
> >
> > It's not like we can use those bytes in a useful way, due to alignment
> > requirements. Declaring that the high 7 bytes are for the relNode portion 
> > and
> > the low byte for the fork would still allow efficient comparisons and 
> > doesn't
> > seem too ugly.
>
> I think this idea is worth more consideration. It seems like 2^56
> relfilenodes ought to be enough for anyone, recalling that you can
> only ever have 2^64 bytes of WAL. So if we do this, we can eliminate a
> bunch of code that is there to guard against relfilenodes being
> reused. In particular, we can remove the code that leaves a 0-length
> tombstone file around until the next checkpoint to guard against
> relfilenode reuse.

+1

>
> I think this would also solve a problem Dilip mentioned to me today:
> suppose you make ALTER DATABASE SET TABLESPACE WAL-logged, as he's
> been trying to do. Then suppose you do "ALTER DATABASE foo SET
> TABLESPACE used_recently_but_not_any_more". You might get an error
> complaining that “some relations of database \“%s\” are already in
> tablespace \“%s\“” because there could be tombstone files in that
> database. With this combination of changes, you could just use the
> barrier mechanism from https://commitfest.postgresql.org/36/2962/ to
> wait for those files to disappear, because they've got to be
> previously-unliked files that Windows is still returning because
> they're still opening -- or else they could be a sign of a corrupted
> database, but there are no other possibilities.

Yes, this approach will solve the problem for the WAL-logged ALTER
DATABASE we are facing.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Support tab completion for upper character inputs in psql

2022-01-05 Thread Peter Eisentraut


On 10.09.21 15:50, tanghy.f...@fujitsu.com wrote:

(A test case for the enum case should be doable easily.)

Test added.


The enum test is failing on *some* platforms:

t/010_tab_completion.pl .. 26/?
#   Failed test 'complete enum values'
#   at t/010_tab_completion.pl line 211.
# Actual output was "ALTER TYPE mytype1 RENAME VALUE '\a\r\n'BLUE' 
'bLACK'  'green'  \r\npostgres=# ALTER TYPE mytype1 RENAME VALUE '"

# Did not match "(?^:'bLACK' + 'BLUE' + 'green')"

So the ordering of the suggested completions is different.  I don't know 
offhand how that ordering is determined.  Perhaps it's dependent on 
locale, readline version, or operating system.  In any case, we need to 
figure this out to make this test stable.

Re: pl/pgsql feature request: shorthand for argument and local variable references

2022-01-05 Thread Pavel Stehule

čt 6. 1. 2022 v 8:02 odesílatel Julien Rouhaud  napsal:

> Hi Pavel,
>
> On Wed, Mar 24, 2021 at 08:09:13AM +0100, Pavel Stehule wrote:
> >
> > I am continuing to  talk with Hannu about syntax. I think so using the
> > compile option is a good direction, but maybe renaming from
> "routine_label"
> > to "topscope_label" (proposed by Hannu) or maybe "outerscope_label" can
> be
> > a good idea.
>
> It's not clear to me what is the status of this patch.  You sent some
> updated
> patch since with some typo fixes but I don't see any conclusion about the
> direction this patch should go and/or how to name the new option.
>
> Should the patch be reviewed or switched to Waiting on Author?
>

I am not sure if there is enough agreement and if there is enough necessity
for this feature.

In this discussion there were more general questions about future
development of plpgsql (about possible configurations and compiler
parametrizations, and how to switch the configuration on global and on
local levels). This is not fully solved yet. This is probably the reason
why discussion about this specific feature (and very simple feature) has
not reached a clear end. Generally, this is the same problem in Postgres.

Personally, I don't want to write a complicated patch for this relatively
trivial (and not too important feature). Sure, then it cannot solve more
general questions. I think the plpgsql option can work well - maybe with
the keyword "top_label" or "root_label". Probably some enhancing of the
DECLARE statement can work too (if it will be done well). But it requires
more work, because the DECLARE statement has block scope, and then the
alias of the top label has to have block scope too. Theoretically we can
propagate it to GUC too - same like plpgsql.variable_conflict. This simple
design can work (my opinion), but I understand so it is not a general reply
for people who can have more control over the behavior of plpgsql. But I
think it is a different issue and should be solved separately (and this
pretty complex problem, because Postgres together with PLpgSQL mix
different access models - access rights, searching path, local scope, and
we miss possible parametrization on schema level what has most sense for
PLpgSQL).

Regards

Pavel

98 matches

Mail list logo