date:20220118

Python-3.11 patch

2022-01-18 Thread Horvath, Miklos

Hi,

This patch is important for postgresql 13.5 and 14.1 if you want to use
them with Python-3.11.

Miklos
Description: Build fix on Python3-3.11 requires for Python > 3.10 because the eval.h header file was removed. 
These files must not be included directly, as they are already included in Python.h
Reference : https://docs.python.org/3.11/whatsnew/3.11.html
* Charles K Barcza - blackPanther OS (www.blackpanther.hu) - kbar...@blackpanther.hu
* Miklos Horvath - blackPanther Project Python Developer - hm...@blackpanther.hu
diff -ruN postgresql-14.1.orig/src/pl/plpython/plpython.h postgresql-14.1/src/pl/plpython/plpython.h
--- postgresql-14.1.orig/src/pl/plpython/plpython.h	2021-11-08 23:00:24.0 +0100
+++ postgresql-14.1/src/pl/plpython/plpython.h	2022-01-17 16:11:50.392671554 +0100
@@ -95,7 +95,7 @@
 #define TEXTDOMAIN PG_TEXTDOMAIN("plpython")
 
 #include 
-#include 
+//removed eval.h for python >= 3.11
 
 /* put back our *printf macros ... this must match src/include/port.h */
 #ifdef vsnprintf

Re: generic plans and "initial" pruning

2022-01-18 Thread Amit Langote

Hi Simon,

On Tue, Jan 18, 2022 at 4:44 PM Simon Riggs
 wrote:
> On Tue, 11 Jan 2022 at 16:22, Robert Haas  wrote:
> > This is just a relatively simple example and I think there are
> > probably a bunch of others. There are a lot of kinds of DDL that could
> > be performed on a partition that gets pruned away: DROP INDEX is just
> > one example.
>
> I haven't followed this in any detail, but this patch and its goal of
> reducing the O(N) drag effect on partition execution time is very
> important. Locking a long list of objects that then get pruned is very
> wasteful, as the results show.
>
> Ideally, we want an O(1) algorithm for single partition access and DDL
> is rare. So perhaps that is the starting point for a safe design -
> invent a single lock or cache that allows us to check if the partition
> hierarchy has changed in any way, and if so, replan, if not, skip
> locks.

Rearchitecting partition locking to be O(1) seems like a project of
non-trivial complexity as Robert mentioned in a related email thread
couple of years ago:

https://www.postgresql.org/message-id/CA%2BTgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ%40mail.gmail.com

Pursuing that kind of a project would perhaps have been more
worthwhile if the locking issue had affected more than just this
particular case, that is, the case of running prepared statements over
partitioned tables using generic plans.  Addressing this by
rearchitecting run-time pruning (and plancache to some degree) seemed
like it might lead to this getting fixed in a bounded timeframe.  I
admit that the concerns that Robert has raised about the patch make me
want to reconsider that position, though maybe it's too soon to
conclude.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Push down time-related SQLValue functions to foreign server

2022-01-18 Thread Alexander Pyhalov


Hi.

Tom Lane писал 2022-01-18 02:08:

Alexander Pyhalov  writes:

Perhaps in a MACRO?



Changed this check to a macro, also fixed condition in
is_foreign_param() and added test for it.
Also fixed comment in prepare_query_params().


I took a quick look at this.  I'm unconvinced that you need the
TIME_RELATED_SQLVALUE_FUNCTION macro, mainly because I think testing
that in is_foreign_param() is pointless.  The only way we'll be seeing
a SQLValueFunction in is_foreign_param() is if we decided it was
shippable, so you really don't need two duplicate tests.
(In the same vein, I would not bother with including a switch in
deparseSQLValueFunction that knows about these opcodes explicitly.
Just use the typmod field; exprTypmod() does.)


Yes, sure, is_foreign_param() is called only when is_foreign_expr() is 
true.

Simplified this part.



I also find it pretty bizarre that contain_unsafe_functions
isn't placed beside its one caller.  Maybe that indicates that
is_foreign_expr is mis-located and should be in shippable.c.

More generally, it's annoying that you had to copy-and-paste
all of contain_mutable_functions to make this.  That creates
a rather nasty maintenance hazard for future hackers, who will
need to keep these widely-separated functions in sync.  Not
sure what to do about that though.  Do we want to extend
contain_mutable_functions itself to cover this use-case?


I've moved logic to contain_mutable_functions_skip_sqlvalues(), it
uses the same subroutines as contain_mutable_functions().
Should we instead just add one more parameter to 
contain_mutable_functions()?
I'm not sure that it's a good idea given that 
contain_mutable_functions() seems to be an

external interface.



The test cases seem a bit overkill --- what is the point of the
two nigh-identical PREPARE tests, or the GROUP BY test?  If
anything is broken about GROUP BY, surely it's not specific
to this patch.


I've removed PREPARE tests, but GROUP BY test checks 
foreign_grouping_ok()/is_foreign_param() path,

so I think it's useful.



I'm not really convinced by the premise of 0002, particularly
this bit:

 static bool
-contain_mutable_functions_checker(Oid func_id, void *context)
+contain_unsafe_functions_checker(Oid func_id, void *context)
 {
-   return (func_volatile(func_id) != PROVOLATILE_IMMUTABLE);
+	/* now() is stable, but we can ship it as it's replaced by parameter 
*/
+	return !(func_volatile(func_id) == PROVOLATILE_IMMUTABLE || func_id 
== F_NOW);

 }

The point of the check_functions_in_node callback API is to verify
the mutability of functions that are embedded in various sorts of
expression nodes ... but they might not be in a plain FuncExpr node,
which is the only case you'll deparse correctly.  It might be that
now() cannot legally appear in any of the other node types that
check_functions_in_node knows about, but I'm not quite convinced
of that, and even less convinced that that'll stay true as we add
more expression node types.  Also, if we commit this, for sure
some poor soul will try to expand the logic to some other stable
function that *can* appear in those contexts, and that'll be broken.

The implementation of converting now() to CURRENT_TIMESTAMP
seems like an underdocumented kluge, too.

On the whole I'm a bit inclined to drop 0002; I'm not sure it's
worth the trouble.



OK. Let's drop it for now.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 9a67c52b57e0b50a3702598aa0b3e8af89569a9c Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 29 Jul 2021 11:45:28 +0300
Subject: [PATCH] SQLValue functions pushdown

current_timestamp, localtimestamp and similar SQLValue functions
can be computed locally and sent to remote side as parameters values.
---
 contrib/postgres_fdw/deparse.c| 83 -
 .../postgres_fdw/expected/postgres_fdw.out| 88 +++
 contrib/postgres_fdw/postgres_fdw.c   |  9 +-
 contrib/postgres_fdw/postgres_fdw.h   |  1 +
 contrib/postgres_fdw/shippable.c  |  3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 22 +
 src/backend/optimizer/util/clauses.c  | 27 +-
 src/include/optimizer/optimizer.h |  1 +
 8 files changed, 226 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..a398e1b2174 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -109,6 +109,15 @@ typedef struct deparse_expr_cxt
 		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
 #define SUBQUERY_REL_ALIAS_PREFIX	"s"
 #define SUBQUERY_COL_ALIAS_PREFIX	"c"
+#define TIME_RELATED_SQLVALUE_FUNCTION(s)	\
+		(s->op == SVFOP_CURRENT_TIMESTAMP || \
+		 s->op == SVFOP_CURRENT_TIMESTAMP_N || \
+		 s->op == SVFOP_CURRENT_TIME || \
+		 s->op == SVFOP_CURRENT_TIME_N || \
+		 s->op == SVFOP_LOCALTIMESTAMP || \
+		 s->op == SVFOP_LOCALTIMESTAMP_N || \
+		 s->op == SVFOP_LOCALTIME || \
+		 s->op == SVFOP_LOCALTIME

[PATCH] allow src/tools/msvc/clean.bat script to be called from the root of the source tree

2022-01-18 Thread Juan José Santamaría Flecha

In [1] capacity for $SUBJECT was added to most of the batch scripts, but
clean.bat was not included. I propose to do so with the attached patch.

By the way, are pgbison.bat and pgflex.bat directly called anywhere?

[1]
https://www.postgresql.org/message-id/2b7a674b-5fb0-d264-75ef-ecc7a31e5...@postgrespro.ru

Regards,

Juan José Santamaría Flecha


0001-msvc-clean-bat-path-fix.patch
Description: Binary data

Re: Extensible Rmgr for Table AMs

2022-01-18 Thread Julien Rouhaud

Hi,

On Mon, Jan 17, 2022 at 12:42:06AM -0800, Jeff Davis wrote:
> On Mon, 2021-11-08 at 15:36 -0800, Jeff Davis wrote:
> > The attached patch (against v14, so it's easier to test columnar) is
> > somewhat like a simplified version of [3] combined with refactoring
> > to
> > make decoding a part of the rmgr.
> 
> New patches attached (v3). Essentially the content as v2, but split
> into 3 patches and rebased.
> 
> Review on patch 0001 would be helpful. It makes decoding just another
> method of rmgr, which makes a lot of sense to me from a code
> organization standpoint regardless of the other patches. Is there any
> reason not to do that?

I think that it's a clean and sensible approach, so +1 for me.

There's a bit of 0003 present in 002:

diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
index 35029cf97d..612b1b3723 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -738,7 +738,7 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr 
RecPtr,
  (uint32) SizeOfXLogRecord, record->xl_tot_len);
return false;
}
-   if (record->xl_rmid > RM_MAX_ID)
+   if (record->xl_rmid > RM_MAX_ID && record->xl_rmid < RM_CUSTOM_MIN_ID)
{

> 
> The other patches then make rmgr extensible, which in turn makes
> decoding extensible and solves the logical replication problem for
> custom table AMs. The most obvious way to make rmgr extensible would be
> to expand the rmgr table, but I was concerned about making that dynamic
> (right now the structure is entirely constant and perhaps that's
> important for some optimizations?). So, at the cost of complexity I
> made a separate, dynamic rmgr table to hold the custom entries.

I'm not sure about performance overhead, but allowing extension to access the
main table directly seems a bit scary.  We definitely accepted some overhead
with the various extensible support, and this new GetRmgr() doesn't look like
it's going to cost a lot for the builtin case, especially compared to the cost
of any of the code associated with the rmgr.

Also, to answer your initial email I think it's a better way to go compared to
your previous approach, given the limitations and performance of the generic
xlog infrastructure, and hopefully index AMs should be able to rely on this
too.

A few random comments on 0003:

 #define RM_MAX_ID  (RM_NEXT_ID - 1)
+#define RM_CUSTOM_MIN_ID   128
+#define RM_CUSTOM_MAX_ID   UINT8_MAX

It would be a good idea to add a StaticAssertStmt here to make sure that
there's no overlap in the ranges.

+
+/*
+ * RmgrId to use for extensions that require an RmgrId, but are still in
+ * development and have not reserved their own unique RmgrId yet. See:
+ * https://wiki.postgresql.org/wiki/ExtensibleRmgr
+ */
+#define RM_EXPERIMENTAL_ID 128

I'm a bit dubious about the whole "register your ID in the Wiki" approach.  It
might not be a problem since there probably won't be hundreds of users, and I
don't have any better suggestion since it has to be consistent across nodes.

+   elog(LOG, "registering customer rmgr \"%s\" with ID %d",
+rmgr->rm_name, rmid);

Should it be a DEBUG message instead?  Also s/customer/custom/

+RmgrData
+GetCustomRmgr(RmgrId rmid)
+{
+   if (rmid < RM_CUSTOM_MIN_ID || rmid > RM_CUSTOM_MAX_ID)
+   elog(PANIC, "custom rmgr id %d out of range", rmid);

Should this be an assert?

+#define GetBuiltinRmgr(rmid) RmgrTable[(rmid)]
+#define GetRmgr(rmid) ((rmid < RM_CUSTOM_MIN_ID) ? \
+  GetBuiltinRmgr(rmid) : GetCustomRmgr(rmid))

rmid should be protected in the macro.

> The custom rmgr API would probably be marked "experimental" for a
> while, and I don't expect lots of people to use it given the challenges
> of a production-quality table AM. But it's still important, because
> without it a table AM has no chance to participate in logical
> replication.

How you plan to mark it experimental?

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-18 Thread Peter Eisentraut


On 09.12.21 14:31, Tom Lane wrote:

Peter Eisentraut  writes:

On 02.12.21 08:20, Peter Eisentraut wrote:

Buildfarm impact:
gaur and prariedog use Python 2.6 and would need to be upgraded.



Tom, are you planning to update the Python version on these build farm
members?  I realize these are very slow machines and this might take
some time; I'm just wondering if this had registered.


I can do that when it becomes necessary.  I've got one eye on the meson
conversion discussion, which will kill those two animals altogether;
so it seems possible that updating their Pythons now would just be
wasted effort depending on what lands first.


I saw that the Python installations on gaur and prairiedog had been 
updated, so I committed this patch.  As the buildfarm shows, various 
platforms have problems with this, in particular because they point to 
the wrong place for the include directory.  AFAICT, in most cases this 
appears to have been fixed in more recent editions of those platforms 
(e.g., Debian unstable members pass but older releases don't), so at 
least the approach was apparently not wrong in principle.  But 
obviously, this leaves us in a mess.  I will revert this patch in a bit, 
after gathering a few more hours of data.


Also, considering the failure on prairiedog, I do see now on 
 that the sysconfig 
module is "New in version 3.2".  I had interpreted the fact that it 
exists in version 2.7 that that includes all higher versions, but 
obviously there were multiple branches involved, so that was a mistaken 
assumption.

Re: Python-3.11 patch

2022-01-18 Thread Peter Eisentraut


On 17.01.22 20:58, Horvath, Miklos wrote:
This patch is important for postgresql 13.5 and 14.1 if you want to use 
them with Python-3.11.


Thanks, this was already done and will be in the next minor releases for 
all branches.

Re: generic plans and "initial" pruning

2022-01-18 Thread Simon Riggs

On Tue, 18 Jan 2022 at 08:10, Amit Langote  wrote:
>
> Hi Simon,
>
> On Tue, Jan 18, 2022 at 4:44 PM Simon Riggs
>  wrote:
> > On Tue, 11 Jan 2022 at 16:22, Robert Haas  wrote:
> > > This is just a relatively simple example and I think there are
> > > probably a bunch of others. There are a lot of kinds of DDL that could
> > > be performed on a partition that gets pruned away: DROP INDEX is just
> > > one example.
> >
> > I haven't followed this in any detail, but this patch and its goal of
> > reducing the O(N) drag effect on partition execution time is very
> > important. Locking a long list of objects that then get pruned is very
> > wasteful, as the results show.
> >
> > Ideally, we want an O(1) algorithm for single partition access and DDL
> > is rare. So perhaps that is the starting point for a safe design -
> > invent a single lock or cache that allows us to check if the partition
> > hierarchy has changed in any way, and if so, replan, if not, skip
> > locks.
>
> Rearchitecting partition locking to be O(1) seems like a project of
> non-trivial complexity as Robert mentioned in a related email thread
> couple of years ago:
>
> https://www.postgresql.org/message-id/CA%2BTgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ%40mail.gmail.com

I agree, completely redesigning locking is a major project. But that
isn't what I suggested, which was to find an O(1) algorithm to solve
the safety issue. I'm sure there is an easy way to check one lock,
maybe a new one/new kind, rather than N.

Why does the safety issue exist? Why is it important to be able to
concurrently access parts of the hierarchy with DDL? Those are not
critical points.

If we asked them, most users would trade a 10x performance gain for
some restrictions on DDL. If anyone cares, make it an option, but most
people will use it.

Maybe force all DDL, or just DDL that would cause safety issues, to
update a hierarchy version number, so queries can tell whether they
need to replan. Don't know, just looking for an O(1) solution.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Column Filtering in Logical Replication

2022-01-18 Thread Peter Eisentraut


On 15.01.22 04:45, Amit Kapila wrote:

I think another issue w.r.t column filter patch is that even while
creating publication (even for 'insert' publications) it should check
that all primary key columns must be part of published columns,
otherwise, it can fail while applying on subscriber as it will try to
insert NULL for the primary key column.


I'm not so sure about the primary key aspects, actually; keep in mind
that the replica can have a different table definition, and it might
have even a completely different primary key.  I think this part is up
to the user to set up correctly; we have enough with just trying to make
the replica identity correct.


But OTOH, the primary key is also considered default replica identity,
so I think users will expect it to work. You are right this problem
can also happen if the user defined a different primary key on a
replica but that is even a problem in HEAD (simple inserts will fail)
but I am worried about the case where both the publisher and
subscriber have the same primary key as that works in HEAD.


This would seem to be a departure from the current design of logical 
replication.  It's up to the user to arrange things so that data can be 
applied in general.  Otherwise, if the default assumption is that the 
schema is the same on both sides, then column filtering shouldn't exist 
at all, since that will necessarily break that assumption.


Maybe there could be a strict mode or something that has more checks, 
but that would be a separate feature.  The existing behavior is that you 
can publish anything you want and it's up to you to make sure the 
receiving side can store it.

Re: support for MERGE

2022-01-18 Thread Peter Eisentraut


On 13.01.22 13:43, Alvaro Herrera wrote:

Apologies, there was a merge failure and I failed to notice.  Here's the
correct patch.


I looked through this a bit.  I wonder why there is a check for 
"unreachable WHEN clause".  I don't see this in the SQL standard.


On the other hand, there is a requirement in the SQL standard that the 
correlation names exposed by the source and target tables are different. 
 This is currently caught indirectly with a message like


ERROR:  table name "t" specified more than once

because of the way everything ends up in a join tree.  But that seems 
implementation-dependent, and a more explicit check might be better.

Re: Map WAL segment files on PMEM as WAL buffers

2022-01-18 Thread Takashi Menjo

Hi Justin,

Thanks for your help. I'm making an additional patch for Cirrus CI.

I'm also trying to reproduce the "make check-world" error you
reported, on my Linux environment that has neither a real PMem nor an
emulated one, with PMEM_IS_PMEM_FORCE=1. I'll keep you updated.

Regards,
Takashi

On Mon, Jan 17, 2022 at 4:34 PM Justin Pryzby  wrote:
>
> On Thu, Jan 06, 2022 at 10:43:37PM -0600, Justin Pryzby wrote:
> > On Fri, Jan 07, 2022 at 12:50:01PM +0900, Takashi Menjo wrote:
> > > > But in this case it really doesn't work :(
> > > >
> > > > running bootstrap script ... 2022-01-05 23:17:30.244 CST [12088] FATAL: 
> > > >  file not on PMEM: path "pg_wal/00010001"
> > >
> > > Do you have a real PMEM device such as NVDIMM-N or Intel Optane PMem?
> >
> > No - the point is that we'd like to have a way to exercise this patch on the
> > cfbot.  Particularly the new code introduced by this patch, not just the
> > --without-pmem case...
> ..
> > I think you should add a patch which does what Thomas suggested: 1) add to
> > ./.cirrus.yaml installation of the libpmem package for 
> > debian/bsd/mac/windows;
> > 2) add setenv to main(), as above; 3) change configure.ac and guc.c to 
> > default
> > to --with-libpmem and wal_pmem_map=on.  This should be the last patch, for
> > cfbot only, not meant to be merged.
>
> I was able to get the cirrus CI to compile on linux and bsd with the below
> changes.  I don't know if there's an easy package installation for mac OSX.  I
> think it's okay if mac CI doesn't use --enable-pmem for now.
>
> > You can test that the package installation part works before mailing 
> > patches to
> > the list with the instructions here:
> >
> > src/tools/ci/README:
> > Enabling cirrus-ci in a github repository..
>
> I ran the CI under my own github account.
> Linux crashes in the recovery check.
> And freebsd has been stuck for 45min.
>
> I'm not sure, but maybe those are legimate consequence of using
> PMEM_IS_PMEM_FORCE (?)  If so, maybe the recovery check would need to be
> disabled for this patch to run on CI...  Or maybe my suggestion to enable it 
> by
> default for CI doesn't work for this patch.  It would need to be specially
> tested with real hardware.
>
> https://cirrus-ci.com/task/6245151591890944
>
> https://cirrus-ci.com/task/6162551485497344?logs=test_world#L3941
> #2  0x55ff43c6edad in ExceptionalCondition (conditionName=0x55ff43d18108 
> "!XLogRecPtrIsInvalid(missingContrecPtr)", errorType=0x55ff43d151c4 
> "FailedAssertion", fileName=0x55ff43d151bd "xlog.c", lineNumber=8297) at 
> assert.c:69
>
> commit 15533794e465a381eb23634d67700afa809a0210
> Author: Justin Pryzby 
> Date:   Thu Jan 6 22:53:28 2022 -0600
>
> tmp: enable pmem by default, for CI
>
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 677bdf0e65e..0cb961c8103 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -81,6 +81,7 @@ task:
>  mkdir -m 770 /tmp/cores
>  chown root:postgres /tmp/cores
>  sysctl kern.corefile='/tmp/cores/%N.%P.core'
> +pkg install -y devel/pmdk
>
># NB: Intentionally build without --with-llvm. The freebsd image size is
># already large enough to make VM startup slow, and even without llvm
> @@ -99,6 +100,7 @@ task:
>  --with-lz4 \
>  --with-pam \
>  --with-perl \
> +--with-libpmem \
>  --with-python \
>  --with-ssl=openssl \
>  --with-tcl --with-tclconfig=/usr/local/lib/tcl8.6/ \
> @@ -138,6 +140,7 @@ LINUX_CONFIGURE_FEATURES: &LINUX_CONFIGURE_FEATURES >-
>--with-lz4
>--with-pam
>--with-perl
> +  --with-libpmem
>--with-python
>--with-selinux
>--with-ssl=openssl
> @@ -188,6 +191,9 @@ task:
>  mkdir -m 770 /tmp/cores
>  chown root:postgres /tmp/cores
>  sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
> +echo 'deb http://deb.debian.org/debian bullseye universe' 
> >>/etc/apt/sources.list
> +apt-get update
> +apt-get -y install libpmem-dev
>
>configure_script: |
>  su postgres <<-EOF
> @@ -267,6 +273,7 @@ task:
>make \
>openldap \
>openssl \
> +  pmem \
>python \
>tcl-tk
>
> @@ -301,6 +308,7 @@ task:
>--with-libxslt \
>--with-lz4 \
>--with-perl \
> +  --with-libpmem \
>--with-python \
>--with-ssl=openssl \
>--with-tcl --with-tclconfig=${brewpath}/opt/tcl-tk/lib/ \
> diff --git a/src/backend/main/main.c b/src/backend/main/main.c
> index 9124060bde7..b814269675d 100644
> --- a/src/backend/main/main.c
> +++ b/src/backend/main/main.c
> @@ -69,6 +69,7 @@ main(int argc, char *argv[])
>  #endif
>
> progname = get_progname(argv[0]);
> +   setenv("PMEM_IS_PMEM_FORCE", "1", 0);
>
> /*
>  * Platform-specific startup hacks
> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index ffc55f33e86..32d650cb9b2 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -1354,7 +

Re: Showing applied extended statistics in explain

2022-01-18 Thread Tatsuro Yamada

Hi Tomas!

>> What exactly use case do you have in mind?
>Well, my goal was to help users investigating the plan/estimates,
>because once you create multiple "candidate" statistics objects it may
>get quite tricky to determine which of them were applied, in what order,
>etc. It's not all that straightforward, given the various heuristics we
>use to pick the "best" statistics, apply dependencies last, etc. And I
>don't quite want to teach the users those rules, because I consider them
>mostly implementation details that wee may want to tweak in the future.

You are right. This feature will be very useful for plan tuning with
extended statistics. Therefore, I would like to help develop it. :-D


>5) It includes just statistics name + clauses, but maybe we should
>include additional info (e.g estimate for that combination of clauses).

I thought the above sentence was about what level to aim for in the initial
version. Ideally, we would like to include additional information. However,
it is clear that the more things we have, the longer it will take to
develop.
So, I think it is better to commit the initial version at a minimal level
to
provide it to users quickly.

As long as an Extended stats name is displayed in EXPLAIN result, it is
possible to figure out which extended statistics are used. That information
alone is valuable to the user.


> 4) The info is collected always, but I guess we should do that only when
>in explain mode. Not sure how expensive it is.

In the case of pg_plan_advsr that I created, I used ProcessUtility_hook to
detect EXPLAIN command. It searches all queries to find T_ExplainStmt, and
set a flag. I guess we can't use this method in Postgres core, right?

If we have to collect the extra info for all query executions, we need to
reduce overhead. I mean collecting the minimum necessary.
To do that, I think it would be better to display less Extended stats info
in EXPLAIN results. For example, displaying only extended stats names is
fine,
I guess. (I haven't understood your patch yet, so If I say wrong, sorry)


>6) The clauses in the grouping query are transformed to AND list, which
>is wrong. This is easy to fix, I was lazy to do that in a PoC patch.

6) is related 5).
If we agree to remove showing quals of extended stats in EXPLAIN result,
We can solve this problem by removing the code. Is it okay?


>2) The deparsing is modeled (i.e. copied) from how we deal with index
>quals, but it's having issues with nested OR clauses, because there are
>nested RestrictInfo nodes and the deparsing does not expect that.
>
>3) It does not work for functional dependencies, because we effectively
>"merge" all functional dependencies and apply the entries. Not sure how
>to display this, but I think it should show the individual dependencies
>actually applied.
>
>7) It does not show statistics for individual expressions. I suppose
>examine_variable could add it to the rel somehow, and maybe we could do
>that with index expressions too?

I'm not sure about 2) 3) and 7) above, so I'd like to see some concrete
examples
of queries. I would like to include it in the test pattern for regression
testing.


To be fixed:

* StatisticExtInfo should have a _copy etc. node functionality now,
 right? I've hit "unrecognized node type" with a prepared statement
 because it's missing.

* Is there any particular reason behind choosing only some scan nodes to
 display extended stats? E.g. BitmapHeapScan is missing.

* (New) In the case of Group by, we should put Extended stats info under the
 Hash/Group Aggregate node (not under Scan node).

* (New) We have to create a regression test including the above test
patterns.


Attached patch:

It is a rebased version on the head of the master because there were many
Hunks
when I applied the previous patch on master.

I'll create regression tests firstly.

Regards,
Tatsuro Yamada


0001-show-stats-in-explain-rebased.patch
Description: Binary data

Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Alvaro Herrera

On 2022-Jan-18, Amit Langote wrote:

> Would you like me to update the patch with the above renumbering or
> are you working on a new version yourself?

I have a few very minor changes apart from that.  Let me see if I can
get this pushed soon; if I'm unable to (there are parts I don't fully
grok yet), I'll post what I have.

Thank you!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: row filtering for logical replication

2022-01-18 Thread Amit Kapila

On Mon, Jan 17, 2022 at 8:58 PM houzj.f...@fujitsu.com
 wrote:
>
> Attach the V66 patch set which addressed Amit, Peter and Greg's comments.
>

Thanks, some more comments, and suggestions:

1.
/*
+ * If no record in publication, check if the table is the partition
+ * of a published partitioned table. If so, the table has no row
+ * filter.
+ */
+ else if (!pub->pubviaroot)
+ {
+ List*schemarelids;
+ List*relids;
+
+ schemarelids = GetAllSchemaPublicationRelations(pub->oid,
+ PUBLICATION_PART_LEAF);
+ relids = GetPublicationRelations(pub->oid,
+ PUBLICATION_PART_LEAF);
+
+ if (list_member_oid(schemarelids, entry->publish_as_relid) ||
+ list_member_oid(relids, entry->publish_as_relid))
+ pub_no_filter = true;
+
+ list_free(schemarelids);
+ list_free(relids);
+
+ if (!pub_no_filter)
+ continue;
+ }

As far as I understand this handling is required only for partition
tables but it seems to be invoked for non-partition tables as well.
Please move the comment inside else if block and expand a bit more to
say why it is necessary to not directly set pub_no_filter here. Note,
that I think this can be improved (avoid cache lookups) if we maintain
a list of pubids in relsyncentry but I am not sure that is required
because this is a rare case and needs to be done only one time.

2.
 static HTAB *OpClassCache = NULL;

-
 /* non-export function prototypes */

Spurious line removal. I have added back in the attached top-up patch.

Apart from the above, I have made some modifications to other comments.

-- 
With Regards,
Amit Kapila.

v65_changes_amit_1.patch
Description: Binary data

Re: row filtering for logical replication

2022-01-18 Thread Amit Kapila

On Tue, Jan 18, 2022 at 6:05 PM Amit Kapila  wrote:
>
> On Mon, Jan 17, 2022 at 8:58 PM houzj.f...@fujitsu.com
>  wrote:
>
> Spurious line removal. I have added back in the attached top-up patch.
>
> Apart from the above, I have made some modifications to other comments.
>

Sorry, attached the wrong patch earlier.

-- 
With Regards,
Amit Kapila.


v66_changes_amit_1.patch
Description: Binary data

Re: ICU for global collation

2022-01-18 Thread Peter Eisentraut


On 18.01.22 05:02, Julien Rouhaud wrote:

If this is applied, then in my estimation all these hunks will completely
disappear from the global ICU patch.  So this would be a significant win.

Agreed, the patch will be quite smaller and also easier to review.  Are you
planning to apply it on its own or add it to the default ICU patchset?


My plan is to apply this patch in the next few days.

Re: Adding CI to our tree

2022-01-18 Thread Peter Eisentraut


On 17.01.22 22:13, Andrew Dunstan wrote:

Well, the logic we use for TAP tests is we run them for the following if
they have a t subdirectory, subject to configuration settings like
PG_TEST_EXTRA:

  src/test/bin/*
  contrib/*
  src/test/modules/*
  src/test/*

As long as any new TAP test gets place in such a location nothing new is
required in the buildfarm client.


But if I wanted to add TAP tests to libpq, then I'd still be stuck.  Why 
not change the above list to "anywhere"?

Re: missing indexes in indexlist with partitioned tables

2022-01-18 Thread Alvaro Herrera

On 2022-Jan-18, Julien Rouhaud wrote:

>  SET enable_partitionwise_join = on;
>  EXPLAIN (COSTS OFF)
>  SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 
> 10;
> -  QUERY PLAN
> 
> +   QUERY PLAN
> +-
>   Limit
> -   ->  Merge Append
> - Sort Key: x.id
> - ->  Merge Left Join
> -   Merge Cond: (x_1.id = y_1.id)
> -   ->  Index Only Scan using fract_t0_pkey on fract_t0 x_1
> -   ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
> - ->  Merge Left Join
> -   Merge Cond: (x_2.id = y_2.id)
> -   ->  Index Only Scan using fract_t1_pkey on fract_t1 x_2
> -   ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2
> -(11 rows)
> +   ->  Append
> + ->  Index Only Scan using fract_t0_pkey on fract_t0 x_1
> + ->  Index Only Scan using fract_t1_pkey on fract_t1 x_2
> +(4 rows)

Hmm, these plan changes look valid to me.  A left self-join using the
primary key column?  That looks optimizable all right.

I suspect that the author of partition-wise joins would want to change
these queries so that whatever was being tested by these self-joins is
tested by some other means (maybe just create an identical partitioned
table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM
fract_t).  But at the same time, the author of this patch should a) make
sure that the submitted patch updates these test results so that the
test pass, and also b) add some test cases to verify that his desired
behavior is tested somewhere, not just in a test case that's
incidentally broken by his patch.

What I still don't know is whether this patch is actually desirable or
not.  If the only cases it affects is self-joins, is there much actual
value?

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Alvaro Herrera

On 2022-Jan-17, Robert Haas wrote:

> On Mon, Jan 17, 2022 at 4:34 PM Alvaro Herrera  
> wrote:

> > Maybe it would work to have a single LSN in shared memory, as an atomic
> > variable, which uses monotonic advance[1] to be updated.  Whether this is
> > updated or not would depend on a new GUC, maybe track_latest_commit_lsn.
> > Causing performance pain during transaction commit is not great, but at
> > least this way it shouldn't be *too* a large hit.
> 
> I don't know if it would or not, but it's such a hot path that I find
> the idea a bit worrisome. Atomics aren't free - especially inside of a
> loop.

I think the aspect to worry about the most is what happens when the
feature is disabled.  The cost for that should be just one comparison,
which I think can be optimized by the compiler fairly well.  That should
be cheap enough.  People who enable it would have to pay the cost of the
atomics, which is of course much higher.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-18 Thread Robert Haas

On Tue, Jan 18, 2022 at 12:14 AM Peter Geoghegan  wrote:
> I quite clearly said that you'll only get an anti-wraparound VACUUM
> with the patch applied when the only factor that *ever* causes *any*
> autovacuum worker to VACUUM the table (assuming the workload is
> stable) is the anti-wraparound/autovacuum_freeze_max_age cutoff. With
> a table like this, even increasing autovacuum_freeze_max_age to its
> absolute maximum of 2 billion would not make it any more likely that
> we'd get a non-aggressive VACUUM -- it would merely make the
> anti-wraparound VACUUMs less frequent. No big change should be
> expected with a table like that.

Sure, I don't disagree with any of that. I don't see how I could. But
I don't see how it detracts from the points I was trying to make
either.

> Also, since the patch is not magic, and doesn't even change the basic
> invariants for relfrozenxid, it's still true that any scenario in
> which it's fundamentally impossible for VACUUM to keep up will also
> have anti-wraparound VACUUMs. But that's the least of the user's
> trouble -- in the long run we're going to have the system refuse to
> allocate new XIDs with such a workload.

Also true. But again, it's just about making sure that the patch
doesn't make other decisions that make things worse for people in that
situation. That's what I was expressing uncertainty about.

> The claim that I have made is 100% testable. Even if it was flat out
> incorrect, not getting anti-wraparound VACUUMs per se is not the
> important part. The important part is that the work is managed
> intelligently, and the burden is spread out over time. I am
> particularly concerned about the "freezing cliff" we get when many
> pages are all-visible but not also all-frozen. Consistently avoiding
> an anti-wraparound VACUUM (except with very particular workload
> characteristics) is really just a side effect -- it's something that
> makes the overall benefit relatively obvious, and relatively easy to
> measure. I thought that you'd appreciate that.

I do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Robert Haas

On Mon, Jan 17, 2022 at 8:39 PM James Coleman  wrote:
> I wondered about that, but commit_ts already does more than commit
> timestamps by recording the xid of the last commit.

Well, if you're maintaining an SLRU, you do kind of need to know where
the leading and lagging ends are.

> For that matter, keeping a cache of last commit metadata in shared
> memory is arguably not obviously implied by "track_commit_timestamps",
> which leads to the below...

I suppose that's true in the strictest sense, but tracking information
does seem to imply having a way to look it up.

> I'm curious, though: I realize it's in the hot path, and I realize
> that there's an accretive cost to even small features, but given we're
> already paying the lock cost and updating memory in what is presumably
> the same cache line, would you expect this cost to be clearly
> measurable?

If you'd asked me ten years ago, I would have said "no, can't matter,"
but Andres has subsequently demonstrated that a lot of things that I
thought were well-optimized were actually able to be optimized a lot
better than I thought possible, and some of them were in this area.
Still, I think it's unlikely that your patch would have a measurable
effect for the reasons that you state. Wouldn't hurt to test, though.
As far as performance goes, I'm more concerned about Alvaro's patch.
My concern with this one is more around whether it's too much of a
kludge.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

2022-01-18 Thread Jeevan Ladhe

Hi,

Similar to LZ4 server-side compression, I have also tried to add a ZSTD
server-side compression in the attached patch. I have done some initial
testing and things seem to be working.

Example run:
pg_basebackup -t server:/tmp/data_zstd -Xnone --server-compression=zstd

The patch surely needs some grooming, but I am expecting some initial
review, specially in the area where we are trying to close the zstd stream
in bbsink_zstd_end_archive(). We need to tell the zstd library to end the
compression by calling ZSTD_compressStream2() thereby sending a
ZSTD_e_end flag. But, this also needs some input string, which per
example[1] line # 686, I have taken as an empty ZSTD_inBuffer.

Thanks, Tushar for testing the LZ4 patch. I have added the LZ4 option in
the pg_basebackup help now.

Note: Before applying these patches please apply Robert's v10 version
of patches 0002, 0003, and 0004.

[1]
https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/tools/zbi/zbi.cc

Regards,
Jeevan Ladhe

On Wed, Jan 5, 2022 at 10:24 PM tushar 
wrote:

>
>
> On Tue, Dec 28, 2021 at 1:12 PM Jeevan Ladhe <
> jeevan.la...@enterprisedb.com> wrote:
>
>> Hi Tushar,
>>
>> You need to apply Robert's v10 version patches 0002, 0003 and 0004,
>> before applying the lz4 patch(v8 version).
>> Please let me know if you still face any issues.
>>
>
> Thanks, Jeevan.
> I tested —server-compression option using different other options of
> pg_basebackup, also checked -t/—server-compression from pg_basebackup of
> v15 will
> throw an error if the server version is v14 or below. Things are looking
> good to me.
> Two open  issues -
> 1)lz4 value is missing for --server-compression in pg_basebackup --help
> 2)Error messages need to improve if using -t server with -z/-Z
>
> regards,
>

v9-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data

v9-0002-Add-a-ZSTD-compression-method-for-server-side-compre.patch
Description: Binary data

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread James Coleman

On Tue, Jan 18, 2022 at 9:25 AM Robert Haas  wrote:
>
> On Mon, Jan 17, 2022 at 8:39 PM James Coleman  wrote:
> > I wondered about that, but commit_ts already does more than commit
> > timestamps by recording the xid of the last commit.
>
> Well, if you're maintaining an SLRU, you do kind of need to know where
> the leading and lagging ends are.

As far as I can tell the data in commitTsShared is used purely as an
optimization for the path looking up the timestamp for an arbitrary
xid when that xid happens to be the most recent one so that we don't
have to look up in the SLRU for that specific case. Maybe I'm missing
something else you're seeing?

> > For that matter, keeping a cache of last commit metadata in shared
> > memory is arguably not obviously implied by "track_commit_timestamps",
> > which leads to the below...
>
> I suppose that's true in the strictest sense, but tracking information
> does seem to imply having a way to look it up.

Looking up for an arbitrary commit, sure, (that's how I understand the
commit timestamps feature anyway) but it seems to me that the "most
recent' is distinct. Reading the code it seems the only usage (besides
the boolean activation status also stored there) is in
TransactionIdGetCommitTsData, and the only consumers of that in core
appear to be the SQL callable functions to get the latest commit info.
It is in commit_ts.h though, so I'm guessing someone is using this
externally (and maybe that's why the feature has the shape it does).

> > I'm curious, though: I realize it's in the hot path, and I realize
> > that there's an accretive cost to even small features, but given we're
> > already paying the lock cost and updating memory in what is presumably
> > the same cache line, would you expect this cost to be clearly
> > measurable?
>
> If you'd asked me ten years ago, I would have said "no, can't matter,"
> but Andres has subsequently demonstrated that a lot of things that I
> thought were well-optimized were actually able to be optimized a lot
> better than I thought possible, and some of them were in this area.
> Still, I think it's unlikely that your patch would have a measurable
> effect for the reasons that you state. Wouldn't hurt to test, though.

If we get past your other main concern I'd be happy to spin something
up to prove that out.

> As far as performance goes, I'm more concerned about Alvaro's patch.
> My concern with this one is more around whether it's too much of a
> kludge.

As far as the kludginess factor: do you think additional GUCs would
help clarify that? And/or are the earlier comments on the right path?

Thanks,
James Coleman

Re: generic plans and "initial" pruning

2022-01-18 Thread Robert Haas

On Tue, Jan 18, 2022 at 3:10 AM Amit Langote  wrote:
> Pursuing that kind of a project would perhaps have been more
> worthwhile if the locking issue had affected more than just this
> particular case, that is, the case of running prepared statements over
> partitioned tables using generic plans.  Addressing this by
> rearchitecting run-time pruning (and plancache to some degree) seemed
> like it might lead to this getting fixed in a bounded timeframe.  I
> admit that the concerns that Robert has raised about the patch make me
> want to reconsider that position, though maybe it's too soon to
> conclude.

I wasn't trying to say that your approach was dead in the water. It
does create a situation that can't happen today, and such things are
scary and need careful thought. But redesigning the locking mechanism
would need careful thought, too ... maybe even more of it than sorting
this out.

I do also agree with Simon that this is an important problem to which
we need to find some solution.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

2022-01-18 Thread Teodor Sigaev


Hi!

Maybe doing that kind of compression in TOAST is somehow simpler, but I 
don't see it.

Seems, in ideal world, compression should be inside toaster.



2 Current toast storage stores chunks in heap accesses method and to 
provide fast access by toast id it makes an index. Ideas:

   - store chunks directly in btree tree, pgsql's btree already has an
 INCLUDE columns, so, chunks and visibility data will be stored only
 in leaf pages. Obviously it reduces number of disk's access for
 "untoasting".
   - use another access method for chunk storage



Maybe, but that probably requires more thought - e.g. btree requires the 
values to be less than 1/3 page, so I wonder how would that play with 
toasting of values.
That's ok, because chunk size is 2000 bytes right now and its could be 
saved.




Seems you'd need a mapping table, to allow M:N mapping between types and 
toasters, linking it to all "compatible" types. It's not clear to me how 
would this work with custom data types, domains etc.
If toaster will look into internal structure then it should know type's 
binary format. So, new custom types have a little chance to work with 
old custom toaster. Default toaster works with any types.


Also, what happens to existing values when you change the toaster? What 
if the toasters don't use the same access method to store the chunks 
(heap vs. btree)? And so on.


vatatt_custom contains an oid of toaster and toaster is not allowed to 
delete (at least, in suggested patches). So, if column's toaster has 
been changed then old values will be detoasted  by toaster pointed in 
varatt_custom structure, not in column definition. This is very similar 
to storage attribute works: we we alter storage attribute only new 
values will be stored with pointed storage type.





More thought:
Now postgres has two options for column: storage and compression and 
now we add toaster. For me it seems too redundantly. Seems, storage 
should be binary value: inplace (plain as now) and toastable. All 
other variation such as toast limit, compression enabling, compression 
kind should be an per-column option for toaster (that's why we suggest 
valid toaster oid for any column with varlena/toastable datatype). It 
looks like a good abstraction but we will have a problem with backward 
compatibility and I'm afraid I can't implement it very fast.




So you suggest we move all of this to toaster? I'd say -1 to that, 
because it makes it much harder to e.g. add custom compression method, etc.
Hmm, I suggested to leave only toaster at upper level. Compression kind 
could be chosen in toaster's options (not implemented yet) or even make 
an API interface to compression to make it configurable. Right now, 
module developer could not implement a module with new compression 
method and it is a disadvantage.

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Robert Haas

On Tue, Jan 18, 2022 at 9:47 AM James Coleman  wrote:
> > Well, if you're maintaining an SLRU, you do kind of need to know where
> > the leading and lagging ends are.
>
> As far as I can tell the data in commitTsShared is used purely as an
> optimization for the path looking up the timestamp for an arbitrary
> xid when that xid happens to be the most recent one so that we don't
> have to look up in the SLRU for that specific case. Maybe I'm missing
> something else you're seeing?

I wasn't looking at the code, but that use also seems closer to the
purpose of committs than your proposal.

> > As far as performance goes, I'm more concerned about Alvaro's patch.
> > My concern with this one is more around whether it's too much of a
> > kludge.
>
> As far as the kludginess factor: do you think additional GUCs would
> help clarify that? And/or are the earlier comments on the right path?

To be honest, I'm sort of keen to hear what other people think. I'm
shooting from the hip a little bit here...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Refactoring of compression options in pg_basebackup

2022-01-18 Thread Robert Haas

On Mon, Jan 17, 2022 at 8:36 PM Michael Paquier  wrote:
> On Mon, Jan 17, 2022 at 12:48:12PM -0500, Robert Haas wrote:
> > Alvaro's proposal is fine with me. I don't see any value in replacing
> > --compress with --compression. It's longer but not superior in any way
> > that I can see. Having both seems worst of all -- that's just
> > confusing.
>
> Okay, that looks like a consensus, then.  Robert, would it be better
> to gather all that on the thread that deals with the server-side
> compression?  Doing that here would be fine by me, with the option to
> only specify the client.  Now it would be a bit weird to do things
> with only the client part and not the server part :)

I think it could make sense for you implement
--compress=METHOD[:LEVEL], keeping -z and -Z N as synonyms for
--compress=gzip and --compress=gzip:N, and with --compress=N being
interpreted as --compress=gzip:N. Then I'll generalize that to
--compress=[{client|server}-]METHOD[:LEVEL] on the other thread. I
think we should leave it where, if you don't say either client or
server, you get client, because that's the historical behavior.

If that doesn't work for you, please let me know what you would prefer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add reloption for views to enable RLS

2022-01-18 Thread Christoph Heiss


Hi Laurenz,

thanks for the review!
I've attached a v2 where I addressed the things you mentioned.

On 1/11/22 19:59, Laurenz Albe wrote:

[..]

You made that an enum with only a single value.
What other values could you imagine in the future?

I think that this should be a boolean reloption, for example "security_definer".
If unset or set to "off", you would get the current behavior.


A boolean option would have been indeed the better choice, I agree.
I haven't though of any specific other values for this enum, it was 
rather a decision following a off-list discussion.


I've changed the option to be boolean and renamed it to 
"security_invoker". This puts it in line with how other systems (e.g. 
MySQL) name their equivalent feature, so I think this should be an 
appropriate choice.





Finally, patch 0003 updates the documentation for this new reloption.


[..]

Please avoid long lines like that.  


Fixed.


Also, I don't think that the documentation on
RLS policies is the correct place for this.  It should be on a page dedicated 
to views
or permissions.

The CREATE VIEW page already has a paragraph about this, starting with
"Access to tables referenced in the view is determined by permissions of the view 
owner."
This looks like the best place to me (and it would need to be adapted anyway).
It makes sense to put it there, thanks for the pointer! I wasn't really 
that sure where to put the documentation to start with, and this seems 
like a more appropriate place.


Please review further.

Thanks,
Christoph HeissFrom 25267e6b8a2ffd81f14acbee95ef08d9edf3d31c Mon Sep 17 00:00:00 2001
From: Christoph Heiss 
Date: Tue, 18 Jan 2022 15:42:58 +0100
Subject: [PATCH 1/3] [PATCH v2 1/3] Add new boolean reloption security_invoker
 to views

When this reloption is set to true, all references to the underlying tables will
be checked against the invoking user rather than the view owner, as is currently
implemented.
Row level security must be enabled on the tables for this to take effect.

Signed-off-by: Christoph Heiss 
---
 src/backend/access/common/reloptions.c| 11 
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/nodes/outfuncs.c  |  1 +
 src/backend/nodes/readfuncs.c |  1 +
 src/backend/optimizer/plan/subselect.c|  1 +
 src/backend/optimizer/prep/prepjointree.c |  1 +
 src/backend/rewrite/rewriteHandler.c  |  1 +
 src/backend/utils/cache/relcache.c| 63 +--
 src/include/nodes/parsenodes.h|  1 +
 src/include/utils/rel.h   | 11 
 11 files changed, 65 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index b5602f5323..3c84982fda 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -140,6 +140,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		false
 	},
+	{
+		{
+			"security_invoker",
+			"View subquery in invoked within the current security context.",
+			RELOPT_KIND_VIEW,
+			AccessExclusiveLock
+		},
+		false
+	},
 	{
 		{
 			"vacuum_truncate",
@@ -1996,6 +2005,8 @@ view_reloptions(Datum reloptions, bool validate)
 	static const relopt_parse_elt tab[] = {
 		{"security_barrier", RELOPT_TYPE_BOOL,
 		offsetof(ViewOptions, security_barrier)},
+		{"security_invoker", RELOPT_TYPE_BOOL,
+		offsetof(ViewOptions, security_invoker)},
 		{"check_option", RELOPT_TYPE_ENUM,
 		offsetof(ViewOptions, check_option)}
 	};
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df0b747883..6efaa07523 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2464,6 +2464,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_NODE_FIELD(tablesample);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
+	COPY_SCALAR_FIELD(security_invoker);
 	COPY_SCALAR_FIELD(jointype);
 	COPY_SCALAR_FIELD(joinmergedcols);
 	COPY_NODE_FIELD(joinaliasvars);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index cb7ddd463c..7f0401fa84 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2766,6 +2766,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
 	COMPARE_NODE_FIELD(tablesample);
 	COMPARE_NODE_FIELD(subquery);
 	COMPARE_SCALAR_FIELD(security_barrier);
+	COMPARE_SCALAR_FIELD(security_invoker);
 	COMPARE_SCALAR_FIELD(jointype);
 	COMPARE_SCALAR_FIELD(joinmergedcols);
 	COMPARE_NODE_FIELD(joinaliasvars);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 91a89b6d51..7dee550856 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3260,6 +3260,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 		case RTE_SUBQUERY:
 			WRITE_NODE_FIELD(subquery);
 			WRITE_BOOL_FIELD(security_barrier);
+			WRITE_BOOL_FIELD(security_invoker);
 			break;
 		case RTE

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-18 Thread Tom Lane

Peter Eisentraut  writes:
> Also, considering the failure on prairiedog, I do see now on 
>  that the sysconfig 
> module is "New in version 3.2".  I had interpreted the fact that it 
> exists in version 2.7 that that includes all higher versions, but 
> obviously there were multiple branches involved, so that was a mistaken 
> assumption.

Hm.  I installed 3.1 because we claim support for that.  I don't mind
updating to 3.2 (as long as we adjust the docs to match), but it seems
kinda moot unless you figure out a solution for the include-path
issue.  I see that platforms as recent as Debian 10 are failing,
so I don't think we can dismiss that as not needing fixing.

regards, tom lane

Re: 32TB relation size make mdnblocks overflow

2022-01-18 Thread Tom Lane

Julien Rouhaud  writes:
> On Tue, Jan 18, 2022 at 02:21:14PM +0800, 陈佳昕(步真) wrote:
>> We know that PostgreSQL doesn't support a single relation size over 32TB,
>> limited by the MaxBlockNumber. But if we just 'insert into' one relation over
>> 32TB, it will get an error message 'unexpected data beyond EOF in block 0 of
>> relation' in ReadBuffer_common.

> I didn't try it but this is supposed to be caught by mdextend():
> ...
> Didn't you hit this?

Probably not, if the OP was testing something predating 8481f9989,
ie anything older than the latest point releases.

(This report does seem to validate my comment in the commit log
that "I think it might confuse ReadBuffer's logic for data-past-EOF
later on".  I'd not bothered to build a non-assert build to check
that, but this looks about like what I guessed would happen.)

regards, tom lane

Re: In-placre persistance change of a relation

2022-01-18 Thread Tom Lane

Julien Rouhaud  writes:
> The cfbot is failing on all OS with this version of the patch.  Apparently
> v16-0002 introduces some usage of "testtablespace" client-side variable that's
> never defined, e.g.

That test infrastructure got rearranged very recently, see d6d317dbf.

regards, tom lane

Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-18 Thread Fujii Masao





On 2021/12/15 15:40, kuroda.hay...@fujitsu.com wrote:

Yeah, remote-checking timeout will be enable even ifa local transaction is 
opened.
In my understanding, postgres cannot distinguish whether opening transactions
are using only local object or not.
My first idea was that turning on the timeout when GetFdwRoutineXXX functions
were called,


What about starting the timeout in GetConnection(), instead?


I attached which implements that.


v05_0004_add_tests.patch failed to be applied to the master. Could you rebase 
it?


-This option is currently available only on systems that support the
-non-standard POLLRDHUP extension to the
-poll system call, including Linux.
+This option relies on kernel events exposed by Linux, macOS, illumos
+and the BSD family of operating systems, and is not currently available
+on other systems.

The above change is included in both 
v5-0003-Use-WL_SOCKET_CLOSED-for-client_connection_check_.patch and 
v05_0002_add_doc.patch. If it should be in the former patch, it should be 
removed from your patch v05_0002_add_doc.patch.


There seems no user of UnregisterCheckingRemoteServersCallback(). So how about 
removing it?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: 32TB relation size make mdnblocks overflow

2022-01-18 Thread 陈佳昕(步真)

I think we must meet some corner cases about our storage. The relation has 
32TB blocks,  so 'mdnblocks' gets the unexpected value, we will check it again.
Thanks a lot.

Re: New developer papercut - Makefile references INSTALL

2022-01-18 Thread Daniel Gustafsson

> On 17 Jan 2022, at 13:26, Daniel Gustafsson  wrote:
> 
>> On 17 Jan 2022, at 11:25, Magnus Hagander  wrote:
> 
>> That said, I'm not sure we're actually gaining anything by *not*
>> referring to the website as well. TBH, I bet the majority of users
>> will actually prefer to read them there. So I'd suggest always
>> including the reference to the website as well, per the suggestion
>> from Tim.
> 
> Fair point, I'll go ahead and do that in a bit unless anyone objects.

I plan on applying the attached which address the feedback given.

--
Daniel Gustafsson   https://vmware.com/



install_v2.diff
Description: Binary data

Re: New developer papercut - Makefile references INSTALL

2022-01-18 Thread Peter Eisentraut




On 18.01.22 16:51, Daniel Gustafsson wrote:

On 17 Jan 2022, at 13:26, Daniel Gustafsson  wrote:


On 17 Jan 2022, at 11:25, Magnus Hagander  wrote:



That said, I'm not sure we're actually gaining anything by *not*
referring to the website as well. TBH, I bet the majority of users
will actually prefer to read them there. So I'd suggest always
including the reference to the website as well, per the suggestion
from Tim.


Fair point, I'll go ahead and do that in a bit unless anyone objects.


I plan on applying the attached which address the feedback given.


The indentation of the two INSTRUCTIONS= lines uses a different mix of 
tabs and spaces, so it looks a bit weird depending on how you view it.


It's also a bit strange that the single quotes are part of the value of 
$INSTRUCTIONS rather than part of the fixed text.


The URL links to the "devel" version of the installation instructions, 
which will not remain appropriate after release.  I don't know how to 
fix that without creating an additional maintenance point.  Since 
README.git already contains that link, I would leave off the web site 
business and just make the change of the dynamically chosen file name.

Re: Pluggable toaster

2022-01-18 Thread Tomas Vondra

On 1/18/22 15:56, Teodor Sigaev wrote:
> Hi!
> 
>> Maybe doing that kind of compression in TOAST is somehow simpler, but
>> I don't see it.
> Seems, in ideal world, compression should be inside toaster.
> 

I'm not convinced that's universally true. Yes, I'm sure certain TOAST
implementations would benefit from tighter control over compression, but
does that imply compression and toast are redundant? I doubt that,
because we compress non-toasted types too, for example. And layering has
a value too, as makes it easier to replace the pieces.

>>
>>> 2 Current toast storage stores chunks in heap accesses method and to
>>> provide fast access by toast id it makes an index. Ideas:
>>>    - store chunks directly in btree tree, pgsql's btree already has an
>>>  INCLUDE columns, so, chunks and visibility data will be stored only
>>>  in leaf pages. Obviously it reduces number of disk's access for
>>>  "untoasting".
>>>    - use another access method for chunk storage
>>>
>>
>> Maybe, but that probably requires more thought - e.g. btree requires
>> the values to be less than 1/3 page, so I wonder how would that play
>> with toasting of values.
> That's ok, because chunk size is 2000 bytes right now and its could be
> saved.
>>

Perhaps. My main point is that we should not be making too many radical
changes at once - it makes it much harder to actually get anything done.
So yeah, doing TOAST through IOT might be interesting, but I'd leave
that for a separate patch.

> 
>> Seems you'd need a mapping table, to allow M:N mapping between types
>> and toasters, linking it to all "compatible" types. It's not clear to
>> me how would this work with custom data types, domains etc.
> If toaster will look into internal structure then it should know type's
> binary format. So, new custom types have a little chance to work with
> old custom toaster. Default toaster works with any types.

The question is what happens when you combine data type with a toaster
that is not designed for that type. I mean, imagine you have a JSONB
toaster and you set it for a bytea column. Naive implementation will
just crash, because it'll try to process bytea as if it was JSONB.

It seems better to prevent such incompatible combinations and restrict
each toaster to just compatible data types, and the mapping table
(linking toaster and data types) seems a way to do that.

However, it seems toasters are either generic (agnostic to data types,
treating everything as bytea) or specialized. I doubt any specialized
toaster can reasonably support multiple data types, so maybe each
toaster can have just one "compatible type" OID. If it's invalid, it'd
be "generic" and otherwise it's useful for that type and types derived
from it (e.g. domains).

So you'd have the toaster OID in two places:

pg_type.toaster_oid  - default toaster for the type
pg_attribute.toaster_oid - current toaster for this column

and then you'd have

pg_toaster.typid - type this toaster handles (or InvalidOid for generic)

>>
>> Also, what happens to existing values when you change the toaster?
>> What if the toasters don't use the same access method to store the
>> chunks (heap vs. btree)? And so on.
> 
> vatatt_custom contains an oid of toaster and toaster is not allowed to
> delete (at least, in suggested patches). So, if column's toaster has
> been changed then old values will be detoasted  by toaster pointed in
> varatt_custom structure, not in column definition. This is very similar
> to storage attribute works: we we alter storage attribute only new
> values will be stored with pointed storage type.
> 

IIRC we do this for compression methods, right?

>>
>>> More thought:
>>> Now postgres has two options for column: storage and compression and
>>> now we add toaster. For me it seems too redundantly. Seems, storage
>>> should be binary value: inplace (plain as now) and toastable. All
>>> other variation such as toast limit, compression enabling,
>>> compression kind should be an per-column option for toaster (that's
>>> why we suggest valid toaster oid for any column with
>>> varlena/toastable datatype). It looks like a good abstraction but we
>>> will have a problem with backward compatibility and I'm afraid I
>>> can't implement it very fast.
>>>
>>
>> So you suggest we move all of this to toaster? I'd say -1 to that,
>> because it makes it much harder to e.g. add custom compression method,
>> etc.
> Hmm, I suggested to leave only toaster at upper level. Compression kind
> could be chosen in toaster's options (not implemented yet) or even make
> an API interface to compression to make it configurable. Right now,
> module developer could not implement a module with new compression
> method and it is a disadvantage.

If you have to implement custom toaster to implement custom compression
method, doesn't that make things more complex? You'd have to solve all
the issues for custom compression methods and also all issues for custom
toaster. Also, what if you want

Re: New developer papercut - Makefile references INSTALL

2022-01-18 Thread Tom Lane

Daniel Gustafsson  writes:
> I plan on applying the attached which address the feedback given.

+1

regards, tom lane

Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Tom Lane

Julien Rouhaud  writes:
> @@ -133,7 +133,7 @@
>  SELECT pg_relation_size('reloptions_test') = 0;
>   ?column?
>  --
> - t
> + f
>  (1 row)

Some machines have been showing that on HEAD too, so I doubt
it's the fault of this patch.  That reloptions test isn't stable
yet seemingly.

regards, tom lane

Re: Adding CI to our tree

2022-01-18 Thread Andrew Dunstan



On 1/18/22 08:06, Peter Eisentraut wrote:
> On 17.01.22 22:13, Andrew Dunstan wrote:
>> Well, the logic we use for TAP tests is we run them for the following if
>> they have a t subdirectory, subject to configuration settings like
>> PG_TEST_EXTRA:
>>
>>   src/test/bin/*
>>   contrib/*
>>   src/test/modules/*
>>   src/test/*
>>
>> As long as any new TAP test gets place in such a location nothing new is
>> required in the buildfarm client.
>
> But if I wanted to add TAP tests to libpq, then I'd still be stuck. 
> Why not change the above list to "anywhere"?



Sure, very doable, although we will still need some special logic for
src/test - there are security implication from running the ssl, ldap and
kerberos tests by default. See its Makefile.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Support for NSS as a libpq TLS backend

2022-01-18 Thread Joshua Brindle

On Tue, Jan 18, 2022 at 7:43 AM Daniel Gustafsson  wrote:
>
> > On 18 Jan 2022, at 07:36, Julien Rouhaud  wrote:
>
> > On Mon, Jan 17, 2022 at 03:09:11PM +0100, Daniel Gustafsson wrote:
> >>
> >> I must've fat-fingered the "git add -p" for v50 as the fix was in 
> >> configure.ac
> >> but not configure.  Fixed now.
> >
> > Thanks!  Apparently this version now fails on all OS, e.g.:
>
> Fixed, I had made a mistake in the OpenSSL.pm testcode and failed to catch it
> in testing.

LGTM +1

Re: psql - add SHOW_ALL_RESULTS option

2022-01-18 Thread Peter Eisentraut


On 15.01.22 10:00, Fabien COELHO wrote:
The reason this test constantly fails on cfbot windows is a 
use-after-free

bug.


Indeed! Thanks a lot for the catch and the debug!

The ClearOrSaveResult function is quite annoying because it may or may 
not clear the result as a side effect.


Attached v14 moves the status extraction before the possible clear. I've 
added a couple of results = NULL after such calls in the code.


In the psql.sql test file, the test I previously added concluded with 
\set ECHO none, which was a mistake that I have now fixed.  As a result, 
the tests that you added after that point didn't show their input lines, 
which was weird and not intentional.  So the tests will now show a 
different output.


I notice that this patch has recently gained a new libpq function.  I 
gather that this is to work around the misbehaviors in libpq that we 
have discussed.  But I think if we are adding a libpq API function to 
work around a misbehavior in libpq, we might as well fix the misbehavior 
in libpq to begin with.  Adding a new public libpq function is a 
significant step, needs documentation, etc.  It would be better to do 
without.  Also, it makes one wonder how others are supposed to use this 
multiple-results API properly, if even psql can't do it without 
extending libpq.  Needs more thought.

Re: Support for NSS as a libpq TLS backend

2022-01-18 Thread Jacob Champion

On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
> I've attached a v50 which fixes the issues found by Joshua upthread, as well 
> as
> rebases on top of all the recent SSL and pgcrypto changes.

I'm currently tracking down a slot leak. When opening and closing large
numbers of NSS databases, at some point we appear to run out of slots
and then NSS starts misbehaving, even though we've closed all of our
context handles.

I don't have anything more helpful to share yet, but I wanted to make a
note of it here in case anyone else had seen it or has ideas on what
may be causing it. My next move will be to update the version of NSS
I'm running.

--Jacob

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-18 Thread Peter Eisentraut


On 18.01.22 16:24, Tom Lane wrote:

Peter Eisentraut  writes:

Also, considering the failure on prairiedog, I do see now on
 that the sysconfig
module is "New in version 3.2".  I had interpreted the fact that it
exists in version 2.7 that that includes all higher versions, but
obviously there were multiple branches involved, so that was a mistaken
assumption.


Hm.  I installed 3.1 because we claim support for that.  I don't mind
updating to 3.2 (as long as we adjust the docs to match), but it seems
kinda moot unless you figure out a solution for the include-path
issue.  I see that platforms as recent as Debian 10 are failing,
so I don't think we can dismiss that as not needing fixing.


I have reverted this for now.

I don't have a clear idea how to fix this in the long run.  We would 
perhaps need to determine at which points the various platforms had 
fixed this issue in their Python installations and select between the 
old and the new approach based on that.  Seems messy.

Re: [PATCH] allow src/tools/msvc/clean.bat script to be called from the root of the source tree

2022-01-18 Thread Andrew Dunstan


On 1/18/22 04:41, Juan José Santamaría Flecha wrote:
> In [1] capacity for $SUBJECT was added to most of the batch scripts,
> but clean.bat was not included. I propose to do so with the attached
> patch.


That looks a bit ugly. How about this (untested) instead?


>
> By the way, are pgbison.bat and pgflex.bat directly called anywhere?



Not to my knowledge. One of the things that's annoying about them is
that the processor names are hardcoded, so if you install winflexbison
as I usually do you have to rename the executables (or rename the
chocolatey shims) or the scripts won't work.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/tools/msvc/clean.bat b/src/tools/msvc/clean.bat
index 0cc91e7d6c..42b3d1b87f 100755
--- a/src/tools/msvc/clean.bat
+++ b/src/tools/msvc/clean.bat
@@ -4,8 +4,9 @@ REM src/tools/msvc/clean.bat
 set DIST=0
 if "%1"=="dist" set DIST=1
 
-set D=%CD%
-if exist ..\msvc if exist ..\..\..\src cd ..\..\..
+setlocal
+
+cd "%~dp0\..\..\.."
 
 if exist debug rd /s /q debug
 if exist release rd /s /q release
@@ -133,7 +134,7 @@ REM Clean up datafiles built with contrib
 REM cd contrib
 REM for /r %%f in (*.sql) do if exist %%f.in del %%f
 
-cd %D%
+cd "%~dp0"
 
 REM Clean up ecpg regression test files
 msbuild ecpg_regression.proj /NoLogo /v:q %MSBFLAGS% /t:clean

Re: Push down time-related SQLValue functions to foreign server

2022-01-18 Thread Tom Lane

Alexander Pyhalov  writes:
> [ updated patch ]

Thanks for updating the patch.  (BTW, please attach version numbers
to new patch versions, to avoid confusion.)

However, before we proceed any further with this patch, I think we
really ought to stop and think about the question I raised last
night: why are we building a one-off feature for SQLValueFunction?
Wouldn't the same parameter-substitution mechanism work for *any*
stable expression that doesn't contain remote Vars?  That would
subsume the now() case as well as plenty of others.

So far the only counterexample I've been able to come up with is
that shipping values of reg* types might not be too safe, because
the remote side might not have the same objects.  For example
consider these two potential quals:
WHERE remote_oid_column = CURRENT_ROLE::regrole
WHERE remote_text_column = CURRENT_ROLE::text
Say we're running as user 'joe' and that role doesn't exist on the
remote server.  Then executing the first WHERE locally is fine, but
shipping it to the remote would cause a failure because the remote's
regrolein() will fail to convert the parameter value.  But the second
case is quite non-problematic, because what will be sent over is just
some uninterpreted text.

In point of fact, this hazard doesn't have anything to do with stable
or not-stable subexpressions --- for example,
WHERE remote_oid_column = 'joe'::regrole
is just as unsafe, even though the value under consideration is a
*constant*.  Maybe there is something in postgres_fdw that would stop
it from shipping this qual, but I don't recall seeing it, so I wonder
if there's a pre-existing bug here.

So it seems like we need a check to prevent generating remote Params
that are of "unsafe" types, but this is a type issue not an expression
issue --- as long as an expression is stable and does not yield an
unsafe-to-ship data type, why can't we treat it as a Param?

regards, tom lane

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-18 Thread Tom Lane

Peter Eisentraut  writes:
> I don't have a clear idea how to fix this in the long run.  We would 
> perhaps need to determine at which points the various platforms had 
> fixed this issue in their Python installations and select between the 
> old and the new approach based on that.  Seems messy.

Are we sure it's an issue within Python, rather than something we
could dodge by invoking sysconfig differently?  It's hard to believe
that sysconfig could be totally unfit for the purpose of finding out
the include path and would remain so for multiple years.

regards, tom lane

Re: Per-table storage parameters for TableAM/IndexAM extensions

2022-01-18 Thread Sadhuprasad Patro

On Mon, Jan 17, 2022 at 4:47 PM Jelte Fennema  wrote:
>
> Big +1, this is a great addition!
>
> I think it would be very useful if there were some tests for this new
> feature. Something similar to the tests for storage parameters for
> index AMs in src/test/modules/dummy_index_am.
>
Sure, I will refer to the index AM test and add the test cases needed.

> Apart from that I think the documentation for table storage parameters
> needs to be updated in doc/src/sgml/ref/create_table.sgml. It now
> needs to indicate that these parameters are different for each table
> access method. Similar to this paragraph in the create index storage
> parameter section of the docs:

Sure, I will add the documentation part for this.

As of now, I have fixed the comments from Dilip & Rushabh and have
done some more changes after internal testing and review. Please find
the latest patch attached.

Thanks & Regards
SadhuPrasad
www.EnterpriseDB.com


v2-0001-PATCH-v2-Per-table-storage-parameters-for-TableAM.patch
Description: Binary data

Re: Adding CI to our tree

2022-01-18 Thread Andres Freund

Hi,

On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote:
> Sure, very doable, although we will still need some special logic for
> src/test - there are security implication from running the ssl, ldap and
> kerberos tests by default. See its Makefile.

ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd
not need to duplicate them in the make / msvc world and we'd see them as
skipped tests when not enabled.

Greetings,

Andres Freund

Re: slowest tap tests - split or accelerate?

2022-01-18 Thread Robert Haas

On Mon, Jan 17, 2022 at 2:57 PM Andres Freund  wrote:
> > pg_basebackup's 010_pg_basebackup.pl looks like it could be split up,
> > though. That one, at least to me, looks like people have just kept
> > adding semi-related things into the same test file.
>
> Yea.

Here's a patch that splits up that file. Essentially the first half of
the file is concerned with testing that a backup ends up in the state
it expects, while the second half is concerned with checking that
various options to pg_basebackup work. So I split it that way, plus I
moved some of the really basic stuff to a completely separate file
with a very brief runtime. The test results are interesting.

Unpatched:

[12:33:33] t/010_pg_basebackup.pl ... ok16161 ms ( 0.02 usr  0.00
sys +  2.07 cusr  7.80 csys =  9.89 CPU)
[12:33:49] t/020_pg_receivewal.pl ... ok 4115 ms ( 0.00 usr  0.00
sys +  0.89 cusr  1.73 csys =  2.62 CPU)
[12:33:53] t/030_pg_recvlogical.pl .. ok 1857 ms ( 0.01 usr  0.01
sys +  0.63 cusr  0.73 csys =  1.38 CPU)
[12:33:55]
All tests successful.
Files=3, Tests=177, 22 wallclock secs ( 0.04 usr  0.02 sys +  3.59
cusr 10.26 csys = 13.91 CPU)

Pached:

[12:32:03] t/010_pg_basebackup_basic.pl .. ok  192 ms ( 0.01
usr  0.00 sys +  0.10 cusr  0.05 csys =  0.16 CPU)
[12:32:03] t/011_pg_basebackup_integrity.pl .. ok 5530 ms ( 0.00
usr  0.00 sys +  0.87 cusr  2.51 csys =  3.38 CPU)
[12:32:09] t/012_pg_basebackup_options.pl  ok13117 ms ( 0.00
usr  0.00 sys +  1.87 cusr  6.31 csys =  8.18 CPU)
[12:32:22] t/020_pg_receivewal.pl  ok 4314 ms ( 0.01
usr  0.00 sys +  0.97 cusr  1.77 csys =  2.75 CPU)
[12:32:26] t/030_pg_recvlogical.pl ... ok 1908 ms ( 0.00
usr  0.00 sys +  0.64 cusr  0.77 csys =  1.41 CPU)
[12:32:28]
All tests successful.
Files=5, Tests=177, 25 wallclock secs ( 0.04 usr  0.02 sys +  4.45
cusr 11.41 csys = 15.92 CPU)

Sadly, we've gained about 2.5 seconds of runtime as the price of
splitting the test. Arguably the options part could be split up a lot
more finely than this, but that would drive up the runtime even more,
basically because we'd need more initdbs. So I don't know whether it's
better to leave things as they are, split them this much, or split
them more. I think this amount of splitting might be justified simply
in the interests of clarity, but I'm reluctant to go further unless we
get some nifty initdb-caching system.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v1-0001-Split-010_pg_basebackup.pl.patch
Description: Binary data

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Alvaro Herrera

On 2022-Jan-17, James Coleman wrote:

> I'd be happy to make it a separate GUC, though it seems adding an
> additional atomic access is worse (assuming we can convince ourselves
> putting this into the commit timestamps infrastructure is acceptable)
> given here we're already under a lock.

I was thinking it'd not be under any locks ... and I don't think it
belongs under commit timestamps either.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)

Re: docs: pg_replication_origin_oid() description does not match behaviour

2022-01-18 Thread Bossart, Nathan

On 1/17/22, 5:24 PM, "Michael Paquier"  wrote:
> On Tue, Jan 18, 2022 at 10:19:41AM +0900, Ian Lawrence Barwick wrote:
>> Given that the code has remained unchanged since the function was
>> introduced in 9.5, it seems reasonable to change the documentation
>> to match the function behaviour rather than the other way round.
>
> Obviously.  I'll go fix that as you suggest, if there are no
> objections.

+1

Nathan

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

2022-01-18 Thread Tom Lane

Jelte Fennema  writes:
> It seems the man page of TCP_USER_TIMEOUT does not align with 
> reality then. When I use it on my local machine it is effectively used
> as a connection timeout too.

Huh, I should have thought to try that.  I confirm this behavior
on RHEL8 (kernel 4.18.0).  Not the first mistake I've seen in
Linux man pages :-(.

Doesn't seem to help on macOS, but AFAICT that platform doesn't
have TCP_USER_TIMEOUT, so no surprise there.

Anyway, that removes my objection, so I'll proceed with committing.
Thanks for working on this patch!

regards, tom lane

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread James Coleman

On Tue, Jan 18, 2022 at 12:50 PM Alvaro Herrera  wrote:
>
> On 2022-Jan-17, James Coleman wrote:
>
> > I'd be happy to make it a separate GUC, though it seems adding an
> > additional atomic access is worse (assuming we can convince ourselves
> > putting this into the commit timestamps infrastructure is acceptable)
> > given here we're already under a lock.
>
> I was thinking it'd not be under any locks ... and I don't think it
> belongs under commit timestamps either.

I'm not sure if you saw the other side of this thread with Robert, but
my argument is basically that the commit_ts infrastructure already
currently does more than just record commit timestamps for future use,
it also includes what looks to me like a more general "last commit
metadata" facility (which is not actually at all necessary to the
storing of commit timestamps). It might make sense to refactor this
somewhat so that that's more obvious, but I'd like to know if it looks
that way to you as well, and, if so, does that make it make more sense
to rely on the existing infrastructure rather than inventing a new
facility?

Thanks,
James Coleman

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-18 Thread Peter Geoghegan

On Tue, Jan 18, 2022 at 6:11 AM Robert Haas  wrote:
> On Tue, Jan 18, 2022 at 12:14 AM Peter Geoghegan  wrote:
> > I quite clearly said that you'll only get an anti-wraparound VACUUM
> > with the patch applied when the only factor that *ever* causes *any*
> > autovacuum worker to VACUUM the table (assuming the workload is
> > stable) is the anti-wraparound/autovacuum_freeze_max_age cutoff. With
> > a table like this, even increasing autovacuum_freeze_max_age to its
> > absolute maximum of 2 billion would not make it any more likely that
> > we'd get a non-aggressive VACUUM -- it would merely make the
> > anti-wraparound VACUUMs less frequent. No big change should be
> > expected with a table like that.
>
> Sure, I don't disagree with any of that. I don't see how I could. But
> I don't see how it detracts from the points I was trying to make
> either.

You said "...the reasoning that says - anti-wraparound vacuums just
aren't going to happen any more - or - relfrozenxid will advance
continuously seems like dangerous wishful thinking to me". You then
proceeded to attack a straw man -- a view that I couldn't possibly
hold. This certainly surprised me, because my actual claims seemed
well within the bounds of what is possible, and in any case can be
verified with a fairly modest effort.

That's what I was reacting to -- it had nothing to do with any
concerns you may have had. I wasn't thinking about long-idle cursors
at all. I was defending myself, because I was put in a position where
I had to defend myself.

> > Also, since the patch is not magic, and doesn't even change the basic
> > invariants for relfrozenxid, it's still true that any scenario in
> > which it's fundamentally impossible for VACUUM to keep up will also
> > have anti-wraparound VACUUMs. But that's the least of the user's
> > trouble -- in the long run we're going to have the system refuse to
> > allocate new XIDs with such a workload.
>
> Also true. But again, it's just about making sure that the patch
> doesn't make other decisions that make things worse for people in that
> situation. That's what I was expressing uncertainty about.

I am not just trying to avoid making things worse when users are in
this situation. I actually want to give users every chance to avoid
being in this situation in the first place. In fact, almost everything
I've said about this aspect of things was about improving things for
these users. It was not about covering myself -- not at all. It would
be easy for me to throw up my hands, and change nothing here (keep the
behavior that makes FreezeLimit derived from the vacuum_freeze_min
GUC), since it's all incidental to the main goals of this patch
series.

I still don't understand why you think that my idea (not yet
implemented) of making FreezeLimit into a backstop (making it
autovacuum_freeze_max_age/2 or something) and relying on the new
"early freezing" criteria for almost everything is going to make the
situation worse in this scenario with long idle cursors. It's intended
to make it better.

Why do you think that the current vacuum_freeze_min_age-based
FreezeLimit isn't actually the main problem in these scenarios? I
think that the way that that works right now (in particular during
aggressive VACUUMs) is just an accident of history. It's all path
dependence -- each incremental step may have made sense, but what we
have now doesn't seem to. Waiting for a cleanup lock might feel like
the diligent thing to do, but that doesn't make it so.

My sense is that there are very few apps that are hopelessly incapable
of advancing relfrozenxid from day one. I find it much easier to
believe that users that had this experience got away with it for a
very long time, until their luck ran out, somehow. I would like to
minimize the chance of that ever happening, to the extent that that's
possible within the confines of the basic heapam/vacuumlazy.c
invariants.

-- 
Peter Geoghegan

Re: [PATCH] reduce page overlap of GiST indexes built using sorted method

2022-01-18 Thread Andrey Borodin




> 18 янв. 2022 г., в 03:54, Björn Harrtell  
> написал(а):
> 
> There might be some deep reason in the architecture that I'm unaware of that 
> could make it difficult to affect the node size but regardless, I believe 
> there could be a substantial win if node size could be controlled.

That's kind of orthogonal development path. Some years ago I had posted "GiST 
intrapage indexing" patch [0], that was aiming to make a tree with fanout that 
is Sqrt(Items on page). But for now decreasing fillfactor == wasting a lot of 
space, both in shared_buffers and on disk...

Thank you for raising this topic, I think I should rebase and refresh that 
patch too...

Best regards, Andrey Borodin.


[0] 
https://www.postgresql.org/message-id/flat/7780A07B-4D04-41E2-B228-166B41D07EEE%40yandex-team.ru

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Alvaro Herrera

On 2022-Jan-18, James Coleman wrote:

> Reading the code it seems the only usage (besides
> the boolean activation status also stored there) is in
> TransactionIdGetCommitTsData, and the only consumers of that in core
> appear to be the SQL callable functions to get the latest commit info.
> It is in commit_ts.h though, so I'm guessing someone is using this
> externally (and maybe that's why the feature has the shape it does).

Logical replication is the intended consumer of that info, for the
purposes of conflict handling.  I suppose pglogical uses it, but I don't
know that code myself.

[ ... greps ... ]

Yeah, that function is called from pglogical.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: refactoring basebackup.c

2022-01-18 Thread Robert Haas

On Tue, Nov 16, 2021 at 4:47 PM Robert Haas  wrote:
> Here's a new patch set.

And here's another one.

I've committed the first two patches from the previous set, the second
of those just today, and so we're getting down to the meat of the
patch set.

0001 adds "server" and "blackhole" as backup targets. It now has some
tests. This might be more or less ready to ship, unless somebody else
sees a problem, or I find one.

0002 adds server-side gzip compression. This one hasn't got tests yet.
Also, it's going to need some adjustment based on the parallel
discussion on the new options structure.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v11-0001-Support-base-backup-targets.patch
Description: Binary data


v11-0002-Server-side-gzip-compression.patch
Description: Binary data

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread James Coleman

On Tue, Jan 18, 2022 at 1:52 PM Alvaro Herrera  wrote:
>
> On 2022-Jan-18, James Coleman wrote:
>
> > Reading the code it seems the only usage (besides
> > the boolean activation status also stored there) is in
> > TransactionIdGetCommitTsData, and the only consumers of that in core
> > appear to be the SQL callable functions to get the latest commit info.
> > It is in commit_ts.h though, so I'm guessing someone is using this
> > externally (and maybe that's why the feature has the shape it does).
>
> Logical replication is the intended consumer of that info, for the
> purposes of conflict handling.  I suppose pglogical uses it, but I don't
> know that code myself.
>
> [ ... greps ... ]
>
> Yeah, that function is called from pglogical.

That's interesting, because my use case for the lsn is also logical
replication (monitoring).

James Coleman

Re: Adding CI to our tree

2022-01-18 Thread Andrew Dunstan



On 1/18/22 12:44, Andres Freund wrote:
> Hi,
>
> On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote:
>> Sure, very doable, although we will still need some special logic for
>> src/test - there are security implication from running the ssl, ldap and
>> kerberos tests by default. See its Makefile.
> ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd
> not need to duplicate them in the make / msvc world and we'd see them as
> skipped tests when not enabled.
>

Yeah, good idea. Especially if we can backpatch that. The buildfarm
client would also get simpler, so it would be doubleplusgood.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Push down time-related SQLValue functions to foreign server

2022-01-18 Thread Alexander Pyhalov


Tom Lane писал 2022-01-18 19:53:

Alexander Pyhalov  writes:

[ updated patch ]


Thanks for updating the patch.  (BTW, please attach version numbers
to new patch versions, to avoid confusion.)

However, before we proceed any further with this patch, I think we
really ought to stop and think about the question I raised last
night: why are we building a one-off feature for SQLValueFunction?
Wouldn't the same parameter-substitution mechanism work for *any*
stable expression that doesn't contain remote Vars?  That would
subsume the now() case as well as plenty of others.



Hi.

I think,  I can extend it to allow any stable function (not just 
immutable/sqlvalues) in is_foreign_expr(), but not so sure about 
"expression".


Perhaps, at top of deparseExpr() we can check that expression doesn't 
contain vars, params, but contains stable function, and deparse it as 
param.

This means we'll translate something like

explain select * from t where d > now() - '1 day'::interval;

to

select * from t where d > $1;

The question is how will we reliably determine its typmod (given that I 
read in exprTypmod() comment
"returns the type-specific modifier of the expression's result type, if 
it can be determined.  In many cases, it can't".


What do we save if we don't ship whole expression as param, but only 
stable functions? Allowing them seems to be more straightforward.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Push down time-related SQLValue functions to foreign server

2022-01-18 Thread Tom Lane

Alexander Pyhalov  writes:
> Tom Lane писал 2022-01-18 19:53:
>> However, before we proceed any further with this patch, I think we
>> really ought to stop and think about the question I raised last
>> night: why are we building a one-off feature for SQLValueFunction?
>> Wouldn't the same parameter-substitution mechanism work for *any*
>> stable expression that doesn't contain remote Vars?  That would
>> subsume the now() case as well as plenty of others.

> This means we'll translate something like
> explain select * from t where d > now() - '1 day'::interval;
> to
> select * from t where d > $1;

Right.

> The question is how will we reliably determine its typmod (given that I 
> read in exprTypmod() comment
> "returns the type-specific modifier of the expression's result type, if 
> it can be determined.  In many cases, it can't".

I don't think we need to.  If exprTypmod() says the typmod is -1,
then that's what it is.

> What do we save if we don't ship whole expression as param, but only 
> stable functions? Allowing them seems to be more straightforward.

How so?  Right off the bat, you get rid of the need for your own
version of contain_mutable_function.  ISTM this approach can probably
be implemented in a patch that's noticeably smaller than what you
have now.  It'll likely be touching entirely different places in
postgres_fdw, of course.

regards, tom lane

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-18 Thread Bossart, Nathan

On 1/14/22, 11:26 PM, "Bharath Rupireddy" 
 wrote:
> On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan  wrote:
>> I'd personally like to avoid creating two code paths for the same
>> thing.  Are there really cases when this one extra auxiliary process
>> would be too many?  And if so, how would a user know when to adjust
>> this GUC?  I understand the point that we should introduce new
>> processes sparingly to avoid burdening low-end systems, but I don't
>> think we should be afraid to add new ones when it is needed.
>
> IMO, having a GUC for enabling/disabling this new worker and it's
> related code would be a better idea. The reason is that if the
> postgres has no replication slots at all(which is quite possible in
> real stand-alone production environments) or if the file enumeration
> (directory traversal and file removal) is fast enough on the servers,
> there's no point having this new worker, the checkpointer itself can
> take care of the work as it is doing today.

IMO introducing a GUC wouldn't be doing users many favors.  Their
cluster might work just fine for a long time before they begin
encountering problems during startups/checkpoints.  Once the user
discovers the underlying reason, they have to then find a GUC for
enabling a special background worker that makes this problem go away.
Why not just fix the problem for everybody by default?

I've been thinking about what other approaches we could take besides
creating more processes.  The root of the problem seems to be that
there are a number of tasks that are performed synchronously that can
take a long time.  The process approach essentially makes these tasks
asynchronous so that they do not block startup and checkpointing.  But
perhaps this can be done in an existing process, possibly even the
checkpointer.  Like the current WAL pre-allocation patch, we could do
this work when the checkpointer isn't checkpointing, and we could also
do small amounts of work in CheckpointWriteDelay() (or a new function
called in a similar way).  In theory, this would help avoid delaying
checkpoints too long while doing cleanup at every opportunity to lower
the chances it falls far behind.

Nathan

Re: Push down time-related SQLValue functions to foreign server

2022-01-18 Thread Tom Lane

I wrote:
> Alexander Pyhalov  writes:
>> This means we'll translate something like
>> explain select * from t where d > now() - '1 day'::interval;
>> to
>> select * from t where d > $1;

> Right.

After thinking about that a bit more, I see that this will result
in a major redefinition of what is "shippable".  Right now, we do not
consider the above WHERE clause to be shippable, not only because of
now() but because the timestamptz-minus-interval operator is dependent
on the timezone setting, which might be different at the remote.
But if we evaluate that operator locally and send its result as a
parameter, the objection vanishes.  In fact, I don't think we even
need to require the subexpression to contain only built-in functions.
Its result still has to be of a built-in type, but that's a much
weaker restriction.

So this is going to require significant restructuring of both
is_foreign_expr and deparse.c, which I realize may be more than
you bargained for.  But if you want to tackle it, I think what
we want to do is divide potentially-shippable expressions into
three sorts of components:

1. Remote Vars, which obviously ship as-is.

2. Locally-evaluatable subexpressions, which must contain no
remote Vars, must be stable or immutable, and must have a result
type that we consider safe to ship.  If such a subexpression
is just a Const, we ship it as a constant, otherwise we evaluate
the value at runtime and ship a parameter.

3. Superstructure, which consists of all expression nodes having
at least one remote Var below them.  These nodes have to be
shippable according to the existing definition, so that we know
that the remote server will evaluate them just as we would.

If we keep the existing division of labor between is_foreign_expr
and deparse.c, I fear that there's going to be a lot of duplicated
code as well as duplicative planning effort.  I wonder if it'd be
wise to combine those operations into a single expression-scanning
process that determines whether an expression is safe to ship and
produces the translated string immediately if it is.

regards, tom lane

Re: [PATCH] allow src/tools/msvc/clean.bat script to be called from the root of the source tree

2022-01-18 Thread Juan José Santamaría Flecha

On Tue, Jan 18, 2022 at 5:49 PM Andrew Dunstan  wrote:

>
> On 1/18/22 04:41, Juan José Santamaría Flecha wrote:
> > In [1] capacity for $SUBJECT was added to most of the batch scripts,
> > but clean.bat was not included. I propose to do so with the attached
> > patch.
>
>
> That looks a bit ugly. How about this (untested) instead?
>
> It is WFM, so I am fine with it.

>
> > By the way, are pgbison.bat and pgflex.bat directly called anywhere?
>
> Not to my knowledge. One of the things that's annoying about them is
> that the processor names are hardcoded, so if you install winflexbison
> as I usually do you have to rename the executables (or rename the
> chocolatey shims) or the scripts won't work.
>
> We could use those batches to get to the hardcoded name, but that would
probably be as annoying as renaming. If those batches serve no purpose
right now, it should be fine to remove them from the tree. I use the
executables from a MinGW installation, and they keep their actual name.

Regards,

Juan José Santamaría Flecha

Re: [PATCH] reduce page overlap of GiST indexes built using sorted method

2022-01-18 Thread sergei sh.


Hi,

I've addressed Andrey Borodin's concerns about v2 of this patch by 
Aliaksandr

Kalenik in attached version. Change list:
* Number of pages to collect moved to GUC parameter 
"gist_sorted_build_page_buffer_size".

* GistSortedBuildPageState type renamed to GistSortedBuildLevelState.
* Comments added.

Sorted build remaind deterministic as long as picksplit implementation 
for given
opclass is, which seem to be true for builtin types, so setting random 
seed is

not required for testing.

Andrey Borodin's GiST support patch for amcheck was used to verify built 
indexes:

https://commitfest.postgresql.org/25/1800/
PSA modified version working with current Postgres code (btree functions
removed).
diff --git a/contrib/pageinspect/expected/gist.out b/contrib/pageinspect/expected/gist.out
index 86c9e9caa9..8cb390feed 100644
--- a/contrib/pageinspect/expected/gist.out
+++ b/contrib/pageinspect/expected/gist.out
@@ -33,14 +33,15 @@ COMMIT;
 SELECT * FROM gist_page_items(get_raw_page('test_gist_idx', 0), 'test_gist_idx');
  itemoffset |   ctid| itemlen | dead |   keys
 +---+-+--+---
-  1 | (1,65535) |  40 | f| (p)=((166,166))
-  2 | (2,65535) |  40 | f| (p)=((332,332))
-  3 | (3,65535) |  40 | f| (p)=((498,498))
-  4 | (4,65535) |  40 | f| (p)=((664,664))
-  5 | (5,65535) |  40 | f| (p)=((830,830))
-  6 | (6,65535) |  40 | f| (p)=((996,996))
-  7 | (7,65535) |  40 | f| (p)=((1000,1000))
-(7 rows)
+  1 | (1,65535) |  40 | f| (p)=((125,125))
+  2 | (2,65535) |  40 | f| (p)=((250,250))
+  3 | (3,65535) |  40 | f| (p)=((375,375))
+  4 | (4,65535) |  40 | f| (p)=((500,500))
+  5 | (5,65535) |  40 | f| (p)=((625,625))
+  6 | (6,65535) |  40 | f| (p)=((750,750))
+  7 | (7,65535) |  40 | f| (p)=((875,875))
+  8 | (8,65535) |  40 | f| (p)=((1000,1000))
+(8 rows)
 
 SELECT * FROM gist_page_items(get_raw_page('test_gist_idx', 1), 'test_gist_idx') LIMIT 5;
  itemoffset | ctid  | itemlen | dead |keys 
@@ -64,6 +65,7 @@ SELECT itemoffset, ctid, itemlen FROM gist_page_items_bytea(get_raw_page('test_g
   5 | (5,65535) |  40
   6 | (6,65535) |  40
   7 | (7,65535) |  40
-(7 rows)
+  8 | (8,65535) |  40
+(8 rows)
 
 DROP TABLE test_gist;
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 9854116fca..239fd08495 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -59,6 +59,11 @@
  */
 #define BUFFERING_MODE_TUPLE_SIZE_STATS_TARGET 4096
 
+/**
+ * Number of pages to collect during sorted build before applying split operation
+ */
+int GistSortedBuildPageBufferSize = 8;
+
 /*
  * Strategy used to build the index. It can change between the
  * GIST_BUFFERING_* modes on the fly, but if the Sorted method is used,
@@ -115,14 +120,34 @@ typedef struct
 
 /*
  * In sorted build, we use a stack of these structs, one for each level,
- * to hold an in-memory buffer of the rightmost page at the level. When the
- * page fills up, it is written out and a new page is allocated.
+ * to hold an in-memory buffer of last pages at the level.
+ *
+ * Sorting GiST build requires good linearization of the sort opclass. This is
+ * not always the case in multidimensional data. To fight the anomalies, we
+ * buffer index tuples and apply picksplit that can be multidimension-aware.
  */
-typedef struct GistSortedBuildPageState
+typedef struct GistSortedBuildLevelState
 {
-	Page		page;
-	struct GistSortedBuildPageState *parent;	/* Upper level, if any */
-} GistSortedBuildPageState;
+	int current_page;
+	BlockNumber last_blkno;
+	struct GistSortedBuildLevelState *parent;	/* Upper level, if any */
+	uint16 item_num_total;
+	Size page_max_num;
+	Page pages[FLEXIBLE_ARRAY_MEMBER];
+} GistSortedBuildLevelState;
+
+/*
+ * Max. number of items to apply gistSplit to is limited by OffsetNumber type
+ * used in GIST_SPLITVEC.
+ */
+#define GistSortedBuildLevelStateIsMaxItemNum(levelstate) \
+	(levelstate->item_num_total + 1 == PG_UINT16_MAX)
+
+#define GistSortedBuildLevelStateRequiredSize(page_max_num) \
+( \
+	AssertMacro(page_max_num > 0), \
+	(offsetof(GistSortedBuildLevelState, pages) + page_max_num * sizeof(Page)) \
+)
 
 /* prototypes for private functions */
 
@@ -130,11 +155,11 @@ static void gistSortedBuildCallback(Relation index, ItemPointer tid,
 	Datum *values, bool *isnull,
 	bool tupleIsAlive, void *state);
 static void gist_indexsortbuild(GISTBuildState *state);
-static void gist_indexsortbuild_pagestate_add(GISTBuildState *state,
-			  GistSortedBuildPageState *pagestate,
+static void gist_indexsortbuild_levelstate_add(GISTBuildState *state,
+			  GistSortedBuildLevelState *levelst

Re: Allow root ownership of client certificate key

2022-01-18 Thread Tom Lane

David Steele  writes:
> [ client-key-perm-002.patch ]

I took a quick look at this and agree with the proposed behavior
change, but also with your self-criticisms:

> We may want to do the same on the server side to make the code blocks 
> look more similar.
>
> Also, on the server side the S_ISREG() check gets its own error and that 
> might be a good idea on the client side as well. As it is, the error 
> message on the client is going to be pretty confusing in this case.

Particularly, I think the S_ISREG check should happen before any
ownership/permissions checks; it just seems saner that way.

The only other nitpick I have is that I'd make the cross-references be
to the two file names, ie like "Note that similar checks are performed
in fe-secure-openssl.c ..."  References to the specific functions seem
likely to bit-rot in the face of future code rearrangements.
I suppose filename references could become obsolete too, but it
seems less likely.

regards, tom lane

Re: [PATCH] reduce page overlap of GiST indexes built using sorted method

2022-01-18 Thread Komяpa

Hello hackers,

On Tue, Jan 18, 2022 at 11:26 PM sergei sh.  wrote:

> Hi,
>
> I've addressed Andrey Borodin's concerns about v2 of this patch by
> Aliaksandr
> Kalenik in attached version.
>

[snip]

This patchset got some attention in the PostGIS development channel, as it
is important to really enable the fast GiST build there for the end user.
The reviews are positive, it saves build time and performs as well as
original non-sorting build on tested workloads.

Test by Giuseppe Broccolo:
https://lists.osgeo.org/pipermail/postgis-devel/2022-January/029330.html

Test by Regina Obe:
https://lists.osgeo.org/pipermail/postgis-devel/2022-January/029335.html

I believe this patch is an important addition to Postgres 15 and will make
a lot of GIS users happier.

Thanks,
Darafei.

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-18 Thread Robert Haas

On Tue, Jan 18, 2022 at 1:51 PM Robert Haas  wrote:
> Modify pg_basebackup to use a new COPY subprotocol for base backups.

Andres pointed out to me that longfin is sad:

2022-01-18 14:52:35.484 EST [82470:4] LOG:  server process (PID 82487)
was terminated by signal 4: Illegal instruction: 4
2022-01-18 14:52:35.484 EST [82470:5] DETAIL:  Failed process was
running: BASE_BACKUP ( LABEL 'pg_basebackup base backup',  PROGRESS,
CHECKPOINT 'fast',  MANIFEST 'yes',  TARGET 'client')

Unfortunately, I can't reproduce this locally, even with COPT=-Wall
-Werror -fno-omit-frame-pointer -fsanitize-trap=alignment
-Wno-deprecated-declarations -DWRITE_READ_PARSE_PLAN_TREES
-DSTRESS_SORT_INT_MIN -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS.

Tom, any chance you can get a stack trace?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Adding CI to our tree

2022-01-18 Thread Justin Pryzby

On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> I think it might still be worth adding stopgap way of running all tap tests on
> windows though. Having a vcregress.pl function to find all directories with t/
> and run the tests there, shouldn't be a lot of code...

I started doing that, however it makes CI/windows even slower.  I think it'll
be necessary to run prove with all the tap tests to parallelize them, rather
than looping around directories, many of which have only a single file, and are
run serially.

https://github.com/justinpryzby/postgres/commits/citest-cirrus
This has the link to a recent cirrus result if you'd want to look.
I suppose I should start a new thread.  

There's a bunch of other stuff for cirrus in there (and bits and pieces of
other branches).

 .  cirrus: spell ccache_maxsize
 .  cirrus: avoid unnecessary double star **
 .  vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
 .  vcregress: style
 .  wip: vcsregress: add alltaptests
 .  wip: run upgrade tests with data-checksums
 .  pg_regress --initdb-opts
 .  wip: pg_upgrade: show list of files copied only in vebose mode
 .  wip: cirrus: include hints how to install OS packages..
 .  wip: cirrus: code coverage
 .  cirrus: upload html docs as artifacts
 .  wip: cirrus/windows: save compiled build as an artifact

Re: [PATCH] allow src/tools/msvc/clean.bat script to be called from the root of the source tree

2022-01-18 Thread Andrew Dunstan



On 1/18/22 15:08, Juan José Santamaría Flecha wrote:
>
>
>
> > By the way, are pgbison.bat and pgflex.bat directly called anywhere?
>
> Not to my knowledge. One of the things that's annoying about them is
> that the processor names are hardcoded, so if you install winflexbison
> as I usually do you have to rename the executables (or rename the
> chocolatey shims) or the scripts won't work.
>
> We could use those batches to get to the hardcoded name, but that
> would probably be as annoying as renaming. If those batches serve no
> purpose right now, it should be fine to remove them from the tree. I
> use the executables from a MinGW installation, and they keep their
> actual name.
>

OK, but I don't think we should require installation of MinGW for an
MSVC build.


cheers


andrew

--

Andrew Dunstan

EDB: https://www.enterprisedb.com

Re: refactoring basebackup.c

2022-01-18 Thread Robert Haas

On Tue, Jan 18, 2022 at 9:43 AM Jeevan Ladhe
 wrote:
> The patch surely needs some grooming, but I am expecting some initial
> review, specially in the area where we are trying to close the zstd stream
> in bbsink_zstd_end_archive(). We need to tell the zstd library to end the
> compression by calling ZSTD_compressStream2() thereby sending a
> ZSTD_e_end flag. But, this also needs some input string, which per
> example[1] line # 686, I have taken as an empty ZSTD_inBuffer.

As far as I can see, this is correct. I found
https://zstd.docsforge.com/dev/api-documentation/#streaming-compression-howto
which seems to endorse what you've done here.

One (minor) thing that I notice is that, the way you've written the
loop in bbsink_zstd_end_archive(), I think it will typically call
bbsink_archive_contents() twice. It will flush whatever is already
present in the next sink's buffer as a result of the previous calls to
bbsink_zstd_archive_contents(), and then it will call
ZSTD_compressStream2() which will partially refill the buffer you just
emptied, and then there will be nothing left in the internal buffer,
so it will call bbsink_archive_contents() again. But ... the initial
flush may not have been necessary. It could be that there was enough
space already in the output buffer for the ZSTD_compressStream2() call
to succeed without a prior flush. So maybe:

do
{
yet_to_flush = ZSTD_compressStream2(..., ZSTD_e_end);
check ZSTD_isError here;
if (mysink->zstd_outBuf.pos > 0)
bbsink_archive_contents();
} while (yet_to_flush > 0);

I believe this might be very slightly more efficient.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Andres Freund

Hi,

On 2022-01-17 18:34:16 -0300, Alvaro Herrera wrote:
> Maybe it would work to have a single LSN in shared memory, as an atomic
> variable, which uses monotonic advance[1] to be updated.

That could be a reasonable approach.

> Whether this is updated or not would depend on a new GUC, maybe
> track_latest_commit_lsn.  Causing performance pain during transaction commit
> is not great, but at least this way it shouldn't be *too* a large hit.

What kind of consistency are we expecting from this new bit of information?
Does it have to be perfectly aligned with visibility? If so, it'd need to
happen in ProcArrayEndTransaction(), with ProcArrayLock held - which I'd
consider a complete no-go, that's way too contended.

If it's "just" another piece of work happening "sometime around" transaction
commit, it'd be a bit less concerning.

I wonder if a very different approach could make sense here. Presumably this
wouldn't need to be queried at a very high frequency, right? If so, what about
storing the latest commit LSN for each backend in PGPROC? That could be
maintained without a lock/atomics, and should be just about free.
pg_last_committed_xact() then would have to iterate over all PGPROCs to
complete the LSN, but that's not too bad for an operation like that. We'd also
need to maintain a value for all disconnected backends, but that's also not a 
hot
path.

Greetings,

Andres Freund

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-18 Thread Tom Lane

Robert Haas  writes:
> Andres pointed out to me that longfin is sad:

> 2022-01-18 14:52:35.484 EST [82470:4] LOG:  server process (PID 82487)
> was terminated by signal 4: Illegal instruction: 4

> Tom, any chance you can get a stack trace?

Hmm, I'd assumed that was just a cosmic ray or something.
I'll check if it reproduces, though.

regards, tom lane

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread James Coleman

On Tue, Jan 18, 2022 at 4:32 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-01-17 18:34:16 -0300, Alvaro Herrera wrote:
> > Maybe it would work to have a single LSN in shared memory, as an atomic
> > variable, which uses monotonic advance[1] to be updated.
>
> That could be a reasonable approach.
>
>
> > Whether this is updated or not would depend on a new GUC, maybe
> > track_latest_commit_lsn.  Causing performance pain during transaction commit
> > is not great, but at least this way it shouldn't be *too* a large hit.
>
> What kind of consistency are we expecting from this new bit of information?
> Does it have to be perfectly aligned with visibility? If so, it'd need to
> happen in ProcArrayEndTransaction(), with ProcArrayLock held - which I'd
> consider a complete no-go, that's way too contended.

My use case wouldn't require perfect alignment with visibility (I'm
not sure about the use case Alvaro mentioned in pglogical).

> If it's "just" another piece of work happening "sometime around" transaction
> commit, it'd be a bit less concerning.

That raises the interesting question of where the existing commit_ts
infrastructure and last commit caching falls into that range.

> I wonder if a very different approach could make sense here. Presumably this
> wouldn't need to be queried at a very high frequency, right? If so, what about
> storing the latest commit LSN for each backend in PGPROC? That could be
> maintained without a lock/atomics, and should be just about free.
> pg_last_committed_xact() then would have to iterate over all PGPROCs to
> complete the LSN, but that's not too bad for an operation like that. We'd also
> need to maintain a value for all disconnected backends, but that's also not a 
> hot
> path.

I expect most monitoring setups default to around something like
checking anywhere from every single digit seconds to minutes.

If I read between the lines I imagine you'd see even e.g. every 2s as
not that big of a deal here, right?

Thanks,
James Coleman

Re: slowest tap tests - split or accelerate?

2022-01-18 Thread Andres Freund

Hi,

On 2022-01-18 12:49:16 -0500, Robert Haas wrote:
> Here's a patch that splits up that file.

Ah, nice! The split seems sensible to me.

> Sadly, we've gained about 2.5 seconds of runtime as the price of
> splitting the test. Arguably the options part could be split up a lot
> more finely than this, but that would drive up the runtime even more,
> basically because we'd need more initdbs. So I don't know whether it's
> better to leave things as they are, split them this much, or split
> them more. I think this amount of splitting might be justified simply
> in the interests of clarity, but I'm reluctant to go further unless we
> get some nifty initdb-caching system.

Hm. From the buildfarm / CF perspective it might still be a win, because the
different pieces can run concurrently. But it's not great :(.

Maybe we really should do at least the most simplistic caching for initdbs, by
doing one initdb as part of the creation of temp_install. Then Cluster::init
would need logic to only use that if $params{extra} is empty.

Greetings,

Andres Freund

Re: Allow root ownership of client certificate key

2022-01-18 Thread David Steele


On 1/18/22 15:41, Tom Lane wrote:

David Steele  writes:

I took a quick look at this and agree with the proposed behavior
change, but also with your self-criticisms:


We may want to do the same on the server side to make the code blocks
look more similar.

Also, on the server side the S_ISREG() check gets its own error and that
might be a good idea on the client side as well. As it is, the error
message on the client is going to be pretty confusing in this case.


Particularly, I think the S_ISREG check should happen before any
ownership/permissions checks; it just seems saner that way.


I was worried about doing too much refactoring in this commit since I 
have hopes and dreams of it being back-patched. But I'll go ahead and do 
that and if any part of this can be back-patched we'll consider that 
separately.



The only other nitpick I have is that I'd make the cross-references be
to the two file names, ie like "Note that similar checks are performed
in fe-secure-openssl.c ..."  References to the specific functions seem
likely to bit-rot in the face of future code rearrangements.
I suppose filename references could become obsolete too, but it
seems less likely.


It's true that functions are more likely to be renamed, but when I 
rename a function I then search for all the places where it is used so I 
can update them. If the function name appears in a comment that gets 
updated as well.


If you would still prefer filenames I have no strong argument against 
that, just wanted to explain my logic.


Regards,
-David

Re: Allow root ownership of client certificate key

2022-01-18 Thread Tom Lane

David Steele  writes:
> On 1/18/22 15:41, Tom Lane wrote:
>> The only other nitpick I have is that I'd make the cross-references be
>> to the two file names, ie like "Note that similar checks are performed
>> in fe-secure-openssl.c ..."  References to the specific functions seem
>> likely to bit-rot in the face of future code rearrangements.
>> I suppose filename references could become obsolete too, but it
>> seems less likely.

> It's true that functions are more likely to be renamed, but when I 
> rename a function I then search for all the places where it is used so I 
> can update them. If the function name appears in a comment that gets 
> updated as well.

Harsh experience says that a lot of Postgres contributors have zero
interest in updating comments two lines away from what they're editing,
let alone in some distant branch of the source tree.  But I'm not dead
set on it either way.

regards, tom lane

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-18 Thread Robert Haas

On Tue, Jan 18, 2022 at 4:36 PM Tom Lane  wrote:
> Robert Haas  writes:
> > Andres pointed out to me that longfin is sad:
>
> > 2022-01-18 14:52:35.484 EST [82470:4] LOG:  server process (PID 82487)
> > was terminated by signal 4: Illegal instruction: 4
>
> > Tom, any chance you can get a stack trace?
>
> Hmm, I'd assumed that was just a cosmic ray or something.
> I'll check if it reproduces, though.

Thomas pointed out to me that thorntail also failed, and that it
included a backtrace. Unfortunately it's not somewhat confusing. The
innermost frame is:

#0  0x016319a4 in bbsink_archive_contents (len=, sink=) at
/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/replication/basebackup.c:1672
1672 return true;

Line 1672 of basebackup.c  is indeed "return true" but we're inside of
sendFile(), not bbsink_archive_contents(). However,
bbsink_archive_contents() is an inline function so maybe the failure
is misattributed. I wonder whether the "sink" pointer in that function
is somehow not valid ... but I don't know how that would happen, or
why it would happen only on this machine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-18 Thread Tom Lane

I wrote:
>> Tom, any chance you can get a stack trace?

> Hmm, I'd assumed that was just a cosmic ray or something.

My mistake: it's failing because of -fsanitize=alignment.

Here's the stack trace:

  * frame #0: 0x00010885dfd0 postgres`sendFile(sink=0x7fdedf071cb0, 
readfilename="./global/4178", tarfilename="global/4178", 
statbuf=0x7ffee77dfaf8, missing_ok=true, dboid=0, 
manifest=0x7ffee77e2780, spcoid=0x) at basebackup.c:1552:10
frame #1: 0x00010885cb7f postgres`sendDir(sink=0x7fdedf071cb0, 
path="./global", basepathlen=1, sizeonly=false, tablespaces=0x7fdedf072718, 
sendtblspclinks=true, manifest=0x7ffee77e2780, spcoid=0x) 
at basebackup.c:1354:12
frame #2: 0x00010885ca6b postgres`sendDir(sink=0x7fdedf071cb0, 
path=".", basepathlen=1, sizeonly=false, tablespaces=0x7fdedf072718, 
sendtblspclinks=true, manifest=0x7ffee77e2780, spcoid=0x) 
at basebackup.c:1346:13
frame #3: 0x0001088595be 
postgres`perform_base_backup(opt=0x7ffee77e2e68, sink=0x7fdedf071cb0) 
at basebackup.c:352:5
frame #4: 0x000108856b0b 
postgres`SendBaseBackup(cmd=0x7fdedf05b510) at basebackup.c:932:3
frame #5: 0x0001088711c8 
postgres`exec_replication_command(cmd_string="BASE_BACKUP ( LABEL 
'pg_basebackup base backup',  PROGRESS,  CHECKPOINT 'fast',  MANIFEST 'yes',  
TARGET 'client')") at walsender.c:1734:4 [opt]
frame #6: 0x0001088dd61e postgres`PostgresMain(dbname=, 
username=) at postgres.c:4494:12 [opt]

It failed at

-> 1552 if (!PageIsNew(page) && 
PageGetLSN(page) < sink->bbs_state->startptr)

and the problem is evidently that the page pointer isn't nicely aligned:

(lldb) p page
(char *) $4 = 0x7fdeded7e041 ""

(I checked the "sink" data structure too for luck, but it seems fine.)

I see that thorntail has now also fallen over, presumably for
the same reason.

regards, tom lane

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-18 Thread Tom Lane

Robert Haas  writes:
> Unfortunately, I can't reproduce this locally, even with COPT=-Wall
> -Werror -fno-omit-frame-pointer -fsanitize-trap=alignment
> -Wno-deprecated-declarations -DWRITE_READ_PARSE_PLAN_TREES
> -DSTRESS_SORT_INT_MIN -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS.

Now that I re-read what you did, I believe you need both of

-fsanitize=alignment -fsanitize-trap=alignment

to enable those traps to happen.  That seems to be the case with
Apple's clang, anyway.

regards, tom lane

Re: a misbehavior of partition row movement (?)

2022-01-18 Thread Alvaro Herrera

On 2022-Jan-18, Alvaro Herrera wrote:

> On 2022-Jan-18, Amit Langote wrote:
> 
> > Would you like me to update the patch with the above renumbering or
> > are you working on a new version yourself?
> 
> I have a few very minor changes apart from that.  Let me see if I can
> get this pushed soon; if I'm unable to (there are parts I don't fully
> grok yet), I'll post what I have.

Here's v13, a series with your patch as 0001 and a few changes from me;
the bulk is just pgindent, and there are a few stylistic changes and an
unrelated typo fix and I added a couple of comments to your new code.

I don't like the API change to ExecInsert().  You're adding two output
arguments:
- the tuple being inserted.  But surely you have this already, because
it's in the 'slot' argument you passed in. ExecInsert is even careful to
set the ->tts_tableOid argument there.  So ExecInsert's caller doesn't
need to receive the inserted tuple as an argument, it can just read
'slot'.
- the relation to which the tuple was inserted.  Can't this be obtained
by looking at slot->tts_tableOid?  We should be able to use
ExecLookupResultRelByOid() to obtain it, no? (I suppose you may claim
that this is wasteful, but I think this is not a common case anyway and
it's worth keeping ExecInsert()'s API clean for the sake of the 99.99%
of its other calls).

I think the argument definition of ExecCrossPartitionUpdateForeignKey is
a bit messy.  I propose to move mtstate,estate as two first arguments;
IIRC the whole executor does it that way.

AfterTriggerSaveEvent determines maybe_crosspart_update (by looking at
mtstate->operation -- why doesn't it look at 'event' instead?) and later
it determines new_event.ate_flags.  Why can't it use
maybe_crosspart_update to simplify part of that?  Or maybe the other way
around, not sure.  It looks like something could be made simpler there.

Overall, the idea embodied in the patch looks sensible to me.

Thank you,

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)
>From 3f3dbdcf1228dd78285c4efcb4fc64f732408270 Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Mon, 11 Oct 2021 14:57:19 +0900
Subject: [PATCH v13 1/2] Enforce foreign key correctly during cross-partition
 updates

When an update on a partitioned table referenced in foreign keys
constraint causes a row to move from one partition to another, which
is implemented by deleting the old row from the source leaf partition
followed by inserting the new row into the destination leaf partition,
firing the foreign key triggers on that delete event can result in
surprising outcomes for those keys.  For example, a given foreign
key's delete trigger which implements the ON DELETE CASCADE clause of
that key will delete any referencing rows when triggerred for that
internal DELETE, although it should not, because the referenced row
is simply being moved from one partition of the referenced root
partitioned table into another, not being deleted from it.

This commit teaches trigger.c to skip queuing such delete trigger
events on the leaf partitions in favor of an UPDATE event fired on
the root target relation.  Doing so makes sense because both the
old and the new tuple "logically" belong to the root relation.

The after trigger event queuing interface now allows passing the
source and the destination partitions of a particular cross-partition
update when registering the update event for the root partitioned
table.  Along with the 2 ctids of the old and the new tuple, an after
trigger event now also stores the OIDs of those partitions. The tuples
fetched from the source and the destination partitions are converted
into the root table format before they are passed to the trigger
function.

The implementation currently has a limitation that only the foreign
keys pointing into the query's target relation are considered, not
those of its sub-partitioned partitions.  That seems like a
reasonable limitation, because it sounds rare to have distinct
foreign keys pointing into sub-partitioned partitions, but not into
the root table.
---
 doc/src/sgml/ref/update.sgml  |   7 +
 src/backend/commands/trigger.c| 322 +++---
 src/backend/executor/execMain.c   |  19 +-
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeModifyTable.c| 187 -
 src/backend/utils/adt/ri_triggers.c   |   6 +
 src/include/commands/trigger.h|   4 +
 src/include/executor/executor.h   |   3 +-
 src/include/nodes/execnodes.h |   3 +
 src/test/regress/expected/foreign_key.out | 204 +-
 src/test/regress/sql/foreign_key.sql  | 135 -
 11 files changed, 840 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..3ba13010e7 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Andres Freund

On 2022-01-18 16:40:25 -0500, James Coleman wrote:
> If I read between the lines I imagine you'd see even e.g. every 2s as
> not that big of a deal here, right?

Right. Even every 0.2s wouldn't be a problem.

Synchronizing slots from primary to standby

2022-01-18 Thread Hsu, John

Hello,

The doc mentions that standby_slot_names is to only be used for waiting on 
physical slots. However, it seems like we calculate the flush_pos for logical 
slots as well in wait_for_standby_confirmation?

Re-posting some of my previous comment since it seems like it got lost...

In wait_for_standby_confirmation, should we move standby_slot_names -> namelist 
into the while (true) block so if we have wrong values set we fix it with a 
SIGHUP? Similarly, in slotsync.c, we never update standby_slot_names once the 
worker is launched. 

If a logical slot was dropped on the writer, should the worker drop logical 
slots that it was previously synchronizing but are no longer present? Or should 
we leave that to the user to manage? I'm trying to think why users would want 
to sync logical slots to a reader but not have that be dropped if we're able to 
detect they're no longer present on the writer. Maybe if there was a use-case 
to set standby_slot_names one-at-a-time, you wouldn't want other logical slots 
to be dropped, but dropping sounds like reasonable behavior for '*'?

Is there a reason we're deciding to use one-worker syncing per database instead 
of one general worker that syncs across all the databases? I imagine I'm 
missing something obvious here.

As for how standby_slot_names should be configured, I'd prefer the flexibility 
similar to what we have for synchronus_standby_names since that seems the most 
analogous. It'd provide flexibility for failovers, which I imagine is the most 
common use-case. 

John H

On 1/3/22, 5:49 AM, "Peter Eisentraut"  
wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.

Here is an updated patch to fix some build failures.  No feature changes.

On 14.12.21 23:12, Peter Eisentraut wrote:
> On 31.10.21 11:08, Peter Eisentraut wrote:
>> I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
>> rebased it, added a bit of testing.  It basically works, but as
>> mentioned in [0], there are various issues to work out.
>>
>> The idea is that the standby runs a background worker to periodically
>> fetch replication slot information from the primary.  On failover, a
>> logical subscriber would then ideally find up-to-date replication
>> slots on the new publisher and can just continue normally.
>
>> So, again, this isn't anywhere near ready, but there is already a lot
>> here to gather feedback about how it works, how it should work, how to
>> configure it, and how it fits into an overall replication and HA
>> architecture.
>
> Here is an updated patch.  The main changes are that I added two
> configuration parameters.  The first, synchronize_slot_names, is set on
> the physical standby to specify which slots to sync from the primary. By
> default, it is empty.  (This also fixes the recovery test failures that
> I had to disable in the previous patch version.)  The second,
> standby_slot_names, is set on the primary.  It holds back logical
> replication until the listed physical standbys have caught up.  That
> way, when failover is necessary, the promoted standby is not behind the
> logical replication consumers.
>
> In principle, this works now, I think.  I haven't made much progress in
> creating more test cases for this; that's something that needs more
> attention.
>
> It's worth pondering what the configuration language for
> standby_slot_names should be.  Right now, it's just a list of slots that
> all need to be caught up.  More complicated setups are conceivable.
> Maybe you have standbys S1 and S2 that are potential failover targets
> for logical replication consumers L1 and L2, and also standbys S3 and S4
> that are potential failover targets for logical replication consumers L3
> and L4.  Viewed like that, this setting could be a replication slot
> setting.  The setting might also have some relationship with
> synchronous_standby_names.  Like, if you have synchronous_standby_names
> set, then that's a pretty good indication that you also want some or all
> of those standbys in standby_slot_names.  (But note that one is slots
> and one is application names.)  So there are a variety of possibilities.

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread James Coleman

On Tue, Jan 18, 2022 at 4:32 PM Andres Freund  wrote:
> I wonder if a very different approach could make sense here. Presumably this
> wouldn't need to be queried at a very high frequency, right? If so, what about
> storing the latest commit LSN for each backend in PGPROC? That could be
> maintained without a lock/atomics, and should be just about free.
> pg_last_committed_xact() then would have to iterate over all PGPROCs to
> complete the LSN, but that's not too bad for an operation like that. We'd also
> need to maintain a value for all disconnected backends, but that's also not a 
> hot
> path.

One other question on this: if we went with this would you expect a
new function to parallel pg_last_committed_xact()? Or allow the xid
and lsn in the return of pg_last_committed_xact() potentially not to
match (of course xid might also not be present if
track_commit_timestamps isn't on)? Or would you expect the current xid
and timestamp use the new infrastructure also?

Thanks,
James Coleman

Re: is ErrorResponse possible on Sync?

2022-01-18 Thread Rafi Shamim

I used your example and tried it with prepared statements. I captured the
traffic with
Wireshark. My client sent Bind/Execute/Sync messages, and PostgreSQL 14 sent
back BindComplete/CommandComplete/ErrorResponse messages, followed by
ReadyForQuery after that.

So yes, it looks like ErrorResponse is a valid response for Sync.

On Tue, Jan 18, 2022 at 6:11 PM Alvaro Herrera 
wrote:

> On 2022-Jan-12, Andrei Matei wrote:
>
> > If Sync itself cannot fail, then what is this
> > sentence really saying: "This parameterless message (ed. Sync) causes the
> > backend to close the current transaction if it's not inside a
> BEGIN/COMMIT
> > transaction block (“close” meaning to commit if no error, or roll back if
> > error)." ?
> > This seems to say that, outside of BEGIN/END, the transaction is
> committed
> > at Sync time (i.e. if the Sync is never sent, nothing is committed).
> > Presumably, committing a transaction can fail even if no
> > previous command/statement failed, right?
>
> A deferred trigger can cause a failure at COMMIT time for which no
> previous error was reported.
>
> alvherre=# create table t (a int unique deferrable initially deferred);
> CREATE TABLE
> alvherre=# insert into t values (1);
> INSERT 0 1
> alvherre=# begin;
> BEGIN
> alvherre=*# insert into t values (1);
> INSERT 0 1
> alvherre=*# commit;
> ERROR:  duplicate key value violates unique constraint "t_a_key"
> DETALLE:  Key (a)=(1) already exists.
>
> I'm not sure if you can cause this to explode with just a Sync message,
> though.
>
> --
> Álvaro Herrera  Valdivia, Chile  —
> https://www.EnterpriseDB.com/
>
>
>
>
>

Re: Extend compatibility of PostgreSQL::Test::Cluster

2022-01-18 Thread Andrew Dunstan


On 12/31/21 11:22, Andrew Dunstan wrote:
> On 12/31/21 11:20, Dagfinn Ilmari Mannsåker wrote:
>> Andrew Dunstan  writes:
>>
>>> +   my $subclass = __PACKAGE__ . "::V_$maj";
>>> +   bless $node, $subclass;
>>> +   unless ($node->isa(__PACKAGE__))
>>> +   {
>>> +   # It's not a subclass, so re-bless back into the main 
>>> package
>>> +   bless($node, __PACKAGE__);
>>> +   carp "PostgreSQL::Test::Cluster isn't fully compatible 
>>> with version $ver";
>>> +   }
>> The ->isa() method works on package names as well as blessed objects, so
>> the back-and-forth blessing can be avoided.
>>
>>  my $subclass = __PACKAGE__ . "::V_$maj";
>>  if ($subclass->isa(__PACKAGE__))
>>  {
>>  bless($node, $subclass);
>>  }
>>  else
>>  {
>>  carp "PostgreSQL::Test::Cluster isn't fully compatible with 
>> version $ver";
>>  }
>>
> OK, thanks, will fix in next version.
>
>

Here's a version that does that and removes some recent bitrot.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 7af0f8db13..a5e3d2f159 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -111,6 +111,10 @@ use Scalar::Util qw(blessed);
 our ($use_tcp, $test_localhost, $test_pghost, $last_host_assigned,
 	$last_port_assigned, @all_nodes, $died);
 
+# the minimum version we believe to be compatible with this package without
+# subclassing.
+our $min_compat = 12;
+
 INIT
 {
 
@@ -1018,7 +1022,7 @@ sub enable_streaming
 
 	print "### Enabling streaming replication for node \"$name\"\n";
 	$self->append_conf(
-		'postgresql.conf', qq(
+		$self->_recovery_file, qq(
 primary_conninfo='$root_connstr'
 ));
 	$self->set_standby_mode();
@@ -1047,7 +1051,7 @@ sub enable_restoring
 	  : qq{cp "$path/%f" "%p"};
 
 	$self->append_conf(
-		'postgresql.conf', qq(
+		$self->_recovery_file, qq(
 restore_command = '$copy_command'
 ));
 	if ($standby)
@@ -1061,6 +1065,8 @@ restore_command = '$copy_command'
 	return;
 }
 
+sub _recovery_file { return "postgresql.conf"; }
+
 =pod
 
 =item $node->set_recovery_mode()
@@ -1246,15 +1252,29 @@ sub new
 
 	$node->dump_info;
 
-	# Add node to list of nodes
-	push(@all_nodes, $node);
-
 	$node->_set_pg_version;
 
-	my $v = $node->{_pg_version};
+	my $ver = $node->{_pg_version};
 
-	carp("PostgreSQL::Test::Cluster isn't fully compatible with version " . $v)
-	  if $v < 12;
+	# Use a subclass as defined below (or elsewhere) if this version
+	# isn't fully compatible. Warn if the version is too old and thus we don't
+	# have a subclass of this class.
+	if (ref $ver && $ver < $min_compat)
+{
+		my $maj  = $ver->major(separator => '_');
+		my $subclass = $class . "::V_$maj";
+		if ($subclass->isa($class))
+		{
+			bless $node, $subclass;
+		}
+		else
+		{
+			carp "PostgreSQL::Test::Cluster isn't fully compatible with version $ver";
+		}
+}
+
+	# Add node to list of nodes
+	push(@all_nodes, $node);
 
 	return $node;
 }
@@ -2546,8 +2566,12 @@ sub wait_for_catchup
 	  . "_lsn to pass "
 	  . $target_lsn . " on "
 	  . $self->name . "\n";
+	# old versions of walreceiver just set the application name to
+	# `walreceiver'
 	my $query =
-	  qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
+	  qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'
+ FROM pg_catalog.pg_stat_replication
+ WHERE application_name in ('$standby_name', 'walreceiver');];
 	$self->poll_query_until('postgres', $query)
 	  or croak "timed out waiting for catchup";
 	print "done\n";
@@ -2807,4 +2831,41 @@ sub pg_recvlogical_upto
 
 =cut
 
+##
+
+package PostgreSQL::Test::Cluster::V_11; ## no critic (ProhibitMultiplePackages)
+
+use parent -norequire, qw(PostgreSQL::Test::Cluster);
+
+# https://www.postgresql.org/docs/11/release-11.html
+
+# max_wal_senders + superuser_reserved_connections must be < max_connections
+# uses recovery.conf
+
+sub _recovery_file { return "recovery.conf"; }
+
+sub set_standby_mode
+{
+my $self = shift;
+$self->append_conf("recovery.conf", "standby_mode = on\n");
+}
+
+sub init
+{
+my ($self, %params) = @_;
+$self->SUPER::init(%params);
+$self->adjust_conf('postgresql.conf', 'max_wal_senders',
+  $params{allows_streaming} ? 5 : 0);
+}
+
+##
+
+package PostgreSQL::Test::Cluster::V_10; ## no critic (ProhibitMultiplePackages)
+
+use parent -norequire, qw(PostgreSQL::Test::Cluster::V_11);
+
+# https://www.postgresql.org/docs/10/release-10.html
+
+
+
 1;

tab-complete COMPLETE_WITH_ATTR can become confused by table-lists.

2022-01-18 Thread Peter Smith

Hi.

I stumbled onto a small quirk/bug in the tab-complete code.

There are some places that suggest tab completions using the current
table columns. These are coded like:
COMPLETE_WITH_ATTR(prev2_wd, "");

The assumption is that the prev2_wd represents the table to select from.

Normally, this works fine. However, there are also cases where a
table-list can be specified (not just a single table) and in this
scenario, the 'prev2_wd' can sometimes become confused about what is
table name to use.

e.g.

If there are spaces in the table-list like "t1, t2" then the word is
recognized as "t2" and it works as expected.

But, if there are no spaces in the table-list like "t1,t2" then the
word is recognized as "t1,t2", and since that is no such table name
the COMPLETE_WITH_ATTR does nothing.

~~

Examples (press  after the "(")

// setup

test=# create table t1(a int, b int, c int);
test=# create table t2(d int, e int, f int);

// test single table --> OK

test=# analyze t1 (
a b c
test=# analyze t2 (
d e f

// test table-list with spaces --> OK

test=# analyze t1, t2 (
d e f
test=# analyze t2, t1 (
a b c

// test table-list without spaces --> does not work

test=# analyze t2,t1 (

~~

I found that this is easily fixed just by adding a comma to the
WORD_BREAKS. Then words all get tokenized properly and so 'prev2_wd'
is what you'd like it to be.

 /* word break characters */
-#define WORD_BREAKS "\t\n@$><=;|&{() "
+#define WORD_BREAKS "\t\n,@$><=;|&{() "

OTOH, this seemed a pretty fundamental change to the 12-year-old (!!)
code so I don't know if it may be too risky and/or could adversely
affect something else?

The tests are all still passing, but there aren't so many tab-complete
tests anyway so that might not mean much.

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-18 Thread Tom Lane

I wrote:
> Are we sure it's an issue within Python, rather than something we
> could dodge by invoking sysconfig differently?  It's hard to believe
> that sysconfig could be totally unfit for the purpose of finding out
> the include path and would remain so for multiple years.

I dug up a Debian 9 image and found that I could reproduce the problem
against its python2 (2.7.13) installation, but not its python3 (3.5.3):

$ python2 -m sysconfig | grep include
include = "/usr/local/include/python2.7"
platinclude = "/usr/local/include/python2.7"
...
$ python3 -m sysconfig | grep include
include = "/usr/include/python3.5m"
platinclude = "/usr/include/python3.5m"
...

Looking at the buildfarm animals that failed this way, 10 out of 11
are using python 2.x.  The lone exception is Andrew's prion.  I wonder
if there is something unusual about its python3 installation.

Anyway, based on these results, we might have better luck switching to
sysconfig after we start forcing python3.  I'm tempted to resurrect the
idea of changing configure's probe order to "python3 python python2"
in the meantime, just so we can see how much of the buildfarm is ready
for that.

regards, tom lane

Re: In-placre persistance change of a relation

2022-01-18 Thread Kyotaro Horiguchi

At Tue, 18 Jan 2022 10:37:53 -0500, Tom Lane  wrote in 
> Julien Rouhaud  writes:
> > The cfbot is failing on all OS with this version of the patch.  Apparently
> > v16-0002 introduces some usage of "testtablespace" client-side variable 
> > that's
> > never defined, e.g.
> 
> That test infrastructure got rearranged very recently, see d6d317dbf.

Thanks to both. It seems that even though I know about the change, I
forgot to make my repo up to date before checking.

The v17 attached changes only the following point (as well as
corresponding "expected" file).

-+CREATE TABLESPACE regress_tablespace LOCATION :'testtablespace';
++CREATE TABLESPACE regress_tablespace LOCATION '';

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From c227842521de00d5da9dffb2f2dd86e8c1c171a8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v17 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  52 ++
 src/backend/access/transam/README |   8 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlog.c |  17 +
 src/backend/catalog/storage.c | 545 +-
 src/backend/commands/tablecmds.c  | 266 +++--
 src/backend/replication/basebackup.c  |   3 +-
 src/backend/storage/buffer/bufmgr.c   |  88 +++
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 344 +++
 src/backend/storage/smgr/md.c |  94 ++-
 src/backend/storage/smgr/smgr.c   |  32 +
 src/backend/storage/sync/sync.c   |  20 +-
 src/bin/pg_rewind/parsexlog.c |  24 +
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  42 +-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/bufmgr.h  |   2 +
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |  10 +-
 src/include/storage/smgr.h|  17 +
 src/test/recovery/t/027_persistence_change.pl | 263 +
 24 files changed, 1724 insertions(+), 182 deletions(-)
 create mode 100644 src/test/recovery/t/027_persistence_change.pl

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 7547813254..2c674e5de0 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rnode.dbNode,
+		   xlrec->rnode.spcNode,
+		   xlrec->rnode.relNode,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action;
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+			default:
+action = "";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +98,15 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";
 			break;
+		case XLOG_SMGR_UNLINK:
+			id = "UNLINK";
+			break;
+		case XLOG_SMGR_MARK:
+			id = "MARK";
+			break;
+		case XLOG_SMGR_BUFPERSISTENCE:
+			id = "BUFPERSISTENCE";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 1edc8180c1..b344bbe511 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -724,6 +724,14 @@ we must panic and abort recovery.  The DBA will have to manually clean up
 then restart recove

Re: slowest tap tests - split or accelerate?

2022-01-18 Thread Andres Freund

Hi,

On 2022-01-18 13:40:40 -0800, Andres Freund wrote:
> Maybe we really should do at least the most simplistic caching for initdbs, by
> doing one initdb as part of the creation of temp_install. Then Cluster::init
> would need logic to only use that if $params{extra} is empty.

I hacked this together. And the wins are bigger than I thought. On my
workstation, with plenty cpu and storage bandwidth, according to
  /usr/bin/time check-world NO_TEMP_INSTALL=1
things go from

  321.56user 74.00system 2:26.22elapsed 270%CPU (0avgtext+0avgdata 
93768maxresident)k
  24inputs+32781336outputs (2254major+8717121minor)pagefaults 0swaps

to

  86.62user 57.10system 1:57.83elapsed 121%CPU (0avgtext+0avgdata 
93752maxresident)k
  8inputs+32683408outputs (1360major+6672618minor)pagefaults 0swaps

The difference in elapsed and system time is pretty good, but the user time
difference is quite staggering.


This doesn't yet actually address the case of the basebackup tests, because
that specifies a "non-default" option, preventing the use of the template
initdb. But the effects are already big enough that I thought it's worth
sharing.

On CI for windows this reduces the time for the subscription tests from 03:24,
to 2:39. There's some run-to-run variation, but it's a pretty clear signal...

Greetings,

Andres Freund
>From a0818b6c8d03f152baa5df231b27aa7b8a7fde45 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 18 Jan 2022 15:25:27 -0800
Subject: [PATCH v2] hack: use "template" initdb in tap tests.

---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 31 +-
 src/test/regress/pg_regress.c| 45 ++--
 src/Makefile.global.in   | 52 +---
 src/tools/msvc/Install.pm|  4 ++
 src/tools/msvc/vcregress.pl  |  1 +
 5 files changed, 94 insertions(+), 39 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 7af0f8db139..7e235c90d8c 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -430,8 +430,35 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
-		@{ $params{extra} });
+	if (defined $params{extra} or !defined $ENV{INITDB_TEMPLATE})
+	{
+		diag("*not* using initdb template");
+		PostgreSQL::Test::Utils::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
+			@{ $params{extra} });
+	}
+	else
+	{
+		my $old_umask;
+
+		diag("using initdb template");
+
+		if (!$PostgreSQL::Test::Utils::windows_os)
+		{
+			$old_umask = umask;
+			umask 0077;
+		}
+
+		PostgreSQL::Test::RecursiveCopy::copypath(
+			$ENV{INITDB_TEMPLATE},
+			$pgdata,
+		  );
+
+		if (!$PostgreSQL::Test::Utils::windows_os)
+		{
+			umask $old_umask;
+		}
+	}
+
 	PostgreSQL::Test::Utils::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata,
 		@{ $params{auth_extra} });
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index e6f71c7582e..07f1beae6c6 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2232,6 +2232,7 @@ regression_main(int argc, char *argv[],
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
+		const char* initdb_template_dir;
 
 		/*
 		 * Prepare the temp instance
@@ -2258,20 +2259,38 @@ regression_main(int argc, char *argv[],
 		if (!directory_exists(buf))
 			make_directory(buf);
 
-		/* initdb */
-		header(_("initializing database system"));
-		snprintf(buf, sizeof(buf),
- "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
- bindir ? bindir : "",
- bindir ? "/" : "",
- temp_instance,
- debug ? " --debug" : "",
- nolocale ? " --no-locale" : "",
- outputdir);
-		if (system(buf))
+		/* create data directory, either from a template, or by running initdb */
+		initdb_template_dir = getenv("INITDB_TEMPLATE");
+		if (initdb_template_dir == NULL)
 		{
-			fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
-			exit(2);
+			header(_("initializing database system"));
+			snprintf(buf, sizeof(buf),
+	 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
+	 bindir ? bindir : "",
+	 bindir ? "/" : "",
+	 temp_instance,
+	 debug ? " --debug" : "",
+	 nolocale ? " --no-locale" : "",
+	 outputdir);
+			if (system(buf))
+			{
+fprintf(stderr, _("\n%s: initdb failed\nExamine %s/log/initdb.log for the reason.\nCommand was: %s\n"), progname, outputdir, buf);
+exit(2);
+			}
+		}
+		else
+		{
+			/* FIXME: this only works on windows when there's a GNU cp in PATH */
+			header(_("initializing database system from template"));
+			snprintf(buf, sizeof(buf),
+	 "cp -a \"%s\" \"%s/data\"",
+	 initdb_template_dir,
+	 temp_instance);
+			if (system(bu

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread Andres Freund

Hi,

On 2022-01-18 18:31:42 -0500, James Coleman wrote:
> One other question on this: if we went with this would you expect a
> new function to parallel pg_last_committed_xact()?

I don't think I have an opinion the user interface aspect.


> Or allow the xid and lsn in the return of pg_last_committed_xact()
> potentially not to match (of course xid might also not be present if
> track_commit_timestamps isn't on)? Or would you expect the current xid and
> timestamp use the new infrastructure also?

When you say "current xid", what do you mean?

I think it might make sense to use the new approach for all of these.


Greetings,

Andres Freund

Re: [Ext:] Re: Stream Replication not working

2022-01-18 Thread Allie Crawford

I was able to figure out what the problem was that was preventing my PostgreSQL 
Stream replication from replicating the changes even though all the health 
checks queries were showing that the replication had no problems.
I am a newbie and I am learning PostgreSQL on my own, so it took me a while to 
figure this out, so I am sharing this with the community in case somebody else 
in the future runs into the same problem.

  1.  In reading the postgresql documentation I found a view called 
pg_stat_activity that shows the session activity of the database showing the 
status and the wait events associate with the session
  2.  In checking the session activity in the pg_stat_activity view, I was able 
to identify the following:

-[ RECORD 2 ]+

datid|

datname  |

pid  | 17333

leader_pid   |

usesysid |

usename  |

application_name |

client_addr  |

client_hostname  |

client_port  |

backend_start| 2022-01-06 17:29:51.503073-07

xact_start   |

query_start  |

state_change |

wait_event_type  | IPC

wait_event   | RecoveryPause

state|

backend_xid  |

backend_xmin |

query|

backend_type | startup


  1.  So I started to research the wait event “RecoveryPause” and I found a 
link to the postgresql documentation that explains all the recovery_target 
setttings https://www.postgresql.org/docs/9.5/recovery-target-settings.html

  2.  So I dediced to review all the recovery settings my cluster had in the 
postgresql.conf file and I found that I had the parameter recovery_target_time 
configured as follows recovery_target_time='2021-04-20 21:00:00 MST', and that 
is when I realized that this configuration was preventing me for applying the 
latest changes to the standby site, because this parameter basically sets the 
time up to which the recovery will proceed. This is the reason why all the 
health check queries where not showing any problems, because there was no 
problem at all, I just had a parameter misconfigured that was stopping the 
standby from applying any WAL files because the recovery target was set up to 
April 2021.
  3.  Once I figured this out, I disabled the recovery_target_time parameter on 
the standby site and bounce the postgresql cluster. As soon as the standby 
cluster was up and running again all the changes I did on January 6th (and that 
had been pending from being applied) were immediately applied and now the 
standby site is completely in sync with the master, and applying WAL files  as 
they are being shipped to the standby site.

Thank you to all of you that sent me suggestions, even though the suggestions 
did not resolve the problem they gave me ideas on which direction I needed to 
go to continue troubleshooting the problem.

Regarding the entries on the pg_locks view (see details below in this email 
thread), they were showing normal activity but not an actual problem, so it did 
not point me to the problem but allowed me to learn about the view that shows 
active locks in the database. Thank you again for sharing this info with me.

postgresql=# select datname from pg_database where oid = 16384;

  datname



 postgresql

(1 row)



postgresql=# select relname from pg_class where oid = 12141;

 relname

--

 pg_locks

(1 row)



postgresql=#

Have a great week everyone.
Regards,
Allie

From: Kyotaro Horiguchi 
Date: Tuesday, January 11, 2022 at 6:18 PM
To: Allie Crawford 
Cc: sushant.postg...@gmail.com , 
amit.kapil...@gmail.com , satyanarlapu...@gmail.com 
, pgsql-hackers@lists.postgresql.org 

Subject: Re: [Ext:] Re: Stream Replication not working
Hi.

At Tue, 11 Jan 2022 15:05:55 +, Allie Crawford 
 wrote in
> er it. How do I figure out which database and relation is db:16384
> and relation:12141.?

On any database,

select datname from pg_database where oid = 16384;

Then on the shown database,

select relname from pg_class where oid = 12141;

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From: Allie Crawford 
Date: Tuesday, January 11, 2022 at 8:05 AM
To: Sushant Postgres , Amit Kapila 

Cc: SATYANARAYANA NARLAPURAM , 
pgsql-hackers@lists.postgresql.org 
Subject: Re: [Ext:] Re: Stream Replication not working
Amit,
Thank you for your help in trying to understand the information that the 
pg_locks table is showing. Regarding your question, I am not sure who to answer 
it. How do I figure out which database and relation is db:16384
and relation:12141.?

Thanks,
Allie

From: Amit Kapila 
Date: Tuesday, January 11, 2022 at 6:28 AM
To: Allie Crawford 
Cc: SATYANARAYANA NARLAPURAM , 
pgsql-hackers@lists.postgresql.org 
Subject: Re: [Ext:] Re: Stream Replication not working
It seems both master and standby have an exclusive lock on db:16384
and relation:12141. Which is this database/relation and why is the
app/database holding a lock on it?


--
With Regards,
Amit Kapila.

From: Allie Cr

Re: tab-complete COMPLETE_WITH_ATTR can become confused by table-lists.

2022-01-18 Thread Tom Lane

Peter Smith  writes:
> If there are spaces in the table-list like "t1, t2" then the word is
> recognized as "t2" and it works as expected.
> But, if there are no spaces in the table-list like "t1,t2" then the
> word is recognized as "t1,t2", and since that is no such table name
> the COMPLETE_WITH_ATTR does nothing.

Hmm, another variant is

=# create table foobar(z int);
CREATE TABLE
=# analyze foo  --> completes "foobar"
=# analyze foobar,foo   --> nothing

> I found that this is easily fixed just by adding a comma to the
> WORD_BREAKS. Then words all get tokenized properly and so 'prev2_wd'
> is what you'd like it to be.
>  /* word break characters */
> -#define WORD_BREAKS "\t\n@$><=;|&{() "
> +#define WORD_BREAKS "\t\n,@$><=;|&{() "

Nice catch.  Now that I look at it, that WORD_BREAKS list seems quite
random -- for example, why "{" but not "}", and why neither of "["
or "]"?  If we have "><=", why not "+-*/", to say nothing of other
operator characters?

I can see reasons for not listing ' " or :, because those are handled
elsewhere.  But I'm not sure about the other omissions.

Experimenting a bit, I see that

=# create table "amp&sand" (f int);
CREATE TABLE
=# insert into "amp --> completes "amp&sand"
=# insert into "amp&--> nothing

So populating WORD_BREAKS more fully would tend to break completion
of names using the added characters.  But probably the answer for
that is to have less ad-hoc handling of quoted names.  (See also
my screed at [1].)  Anyway, optimizing for weird quoted names is
probably not what we want to do here.

I feel like we should populate WORD_BREAKS more fully and document
the reasons for omissions, rather than leave future hackers to
guess about it.

> OTOH, this seemed a pretty fundamental change to the 12-year-old (!!)
> code so I don't know if it may be too risky and/or could adversely
> affect something else?

It's a bit scary, and I wouldn't consider back-patching it, but
TBH tab-complete.c is all chewing gum and baling wire anyway.
What I'd *really* like to do is nuke the whole thing and build
something that's mechanically derived from the actual backend
grammar.  But I don't know how to make that happen.  In the
meantime, incrementally moving it towards the real SQL parsing
rules seems like it ought to be an improvement.

> The tests are all still passing, but there aren't so many tab-complete
> tests anyway so that might not mean much.

Yeah, we certainly haven't got coverage for these sorts of details.

regards, tom lane

[1] https://www.postgresql.org/message-id/3547066.1642272686%40sss.pgh.pa.us

Re: Adding CI to our tree

2022-01-18 Thread Andres Freund

Hi,

On 2022-01-18 15:08:47 -0600, Justin Pryzby wrote:
> On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > I think it might still be worth adding stopgap way of running all tap tests 
> > on
> > windows though. Having a vcregress.pl function to find all directories with 
> > t/
> > and run the tests there, shouldn't be a lot of code...
> 
> I started doing that, however it makes CI/windows even slower.

To be expected... Perhaps the caching approach I just posted in [1] would buy
most of it back though...


> I think it'll be necessary to run prove with all the tap tests to
> parallelize them, rather than looping around directories, many of which have
> only a single file, and are run serially.

That's unfortunately not trivially possible. Quite a few tests currently rely
on being called in a specific directory. We should fix this, but it's not a
trivial amount of work.

Greetings,

Andres Freund

[1] https://postgr.es/m/20220119010034.javla5sakeh2a4fa%40alap3.anarazel.de

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread James Coleman

On Tue, Jan 18, 2022 at 8:05 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-01-18 18:31:42 -0500, James Coleman wrote:
> > One other question on this: if we went with this would you expect a
> > new function to parallel pg_last_committed_xact()?
>
> I don't think I have an opinion the user interface aspect.
>
>
> > Or allow the xid and lsn in the return of pg_last_committed_xact()
> > potentially not to match (of course xid might also not be present if
> > track_commit_timestamps isn't on)? Or would you expect the current xid and
> > timestamp use the new infrastructure also?
>
> When you say "current xid", what do you mean?

I mean the existing commitTsShared->xidLastCommit field which is
returned by pg_last_committed_xact().

> I think it might make sense to use the new approach for all of these.

I think that would mean we could potentially remove commitTsShared,
but before doing so I'd like to know if that'd break existing
consumers.

Alvaro: You'd mentioned a use case in pglogical; if we moved the
xidLastCommit (and possibly even the cached last timestamp) out of
commit_ts.c (meaning it'd also no longer be under the commit ts lock)
would that be a problem for the current use (whether in lock safety or
in performance)?

Thanks,
James Coleman

Re: docs: pg_replication_origin_oid() description does not match behaviour

2022-01-18 Thread Michael Paquier

On Tue, Jan 18, 2022 at 06:20:22PM +, Bossart, Nathan wrote:
> +1

And done.
--
Michael


signature.asc
Description: PGP signature

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-18 Thread Andres Freund

On 2022-01-18 17:12:00 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > Unfortunately, I can't reproduce this locally, even with COPT=-Wall
> > -Werror -fno-omit-frame-pointer -fsanitize-trap=alignment
> > -Wno-deprecated-declarations -DWRITE_READ_PARSE_PLAN_TREES
> > -DSTRESS_SORT_INT_MIN -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS.
>
> Now that I re-read what you did, I believe you need both of
>
> -fsanitize=alignment -fsanitize-trap=alignment
>
> to enable those traps to happen.  That seems to be the case with
> Apple's clang, anyway.

FWIW, I can reproduce it on linux, but only if I -fno-sanitize-recover instead
of -fsanitize-trap=alignment. That then also produces a nicer explanation of
the problem:

/home/andres/src/postgresql/src/backend/replication/basebackup.c:1552:10: 
runtime error: member access within misaligned address 0x02b9ce09 for type 
'PageHeaderData' (aka 'struct PageHeaderData'), which requires 4 byte alignment
0x02b9ce09: note: pointer points here
 00 00 00  64 00 00 00 00 c8 ad 0c  01 c5 1b 00 00 48 00 f0  1f f0 1f 04 20 00 
00 00  00 62 31 05 00
  ^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
/home/andres/src/postgresql/src/backend/replication/basebackup.c:1552:10 in
2022-01-18 17:36:17.746 PST [1448756] LOG:  server process (PID 1448774) exited 
with exit code 1
2022-01-18 17:36:17.746 PST [1448756] DETAIL:  Failed process was running: 
BASE_BACKUP ( LABEL 'pg_basebackup base backup',  PROGRESS,  CHECKPOINT 'fast', 
 MANIFEST 'yes',  TARGET 'client')

The problem originates in bbsink_copystream_begin_backup()...

Greetings,

Andres Freund

Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back

2022-01-18 Thread Robert Haas

On Tue, Jan 18, 2022 at 5:12 PM Tom Lane  wrote:
> Now that I re-read what you did, I believe you need both of
>
> -fsanitize=alignment -fsanitize-trap=alignment
>
> to enable those traps to happen.  That seems to be the case with
> Apple's clang, anyway.

Ah, I guess I copied and pasted the options wrong, or something.
Anyway, I have an idea how to fix this. I didn't realize that we were
going to read from the bbsink's buffer like this, and it's not
properly aligned for that. I'll jigger things around to fix that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add last commit LSN to pg_last_committed_xact()

2022-01-18 Thread James Coleman

On Tue, Jan 18, 2022 at 4:32 PM Andres Freund  wrote:
>
> I wonder if a very different approach could make sense here. Presumably this
> wouldn't need to be queried at a very high frequency, right? If so, what about
> storing the latest commit LSN for each backend in PGPROC? That could be
> maintained without a lock/atomics, and should be just about free.
> pg_last_committed_xact() then would have to iterate over all PGPROCs to
> complete the LSN, but that's not too bad for an operation like that. We'd also
> need to maintain a value for all disconnected backends, but that's also not a 
> hot
> path.

Is something roughly like the attached what you'd envisioned? I
wouldn't expect the final implementation to be in commit_ts.c, but I
left it there for expediency's sake in demonstrating the idea since
pg_last_committed_xact() currently finds its home there.

I think we need a shared ProcArrayLock to read the array, correct? We
also need to do the global updating under lock, but given it's when a
proc is removed, that shouldn't be a performance issue if I'm
following what you are saying.

Thanks,
James Coleman
From 7e8a4810f5b4cd16ca2a6f76585513711dbf529a Mon Sep 17 00:00:00 2001
From: jcoleman 
Date: Fri, 14 Jan 2022 22:26:10 +
Subject: [PATCH v2] Expose LSN of last commit via pg_last_committed_xact

---
 src/backend/access/transam/commit_ts.c | 34 --
 src/backend/access/transam/twophase.c  |  2 +-
 src/backend/access/transam/xact.c  | 10 +---
 src/backend/storage/ipc/procarray.c|  4 +++
 src/include/access/commit_ts.h |  5 ++--
 src/include/access/transam.h   |  2 ++
 src/include/catalog/pg_proc.dat|  6 ++---
 src/include/storage/proc.h |  5 
 8 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 659109f8d4..50a7be82bc 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -34,9 +34,11 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "storage/shmem.h"
+#include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
+#include "utils/pg_lsn.h"
 
 /*
  * Defines for CommitTs page sizes.  A page is the same BLCKSZ as is used
@@ -93,6 +95,7 @@ static SlruCtlData CommitTsCtlData;
 typedef struct CommitTimestampShared
 {
 	TransactionId xidLastCommit;
+	XLogRecPtr lsnLastCommit;
 	CommitTimestampEntry dataLastCommit;
 	bool		commitTsActive;
 } CommitTimestampShared;
@@ -135,7 +138,7 @@ static void WriteTruncateXlogRec(int pageno, TransactionId oldestXid);
 void
 TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
 			   TransactionId *subxids, TimestampTz timestamp,
-			   RepOriginId nodeid)
+			   XLogRecPtr commitLsn, RepOriginId nodeid)
 {
 	int			i;
 	TransactionId headxid;
@@ -414,10 +417,22 @@ pg_last_committed_xact(PG_FUNCTION_ARGS)
 	TransactionId xid;
 	RepOriginId nodeid;
 	TimestampTz ts;
-	Datum		values[3];
-	bool		nulls[3];
+	XLogRecPtr lsn;
+	Datum		values[4];
+	bool		nulls[4];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
+	int			index;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	lsn = ShmemVariableCache->finishedProcsLastCommitLSN;
+	for (index = 0; index < ProcGlobal->allProcCount; index++)
+	{
+		XLogRecPtr procLSN = ProcGlobal->allProcs[index].lastCommitLSN;
+		if (procLSN > lsn)
+			lsn = procLSN;
+	}
+	LWLockRelease(ProcArrayLock);
 
 	/* and construct a tuple with our data */
 	xid = GetLatestCommitTsData(&ts, &nodeid);
@@ -426,12 +441,14 @@ pg_last_committed_xact(PG_FUNCTION_ARGS)
 	 * Construct a tuple descriptor for the result row.  This must match this
 	 * function's pg_proc entry!
 	 */
-	tupdesc = CreateTemplateTupleDesc(3);
+	tupdesc = CreateTemplateTupleDesc(4);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid",
 	   XIDOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "timestamp",
 	   TIMESTAMPTZOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "roident",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "lsn",
+	   PG_LSNOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "roident",
 	   OIDOID, -1, 0);
 	tupdesc = BlessTupleDesc(tupdesc);
 
@@ -447,8 +464,11 @@ pg_last_committed_xact(PG_FUNCTION_ARGS)
 		values[1] = TimestampTzGetDatum(ts);
 		nulls[1] = false;
 
-		values[2] = ObjectIdGetDatum((Oid) nodeid);
-		nulls[2] = false;
+		values[2] = LSNGetDatum(lsn);
+		nulls[2] = lsn == InvalidXLogRecPtr;
+
+		values[3] = ObjectIdGetDatum((Oid) nodeid);
+		nulls[3] = false;
 	}
 
 	htup = heap_form_tuple(tupdesc, values, nulls);
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 271a3146db..cd33849aea 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2300,7 +2300,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 
 	TransactionTreeSetCommitTsData(xid

RE: row filtering for logical replication

2022-01-18 Thread houzj.f...@fujitsu.com

On Tues, Jan 18, 2022 9:27 AM Peter Smith  wrote:
> Here are some review comments for v66-0001 (review of updates since
> v65-0001)

Thanks for the comments!

> ~~~
> 
> 1. src/backend/catalog/pg_publication.c - GetTopMostAncestorInPublication
> 
> @@ -276,17 +276,45 @@ GetPubPartitionOptionRelations(List *result,
> PublicationPartOpt pub_partopt,  }
> 
>  /*
> + * Returns the relid of the topmost ancestor that is published via this
> + * publication if any, otherwise return InvalidOid.
> + */
> 
> Suggestion:
> "otherwise return InvalidOid." --> "otherwise returns InvalidOid."
> 

Changed.

> 
> 2. src/backend/commands/publicationcmds.c -
> contain_invalid_rfcolumn_walker
> 
> @@ -235,6 +254,337 @@ CheckObjSchemaNotAlreadyInPublication(List
> *rels, List *schemaidlist,
>  }
> 
>  /*
> + * Returns true, if any of the columns used in the row filter WHERE
> + clause are
> + * not part of REPLICA IDENTITY, false, otherwise.
> 
> Suggestion:
> ", false, otherwise" --> ", otherwise returns false."
> 

Changed.

> 
> 3. src/backend/replication/logical/tablesync.c - fetch_remote_table_info
> 
> + * We do need to copy the row even if it matches one of the
> + publications,
> + * so, we later combine all the quals with OR.
> 
> Suggestion:
> 
> BEFORE
> * We do need to copy the row even if it matches one of the publications,
> * so, we later combine all the quals with OR.
> AFTER
> * We need to copy the row even if it matches just one of the publications,
> * so, we later combine all the quals with OR.
> 

Changed.

> 
> 4. src/backend/replication/pgoutput/pgoutput.c -
> pgoutput_row_filter_exec_expr
> 
> + ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
> +
> + elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
> + DatumGetBool(ret) ? "true" : "false",
> + isnull ? "false" : "true");
> +
> + if (isnull)
> + return false;
> +
> + return DatumGetBool(ret);
> 
> That change to the logging looks incorrect - the "(isnull: %s)" value is
> backwards now.
> 
> I guess maybe the intent was to change it something like below:
> 
> elog(DEBUG3, "row filter evaluates to %s (isnull: %s)", isnull ? "false" :
> DatumGetBool(ret) ? "true" : "false", isnull ? "true" : "false");

I misread the previous comments.
I think the original log is correct and I have reverted this change.

Best regards,
Hou zj

1 2 >

1 - 100 of 122 matches

Mail list logo