date:20200630

Possible missing segments in archiving on standby

2020-06-30 Thread Kyotaro Horiguchi

Hello.

While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.

A. Walreceiver is gracefully terminated just after a segment is
   finished.

B. Walreceiver is gracefully terminated while receiving filling chunks
  for a segment switch.

The two above are reprodusible (without distinction between the two)
using a helper patch. See below.

There's one more issue here.

C. Standby doesn't archive a segment until walreceiver receives any
  data for the next segment.

I'm not sure wehther we assume C as an issue.

The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/300).  That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.

The second attached patch fixes all of A to C, but seems somewhat
redundant.

Any opnions and/or suggestions are welcome.


The attached files are:

1. v1-0001-Make-sure-standby-archives-all-segments.patch:
  Fix for A and B.

2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
  Fix for A, B and C.

3. repro.sh
  The reproducer shell script used below.

4. repro_helper.patch
  Helper patch for repro.sh for master and patch 1 above.

5. repro_helper2.patch
  Helper patch for repro.sh for patch 2 above.

=
** REPRODUCER

The failure is reproducible with some code tweak.

1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge

4. psql -c "create table t(); drop table t; select pg_switch_wal(); select 
pg_sleep(1); create table t(); drop table t; select pg_switch_wal();"

5. look into the archive directory of the standby.
   If no missing segments found in archive, repeat from 3.

The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.

$ mkdir testdir
$ cd testdir
$ bash /repro.sh

After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
00010003
00010004
00010005
00010006
00010007
00010008
# standby archive
00010003
00010005
00010006
00010008

The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From afa907bca7db8ea6335d47bd02761f567591d553 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 30 Jun 2020 14:21:30 +0900
Subject: [PATCH v1] Make sure standby archives all segments

Standby fails to archive a segment if standby is stopped just after a
segment is finished or stopped just after a segment swtich. Make sure
that walreceiver archives all segments by rechecking at start.
---
 src/backend/replication/walreceiver.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..680154365d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -938,6 +938,23 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 else
 	XLogArchiveNotify(xlogfname);
 			}
+			else if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
+			{
+/*
+ * If we are starting streaming at the beginning of a segment,
+ * there may be casees where the previous segment have not been
+ * archived yet.  Make sure it is archived.
+ */
+char		xlogfname[MAXFNAMELEN];
+XLogSegNo	prevseg;
+
+XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+XLogFileName(xlogfname, ThisTimeLineID, prevseg,
+			 wal_segment_size);
+
+/* Mark as ".ready" of not yet */
+XLogArchiveCheckDone(xlogfname);
+			}
 			recvFile = -1;
 
 			/* Create/use new log file */
-- 
2.18.4

>From 7af716134dceb3bafce421dfeaffebf1e1e3e17d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 29 Jun 2020 16:12:01 +0900
Subject: [PATCH v1] Make sure standby archives all segments immediately

Standby may get a bit being late in archive, since walsender doesn't
archive a segment until it receives any data for the next segment, Fix
that by archiving just after a segment is finished.

Also, standby fails to archive a segment if standby is stopped just
after a segment is finished or stopped just after a segment
swtich. Make sure that walreceiver archives all segments by rechecking
at start.
---
 src/backend/replication/walreceiver.c | 82 ++-
 1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..831718c859 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -902,49 +902,34 @@ XLogWal

Re: min_safe_lsn column in pg_replication_slots view

2020-06-30 Thread Fujii Masao





On 2020/06/26 13:45, Amit Kapila wrote:

On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera  wrote:


On 2020-Jun-26, Michael Paquier wrote:


On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote:

I don't understand the proposal.  Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?


I have sent last week a patch about only the removal of min_safe_lsn:
https://www.postgresql.org/message-id/20200619121552.gh453...@paquier.xyz
So this applies to this part.


Well, I oppose that because it leaves us with no way to monitor slot
limits.  In his opening email, Masao-san proposed to simply change the
value by adding 1.  How you go from adding 1 to a column to removing
the column completely with no recourse, is beyond me.

Let me summarize the situation and possible ways forward as I see them.
If I'm mistaken, please correct me.

Problems:
i)  pg_replication_slot.min_safe_lsn has a weird definition in that all
 replication slots show the same value



It is also not clear how the user can make use of that value?


ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns
 the name of the previous segment.

Proposed solutions:

a) Do nothing -- keep the min_safe_lsn column as is.  Warn users that
pg_walfile_name should not be used with this column.
b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used
and return a useful value.
c) Remove min_safe_lsn; add functions that expose the same value
d) Remove min_safe_lsn; add a new view that exposes the same value and
possibly others

e) Replace min_safe_lsn with a "distance" column, which reports
restart_lsn - oldest valid LSN
(Note that you no longer have an LSN in this scenario, so you can't
call pg_walfile_name.)


I like (e).



Can we consider an option to "Remove min_safe_lsn; document how a user
can monitor the distance"?  We have a function to get current WAL
insert location and other things required are available either via
view or as guc variable values.  The reason I am thinking of this
option is that it might be better to get some more feedback on what is
the most appropriate value to display.  However, I am okay if we can
reach a consensus on one of the above options.


Yes, that's an idea. But it might not be easy to calculate that distance
manually by subtracting max_slot_wal_keep_size from the current LSN.
Because we've not supported -(pg_lsn, numeric) operator yet. I'm
proposing that operator, but it's for v14.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: POC: postgres_fdw insert batching

2020-06-30 Thread Etsuro Fujita

On Tue, Jun 30, 2020 at 2:54 PM Amit Langote  wrote:
> On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
>  wrote:
> > On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita  wrote:
> >> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
> >>  wrote:
> >> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> >> >  wrote:
> >>
> >> > > 3) What about the other DML operations (DELETE/UPDATE)?
> >> > >
> >> > > The other DML operations could probably benefit from the batching too.
> >> > > INSERT was good enough for a PoC, but having batching only for INSERT
> >> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> >> > > of quals, but likely doable.
> >> >
> >> > Bulk INSERTs are more common in a sharded environment because of data
> >> > load in say OLAP systems. Bulk update/delete are rare, although not
> >> > that rare. So if an approach just supports bulk insert and not bulk
> >> > UPDATE/DELETE that will address a large number of usecases IMO. But if
> >> > we can make everything work together that would be good as well.
> >>
> >> In most cases, I think the entire UPDATE/DELETE operations would be
> >> pushed down to the remote side by DirectModify.  So, I'm not sure we
> >> really need the bulk UPDATE/DELETE.
> > That may not be true for a partitioned table whose partitions are foreign 
> > tables. Esp. given the work that Amit Langote is doing [1]. It really 
> > depends on the ability of postgres_fdw to detect that the DML modifying 
> > each of the partitions can be pushed down. That may not come easily.
>
> While it's true that how to accommodate the DirectModify API in the
> new inherited update/delete planning approach is an open question on
> that thread, I would eventually like to find an answer to that.  That
> is, that work shouldn't result in losing the foreign partition's
> ability to use DirectModify API to optimize updates/deletes.

That would be great!

Best regards,
Etsuro Fujita

Re: Rotten parts of src/backend/replication/README

2020-06-30 Thread Peter Eisentraut


On 2020-05-04 07:50, Michael Paquier wrote:

Yeah.  The years have visibly proved that the README had updates when
it came to the general descriptions of the libpqwalreceiver interface,
but that we had better consolidate the header to give some
documentation to whoever plays with this interface.  Attached is a
patch to address all that, where I simplified the README and added
some description to all the callbacks.  Thoughts are welcome.  I'll
add that to the next CF.  Now I don't see any actual problems in
getting that on HEAD before v13 forks.  But let's gather more opinions
first.


This patch looks good to me.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Commitfest 2020-07

2020-06-30 Thread Daniel Gustafsson

The first commitfest of the v14 cycle, 2020-07 is just around the corner now
and the trend of growing the list of patches has continued, so there is a lot
to go through.

If you have a patch registered in the commitfest, make sure it still applies
and that the tests pass.  Looking at the Patch Tester there are quite a few
patches which no longer applies that need an updated version:

http://cfbot.cputube.org/

cheers ./daniel

Re: [PATCH] Remove Extra palloc Of raw_buf For Binary Format In COPY FROM

2020-06-30 Thread Bharath Rupireddy

Thanks Vignesh and Rushabh for reviewing this.

I've added this patch to commitfest - https://commitfest.postgresql.org/28/.

Request community take this patch further if there are no further issues.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

On Sat, Jun 27, 2020 at 6:30 PM vignesh C  wrote:
>
> On Sat, Jun 27, 2020 at 9:23 AM Bharath Rupireddy
>  wrote:
> >
> > Thanks Rushabh and Vignesh for the comments.
> >
> > >
> > > One comment:
> > > We could change below code:
> > > + */
> > > + if (!cstate->binary)
> > > + cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
> > > + else
> > > + cstate->raw_buf = NULL;
> > > to:
> > > cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE 
> > > + 1);
> > >
> >
> > Attached the patch with the above changes.
>
> Changes look fine to me.
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com

Re: Include access method in listTables output

2020-06-30 Thread Georgios



‐‐‐ Original Message ‐‐‐
On Saturday, June 20, 2020 3:15 PM, vignesh C  wrote:

> On Tue, Jun 16, 2020 at 6:13 PM Georgios gkokola...@protonmail.com wrote:
>
> > > Few comments:
> > >
> > > -   if (pset.sversion >= 12)
> > >
> > > -   appendPQExpBufferStr(&buf,
> > >
> > > -   "\n LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam");
> > > Should we include pset.hide_tableam check along with the version 
> > > check?
> > >
> >
> > I opted against it, since it seems more intuitive to have a single
> > switch and placed on the display part. A similar pattern can be found
> > in describeOneTableDetails(). I do not hold a strong opinion and will
> > gladly ament if insisted upon.
>
> I felt we could add that check as we might be including
> pg_catalog.pg_am in cases even though we really don't need it.

As promised, I gladly ament upon your request. Also included a fix for
a minor oversight in tests, they should now be stable. Finally in this
version, I extended a bit the logic to only include the access method column
if the relations displayed can have one, for example sequences.

>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com



0001-Include-AM-in-listTables-output-v3.patch
Description: Binary data

Re: estimation problems for DISTINCT ON with FDW

2020-06-30 Thread Etsuro Fujita

On Mon, Jun 29, 2020 at 7:02 PM Bharath Rupireddy
 wrote:
> > It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign 
> > side, which is probably the best way to run the query.  I guess it makes 
> > sense that FDW machinery in general doesn't want to try to push a 
> > PostgreSQL specific construct.
>
> I think you are right, the DISTINCT operation is not being pushed to
> remote(I may be wrong here. just for info - I looked at remote SQL
> from explain(verbose) on the query to find this out) and so is for
> estimates.

I think you are right.

> But when fdw is
> used for non-sharded configurations such as just to get existing data
> from another remote postgres server, oracle, hadoop or  some other
> remote database engines where DISTINCT operation is supported, it's
> good to push that to remote for both explains/estimates as well as in
> the actual queries itself, to reduce data transferred from remote
> database server to local postgres database server.

I think so too.  And I think we could do so using the upper-planner
pathification (ie, GetForeignUpperPaths() with UPPERREL_DISTINCT in
create_distinct_paths()).  It's on my long-term TODO list to implement
that in postgres_fdw.

Best regards,
Etsuro Fujita

Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators

2020-06-30 Thread Asif Rehman

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

The patch looks good to me.

The new status of this patch is: Ready for Committer

Re: Use of "long" in incremental sort code

2020-06-30 Thread Peter Eisentraut


On 2020-06-30 06:24, David Rowley wrote:

On Tue, 30 Jun 2020 at 16:20, Tom Lane  wrote:

There is a fairly widespread issue that memory-size-related GUCs and
suchlike variables are limited to represent sizes that fit in a "long".
Although Win64 is the *only* platform where that's an issue, maybe
it's worth doing something about.  But we shouldn't just fix the sort
code, if we do do something.

(IOW, I don't agree with doing a fix that doesn't also fix work_mem.)


I raised it mostly because this new-to-PG13-code is making the problem worse.


Yeah, we recently got rid of a bunch of inappropriate use of long, so it 
seems reasonable to make this new code follow that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: track_planning causing performance regression

2020-06-30 Thread Ants Aasma

On Tue, 30 Jun 2020 at 08:43, Fujii Masao 
wrote:

> > The problem looks to be that spinlocks are terrible with overloaded
> CPU and a contended spinlock. A process holding the spinlock might easily
> get scheduled out leading to excessive spinning by everybody. I think a
> simple thing to try would be to replace the spinlock with LWLock.
>
> Yes. Attached is the POC patch that replaces per-counter spinlock with
> LWLock.
>

Great. I think this is the one that should get considered for testing.

> > I did a prototype patch that replaces spinlocks with futexes, but was
> not able to find a workload where it mattered.
>
> I'm not familiar with futex, but could you tell me why you used futex
> instead
> of LWLock that we already have? Is futex portable?
>

Futex is a Linux kernel call that allows to build a lock that has
uncontended cases work fully in user space almost exactly like a spinlock,
while falling back to syscalls that wait for wakeup in case of contention.
It's not portable, but probably something similar could be implemented for
other operating systems. I did not pursue this further because it became
apparent that every performance critical spinlock had already been removed.

To be clear, I am not advocating for this patch to get included. I just had
the patch immediately available and it could have confirmed that using a
better lock fixes things.

-- 
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-06-30 Thread Amit Kapila

On Tue, Jun 30, 2020 at 10:13 AM Dilip Kumar  wrote:
>
> On Tue, Jun 30, 2020 at 9:20 AM Amit Kapila  wrote:
> >
> > Can't we name the last parameter as 'commit_lsn' as that is how
> > documentation in the patch spells it and it sounds more appropriate?
>
> You are right commit_lsn seems more appropriate here.
>
> > Also, is there a reason for assigning report_location and
> > write_location differently than what we do in commit_cb_wrapper?
> > Basically, assign those as txn->final_lsn and txn->end_lsn
> > respectively.
>
> Yes, I think it should be handled in same way as commit_cb_wrapper.
> Because before calling ReorderBufferStreamCommit in
> ReorderBufferCommit, we are properly updating the final_lsn as well as
> the end_lsn.
>

Okay, I have made these changes in the attached patch and there are
few more changes in
0003-Extend-the-output-plugin-API-with-stream-methods.
1. In pg_decode_stream_message, for transactional messages, we were
displaying message contents which is different from other streaming
APIs.  I have changed it so that streaming API doesn't display message
contents for transactional messages.

2.
+ /* in streaming mode, stream_change_cb is required */
+ if (ctx->callbacks.stream_change_cb == NULL)
+ ereport(ERROR,
+ (errmsg("Output plugin supports streaming, but has not registered "
+ "stream_change_cb callback.")));

The error messages seem a bit weird.  (a) doesn't include error code,
(b) not in PG style. I have changed all the error messages to fix
these two issues and change the message as well

3. Rearranged the functions stream_* so that the optional functions
are at the end and also arranged other functions in a way that looks
more logical to me.

4. Updated comments, commit message, and edited docs in the patch.

I have made a few changes in
0004-Gracefully-handle-concurrent-aborts-of-transacti as well.
1. The variable bsysscan was not being reset in case of error.  I have
introduced a new function to reset both bsysscan and CheckXidAlive
during transaction abort.  Also, snapmgr.c doesn't seem right place
for these variables, so I moved them to xact.c.  I think this will
make the initialization of CheckXidAlive during catch in
ReorderBufferProcessTXN redundant.

2. Updated comments and commit message.

Let me know what you think about the above changes.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v31-0001-Immediately-WAL-log-subtransaction-and-top-level.patch
Description: Binary data

v31-0004-Gracefully-handle-concurrent-aborts-of-transacti.patch
Description: Binary data

v31-0002-WAL-Log-invalidations-at-command-end-with-wal_le.patch
Description: Binary data

v31-0003-Extend-the-logical-decoding-output-plugin-API-wi.patch
Description: Binary data

Re: TAP tests and symlinks on Windows

2020-06-30 Thread Michael Paquier

On Mon, Jun 29, 2020 at 04:56:16PM +0900, Michael Paquier wrote:
> On Fri, Jun 26, 2020 at 02:00:37PM +0200, Peter Eisentraut wrote:
>> We should be more accurate about things like this:
>> 
>> +# The following tests test symlinks. Windows may not have symlinks, so
>> +# skip there.
>> 
>> The issue isn't whether Windows has symlinks, since all versions of Windows
>> supported by PostgreSQL do (AFAIK).  The issue is only whether the Perl
>> installation that runs the tests has symlink support.  And that is only
>> necessary if the test itself wants to create or inspect symlinks.  For
>> example, there are existing tests involving tablespaces that work just fine
>> on Windows.
> 
> Check.  Indeed that sounds confusing.

Attached is an updated patch, where I have tried to use a better
wording in all the code paths involved.

>> Relatedly, your patch ends up skipping the tests on MSYS2, even though Perl
>> supports symlinks there out of the box.
> 
> Do you think that it would be enough to use what Andrew has mentioned
> in [1]?  I don't have a MSYS2 installation, so I am unfortunately not
> able to confirm that, but I would just move the check to TestLib.pm
> and save it in an extra variable.

Added an extra $is_msys2 to track that in TestLib.pm.  One thing I am
not sure of though: Win32::Symlink fails to work properly with -l, but
is that the case with MSYS2?  If that's able to work, it would be
possible to not skip the following test but I have taken the most
careful approach for now:
+   # This symlink check is not supported on Windows.  Win32::Symlink works
+   # around this situation by using junction points (actually PostgreSQL
+   # approach on the problem), and -l is not able to detect that situation.
+  SKIP:
+   {
+   skip "symlink check not implemented on Windows", 1
+ if ($windows_os)

Thanks,
--
Michael
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 208df557b8..7f2df1d09b 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -211,11 +211,11 @@ $node->command_fails(
 	'pg_basebackup tar with long name fails');
 unlink "$pgdata/$superlongname";
 
-# The following tests test symlinks. Windows doesn't have symlinks, so
-# skip on Windows.
+# The following tests are for symlinks.
 SKIP:
 {
-	skip "symlinks not supported on Windows", 18 if ($windows_os);
+	skip "symlinks not supported with this perl installation", 18
+	  if (!$symlink_support);
 
 	# Move pg_replslot out of $pgdata and create a symlink to it.
 	$node->stop;
@@ -238,11 +238,12 @@ SKIP:
 	# for the tablespace directories, which hopefully won't run afoul of
 	# the 99 character length limit.
 	my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+	my $realTsDir   = TestLib::perl2host("$shorter_tempdir/tblspc1");
 	symlink "$tempdir", $shorter_tempdir;
 
 	mkdir "$tempdir/tblspc1";
 	$node->safe_psql('postgres',
-		"CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';");
+		"CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';");
 	$node->safe_psql('postgres',
 		"CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
 	$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup2", '-Ft' ],
@@ -292,18 +293,33 @@ SKIP:
 		],
 		'plain format with tablespaces succeeds with tablespace mapping');
 	ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated');
-	opendir(my $dh, "$pgdata/pg_tblspc") or die;
-	ok( (   grep {
--l "$tempdir/backup1/pg_tblspc/$_"
-  and readlink "$tempdir/backup1/pg_tblspc/$_" eq
-  "$tempdir/tbackup/tblspc1"
-			} readdir($dh)),
-		"tablespace symlink was updated");
-	closedir $dh;
+
+	# This symlink check is not supported on Windows.  Win32::Symlink works
+	# around this situation by using junction points (actually PostgreSQL
+	# approach on the problem), and -l is not able to detect that situation.
+  SKIP:
+	{
+		skip "symlink check not implemented on Windows", 1
+		  if ($windows_os);
+		opendir(my $dh, "$pgdata/pg_tblspc") or die;
+		ok( (   grep {
+	-l "$tempdir/backup1/pg_tblspc/$_"
+	  and readlink "$tempdir/backup1/pg_tblspc/$_" eq
+	  "$tempdir/tbackup/tblspc1"
+} readdir($dh)),
+			"tablespace symlink was updated");
+		closedir $dh;
+	}
 
 	# Group access should be enabled on all backup files
-	ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
-		"check backup dir permissions");
+  SKIP:
+	{
+		skip "unix-style permissions not supported on Windows", 1
+		  if ($windows_os);
+
+		ok(check_mode_recursive("$tempdir/backup1", 0750, 0640),
+			"check backup dir permissions");
+	}
 
 	# Unlogged relation forks other than init should not be copied
 	my ($tblspc1UnloggedBackupPath) =
diff --git a/src/bin/pg_dump/t/010_dump_connstr.pl b/src/bin/pg_dump/t/010_dump_connstr.pl
index abdb07c558..1fd8a78abc 100644
--- a/src/bin/pg_dump/t/010_dump_connstr.pl
+++ b/src/bin/pg_dump/t/010_dump_connstr.pl
@@ -5,7 +5,7 @@ use PostgresNode;
 u

Re: Missing some ifndef FRONTEND at the top of logging.c and file_utils.c

2020-06-30 Thread Michael Paquier

On Fri, Jun 26, 2020 at 09:59:30AM +0900, Michael Paquier wrote:
> Thanks.  This list is provided by OBJS_FRONTEND in
> src/common/Makefile, and pgcommonfrontendfiles in Mkvcbuild.pm.  Let's
> see if others have comments, as it just looks like something that was
> forgotten in bf5bb2e and fc9a62a when this code was moved to
> src/common/.  If there are no objections, I'll revisit that some time
> next week and fix it on HEAD.

And committed as of 324435e.
--
Michael


signature.asc
Description: PGP signature

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-06-30 Thread Daniel Gustafsson

> On 29 Jun 2020, at 08:57, Michael Paquier  wrote:
> 
> On Fri, Jun 26, 2020 at 02:26:50PM +0200, Daniel Gustafsson wrote:
>> On 26 Jun 2020, at 10:11, Michael Paquier  wrote:
>>> +   /* TODO is nreferenced a reasonable allocation of slots? */
>>> +   slot = palloc(sizeof(TupleTableSlot *) * nreferenced);
>>> It seems to me that we could just apply the same rule as for
>>> pg_attribute and pg_shdepend, no?
>> 
>> I think so, I see no reason not to.
> 
> I was actually looking a patch to potentially support REINDEX for
> partitioned table, and the CONCURRENTLY case may need this patch,
> still if a lot of dependencies are switched at once it may be a
> problem, so it is better to cap it.

Agreed.

> One thing though is that we may finish by allocating more slots than what's
> necessary if some dependencies are pinned, but using multi-inserts would be a
> gain anyway, and the patch does not insert in more slots than needed.

I was playing around with a few approaches around this when I initially wrote
this, but I was unable to find any way to calculate the correct number of slots
which wasn't significantly more expensive than the extra allocations.

> Attached is a rebased set, with more splitting work done after a new
> round of review.  0001 is more refactoring to use more
> ObjectAddress[Sub]Set() where we can, leading to some more cleanup:
> 5 files changed, 43 insertions(+), 120 deletions(-)

This patch seems quite straightforward, and in the case of the loops in for
example CreateConstraintEntry() makes the code a lot more readable.  +1 for
applying 0001.

> In this round, I got confused with the way ObjectAddress items are
> assigned, assuming that we have to use the same dependency type for a
> bunch of dependencies to attach.  Using add_exact_object_address() is
> fine for this purpose, but this also makes me wonder if we should try
> to think harder about this interface so as we would be able to insert
> a bunch of dependency tuples with multiple types of dependencies
> handled.  So this has made me remove reset_object_addresses() from the
> patch series, as it was used when dependency types were mixed up.  We
> could also discuss that separately, but that's not strongly mandatory
> here. 

Ok, once the final state of this patchset is known I can take a stab at
recording multiple dependencies with different behaviors as a separate
patchset.

> There are however cases where it is straight-forward to gather and insert
> multiple records, like in InsertExtensionTuple() (as does already
> tsearchcmds.c), which is what 0002 does.


+1 on 0002 as well.

> An opposite example is StorePartitionKey(), where there is a mix of normal and
> internal dependencies, so I have removed it for now.


I don't think it makes the code any worse off to handle the NORMAL dependencies
separate here, but MVV.

> ExecDropSingleTupleTableSlot() was also called for an incorrect number
> of slots when it came to pg_shdepend.

Hmm, don't you mean in heap.c:InsertPgAttributeTuples()?

> I was thinking if it could be possible to do more consolidation between the
> three places where we calculate the number of slots to use, but that would 
> also
> mean to have more tuple slot dependency moving around, which is not great.


If we do, we need to keep the cap consistent across all callers, else we'll end
up with an API without an abstraction to make it worth more than saving a few
lines of quite simple to read code.  Currently this is the case, but that might
not always hold, so not sure it if it's worth it.

0001 + 0002 + 0003 pass tests on my machine and nothing sticks out.

cheers ./daniel

Re: A patch for get origin from commit_ts.

2020-06-30 Thread Simon Riggs

On Tue, 30 Jun 2020 at 02:17, Madan Kumar  wrote:

> We already have pg_xact_commit_timestamp() that returns the timestamp of
> the commit.

Yes, pg_xact_commit_origin() is a good name for an additional function. +1
for this.

> It may be better to have one single function returning both
> timestamp and origin for a given transaction ID.
>

No need to change existing APIs.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

Mission Critical Databases

Re: track_planning causing performance regression

2020-06-30 Thread Fujii Masao

On 2020/06/30 20:30, Ants Aasma wrote:

On Tue, 30 Jun 2020 at 08:43, Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:

 > The problem looks to be that spinlocks are terrible with overloaded CPU 
and a contended spinlock. A process holding the spinlock might easily get 
scheduled out leading to excessive spinning by everybody. I think a simple thing 
to try would be to replace the spinlock with LWLock.

Yes. Attached is the POC patch that replaces per-counter spinlock with 
LWLock.

Great. I think this is the one that should get considered for testing.

 > I did a prototype patch that replaces spinlocks with futexes, but was 
not able to find a workload where it mattered.

I'm not familiar with futex, but could you tell me why you used futex 
instead
of LWLock that we already have? Is futex portable?

Futex is a Linux kernel call that allows to build a lock that has uncontended 
cases work fully in user space almost exactly like a spinlock, while falling 
back to syscalls that wait for wakeup in case of contention. It's not portable, 
but probably something similar could be implemented for other operating 
systems. I did not pursue this further because it became apparent that every 
performance critical spinlock had already been removed.

To be clear, I am not advocating for this patch to get included. I just had the 
patch immediately available and it could have confirmed that using a better 
lock fixes things.

Understood. Thanks for the explanation!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: track_planning causing performance regression

2020-06-30 Thread Fujii Masao





On 2020/06/30 7:29, Peter Geoghegan wrote:

On Mon, Jun 29, 2020 at 3:23 PM Peter Geoghegan  wrote:

+1 -- this regression seems unacceptable to me.


I added an open item to track this.


Thanks!
I'm thinking to change the default value of track_planning to off for v13.

Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Implement for window functions

2020-06-30 Thread Vik Fearing

This feature adds RESPECT NULLS and IGNORE NULLS syntax to several
window functions, according to the SQL Standard.

Unlike the last time this was attempted[1], my version does not hardcode
the spec's list of functions that this applies to.  Instead, it accepts
it for all true window functions (that is, it does not apply to
aggregates acting as window functions).

This patch also does not attempt to solve the FROM LAST problem.  That
remains unimplemented.

For the CREATE FUNCTION syntax, I used TREAT NULLS so as to avoid
creating new keywords.

The second patch adds some new window functions in order to test that
the null treatment works correctly for cases that aren't covered by the
standard functions but that custom functions might want to use.  It is
*not* intended to be committed; I am only submitting the first patch for
inclusion in core.

This is based off of 324435eb14.

[1]
https://www.postgresql.org/message-id/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com
-- 
Vik Fearing
>From 9ea72a4fe144f8ffdccb4b53eb4ea97e24ce61f7 Mon Sep 17 00:00:00 2001
From: Vik Fearing 
Date: Sun, 31 May 2020 07:29:35 +0200
Subject: [PATCH 1/2] implement  for window functions

---
 doc/src/sgml/func.sgml|  19 +-
 doc/src/sgml/ref/create_function.sgml |  12 +
 doc/src/sgml/syntax.sgml  |  14 +
 src/backend/catalog/pg_aggregate.c|   3 +-
 src/backend/catalog/pg_proc.c |   8 +
 src/backend/commands/functioncmds.c   |  29 +-
 src/backend/commands/typecmds.c   |   1 +
 src/backend/executor/nodeWindowAgg.c  | 463 +++---
 src/backend/nodes/copyfuncs.c |   2 +
 src/backend/nodes/equalfuncs.c|   2 +
 src/backend/nodes/makefuncs.c |   1 +
 src/backend/nodes/outfuncs.c  |   2 +
 src/backend/nodes/readfuncs.c |   1 +
 src/backend/optimizer/util/clauses.c  |   1 +
 src/backend/parser/gram.y |  24 +-
 src/backend/parser/parse_func.c   |  27 +-
 src/backend/utils/adt/ruleutils.c |  12 +-
 src/include/catalog/pg_proc.dat   |  27 +-
 src/include/catalog/pg_proc.h |   4 +
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   8 +
 src/include/parser/kwlist.h   |   2 +
 src/include/parser/parse_func.h   |   3 +-
 .../regress/expected/create_function_3.out|   8 +
 .../regress/expected/create_procedure.out |   4 +
 src/test/regress/expected/misc_sanity.out |  10 +
 src/test/regress/expected/window.out  | 211 
 src/test/regress/sql/create_function_3.sql|   7 +
 src/test/regress/sql/create_procedure.sql |   1 +
 src/test/regress/sql/misc_sanity.sql  |   9 +
 src/test/regress/sql/window.sql   | 101 
 31 files changed, 812 insertions(+), 205 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f065856535..388b942d57 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19722,6 +19722,14 @@ SELECT count(*) FROM sometable;
about frame specifications.
   
 
+  
+   The functions lead, lag,
+   first_value, last_value, and
+   nth_value accept a null treatment option which is
+   RESPECT NULLS or IGNORE NULLS.
+   If this option is not specified, the default is RESPECT NULLS.
+  
+
   
When an aggregate function is used as a window function, it aggregates
over the rows within the current row's window frame.
@@ -19735,14 +19743,9 @@ SELECT count(*) FROM sometable;
 
   

-The SQL standard defines a RESPECT NULLS or
-IGNORE NULLS option for lead, lag,
-first_value, last_value, and
-nth_value.  This is not implemented in
-PostgreSQL: the behavior is always the
-same as the standard's default, namely RESPECT NULLS.
-Likewise, the standard's FROM FIRST or FROM LAST
-option for nth_value is not implemented: only the
+The SQL standard defines a FROM FIRST or FROM LAST
+option for nth_value.  This is not implemented in
+PostgreSQL: only the
 default FROM FIRST behavior is supported.  (You can achieve
 the result of FROM LAST by reversing the ORDER BY
 ordering.)
diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml
index f81cedc823..aae967eeb8 100644
--- a/doc/src/sgml/ref/create_function.sgml
+++ b/doc/src/sgml/ref/create_function.sgml
@@ -28,6 +28,7 @@ CREATE [ OR REPLACE ] FUNCTION
   { LANGUAGE lang_name
 | TRANSFORM { FOR TYPE type_name } [, ... ]
 | WINDOW
+| TREAT NULLS
 | IMMUTABLE | STABLE | VOLATILE | [ NOT ] LEAKPROOF
 | CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
 | [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
@@ -293,6 +294,17 @@ CREATE [ OR REPLACE ] FUNCTION
  
 
 
+
+ TREAT NULLS
+
+

Re: warnings for invalid function casts

2020-06-30 Thread Tom Lane

Peter Eisentraut  writes:
> There are three subplots:

> 1. Changing the return type of load_external_function() and 
> lookup_external_function() from PGFunction to a generic pointer type, 
> which is what the discussion in [0] started out about.

I feel like what you propose to do here is just shifting the problem
around: we're still casting from a function pointer that describes one
concrete call ABI to a function pointer that describes some other concrete
call ABI.  That is, "void (*ptr) (void)" is *not* disclaiming knowledge
of the function's signature, in the way that "void *ptr" disclaims
knowledge of what a data pointer points to.  So if current gcc fails to
warn about that, that's just a random and indeed obviously wrong decision
that they might change someday.

Re-reading the original discussion, it seems like what we have to do
if we want to suppress these warnings is to fully buy into POSIX's
assertion that casting between data and function pointers is OK:

Note that conversion from a void * pointer to a function pointer as in:
fptr = (int (*)(int)) dlsym(handle, "my_function");
is not defined by the ISO C standard. This standard requires this
conversion to work correctly on conforming implementations.

I suggest therefore that a logically cleaner solution is to keep the
result type of load_external_function et al as "void *", and have
callers cast that to the required specific function-pointer type,
thus avoiding ever casting between two function-pointer types.
(We could keep most of your patch as-is, but typedef GenericFunctionPtr
as "void *" not a function pointer, with some suitable commentary.)

> 2. There is a bit of cheating in dynahash.c.

It's slightly annoying that this fix introduces an extra layer of
function-call indirection.  Maybe that's not worth worrying about,
but I'm tempted to suggest that we could fix it on the same principle
with

hashp->keycopy = (HashCopyFunc) (void *) strlcpy;

> 3. Finally, there is some nonsense necessary in plpython, which is 
> annoying but otherwise uninteresting.

Again, it seems pretty random to me that this suppresses any warnings,
but it'd be less so if the intermediate cast were to "void *".

regards, tom lane

Re: min_safe_lsn column in pg_replication_slots view

2020-06-30 Thread Fujii Masao





On 2020/06/30 17:07, Fujii Masao wrote:



On 2020/06/26 13:45, Amit Kapila wrote:

On Fri, Jun 26, 2020 at 4:54 AM Alvaro Herrera  wrote:


On 2020-Jun-26, Michael Paquier wrote:


On Thu, Jun 25, 2020 at 11:24:27AM -0400, Alvaro Herrera wrote:

I don't understand the proposal.  Michael posted a patch that adds
pg_wal_oldest_lsn(), and you say we should apply the patch except the
part that adds that function -- so what part would be applying?


I have sent last week a patch about only the removal of min_safe_lsn:
https://www.postgresql.org/message-id/20200619121552.gh453...@paquier.xyz
So this applies to this part.


Well, I oppose that because it leaves us with no way to monitor slot
limits.  In his opening email, Masao-san proposed to simply change the
value by adding 1.  How you go from adding 1 to a column to removing
the column completely with no recourse, is beyond me.

Let me summarize the situation and possible ways forward as I see them.
If I'm mistaken, please correct me.

Problems:
i)  pg_replication_slot.min_safe_lsn has a weird definition in that all
 replication slots show the same value



It is also not clear how the user can make use of that value?


ii) min_safe_lsn cannot be used with pg_walfile_name, because it returns
 the name of the previous segment.

Proposed solutions:

a) Do nothing -- keep the min_safe_lsn column as is.  Warn users that
    pg_walfile_name should not be used with this column.
b) Redefine min_safe_lsn to be lsn+1, so that pg_walfile_name can be used
    and return a useful value.
c) Remove min_safe_lsn; add functions that expose the same value
d) Remove min_safe_lsn; add a new view that exposes the same value and
    possibly others

e) Replace min_safe_lsn with a "distance" column, which reports
    restart_lsn - oldest valid LSN
    (Note that you no longer have an LSN in this scenario, so you can't
    call pg_walfile_name.)


I like (e).



Can we consider an option to "Remove min_safe_lsn; document how a user
can monitor the distance"?  We have a function to get current WAL
insert location and other things required are available either via
view or as guc variable values.  The reason I am thinking of this
option is that it might be better to get some more feedback on what is
the most appropriate value to display.  However, I am okay if we can
reach a consensus on one of the above options.


Yes, that's an idea. But it might not be easy to calculate that distance
manually by subtracting max_slot_wal_keep_size from the current LSN.
Because we've not supported -(pg_lsn, numeric) operator yet. I'm
proposing that operator, but it's for v14.


Sorry this is not true. That distance can be calculated without those operators.
For example,

SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 
1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size') distance FROM 
pg_replication_slots;

If the calculated distance is small or negative value, which means that
we may lose some required WAL files. So in this case it's worth considering
to increase max_slot_wal_keep_size.

I still think it's better and more helpful to display something like
that distance in pg_replication_slots rather than making each user
calculate it...

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

max_slot_wal_keep_size and wal_keep_segments

2020-06-30 Thread Fujii Masao


Hi,

When I talked about max_slot_wal_keep_size as new feature in v13
at the conference, I received the question like "Why are the units of
setting values in max_slot_wal_keep_size and wal_keep_segments different?"
from audience. That difference looks confusing for users and
IMO it's better to use the same unit for them. Thought?

There seem to be several options to do this.

(1) Rename max_slot_wal_keep_size to max_slot_wal_keep_segments,
and make users specify the number of WAL segments in it instead of
WAL size.

(2) Rename wal_keep_segments to wal_keep_size, and make users specify
 the WAL size in it instead of the number of WAL segments.

(3) Don't rename the parameters, and allow users to specify not only
the number of WAL segments but also the WAL size in wal_keep_segments.

Since we have been moving away from measuring in segments, e.g.,
max_wal_size, I don't think (1) is good idea.

For backward compatibility, (3) is better. But which needs more
(maybe a bit complicated) code in guc.c. Also the parameter names
are not consistent yet (i.e., _segments and _size).

So for now I like (2).

Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: some more pg_dump refactoring

2020-06-30 Thread Fabien COELHO




Hello,


You changed the query strings to use "\n" instead of " ". I would not have
changed that, because it departs from the style around, and I do not think
it improves readability at the C code level.


This was the style that was introduced by 
daa9fe8a5264a3f192efa5ddee8fb011ad9da365.


Yep. This does not imply that it is better, or worst. Currently it is not 
consistent within the file, and there are only few instances, so maybe it 
could be discussed anyway.


After giving it some thought, I'd say that at least I'd like the query to 
be easy to read when dumped. This is not incompatible with some embedded 
eol, on the contrary, but ISTM that it could keep some indentation as 
well, which would be kind-of a middle ground. For readability, I'd also 
consider turning keywords to upper case. Maybe it could look like:


  "SELECT\n"
  "  somefield,\n"
  "  someotherfiled,\n"
  ...
  "FROM some_table\n"
  "JOIN ... ON ...\n" ...

All this is highly debatable, so ignore if you feel like it.


Would it make sense to accumulate in the other direction, older to newer,
so that new attributes are added at the end of the select?


I think that could make sense, but the current style was introduced by 
daa9fe8a5264a3f192efa5ddee8fb011ad9da365.  Should we revise that?


It seems to me more logical to do it while you're at it, but you are the 
one writing the patches:-)



Should array_to_string be pg_catalog.array_to_string? All other calls seem
to have an explicit schema.


It's not handled fully consistently in pg_dump.  But my understanding is that 
this is no longer necessary because pg_dump explicitly sets the search path 
before running.


Dunno. It does not look consistent with a mix because the wary reviewer 
think that there may be a potential bug:-) ISTM that explicit is better 
than implicit in this context: not relying on search path would allow to 
test the query independently, and anyway what you see is what you get.


Otherwise: v2 patch applies cleanly, compiles, global make check ok, 
pg_dump tap ok.


"progargnames" is added in both branches of an if, which looks awkward. 
I'd suggest maybe to add it once unconditionnaly.


I cannot test easily on older versions.

--
Fabien.

Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators

2020-06-30 Thread Fujii Masao





On 2020/06/30 19:54, Asif Rehman wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

The patch looks good to me.

The new status of this patch is: Ready for Committer


Thanks for the review! Pushed.

Regards,  


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-06-30 Thread Bruce Momjian

On Tue, Jun 30, 2020 at 12:23:28PM +0900, Masahiko Sawada wrote:
> > I propose to have a new session level GUC called
> > "enable_connectioncache"(name can be changed if it doesn't correctly
> > mean the purpose) with the default value being true which means that
> > all the remote connections are cached. If set to false, the
> > connections are not cached and so are remote sessions closed by the local 
> > backend/session at
> > the end of each remote transaction.
> 
> I've not looked at your patch deeply but if this problem is talking
> only about postgres_fdw I think we should improve postgres_fdw, not
> adding a GUC to the core. It’s not that all FDW plugins use connection
> cache and postgres_fdw’s connection cache is implemented within
> postgres_fdw, I think we should focus on improving postgres_fdw. I
> also think it’s not a good design that the core manages connections to
> remote servers connected via FDW. I wonder if we can add a
> postgres_fdw option for this purpose, say keep_connection [on|off].
> That way, we can set it per server so that remote connections to the
> particular server don’t remain idle.

I thought we would add a core capability, idle_session_timeout, which
would disconnect idle sessions, and the postgres_fdw would use that.  We
have already had requests for idle_session_timeout, but avoided it
because it seemed better to tell people to monitor pg_stat_activity and
terminate sessions that way, but now that postgres_fdw needs it too,
there might be enough of a requirement to add it.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: pg_read_file() with virtual files returns empty string

2020-06-30 Thread Joe Conway

On 6/28/20 6:00 PM, Tom Lane wrote:
> Joe Conway  writes:
>> All good stuff -- I believe the attached checks all the boxes.
> 
> Looks okay to me, except I think you want
> 
> ! if (bytes_to_read > 0)
> 
> to be
> 
> ! if (bytes_to_read >= 0)

Yep -- thanks.

I did some performance testing of the worst case/largest possible file and found
that skipping the stat and bulk read does cause a significant regression.
Current HEAD takes about 400ms on my desktop, and with that version of the patch
more like 1100ms.

In the attached patch I was able to get most of the performance degradation back
-- ~600ms. Hopefully you don't think what I did was "too cute by half" :-). Do
you think this is good enough or should we go back to using the stat file size
when it is > 0?

As noted in the comment, the downside of that method is that the largest
supported file size is 1 byte smaller when "reading the entire file" versus
"reading a specified size" due to StringInfo reserving the last byte for a
trailing null.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/contrib/adminpack/expected/adminpack.out b/contrib/adminpack/expected/adminpack.out
index 5738b0f..edf3ebf 100644
*** a/contrib/adminpack/expected/adminpack.out
--- b/contrib/adminpack/expected/adminpack.out
*** SELECT pg_file_rename('test_file1', 'tes
*** 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not stat file "test_file1": No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
--- 79,85 
  (1 row)
  
  SELECT pg_read_file('test_file1');  -- not there
! ERROR:  could not open file "test_file1" for reading: No such file or directory
  SELECT pg_read_file('test_file2');
   pg_read_file 
  --
*** SELECT pg_file_rename('test_file2', 'tes
*** 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not stat file "test_file2": No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
--- 108,114 
  (1 row)
  
  SELECT pg_read_file('test_file2');  -- not there
! ERROR:  could not open file "test_file2" for reading: No such file or directory
  SELECT pg_read_file('test_file3');
   pg_read_file 
  --
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index ceaa618..33db576 100644
*** a/src/backend/utils/adt/genfile.c
--- b/src/backend/utils/adt/genfile.c
***
*** 36,41 
--- 36,42 
  #include "utils/syscache.h"
  #include "utils/timestamp.h"
  
+ #define READBUF_SIZE	8192
  
  /*
   * Convert a "text" filename argument to C string, and check it's allowable.
*** read_binary_file(const char *filename, i
*** 106,138 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes;
  	FILE	   *file;
  
! 	if (bytes_to_read < 0)
! 	{
! 		if (seek_offset < 0)
! 			bytes_to_read = -seek_offset;
! 		else
! 		{
! 			struct stat fst;
! 
! 			if (stat(filename, &fst) < 0)
! 			{
! if (missing_ok && errno == ENOENT)
! 	return NULL;
! else
! 	ereport(ERROR,
! 			(errcode_for_file_access(),
! 			 errmsg("could not stat file \"%s\": %m", filename)));
! 			}
! 
! 			bytes_to_read = fst.st_size - seek_offset;
! 		}
! 	}
! 
! 	/* not sure why anyone thought that int64 length was a good idea */
! 	if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
--- 107,117 
   bool missing_ok)
  {
  	bytea	   *buf;
! 	size_t		nbytes = 0;
  	FILE	   *file;
  
! 	/* clamp request size to what we can actually deliver */
! 	if (bytes_to_read > (int64) (MaxAllocSize - VARHDRSZ))
  		ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("requested length too large")));
*** read_binary_file(const char *filename, i
*** 154,162 
  (errcode_for_file_access(),
   errmsg("could not seek in file \"%s\": %m", filename)));
  
! 	buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
  
! 	nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
  
  	if (ferror(file))
  		ereport(ERROR,
--- 133,173 
  (errcode_for_file_access(),
   errmsg("could not seek in file \"%s\": %m", filename)));
  
! 	if (bytes_to_read >= 0)
! 	{
! 		/* If passed explicit read size just do it */
! 		buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
  
! 		nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
! 	}
! 	else
! 	{
! 		/* Negative read size, read rest of file */
! 		StringInfoData	sbuf;
! 		size_t			rbytes = 0;
! 		char			rbuf[READBUF_SIZE];
! 
! 		initStringInfo(&sbuf);
! 		buf = (bytea *) sbuf.data;
! 		sbuf.len += VARHDRSZ;
! 
! 		while (!(feof(file) || ferror(file)))
! 		{
! 			rbytes = fread(rbuf, 1, (size_t) READBUF_SIZE,

Re: Implement for window functions

2020-06-30 Thread Dagfinn Ilmari Mannsåker

Vik Fearing  writes:

> The second patch adds some new window functions in order to test that
> the null treatment works correctly for cases that aren't covered by the
> standard functions but that custom functions might want to use.  It is
> *not* intended to be committed; I am only submitting the first patch for
> inclusion in core.

Would it make stense to add them as a test extension under
src/test/modules/?

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen

Intermittent BRIN failures on hyrax and lousyjack

2020-06-30 Thread Tom Lane

Buildfarm member hyrax has shown this failure twice recently:

--- 
/home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/expected/brin.out   
2020-01-23 11:10:05.730014075 -0500
+++ 
/home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/brin.out
2020-06-30 03:50:23.651196117 -0400
@@ -490,6 +490,7 @@
 INSERT INTO brintest_2 VALUES (numrange(0, 2^1000::numeric));
 INSERT INTO brintest_2 VALUES ('(-1, 0)');
 SELECT brin_desummarize_range('brinidx', 0);
+WARNING:  leftover placeholder tuple detected in BRIN index "brinidx", deleting
  brin_desummarize_range 
 
  
This happened on HEAD:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-06-30%2001%3A41%3A50
and v13:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-06-17%2005%3A50%3A46
and lousyjack has also shown it once on HEAD:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-02%2013%3A03%3A04

The warning is from this bit (commit c655899ba):

/*
 * Because of ShareUpdateExclusive lock, this function shouldn't run
 * concurrently with summarization.  Placeholder tuples can only exist as
 * leftovers from crashed summarization, so if we detect any, we complain
 * but proceed.
 */
if (BrinTupleIsPlaceholder(tup))
ereport(WARNING,
(errmsg("leftover placeholder tuple detected in BRIN index 
\"%s\", deleting",
RelationGetRelationName(idxrel;

Now, there was no preceding crash in these tests, so the comment's claim
is evidently a lie.  But what's going wrong?  The postmaster log provides
a strong hint:

2020-06-30 03:50:14.603 EDT [3699:48] pg_regress/brin LOG:  statement: SELECT 
brin_desummarize_range('brinidx', 0);
2020-06-30 03:50:15.894 EDT [3699:49] pg_regress/brin LOG:  process 3699 still 
waiting for ShareUpdateExclusiveLock on relation 24795 of database 16384 after 
1000.094 ms
2020-06-30 03:50:15.894 EDT [3699:50] pg_regress/brin DETAIL:  Process holding 
the lock: 7237. Wait queue: 3699.
2020-06-30 03:50:15.894 EDT [3699:51] pg_regress/brin STATEMENT:  SELECT 
brin_desummarize_range('brinidx', 0);
2020-06-30 03:50:15.895 EDT [7237:1] ERROR:  canceling autovacuum task
2020-06-30 03:50:15.895 EDT [7237:2] CONTEXT:  while cleaning up index 
"brinidx" of relation "public.brintest"
automatic vacuum of table "regression.public.brintest"
2020-06-30 03:50:15.899 EDT [3699:52] pg_regress/brin LOG:  process 3699 
acquired ShareUpdateExclusiveLock on relation 24795 of database 16384 after 
1001.626 ms
2020-06-30 03:50:15.899 EDT [3699:53] pg_regress/brin STATEMENT:  SELECT 
brin_desummarize_range('brinidx', 0);
2020-06-30 03:50:16.018 EDT [3699:54] pg_regress/brin WARNING:  leftover 
placeholder tuple detected in BRIN index "brinidx", deleting
2020-06-30 03:50:16.031 EDT [3699:55] pg_regress/brin LOG:  statement: SELECT 
brin_summarize_range('brinidx', 0);

I think the "crash" actually was the forced autovac cancellation we see
here.  Thus, the fact that both these animals use CLOBBER_CACHE_ALWAYS is
not a direct cause of the failure, though it might contribute to getting
the timing right for this to happen.

So (1) the comment needs to be adjusted to mention that vacuum
cancellation is enough to create the situation, and (2) probably the
message needs to be downgraded quite a lot, DEBUG2 or even less.
Or maybe we could just delete the quoted stanza altogether.

regards, tom lane

Re: POC: postgres_fdw insert batching

2020-06-30 Thread Tomas Vondra


On Mon, Jun 29, 2020 at 04:22:15PM +0530, Ashutosh Bapat wrote:

On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
 wrote:



FDW batching: 4584 ms

So, rather nice improvement, I'd say ...


Very nice.



Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.


1) Extend the FDW API?

In the patch, the batching is simply "injected" into the existing insert
API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
extend the API with a "batched" version of the method, so that we can
easily determine whether the FDW supports batching or not - it would
require changes in the callers, though. OTOH it might be useful for
COPY, where we could do something similar to multi_insert (COPY already
benefits from this patch, but it does not use the batching built-into
COPY).


Amit Langote has pointed out a related patch being discussed on hackers at [1].

That patch introduces a new API. But if we can do it without
introducing a new API that will be good. FDWs which can support
batching can just modify their code and don't have to implement and
manage a new API. We already have a handful of those APIs.



I don't think extending the API is a big issue - the FDW code will need
changing anyway, so this seems minor.

I'll take a look at the COPY patch - I agree it seems like a good idea,
although it can be less convenient in various caes (e.g. I've seen a lot
of INSERT ... SELECT queries in sharded systems, etc.). 



2) What about the insert results?

I'm not sure what to do about "result" status for the inserted rows. We
only really "stash" the rows into a buffer, so we don't know if it will
succeed or not. The patch simply assumes it will succeed, but that's
clearly wrong, and it may result in reporting a wrong number or rows.


I didn't get this. We are executing an INSERT on the foreign server,
so we get the number of rows INSERTed from that server. We should just
add those up across batches. If there's a failure, it would abort the
transaction, local as well as remote.



True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.



The patch also disables the batching when the insert has a RETURNING
clause, because there's just a single slot (for the currently inserted
row). I suppose a "batching" method would take an array of slots.



It will be a rare case when a bulk load also has a RETURNING clause.
So, we can leave with this restriction. We should try to choose a
design which allows that restriction to be lifted in the future. But I
doubt that restriction will be a serious one.



3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.


Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In your patch, I see that an INSERT statement with batch is
constructed as INSERT INTO ... VALUES (...), (...) as many values as
the batch size. That won't work as is for UPDATE/DELETE since we can't
pass multiple pairs of ctids and columns to be updated for each ctid
in one statement. Maybe we could build as many UPDATE/DELETE
statements as the size of a batch, but that would be ugly. What we
need is a feature like a batch prepared statement in libpq similar to
what JDBC supports
((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
This will allow a single prepared statement to be executed with a
batch of parameters, each batch corresponding to one foreign DML
statement.



I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.




3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.


I think this won't support RETURNING as well. But if we could somehow
use copy protocol to send the data to the foreign server and yet treat
it as INSERT, that might work. I think we have find out which performs
better COPY or batch INSERT.



I don't see why not support both, the use cases are somewhat different I
think.



For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for

Re: estimation problems for DISTINCT ON with FDW

2020-06-30 Thread Tom Lane

Jeff Janes  writes:
> It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign
> side, which is probably the best way to run the query.  I guess it makes
> sense that FDW machinery in general doesn't want to try to push a
> PostgreSQL specific construct.

Well, that's an unimplemented feature anyway.  But people hared off after
that without addressing your actual bug report:

> But much worse than that, it horribly misestmates the number of unique rows
> it will get back, having never asked the remote side for an estimate of
> that.

I poked into that and found that the problem is in estimate_num_groups,
which effectively just disregards any relation that has rel->tuples = 0.
That is the case for a postgres_fdw foreign table if use_remote_estimate
is true, because postgres_fdw never bothers to set any other value.
(On the other hand, if use_remote_estimate is false, it does fill in a
pretty-bogus value, mainly so it can use set_baserel_size_estimates.
See postgresGetForeignRelSize.)

It seems like we could make estimate_num_groups a bit more robust here;
it could just skip its attempts to clamp based on total size or
restriction selectivity, but still include the reldistinct value for the
rel into the total numdistinct.  I wonder though if this is the only
problem caused by failing to fill in any value for rel->tuples ...
should we make postgres_fdw install some value for that?

(Note that the question of whether we should ask the remote server for
an estimate of ndistinct is kind of orthogonal to any of these points.
Even if we had obtained one that way, estimate_num_groups would not pay
any attention to it without a fix for the point at hand.)

regards, tom lane

SQL-standard function body

2020-06-30 Thread Peter Eisentraut

This adds support for writing CREATE FUNCTION and CREATE PROCEDURE 
statements for language SQL with a function body that conforms to the 
SQL standard and is portable to other implementations.


Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:

 CREATE FUNCTION add(a integer, b integer) RETURNS integer
 LANGUAGE SQL
 RETURN a + b;

or as a block

 CREATE PROCEDURE insert_data(a integer, b integer)
 LANGUAGE SQL
 BEGIN ATOMIC
   INSERT INTO tbl VALUES (a);
   INSERT INTO tbl VALUES (b);
 END;

The function body is parsed at function definition time and stored as
expression nodes in probin.  So at run time, no further parsing is
required.

However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.

Dependencies between the function and the objects it uses are fully
tracked.

A new RETURN statement is introduced.  This can only be used inside
function bodies.  Internally, it is treated much like a SELECT
statement.

psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.

Also, per SQL standard, LANGUAGE SQL is the default, so it does not
need to be specified anymore.

Note: Some parts of the patch look better under git diff -w (ignoring 
whitespace changes) because if/else blocks were introduced around 
existing code.


TODOs and discussion points:

- pg_dump is not yet supported.  As a consequence, the pg_upgrade
tests don't pass yet.  I'm thinking about changing pg_dump to use 
pg_get_functiondef here instead of coding everything by hand.  Some 
initial experimenting showed that this would be possible with minimal 
tweaking and it would surely be beneficial in the long run.


- The compiled function body is stored in the probin field of pg_proc. 
This matches the historical split similar to adsrc/adbin, consrc/conbin, 
but this has now been abandoned.  Also, this field should ideally be of 
type pg_node_tree, so reusing probin for that is probably not good. 
Seems like a new field might be best.


- More test coverage is needed.  Surprisingly, there wasn't actually any 
test AFAICT that just creates and SQL function and runs it.  Most of 
that code is tested incidentally, but there is very little or no 
targeted testing of this functionality.


- Some of the changes in pg_proc.c, functioncmds.c, and functions.c in 
particular were jammed in and could use some reorganization after the 
basic ideas are solidified.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 9611034103216bf57a76546cc212786fa8fe5b73 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 30 Jun 2020 19:42:08 +0200
Subject: [PATCH v1] SQL-standard function body

This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.

Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:

CREATE FUNCTION add(a integer, b integer) RETURNS integer
LANGUAGE SQL
RETURN a + b;

or as a block

CREATE PROCEDURE insert_data(a integer, b integer)
LANGUAGE SQL
BEGIN ATOMIC
  INSERT INTO tbl VALUES (a);
  INSERT INTO tbl VALUES (b);
END;

The function body is parsed at function definition time and stored as
expression nodes in probin.  So at run time, no further parsing is
required.

However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.

Dependencies between the function and the objects it uses are fully
tracked.

A new RETURN statement is introduced.  This can only be used inside
function bodies.  Internally, it is treated much like a SELECT
statement.

psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.

Also, per SQL standard, LANGUAGE SQL is the default, so it does not
need to be specified anymore.

TODO: pg_dump is not yet supported.  As a consequence, the pg_upgrade
tests don't pass yet.
---
 doc/src/sgml/ref/create_function.sgml | 126 +++--
 doc/src/sgml/ref/create_procedure.sgml|  62 ++-
 src/backend/catalog/pg_proc.c | 148 ---
 src/backend/commands/aggregatecmds.c  |   2 +
 src/backend/commands/functioncmds.c   |  96 +++---
 src/backend/executor/functions.c  |  79 
 src/backend/nodes/copyfuncs.c |  15 ++
 src/backend/nodes/equalfuncs.c|  13

Re: SQL-standard function body

2020-06-30 Thread Robert Haas

On Tue, Jun 30, 2020 at 1:49 PM Peter Eisentraut
 wrote:
> This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
> statements for language SQL with a function body that conforms to the
> SQL standard and is portable to other implementations.

With what other implementations is it compatible?

> The function body is parsed at function definition time and stored as
> expression nodes in probin.  So at run time, no further parsing is
> required.
>
> However, this form does not support polymorphic arguments, because
> there is no more parse analysis done at call time.
>
> Dependencies between the function and the objects it uses are fully
> tracked.
>
> A new RETURN statement is introduced.  This can only be used inside
> function bodies.  Internally, it is treated much like a SELECT
> statement.
>
> psql needs some new intelligence to keep track of function body
> boundaries so that it doesn't send off statements when it sees
> semicolons that are inside a function body.
>
> Also, per SQL standard, LANGUAGE SQL is the default, so it does not
> need to be specified anymore.

Hmm, this all seems like a pretty big semantic change. IIUC, right
now, a SQL function can only contain one statement, but it seems like
with this patch you can have a block in there with a bunch of
statements, sorta like plpgsql. But probably you don't have all of the
functionality of plpgsql available. Also, the fact that you're doing
parsing earlier means that e.g. creating a table and inserting into it
won't work. Maybe that's fine. But it almost seems like you are
inventing a whole new PL

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SQL-standard function body

2020-06-30 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> Hmm, this all seems like a pretty big semantic change. IIUC, right
> now, a SQL function can only contain one statement, but it seems like
> with this patch you can have a block in there with a bunch of
> statements, sorta like plpgsql.

From our docs:

CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
UPDATE bank
SET balance = balance - debit
WHERE accountno = tf1.accountno;
SELECT 1;
$$ LANGUAGE SQL;

https://www.postgresql.org/docs/current/xfunc-sql.html

Haven't looked at the patch, tho if it adds support for something the
SQL standard defines, that generally seems like a positive to me.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: min_safe_lsn column in pg_replication_slots view

2020-06-30 Thread Alvaro Herrera

On 2020-Jun-30, Fujii Masao wrote:

> Sorry this is not true. That distance can be calculated without those 
> operators.
> For example,
> 
> SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric * 1024 * 
> 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size') distance FROM 
> pg_replication_slots;
> 
> If the calculated distance is small or negative value, which means that
> we may lose some required WAL files. So in this case it's worth considering
> to increase max_slot_wal_keep_size.

... OK, but you're forgetting wal_keep_segments.

> I still think it's better and more helpful to display something like
> that distance in pg_replication_slots rather than making each user
> calculate it...

Agreed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SQL-standard function body

2020-06-30 Thread Pavel Stehule

út 30. 6. 2020 v 19:58 odesílatel Robert Haas 
napsal:

> On Tue, Jun 30, 2020 at 1:49 PM Peter Eisentraut
>  wrote:
> > This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
> > statements for language SQL with a function body that conforms to the
> > SQL standard and is portable to other implementations.
>
> With what other implementations is it compatible?
>
> > The function body is parsed at function definition time and stored as
> > expression nodes in probin.  So at run time, no further parsing is
> > required.
> >
> > However, this form does not support polymorphic arguments, because
> > there is no more parse analysis done at call time.
> >
> > Dependencies between the function and the objects it uses are fully
> > tracked.
> >
> > A new RETURN statement is introduced.  This can only be used inside
> > function bodies.  Internally, it is treated much like a SELECT
> > statement.
> >
> > psql needs some new intelligence to keep track of function body
> > boundaries so that it doesn't send off statements when it sees
> > semicolons that are inside a function body.
> >
> > Also, per SQL standard, LANGUAGE SQL is the default, so it does not
> > need to be specified anymore.
>
> Hmm, this all seems like a pretty big semantic change. IIUC, right
> now, a SQL function can only contain one statement, but it seems like
> with this patch you can have a block in there with a bunch of
> statements, sorta like plpgsql. But probably you don't have all of the
> functionality of plpgsql available. Also, the fact that you're doing
> parsing earlier means that e.g. creating a table and inserting into it
> won't work. Maybe that's fine. But it almost seems like you are
> inventing a whole new PL
>

It is SQL/PSM and can be nice to have it.

I am a little bit afraid about performance - SQL functions doesn't use plan
cache and simple expressions. Without inlining it can be too slow.

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

Re: SQL-standard function body

2020-06-30 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jun 30, 2020 at 1:49 PM Peter Eisentraut
>  wrote:
>> This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
>> statements for language SQL with a function body that conforms to the
>> SQL standard and is portable to other implementations.

> With what other implementations is it compatible?

Yeah ... I'm sort of wondering exactly what this really accomplishes.
I think "portability" is a red herring unfortunately.

Tracking the dependencies of the function body sounds nice at first
glance, so it might be a feature.  But given our experiences with having
to use check_function_bodies = off to not have impossible dependency loops
in dump/restore, I rather wonder whether it'll be a net loss in practice.
IIUC, this implementation is flat out incapable of doing the equivalent of
check_function_bodies = off, and that sounds like trouble.

> Hmm, this all seems like a pretty big semantic change. IIUC, right
> now, a SQL function can only contain one statement,

Not true, you can have more.  However, it's nonetheless an enormous
semantic change, if only because the CREATE FUNCTION-time search_path
is now relevant instead of the execution-time path.  That *will*
break use-cases I've heard of, where the same function is applied
to different tables by adjusting the path.  It'd certainly be useful
from some perspectives (eg better security), but it's ... different.

Replicating the creation-time search path will be a big headache for
pg_dump, I bet.

> But it almost seems like you are
> inventing a whole new PL

Yes.  Having this replace the existing SQL PL would be a disaster,
because there are use-cases this simply can't meet (even assuming
that we can fix the polymorphism problem, which seems a bit unlikely).
We'd need to treat it as a new PL type.

Perhaps this is useful enough to justify all the work involved,
but I'm not sure.

regards, tom lane

Re: A patch for get origin from commit_ts.

2020-06-30 Thread Alvaro Herrera

On 2020-Jun-30, Michael Paquier wrote:

> Another question that has popped up when doing this review is what
> would be the use-case of adding this information at SQL level knowing
> that logical replication exists since 10?

Logical replication in core is a far cry from a fully featured
replication solution.  Kindly do not claim that we can now remove
features just because in-core logical replication does not use them;
this argument is ignoring the fact that we're still a long way from
developing actually powerful logical replication capabilities.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: new heapcheck contrib module

2020-06-30 Thread Alvaro Herrera

I think there are two very large patches here.  One adds checking of
heapam tables to amcheck, and the other adds a binary that eases calling
amcheck from the command line.  I think these should be two separate
patches.

I don't know what to think of a module contrib/pg_amcheck.  I kinda lean
towards fitting it in src/bin/scripts rather than as a contrib module.
However, it seems a bit weird that it depends on a contrib module.
Maybe amcheck should not be a contrib module at all but rather a new
extension in src/extensions/ that is compiled and installed (in the
filesystem, not in databases) by default.

I strongly agree with hardening backend code so that all the crashes
that Mark has found can be repaired.  (We discussed this topic
before[1]: we'd repair all crashes when run with production code, not
all assertion crashes.)

[1] https://postgr.es/m/20200513221051.GA26592@alvherre.pgsql

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SQL-standard function body

2020-06-30 Thread Tom Lane

I wrote:
> Replicating the creation-time search path will be a big headache for
> pg_dump, I bet.

On further thought, we probably don't have to.  Re-parsing the function
body the same way is exactly the same problem as re-parsing a view or
matview body the same way.  I don't want to claim that that's a 100%
solved problem, but I've heard few complaints in that area lately.

The point remains that exposing the function body's dependencies will
constrain restore order far more than we are accustomed to see.  It
might be possible to build examples that flat out can't be restored,
even granting that we teach pg_dump how to break dependency loops
by first creating the function with empty body and later redefining
it with the real body.  (Admittedly, if that's possible then you
likely could make it happen with views too.  But somehow it seems
more likely that people would create spaghetti dependencies for
functions than views.)

regards, tom lane

Re: track_planning causing performance regression

2020-06-30 Thread Andres Freund

Hi,

On 2020-06-30 14:43:39 +0900, Fujii Masao wrote:
> > I did a prototype patch that replaces spinlocks with futexes, but was not 
> > able to find a workload where it mattered.
> 
> I'm not familiar with futex, but could you tell me why you used futex instead
> of LWLock that we already have? Is futex portable?

We can't rely on futexes, they're linux only. I also don't see much of a
reason to use spinlocks (rather than lwlocks) here in the first place.


> diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
> b/contrib/pg_stat_statements/pg_stat_statements.c
> index cef8bb5a49..aa506f6c11 100644
> --- a/contrib/pg_stat_statements/pg_stat_statements.c
> +++ b/contrib/pg_stat_statements/pg_stat_statements.c
> @@ -39,7 +39,7 @@
>   * in an entry except the counters requires the same.  To look up an entry,
>   * one must hold the lock shared.  To read or update the counters within
>   * an entry, one must hold the lock shared or exclusive (so the entry doesn't
> - * disappear!) and also take the entry's mutex spinlock.
> + * disappear!) and also take the entry's partition lock.
>   * The shared state variable pgss->extent (the next free spot in the external
>   * query-text file) should be accessed only while holding either the
>   * pgss->mutex spinlock, or exclusive lock on pgss->lock.  We use the mutex 
> to
> @@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = 
> PG_VERSION_NUM / 100;
>  
>  #define JUMBLE_SIZE  1024/* query serialization 
> buffer size */
>  
> +#define  PGSS_NUM_LOCK_PARTITIONS()  (pgss_max)
> +#define  PGSS_HASH_PARTITION_LOCK(key)   \
> + (&(pgss->base + \
> +(get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock)
> +
>  /*
>   * Extension version number, for supporting older extension versions' objects
>   */
> @@ -207,7 +212,7 @@ typedef struct pgssEntry
>   Sizequery_offset;   /* query text offset in external file */
>   int query_len;  /* # of valid bytes in 
> query string, or -1 */
>   int encoding;   /* query text encoding 
> */
> - slock_t mutex;  /* protects the counters only */
> + LWLock  *lock;  /* protects the counters only */
>  } pgssEntry;

Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.

Greetings,

Andres Freund

Re: track_planning causing performance regression

2020-06-30 Thread Andres Freund

Hi,

On 2020-06-30 14:30:03 +0300, Ants Aasma wrote:
> Futex is a Linux kernel call that allows to build a lock that has
> uncontended cases work fully in user space almost exactly like a spinlock,
> while falling back to syscalls that wait for wakeup in case of contention.
> It's not portable, but probably something similar could be implemented for
> other operating systems. I did not pursue this further because it became
> apparent that every performance critical spinlock had already been removed.

Our lwlock implementation does have that property already, though. While
the kernel wait is implemented using semaphores, those are implemented
using futexes internally (posix ones, not sysv ones, so only after
whatever version we switched the default to posix semas on linux).

I'd rather move towards removing spinlocks from postgres than making
their implementation more complicated...

Greetings,

Andres Freund

Re: warnings for invalid function casts

2020-06-30 Thread Andres Freund

Hi,

On 2020-06-30 08:47:56 +0200, Peter Eisentraut wrote:
> Some time ago, there were some discussions about gcc warnings produced by
> -Wcast-function-type [0].  To clarify, while that thread seemed to imply
> that the warnings appear by default in some compiler version, this is not
> the case AFAICT, and the warnings are entirely optional.

Well, it's part of -Wextra. Which I think a fair number of people just
always enable...


> There are three subplots:
> 
> 1. Changing the return type of load_external_function() and
> lookup_external_function() from PGFunction to a generic pointer type, which
> is what the discussion in [0] started out about.

To a generic *function pointer type*, right?


> 2. There is a bit of cheating in dynahash.c.  They keycopy field is declared
> as a function pointer that returns a pointer to the destination, to match
> the signature of memcpy(), but then we assign strlcpy() to it, which returns
> size_t.  Even though we never use the return value, I'm not sure whether
> this could break if size_t and pointers are of different sizes, which in
> turn is very unlikely.

I agree that it's a low risk,


> Is there anything we want to pursue further here?

You mean whether we want to do further changes in the vein of yours, or
whether we want to apply your patch?

Greetings,

Andres Freund

Re: warnings for invalid function casts

2020-06-30 Thread Andres Freund

Hi,

On 2020-06-30 10:15:05 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > There are three subplots:
> 
> > 1. Changing the return type of load_external_function() and 
> > lookup_external_function() from PGFunction to a generic pointer type, 
> > which is what the discussion in [0] started out about.
> 
> I feel like what you propose to do here is just shifting the problem
> around: we're still casting from a function pointer that describes one
> concrete call ABI to a function pointer that describes some other concrete
> call ABI.  That is, "void (*ptr) (void)" is *not* disclaiming knowledge
> of the function's signature, in the way that "void *ptr" disclaims
> knowledge of what a data pointer points to.  So if current gcc fails to
> warn about that, that's just a random and indeed obviously wrong decision
> that they might change someday.

ISTM that it's unlikely that they'd warn about casting from one
signature to another? That'd basically mean that you're not allowed to
cast function pointers at all anymore? There's a legitimate reason to
distinguish between pointers to functions and pointers to data - but
what'd be the point in forbidding all casts between different function
pointer types?

> > 2. There is a bit of cheating in dynahash.c.
> 
> It's slightly annoying that this fix introduces an extra layer of
> function-call indirection.  Maybe that's not worth worrying about,
> but I'm tempted to suggest that we could fix it on the same principle
> with

Hm. At first I was going to say that every compiler worth its salt
should be able to optimize the indirection, but that's probably not
generally true, due to returning dest "manually". If the wrapper instead
just added explicit cast to the return type it'd presumably be ok.

Greetings,

Andres Freund

Re: SQL-standard function body

2020-06-30 Thread Andres Freund

Hi,

On 2020-06-30 19:49:04 +0200, Peter Eisentraut wrote:
> The function body is parsed at function definition time and stored as
> expression nodes in probin.  So at run time, no further parsing is
> required.

As raw parse tree or as a parse-analysed tree? I assume the latter?

Isn't a consequence of that that we'd get a lot more errors if any DDL
is done to tables involved in the query? In contrast to other languages
we'd not be able to handle column type changes etc, right?

Greetings,

Andres Freund

Re: pg_bsd_indent compiles bytecode

2020-06-30 Thread Andres Freund

Hi,

On 2020-06-29 21:27:48 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund  writes:
> >> The way that pg_bsd_indent defines its variables isn't standard C, as
> >> far as I can tell, which explains the errors I was getting. All the
> >> individual files include indent_globs.h, which declares/defines a bunch
> >> of variables. Since it doesn't use extern, they'll all end up as full
> >> definitions in each .o when -fno-common is used (the default now), hence
> >> the multiple definition errors. The only reason it works with -fcommon
> >> is that they'll end up processed as weak symbols and 'deduplicated' at
> >> link time.
> 
> > Ugh.  I agree that's pretty bogus, even if there's anything in the
> > C standard that allows it.  I'll put it on my to-do list.
> 
> I pushed the attached patch to the pg_bsd_indent repo.  Perhaps Piotr
> would like to absorb it into upstream.

Thanks!


> I don't intend to mark pg_bsd_indent with a new release number for
> this --- for people who successfully compiled, it's the same as before.

Makes sense to me.

Greetings,

Andres Freund

Re: warnings for invalid function casts

2020-06-30 Thread Tom Lane

Andres Freund  writes:
> On 2020-06-30 10:15:05 -0400, Tom Lane wrote:
>> I feel like what you propose to do here is just shifting the problem
>> around: we're still casting from a function pointer that describes one
>> concrete call ABI to a function pointer that describes some other concrete
>> call ABI.  That is, "void (*ptr) (void)" is *not* disclaiming knowledge
>> of the function's signature, in the way that "void *ptr" disclaims
>> knowledge of what a data pointer points to.  So if current gcc fails to
>> warn about that, that's just a random and indeed obviously wrong decision
>> that they might change someday.

> ISTM that it's unlikely that they'd warn about casting from one
> signature to another?

Uh, what?  Isn't that *exactly* what this warning class does?
If it doesn't do that, what good is it?  I mean, I can definitely
see the point of warning when you cast a function pointer to some
other not-ABI-compatible function pointer type, because that might
be a mistake, just like assigning "int *" to "double *" might be.

gcc 8's manual says

'-Wcast-function-type'
 Warn when a function pointer is cast to an incompatible function
 pointer.  In a cast involving function types with a variable
 argument list only the types of initial arguments that are provided
 are considered.  Any parameter of pointer-type matches any other
 pointer-type.  Any benign differences in integral types are
 ignored, like 'int' vs.  'long' on ILP32 targets.  Likewise type
 qualifiers are ignored.  The function type 'void (*) (void)' is
 special and matches everything, which can be used to suppress this
 warning.  In a cast involving pointer to member types this warning
 warns whenever the type cast is changing the pointer to member
 type.  This warning is enabled by '-Wextra'.

so it seems like they've already mostly crippled the type-safety of the
warning with the provision about "all pointer types are interchangeable"
:-(.  But they certainly are warning about *some* cases of casting one
signature to another.

In any case, I think the issue here is what is the escape hatch for saying
that "I know this cast is okay, don't warn about it, thanks".  Treating
"void (*) (void)" as special for that purpose is nothing more nor less
than a kluge, so another compiler might do it differently.  Given the
POSIX restriction, I think we could reasonably use "void *" instead.

regards, tom lane

Re: SQL-standard function body

2020-06-30 Thread Tom Lane

Andres Freund  writes:
> On 2020-06-30 19:49:04 +0200, Peter Eisentraut wrote:
>> The function body is parsed at function definition time and stored as
>> expression nodes in probin.  So at run time, no further parsing is
>> required.

> Isn't a consequence of that that we'd get a lot more errors if any DDL
> is done to tables involved in the query? In contrast to other languages
> we'd not be able to handle column type changes etc, right?

I suppose it'd act like column references in a view, ie the dependency
mechanisms would forbid you from changing/dropping any column mentioned
in one of these functions.

regards, tom lane

Re: new heapcheck contrib module

2020-06-30 Thread Mark Dilger

> On Jun 30, 2020, at 11:44 AM, Alvaro Herrera  wrote:
> 
> I think there are two very large patches here.  One adds checking of
> heapam tables to amcheck, and the other adds a binary that eases calling
> amcheck from the command line.  I think these should be two separate
> patches.

contrib/amcheck has pretty limited regression test coverage.  I wrote 
pg_amcheck in large part because the infrastructure I was writing for testing 
contrib/amcheck was starting to look like a stand-alone tool, so I made it one. 
 I can split contrib/pg_amcheck into a separate patch, but I would expect 
reviewers to use it to review contrib/amcheck  Say the word, and I'll resubmit 
as two separate patches.

> I don't know what to think of a module contrib/pg_amcheck.  I kinda lean
> towards fitting it in src/bin/scripts rather than as a contrib module.
> However, it seems a bit weird that it depends on a contrib module.

Agreed.

> Maybe amcheck should not be a contrib module at all but rather a new
> extension in src/extensions/ that is compiled and installed (in the
> filesystem, not in databases) by default.

Fine with me, but I'll have to see what others think about that.

> I strongly agree with hardening backend code so that all the crashes
> that Mark has found can be repaired.  (We discussed this topic
> before[1]: we'd repair all crashes when run with production code, not
> all assertion crashes.)

I'm guessing that hardening the backend would be a separate patch?  Or did you 
want that as part of this one?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Remove Deprecated Exclusive Backup Mode

2020-06-30 Thread David Steele


On 12/11/18 5:24 PM, David Steele wrote:


Attached is the patch.

I was a bit surprised by how much code went away.  There was a lot of
code dedicated to making sure that backup_label got renamed on shutdown,
that there was not an exclusive backup running, etc.

There were no tests for exclusive backup so the test changes were minimal.

I did have to replace one "hot backup" in
010_logical_decoding_timelines.pl.  I'm not sure why the backup was
being done this way, or why the standby needs a copy of pg_replslot
(which is not copied by pg_basebackup).  It might be worth looking at
but it seemed out of scope for this patch.


Rebased patch is attached.

I also fixed the issue that Robert pointed out with regard to changing 
the function signatures. It's a bit weird that there are optional 
parameters when the last parameter is really not optional (i.e. the 
default will lead to an error). Any suggestions are appreciated.


Reading back through the thread, the major objection to the original 
patch was that it was submitted too soon. Hopefully the elapsed time 
addresses that concern.


Another common objection is that the new API is hard to use from bash 
scripts. This is arguably still true, but Laurenz has demonstrated that 
bash is capable of this feat:


https://github.com/cybertec-postgresql/safe-backup

Lastly, there is some concern about getting the backup label too late 
when doing snapshots or using traditional backup software. It would 
certainly be possible to return the backup_label and tablespace_map from 
pg_start_backup() and let the user make the choice about what to do with 
them. Any of these methods will require some scripting so I don't see 
why the files couldn't be written as, for example, backup_label.snap and 
tablespace_map.snap and then renamed by the restore script. The patch 
does not currently make this change, but it could be added pretty easily 
if that overcomes this objection.


Regards,
--
-David
da...@pgmasters.net
From 035153589f47047d38cbb2ffef89d7f1f67e85cc Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Tue, 30 Jun 2020 17:23:51 -0400
Subject: [PATCH] Remove Deprecated Exclusive Backup Mode.

Remove the exclusive backup mode which has been deprecated since 
PostgreSQL 9.6.

---
 doc/src/sgml/backup.sgml  | 150 +-
 doc/src/sgml/func.sgml|  82 +--
 src/backend/access/transam/xlog.c | 476 ++
 src/backend/access/transam/xlogfuncs.c| 245 ++---
 src/backend/catalog/system_views.sql  |   3 +-
 src/backend/postmaster/postmaster.c   |  46 +-
 src/backend/replication/basebackup.c  |   6 +-
 src/backend/utils/init/postinit.c |   2 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |   4 +
 src/bin/pg_rewind/filemap.c   |   6 +-
 src/include/access/xlog.h |   3 +-
 src/include/catalog/pg_proc.dat   |  12 +-
 src/include/libpq/libpq-be.h  |   3 +-
 src/include/miscadmin.h   |   4 -
 src/test/perl/PostgresNode.pm |  56 +--
 .../t/010_logical_decoding_timelines.pl   |   4 +-
 16 files changed, 128 insertions(+), 974 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index bdc9026c62..c544d79a1b 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -819,16 +819,9 @@ test ! -f /mnt/server/archivedir/000100A90065 
&& cp pg_wal/0
 sequence, and that the success of a step is verified before
 proceeding to the next step.

-   
-Low level base backups can be made in a non-exclusive or an exclusive
-way. The non-exclusive method is recommended and the exclusive one is
-deprecated and will eventually be removed.
-   
 
-   
-Making a Non-Exclusive Low-Level Backup
 
- A non-exclusive low level backup is one that allows other
+ A low level backup allows other
  concurrent backups to be running (both those started using
  the same backup API and those started using
  ).
@@ -868,8 +861,9 @@ SELECT pg_start_backup('label', false, false);
 
 
 
- The third parameter being false tells
- pg_start_backup to initiate a non-exclusive base 
backup.
+ The third parameter must always be set to false. The
+ default of true requests an exclusive backup, which is
+ no longer supported.
 


@@ -946,142 +940,6 @@ SELECT * FROM pg_stop_backup(false, true);

   
 
-   
-   
-Making an Exclusive Low-Level Backup
-
-
- 
-  The exclusive backup method is deprecated and should be avoided.
-  Prior to PostgreSQL 9.6, this was the only
-  low-level method available, but it is now recommended that all users
-  upgrade their scripts to use non-exclusive backups.
- 
-
-
-
- The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps.

Re: new heapcheck contrib module

2020-06-30 Thread Alvaro Herrera

On 2020-Jun-30, Mark Dilger wrote:

> I'm guessing that hardening the backend would be a separate patch?  Or
> did you want that as part of this one?

Lately, to me the foremost criterion to determine what is a separate
patch and what isn't is the way the commit message is structured.  If it
looks too much like a bullet list of unrelated things, that suggests
that the commit should be split into one commit per bullet point; of
course, there are counterexamples.  But when I have a commit message
that says "I do A, and I also do B because I need it for A", then it
makes more sense to do B first standalone and then A on top.  OTOH if
two things are done because they're heavily intermixed (e.g. commit
850196b610d2, bullet points galore), that suggests that one commit is a
decent approach.

Just my opinion, of course.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Remove Deprecated Exclusive Backup Mode

2020-06-30 Thread Stephen Frost

Greetings,

* David Steele (da...@pgmasters.net) wrote:
> Lastly, there is some concern about getting the backup label too late when
> doing snapshots or using traditional backup software. It would certainly be
> possible to return the backup_label and tablespace_map from
> pg_start_backup() and let the user make the choice about what to do with
> them. Any of these methods will require some scripting so I don't see why
> the files couldn't be written as, for example, backup_label.snap and
> tablespace_map.snap and then renamed by the restore script. The patch does
> not currently make this change, but it could be added pretty easily if that
> overcomes this objection.

Of course, if someone just restored that snapshot without actually
moving the files into place they'd get a corrupted database without any
indication of anything being wrong...

And if we checked for those files on startup and complained about them
being there (called .snap) then we're back into the "crash during backup
causes PG to fail to start" situation.

All of which is, as I recall, is why we have the API we do today.

As such, doing that doesn't strike me as an improvement over using the
new API and making it abundently clear that it's not so simple as people
might like to think it is...

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: estimation problems for DISTINCT ON with FDW

2020-06-30 Thread Tom Lane

I wrote:
> I poked into that and found that the problem is in estimate_num_groups,
> which effectively just disregards any relation that has rel->tuples = 0.
> That is the case for a postgres_fdw foreign table if use_remote_estimate
> is true, because postgres_fdw never bothers to set any other value.
> (On the other hand, if use_remote_estimate is false, it does fill in a
> pretty-bogus value, mainly so it can use set_baserel_size_estimates.
> See postgresGetForeignRelSize.)

> It seems like we could make estimate_num_groups a bit more robust here;
> it could just skip its attempts to clamp based on total size or
> restriction selectivity, but still include the reldistinct value for the
> rel into the total numdistinct.  I wonder though if this is the only
> problem caused by failing to fill in any value for rel->tuples ...
> should we make postgres_fdw install some value for that?

Attached are a couple of quick-hack patches along each of those lines.
Either one resolves the crazy number-of-groups estimate for Jeff's
example; neither changes any existing regression test results.

On the whole I'm not sure I like 0001 (ie, changing estimate_num_groups).
Sure, it makes that function "more robust", but it does so at the cost
of believing what might be a default or otherwise pretty insane
reldistinct estimate.  We put in the clamping behavior for a reason,
and I'm not sure we should disable it just because reltuples = 0.

0002 seems like a better answer on the whole, but it has a pretty
significant issue as well: it's changing the API for FDW
GetForeignRelSize functions, because now we're expecting them to set
both rows and tuples to something sane, contrary to the existing docs.

What I'm sort of inclined to do is neither of these exactly, but
instead put the

baserel->tuples = Max(baserel->tuples, baserel->rows);

clamping behavior into the core code, immediately after the call to
GetForeignRelSize.  This'd still let the FDW set baserel->tuples if
it has a mind to, while not requiring that; and it prevents the
situation where the rows and tuples estimates are inconsistent.

regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index be08eb4814..8eea59d5a3 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3647,14 +3647,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 	(1 - pow((rel->tuples - rel->rows) / rel->tuples,
 			 rel->tuples / reldistinct));
 			}
-			reldistinct = clamp_row_est(reldistinct);
-
-			/*
-			 * Update estimate of total distinct groups.
-			 */
-			numdistinct *= reldistinct;
 		}
 
+		/*
+		 * Update estimate of total distinct groups.
+		 */
+		reldistinct = clamp_row_est(reldistinct);
+		numdistinct *= reldistinct;
+
 		varinfos = newvarinfos;
 	} while (varinfos != NIL);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..fc061adedb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -688,6 +688,15 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		/* Report estimated baserel size to planner. */
 		baserel->rows = fpinfo->rows;
 		baserel->reltarget->width = fpinfo->width;
+
+		/*
+		 * plancat.c copied baserel->pages and baserel->tuples from pg_class.
+		 * If the foreign table has never been ANALYZEd, or if its stats are
+		 * out of date, baserel->tuples might now be less than baserel->rows,
+		 * which will confuse assorted logic.  Hack it to appear minimally
+		 * sensible.  (Do we need to hack baserel->pages too?)
+		 */
+		baserel->tuples = Max(baserel->tuples, baserel->rows);
 	}
 	else
 	{

Re: Remove Deprecated Exclusive Backup Mode

2020-06-30 Thread David Steele


On 6/30/20 6:13 PM, Stephen Frost wrote:

Greetings,

* David Steele (da...@pgmasters.net) wrote:

Lastly, there is some concern about getting the backup label too late when
doing snapshots or using traditional backup software. It would certainly be
possible to return the backup_label and tablespace_map from
pg_start_backup() and let the user make the choice about what to do with
them. Any of these methods will require some scripting so I don't see why
the files couldn't be written as, for example, backup_label.snap and
tablespace_map.snap and then renamed by the restore script. The patch does
not currently make this change, but it could be added pretty easily if that
overcomes this objection.


Of course, if someone just restored that snapshot without actually
moving the files into place they'd get a corrupted database without any
indication of anything being wrong...

And if we checked for those files on startup and complained about them
being there (called .snap) then we're back into the "crash during backup
causes PG to fail to start" situation.

All of which is, as I recall, is why we have the API we do today.


I'm certainly not proposing that we ignore .snap files (or whatever). 
There are a many ways a restore can be done incorrectly and not be 
detected. The restore script would be responsible for knowing the rules 
and renaming the files or erroring out.



As such, doing that doesn't strike me as an improvement over using the
new API and making it abundently clear that it's not so simple as people
might like to think it is...


Sure, backup is complicated, but I think there's an argument for 
providing the backup_label and tablespace_map files at the beginning of 
the backup since they are available at that point. And maybe put some 
scary language about storing them in PGDATA.


Regards,
--
-David
da...@pgmasters.net

Re: track_planning causing performance regression

2020-06-30 Thread Peter Geoghegan

On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao  wrote:
> Ants and Andres suggested to replace the spinlock used in pgss_store() with
> LWLock. I agreed with them and posted the POC patch doing that. But I think
> the patch is an item for v14. The patch may address the reported performance
> issue, but may cause other performance issues in other workloads. We would
> need to measure how the patch affects the performance in various workloads.
> It seems too late to do that at this stage of v13. Thought?

I agree that it's too late for v13.

Thanks
--
Peter Geoghegan

Re: pgsql: Enable Unix-domain sockets support on Windows

2020-06-30 Thread Andrew Dunstan



On 6/30/20 12:13 AM, Amit Kapila wrote:
> On Mon, Jun 29, 2020 at 8:48 PM Andrew Dunstan
>  wrote:
>>
>>
>>
>> It needs to be a path from the Windows POV, not an Msys virtualized
>> path. So c:/tmp or just /tmp should work, but /c/tmp or similar probably
>> will not. The directory needs to exist. I just checked that this is
>> working, both in postgresql.conf and on the psql command line.
>>
> Okay, thanks for the verification.  I was trying to see its behavior
> on Win7 or a similar environment where this feature is not supported
> to see if we get the appropriate error message.  If by any chance, you
> have access to such an environment, then it might be worth trying on
> such an environment once.
>


I haven't had a working Windows 7 environment for quite some years.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Remove Deprecated Exclusive Backup Mode

2020-06-30 Thread Stephen Frost

Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 6/30/20 6:13 PM, Stephen Frost wrote:
> >* David Steele (da...@pgmasters.net) wrote:
> >>Lastly, there is some concern about getting the backup label too late when
> >>doing snapshots or using traditional backup software. It would certainly be
> >>possible to return the backup_label and tablespace_map from
> >>pg_start_backup() and let the user make the choice about what to do with
> >>them. Any of these methods will require some scripting so I don't see why
> >>the files couldn't be written as, for example, backup_label.snap and
> >>tablespace_map.snap and then renamed by the restore script. The patch does
> >>not currently make this change, but it could be added pretty easily if that
> >>overcomes this objection.
> >
> >Of course, if someone just restored that snapshot without actually
> >moving the files into place they'd get a corrupted database without any
> >indication of anything being wrong...
> >
> >And if we checked for those files on startup and complained about them
> >being there (called .snap) then we're back into the "crash during backup
> >causes PG to fail to start" situation.
> >
> >All of which is, as I recall, is why we have the API we do today.
> 
> I'm certainly not proposing that we ignore .snap files (or whatever). There
> are a many ways a restore can be done incorrectly and not be detected. The
> restore script would be responsible for knowing the rules and renaming the
> files or erroring out.

If we don't ignore them then what are we going to do when they exist..?

> >As such, doing that doesn't strike me as an improvement over using the
> >new API and making it abundently clear that it's not so simple as people
> >might like to think it is...
> 
> Sure, backup is complicated, but I think there's an argument for providing
> the backup_label and tablespace_map files at the beginning of the backup
> since they are available at that point. And maybe put some scary language
> about storing them in PGDATA.

but still require that the connection be kept open until the
pg_stop_backup() is called?  I suppose we could do that but it seems
unlikely to change much for anyone.

De-deprecating the exclusive backup mode and returning that info at the
start and telling users "put this anywhere EXCEPT this specific place,
and make sure you DO put these files in that exact place when you
restore" would perhaps be closer to what folks who really, really, want
to do snapshot-based backups would want... if that ends up being a net
positive for users or not is questionable imv.  One could imagine a set
of scripts that would save that info to a directory above PGDATA but on
the appropriate volume to be included in the snapshot, and then on
restore would put them into the right place to allow PG to restore from
the backup properly, removing the complication around having to save
that data elsewhere or snapshot another filesystem after the backup is
run or such.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Remove Deprecated Exclusive Backup Mode

2020-06-30 Thread Bruce Momjian

On Tue, Jun 30, 2020 at 07:10:46PM -0400, Stephen Frost wrote:
> De-deprecating the exclusive backup mode and returning that info at the
> start and telling users "put this anywhere EXCEPT this specific place,
> and make sure you DO put these files in that exact place when you
> restore" would perhaps be closer to what folks who really, really, want
> to do snapshot-based backups would want... if that ends up being a net

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Remove Deprecated Exclusive Backup Mode

2020-06-30 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Jun 30, 2020 at 07:10:46PM -0400, Stephen Frost wrote:
> > De-deprecating the exclusive backup mode and returning that info at the
> > start and telling users "put this anywhere EXCEPT this specific place,
> > and make sure you DO put these files in that exact place when you
> > restore" would perhaps be closer to what folks who really, really, want
> > to do snapshot-based backups would want... if that ends up being a net
> 
> Agreed.

Alright, in that case I think we need a victim^Wvolunteer to step up to
rewrite the documentation for backup for these two different methods,
with very clear instructions on how to do this properly (which really
should also include removing the particularly bad 'cp' command usage).

Maybe I'm wrong, but as I recall David already tried to do that once and
I doubt he's going to be interested in another round of that.

Otherwise, it seems best to stop trying to paint this as something
that's simple to do and rip the bandaid off and remove this long
deprecated method, so we can have some sane documentation for the
software writers who are writing backup solutions for PG.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Extracting only the columns needed for a query

2020-06-30 Thread Soumyadeep Chakraborty

Hello,

Melanie and me will be posting a separate thread with the scanCols patch
listed here, a patch to capture the set of cols in RETURNING and a group
of patches to pass down the list of cols to various table AM functions
together as a patch set. This will take some time. Thus, we are
deregistering this patch for the July commitfest and will come back.

Regards,
Soumyadeep (VMware)

Deleting older versions in unique indexes to avoid page splits

2020-06-30 Thread Peter Geoghegan

Attached is a POC patch that teaches nbtree to delete old duplicate
versions from unique indexes. The optimization targets non-HOT
duplicate version bloat. Although the patch is rather rough, it
nevertheless manages to more or less eliminate a whole class of index
bloat: Unique index bloat from non-HOT updates in workloads where no
transaction lasts for more than a few seconds. For example, it
eliminates index bloat with a custom pgbench workload that uses an
INCLUDE unique index on pgbench_accounts.aid (with abalance as the
non-key attribute), instead of the usual accounts primary key.
Similarly, with a standard pgbench_accounts primary key alongside an
extra non-unique index on abalance, the primary key will never have
any page splits with the patch applied. It's almost as if the updates
were actually HOT updates, at least if you focus on the unique index
(assuming that there are no long-running transactions).

The patch repurposes the deduplication infrastructure to delete
duplicates within unique indexes, provided they're actually safe to
VACUUM. This is somewhat different to the _bt_unique_check() LP_DEAD
bit setting stuff, in that we have to access heap pages that we
probably would not have to access otherwise -- it's something that we
go out of our way to make happen at the point that the page is about
to split, not something that happens in passing at no extra cost. The
general idea is to exploit the fact that duplicates in unique indexes
are usually deadwood.

We only need to "stay one step ahead" of the bloat to avoid all page
splits in many important cases. So we usually only have to access a
couple of heap pages to avoid a page split in each case. In
traditional serial/identity column primary key indexes, any page split
that happens that isn't a split of the current rightmost page must be
caused by version churn. It should be possible to avoid these
"unnecessary" page splits altogether (again, barring long-running
transactions).

I would like to get early feedback on high level direction. While the
patch seems quite promising, I am uncertain about my general approach,
and how it might fit into some much broader effort to control bloat in
general.

There are some clear downsides to my approach. The patch has grotty
heuristics that try to limit the extra work performed to avoid page
splits -- the cost of accessing additional heap pages while a buffer
lock is held on the leaf page needs to be kept. under control. No
doubt this regresses some workloads without giving them a clear
benefit. Also, the optimization only ever gets used with unique
indexes, since they're the only case where a duplicate naturally
suggests version churn, which can be targeted fairly directly, and
without wasting too many cycles when it doesn't work out.

It's not at all clear how we could do something like this with
non-unique indexes. One related-though-distinct idea that might be
worth considering occurs to me: teach nbtree to try to set LP_DEAD
bits in non-unique indexes, in about the same way as it will in
_bt_check_unique() for unique indexes. Perhaps the executor could hint
to btinsert()/aminsert() that it's inserting a duplicate caused by a
non-HOT update, so it's worth trying to LP_DEAD nearby duplicates --
especially if they're on the same heap page as the incoming item.

There is a wholly separate question about index bloat that is of long
term strategic importance to the Postgres project: what should we do
about long running transactions? I tend to think that we can address
problems in that area by indicating that it is safe to delete
"intermediate" versions -- tuples that are not too old to be seen by
the oldest transaction, that are nevertheless not needed (they're too
new to be interesting to the old transaction's snapshot, but also too
old to be interesting to any other snapshot). Perhaps this
optimization could be pursued in a phased fashion, starting with index
AMs, where it seems less scary.

I recently read a paper that had some ideas about what we could do
here [1]. IMV it is past time that we thrashed together a "remove
useless intermediate versions" design that is compatible with the
current heapam design.

[1] https://dl.acm.org/doi/pdf/10.1145/3318464.3389714
-- 
Peter Geoghegan


0001-Non-opportunistically-delete-B-Tree-items.patch
Description: Binary data

Re: max_slot_wal_keep_size and wal_keep_segments

2020-06-30 Thread Kyotaro Horiguchi

At Tue, 30 Jun 2020 23:51:40 +0900, Fujii Masao  
wrote in 
> Hi,
> 
> When I talked about max_slot_wal_keep_size as new feature in v13
> at the conference, I received the question like "Why are the units of
> setting values in max_slot_wal_keep_size and wal_keep_segments
> different?"
> from audience. That difference looks confusing for users and
> IMO it's better to use the same unit for them. Thought?

We are moving the units for amount of WAL from segments to MB.  The
variable is affected by the movement.  I'm not sure wal_keep_segments
is going to die soon but we may change it to wal_keep_size(_mb) sooner
or later if it going to stay alive.

> There seem to be several options to do this.
> 
> (1) Rename max_slot_wal_keep_size to max_slot_wal_keep_segments,
> and make users specify the number of WAL segments in it instead of
> WAL size.

I don't think this is not the way.

> (2) Rename wal_keep_segments to wal_keep_size, and make users specify
>  the WAL size in it instead of the number of WAL segments.

Yes. I agree to this (as I wrote above before reading this).

> (3) Don't rename the parameters, and allow users to specify not only
> the number of WAL segments but also the WAL size in wal_keep_segments.

Possible in a short term, but not for a long term.

> Since we have been moving away from measuring in segments, e.g.,
> max_wal_size, I don't think (1) is good idea.
> 
> For backward compatibility, (3) is better. But which needs more
> (maybe a bit complicated) code in guc.c. Also the parameter names
> are not consistent yet (i.e., _segments and _size).
> 
> So for now I like (2).
> 
> Thought?

I agree to you.  If someone found that wal_keep_segment is no longer
usable, the alternative would easily be found by searching config file
for "wal_keep". Or we could have a default config line like this:

wal_keep_size = 0 # in megabytes: 0 disables (formerly wal_keep_segments)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Open Item: Should non-text EXPLAIN always show properties?

2020-06-30 Thread David Rowley

On Thu, 25 Jun 2020 at 21:15, David Rowley  wrote:
> I've attached a small patch that changes the Hash Aggregate behaviour
> to always show these properties for non-text formats.

I've pushed this change for HashAgg only and marked the open item as
completed for hash agg.  I'll leave it up to Justin, Tomas and James
to decide what to do with the incremental sort EXPLAIN open item.

David

Re: Remove Deprecated Exclusive Backup Mode

2020-06-30 Thread Bruce Momjian

On Tue, Jun 30, 2020 at 07:32:47PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Tue, Jun 30, 2020 at 07:10:46PM -0400, Stephen Frost wrote:
> > > De-deprecating the exclusive backup mode and returning that info at the
> > > start and telling users "put this anywhere EXCEPT this specific place,
> > > and make sure you DO put these files in that exact place when you
> > > restore" would perhaps be closer to what folks who really, really, want
> > > to do snapshot-based backups would want... if that ends up being a net
> > 
> > Agreed.
> 
> Alright, in that case I think we need a victim^Wvolunteer to step up to
> rewrite the documentation for backup for these two different methods,
> with very clear instructions on how to do this properly (which really
> should also include removing the particularly bad 'cp' command usage).
> 
> Maybe I'm wrong, but as I recall David already tried to do that once and
> I doubt he's going to be interested in another round of that.
> 
> Otherwise, it seems best to stop trying to paint this as something
> that's simple to do and rip the bandaid off and remove this long
> deprecated method, so we can have some sane documentation for the
> software writers who are writing backup solutions for PG.

I can work on it but I don't think I know all the details.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: min_safe_lsn column in pg_replication_slots view

2020-06-30 Thread Kyotaro Horiguchi

At Tue, 30 Jun 2020 23:23:30 +0900, Fujii Masao  
wrote in 
> >> Can we consider an option to "Remove min_safe_lsn; document how a user
> >> can monitor the distance"?  We have a function to get current WAL
> >> insert location and other things required are available either via
> >> view or as guc variable values.  The reason I am thinking of this
> >> option is that it might be better to get some more feedback on what is
> >> the most appropriate value to display.  However, I am okay if we can
> >> reach a consensus on one of the above options.
> > Yes, that's an idea. But it might not be easy to calculate that
> > distance
> > manually by subtracting max_slot_wal_keep_size from the current LSN.
> > Because we've not supported -(pg_lsn, numeric) operator yet. I'm
> > proposing that operator, but it's for v14.
> 
> Sorry this is not true. That distance can be calculated without those
> operators.
> For example,
> 
> SELECT restart_lsn - pg_current_wal_lsn() + (SELECT setting::numeric *
> 1024 * 1024 FROM pg_settings WHERE name = 'max_slot_wal_keep_size')
> distance FROM pg_replication_slots;

It's an approximation with accuracy of segment size. The calculation
would be not that simple because of the unit of the calculation.  The
formula for the exact calculateion (ignoring wal_keep_segments) is:

distance = (seg_floor(restart_lsn) +
 seg_floor(max_slot_wal_keep_size) + 1) * wal_segment_size -
current_lsn

where seg_floor is floor() by wal_segment_size.

regards.

> If the calculated distance is small or negative value, which means
> that
> we may lose some required WAL files. So in this case it's worth
> considering
> to increase max_slot_wal_keep_size.
> 
> I still think it's better and more helpful to display something like
> that distance in pg_replication_slots rather than making each user
> calculate it...

Agreed.  The attached replaces min_safe_lsn with "distance".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 7e078c9ff8e0594ce8f4e95c7db84ea0a31e9775 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 30 Jun 2020 21:09:19 +0900
Subject: [PATCH v1] Replace min_safe_lsn with "distance" in
 pg_replication_slots

pg_replication_slot.min_safe_lsn, which shows the oldest LSN kept in
pg_wal, is doubtful in usability for monitoring. Change it to
distance, which shows how many bytes the server can advance before the
slot loses required segments.
---
 src/backend/access/transam/xlog.c | 39 +++
 src/backend/catalog/system_views.sql  |  2 +-
 src/backend/replication/slotfuncs.c   | 19 +--
 src/include/access/xlog.h |  1 +
 src/include/catalog/pg_proc.dat   |  4 +--
 src/test/recovery/t/019_replslot_limit.pl | 20 ++--
 src/test/regress/expected/rules.out   |  4 +--
 7 files changed, 62 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fd93bcfaeb..1f27639912 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9570,6 +9570,45 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	return WALAVAIL_REMOVED;
 }
 
+/*
+ * Calculate how many bytes we can advance from currptr until the targetLSN is
+ * removed.
+ *
+ * Returns 0 if the distance is invalid.
+ */
+uint64
+DistanceToWALHorizon(XLogRecPtr targetLSN, XLogRecPtr currptr)
+{
+	XLogSegNo	targetSeg;
+	XLogSegNo	keepSegs;
+	XLogSegNo	failSeg;
+	XLogRecPtr	horizon;
+
+	XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+	keepSegs = 0;
+
+	/* no limit if max_slot_wal_keep_size is invalid */
+	if (max_slot_wal_keep_size_mb < 0)
+		return 0;
+
+	/* How many segments slots can keep? */
+	keepSegs = ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
+
+	/* override by wal_keep_segments if needed */
+	if (wal_keep_segments > keepSegs)
+		keepSegs = wal_keep_segments;
+
+	/* calculate the LSN where targetLSN is lost when currpos reaches */
+	failSeg = targetSeg + keepSegs + 1;
+	XLogSegNoOffsetToRecPtr(failSeg, 0, wal_segment_size, horizon);
+
+	/* If currptr already beyond the horizon, return zero. */
+	if (currptr > horizon)
+		return 0;
+
+	/* return the distance from currptr to the horizon */
+	return horizon - currptr;
+}
 
 /*
  * Retreat *logSegNo to the last segment that we need to retain because of
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..b9847a9f92 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -879,7 +879,7 @@ CREATE VIEW pg_replication_slots AS
 L.restart_lsn,
 L.confirmed_flush_lsn,
 L.wal_status,
-L.min_safe_lsn
+L.distance
 FROM pg_get_replication_slots() AS L
 LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 88033a79b2.

Re: [PATCH] Better cleanup in TLS tests for -13beta2

2020-06-30 Thread michael

On Tue, Jun 30, 2020 at 01:13:39PM +0900, Michael Paquier wrote:
> I looked at the patch, and can confirm that client_wrongperms_tmp.key
> remains around after running 001_ssltests.pl, and client_tmp.key after
> running 002_scram.pl.  The way the patch does its cleanup looks fine
> to me, so I'll apply and backpatch where necessary, if there are no
> objections of course.

I found one problem when testing with parallel jobs once we apply this
patch (say PROVE_FLAGS="-j 4"): the tests of 001 and 002 had the idea
to use the same file name client_tmp.key, so it was possible to easily
fail the tests if for example 002 removes the temporary client key
copy that 001 needs, or vice-versa.  001 takes longer than 002, so the
removal would likely be done by the latter, not the former.  And it
was even logically possible to fail in the case where 001 removes the
file and 002 needs it, though very unlikely because 002 needs this
file for a very short amount of time and one test case.  I have fixed
this issue by just making 002 use a different file name, as we do in
001 for the case of the wrong permissions, and applied the patch down
to 13.
--
Michael


signature.asc
Description: PGP signature

HashAgg's batching counter starts at 0, but Hash's starts at 1.

2020-06-30 Thread David Rowley

Hi,

While working on 40efbf870 I noticed that when performing a Hash Join
that we always start out by setting nbatch to 1.  That seems
reasonable as it's hard to imagine being able to complete any non-zero
amount of work in fewer than 1 batch.

In the HashAgg case, since 40efbf870, we'll display:

"HashAgg Batches": 0,

if you do something like: explain(analyze, format json) select
distinct oid from pg_class;

I'd rather this said that the number of batches was 1.

Does anyone have any objections to that being changed?

David

Re: max_slot_wal_keep_size and wal_keep_segments

2020-06-30 Thread Alvaro Herrera

On 2020-Jun-30, Fujii Masao wrote:

> Hi,
> 
> When I talked about max_slot_wal_keep_size as new feature in v13
> at the conference, I received the question like "Why are the units of
> setting values in max_slot_wal_keep_size and wal_keep_segments different?"
> from audience. That difference looks confusing for users and
> IMO it's better to use the same unit for them. Thought?

Do we still need wal_keep_segments for anything?  Maybe we should
consider removing that functionality instead ... and even if we don't
remove it in 13, then what is the point of renaming it only to remove it
shortly after?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Intermittent BRIN failures on hyrax and lousyjack

2020-06-30 Thread Alvaro Herrera

On 2020-Jun-30, Tom Lane wrote:

>  SELECT brin_desummarize_range('brinidx', 0);
> +WARNING:  leftover placeholder tuple detected in BRIN index "brinidx", 
> deleting

> I think the "crash" actually was the forced autovac cancellation we see
> here.  Thus, the fact that both these animals use CLOBBER_CACHE_ALWAYS is
> not a direct cause of the failure, though it might contribute to getting
> the timing right for this to happen.

Oh, interesting.

> So (1) the comment needs to be adjusted to mention that vacuum
> cancellation is enough to create the situation, and (2) probably the
> message needs to be downgraded quite a lot, DEBUG2 or even less.
> Or maybe we could just delete the quoted stanza altogether.

Yeah, maybe deleting the whole thing is a decent answer.  When I wrote
that, I was thinking that it was a very unlikely event, but evidently
that's not true.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v12 and TimeLine switches and backups/restores

2020-06-30 Thread Stephen Frost

Greetings,

Among the changes made to PG's recovery in v12 was to set
recovery_target_timeline to be 'latest' by default.  That's handy when
you're flipping back and forth between replicas and want to have
everyone follow that game, but it's made doing some basic things like
restoring from a backup problematic.

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default.  What happens is that the restore
process will happily follow the timeline change- even though it happened
before we reached consistency, and then it'll never find the needed
end-of-backup WAL point that would allow us to reach consistency.

Naturally, a primary isn't ever going to do a TL switch, and we already
throw an error during an online backup from a replica if that replica
did a TL switch during the backup, to indicate that the backup isn't
valid.

Attached is an initial draft of a patch to at least give a somewhat
clearer error message when we detect that the user has asked us to
follow a timeline switch to a new timeline before we've reached
consistency (though I had to hack in a check to see if pg_rewind is
being used, since apparently it actually depends on PG following a
timeline switch before reaching consistency...).

Thoughts?

Thanks,

Stephen
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index fd93bcf..5dd777f
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** static void SetCurrentChunkStartTime(Tim
*** 896,902 
  static void CheckRequiredParameterValues(void);
  static void XLogReportParameters(void);
  static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
! TimeLineID prevTLI);
  static void LocalSetXLogInsertAllowed(void);
  static void CreateEndOfRecoveryRecord(void);
  static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
--- 896,902 
  static void CheckRequiredParameterValues(void);
  static void XLogReportParameters(void);
  static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
! TimeLineID prevTLI, bool pgrewind_replay);
  static void LocalSetXLogInsertAllowed(void);
  static void CreateEndOfRecoveryRecord(void);
  static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
*** static void xlog_outdesc(StringInfo buf,
*** 946,952 
  static void pg_start_backup_callback(int code, Datum arg);
  static void pg_stop_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 			  bool *backupEndRequired, bool *backupFromStandby);
  static bool read_tablespace_map(List **tablespaces);
  
  static void rm_redo_error_callback(void *arg);
--- 946,952 
  static void pg_start_backup_callback(int code, Datum arg);
  static void pg_stop_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 			  bool *backupEndRequired, bool *backupFromStandby, bool *pgrewind_replay);
  static bool read_tablespace_map(List **tablespaces);
  
  static void rm_redo_error_callback(void *arg);
*** StartupXLOG(void)
*** 6319,6324 
--- 6319,6325 
  	TransactionId oldestActiveXID;
  	bool		backupEndRequired = false;
  	bool		backupFromStandby = false;
+ 	bool		pgrewind_replay = false;
  	DBState		dbstate_at_startup;
  	XLogReaderState *xlogreader;
  	XLogPageReadPrivate private;
*** StartupXLOG(void)
*** 6506,6512 
  	master_image_masked = (char *) palloc(BLCKSZ);
  
  	if (read_backup_label(&checkPointLoc, &backupEndRequired,
! 		  &backupFromStandby))
  	{
  		List	   *tablespaces = NIL;
  
--- 6507,6513 
  	master_image_masked = (char *) palloc(BLCKSZ);
  
  	if (read_backup_label(&checkPointLoc, &backupEndRequired,
! 		  &backupFromStandby, &pgrewind_replay))
  	{
  		List	   *tablespaces = NIL;
  
*** StartupXLOG(void)
*** 7297,7303 
  	if (newTLI != ThisTimeLineID)
  	{
  		/* Check that it's OK to switch to this TLI */
! 		checkTimeLineSwitch(EndRecPtr, newTLI, prevTLI);
  
  		/* Following WAL records should be run with new TLI */
  		ThisTimeLineID = newTLI;
--- 7298,7304 
  	if (newTLI != ThisTimeLineID)
  	{
  		/* Check that it's OK to switch to this TLI */
! 		checkTimeLineSwitch(EndRecPtr, newTLI, prevTLI, pgrewind_replay);
  
  		/* Following WAL records should be run with new TLI */
  		ThisTimeLineID = newTLI;
*** UpdateFullPageWrites(void)
*** 9841,9847 
   * replay. (Currently, timeline can only change at a shutdown checkpoint).
   */
  static void
! checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI, TimeLineID prevTLI)
  {
  	/* Check that the record agrees on what the current (old) timeline is */
  	if (prevTLI != ThisTimeLineID)
---

Re: Transactions involving multiple postgres foreign servers, take 2

2020-06-30 Thread Tatsuo Ishii

>> The point is this data inconsistency is
>> lead by an inconsistent read but not by an inconsistent commit
>> results. I think there are kinds of possibilities causing data
>> inconsistency but atomic commit and atomic visibility eliminate
>> different possibilities. We can eliminate all possibilities of data
>> inconsistency only after we support 2PC and globally MVCC.
> 
> IMO any permanent data inconsistency is a serious problem for users no
> matter what the technical reasons are.

I have incorporated "Pangea" algorithm into Pgpool-II to implement the
atomic visibility. In a test below I have two PostgreSQL servers
(stock v12), server0 (port 11002) and server1 (port
11003). default_transaction_isolation was set to 'repeatable read' on
both PostgreSQL, this is required by Pangea. Pgpool-II replicates
write queries and send them to both server0 and server1. There are two
tables "t1" (having only 1 integer column "i") and "log" (having only
1 integer c column "i"). I have run following script
(inconsistency1.sql) via pgbench:

BEGIN;
UPDATE t1 SET i = i + 1;
END;

like: pgbench -n -c 1 -T 30 -f inconsistency1.sql

In the moment I have run another session from pgbench concurrently:

BEGIN;
INSERT INTO log SELECT * FROM t1;
END;

pgbench -n -c 1 -T 30 -f inconsistency2.sql

After finishing those two pgbench runs, I ran following COPY to see if
contents of table "log" are identical in server0 and server1:
psql -p 11002 -c "\copy log to '11002.txt'"
psql -p 11003 -c "\copy log to '11003.txt'"
cmp 11002.txt 11003.txt

The new Pgpool-II incorporating Pangea showed that 11002.txt and
11003.txt are identical as expected. This indicates that the atomic
visibility are kept.

On the other hand Pgpool-II which does not implement Pangea showed
differences in those files.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: POC: postgres_fdw insert batching

2020-06-30 Thread Ashutosh Bapat

On Tue, 30 Jun 2020 at 22:23, Tomas Vondra 
wrote:

> >I didn't get this. We are executing an INSERT on the foreign server,
> >so we get the number of rows INSERTed from that server. We should just
> >add those up across batches. If there's a failure, it would abort the
> >transaction, local as well as remote.
> >
>
> True, but it's not the FDW code doing the counting - it's the caller,
> depending on whether the ExecForeignInsert returns a valid slot or NULL.
> So it's not quite possible to just return a number of inserted tuples,
> as returned by the remote server.
>

Hmm yes, now I remember that bit. So for every row buffered, we return a
valid slot without knowing whether that row was inserted on the remote
server or not. I think we have that problem even now where a single INSERT
might result in multiple INSERTs on the remote server (rare but not
completely impossible).


>
> >In your patch, I see that an INSERT statement with batch is
> >constructed as INSERT INTO ... VALUES (...), (...) as many values as
> >the batch size. That won't work as is for UPDATE/DELETE since we can't
> >pass multiple pairs of ctids and columns to be updated for each ctid
> >in one statement. Maybe we could build as many UPDATE/DELETE
> >statements as the size of a batch, but that would be ugly. What we
> >need is a feature like a batch prepared statement in libpq similar to
> >what JDBC supports
> >((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
> >This will allow a single prepared statement to be executed with a
> >batch of parameters, each batch corresponding to one foreign DML
> >statement.
> >
>
> I'm pretty sure we could make it work with some array/unnest tricks to
> build a relation, and use that as a source of data.
>

That sounds great. The solution will be limited to postgres_fdw only.


> I don't see why not support both, the use cases are somewhat different I
> think.
>

+1, if we can do both.

-- 
Best Wishes,
Ashutosh

Re: Creating a function for exposing memory usage of backend process

2020-06-30 Thread torikoshia

On Mon, Jun 29, 2020 at 3:13 PM Fujii Masao 
 wrote:



Could you add this patch to Commitfest 2020-07?


Thanks for notifying, I've added it.
BTW, I registered you as an author because this patch used
lots of pg_cheat_funcs' codes.

  https://commitfest.postgresql.org/28/2622/


This patch provides only the function, but isn't it convenient to
provide the view like pg_shmem_allocations?



Sounds good. But isn't it better to document each column?
Otherwise, users cannot undersntad what "ident" column indicates.


Agreed.
Attached a patch for creating a view for local memory context
and its explanation on the document.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 29 Jul 2020 13:02:29 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

Backend processes sometimes use a lot of memory because of various
reasons like caches, prepared statements and cursors.

Previously, the only way to examine the usage of backend processes
was attaching a debugger and call MemoryContextStats() and it was
not so convenient in some cases.

This patch implements a new SQL-callable function
pg_stat_get_memory_contexts which exposes memory usage of the
local backend.
It also adds a new view pg_stat_local_memory_contexts for exposing
local backend memory contexts.
---
 doc/src/sgml/monitoring.sgml | 137 +++
 src/backend/catalog/system_views.sql |  13 +++
 src/backend/postmaster/pgstat.c  |  80 
 src/backend/utils/adt/pgstatfuncs.c  |  45 +
 src/include/catalog/catversion.h |   2 +-
 src/include/catalog/pg_proc.dat  |   9 ++
 src/include/pgstat.h |   6 +-
 src/test/regress/expected/rules.out  |  10 ++
 8 files changed, 300 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dfa9d0d641..9b82af54d0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -350,6 +350,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_local_memory_contextspg_stat_local_memory_contexts
+  One row per memory context, showing information about
+   the current memory context on the local backend.
+   See 
+   pg_stat_local_memory_contexts for details.
+  
+ 
+
  
   pg_stat_progress_analyzepg_stat_progress_analyze
   One row for each backend (including autovacuum worker processes) running
@@ -3053,6 +3062,120 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_local_memory_contexts
+
+  
+   pg_stat_local_memory_contexts
+  
+
+  
+   The pg_stat_local_memory_contexts view will
+   contain one row per memory context, showing information about the
+   current memory context on the local backend.
+  
+
+  
+   pg_stat_local_memory_contexts View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Name of the context
+  
+ 
+
+ 
+  
+   ident text
+  
+  
+   Identification information of the context. This field is truncated if the identification field is longer than MEMORY_CONTEXT_IDENT_SIZE (1024 characters in a standard build)
+  
+ 
+
+ 
+  
+   parent text
+  
+  
+   Name of the parent of this context
+  
+ 
+
+ 
+  
+   level integer
+  
+  
+   Distance from TopMemoryContext in context tree
+  
+ 
+
+ 
+  
+   total_bytes bigint
+  
+  
+   Total bytes requested from malloc
+  
+ 
+
+ 
+  
+   total_nblocks bigint
+  
+  
+   Total number of malloc blocks
+  
+ 
+
+ 
+  
+   free_bytes bigint
+  
+  
+   Free space in bytes
+  
+ 
+
+ 
+  
+   free_chunks bigint
+  
+  
+   Total number of free chunks
+  
+ 
+
+ 
+  
+   used_bytes bigint
+  
+  
+   Used space in bytes
+  
+ 
+
+   
+  
+
+ 
+
  
   pg_stat_archiver
 
@@ -4617,6 +4740,20 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i

   
 
+  
+   
+
+ pg_stat_get_memory_contexts
+
+pg_stat_get_memory_contexts ()
+setof record
+   
+   
+Returns records of information about all memory contexts of the
+local backend.
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..134b20f13f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -793,6 +793,19 @@ CREATE VIEW pg_stat_replication AS
 JOIN pg_stat

Re: ModifyTable overheads in generic plans

2020-06-30 Thread Amit Langote

On Fri, Jun 26, 2020 at 9:36 PM Amit Langote  wrote:
> I would like to discuss a refactoring patch that builds on top of the
> patches at [1] to address $subject.

I forgot to update a place in postgres_fdw causing one of its tests to crash.

Fixed in the attached updated version.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


v2-0003-Delay-initializing-UPDATE-DELETE-ResultRelInfos.patch
Description: Binary data


v2-0002-Do-not-set-rootResultRelIndex-unnecessarily.patch
Description: Binary data


v2-0001-Revise-how-some-FDW-executor-APIs-obtain-ResultRe.patch
Description: Binary data

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.

2020-06-30 Thread Jeff Davis

On Tue, Jun 30, 2020, 7:04 PM David Rowley  wrote:

> Does anyone have any objections to that being changed?
>

That's OK with me. By the way, I'm on vacation and will catch up on these
HashAgg threads next week.

Regards,
 Jeff Davis

74 matches

Mail list logo