date:20200528

Re: BufFileRead() error signalling

2020-05-28 Thread Michael Paquier

On Wed, May 27, 2020 at 11:59:59AM -0400, Alvaro Herrera wrote:
> In the discussion that led to 811b6e36a9e2 I did suggest to use "read
> only M of N" instead, but there wasn't enough discussion on that fine
> point so we settled on what you now call prevalent without a lot of
> support specifically on that.  I guess it was enough of an improvement
> over what was there.  But like Robert, I too prefer the wording that
> includes "only" and "bytes" over the wording that doesn't.
> 
> I'll let it be known that from a translator's point of view, it's a
> ten-seconds job to update a fuzzy string from not including "only" and
> "bytes" to one that does.  So let's not make that an argument for not
> changing.

Using "only" would be fine by me, though I tend to prefer the existing
one.  Now I think that we should avoid "bytes" to not have to worry
about pluralization of error messages.  This has been a concern in the
past (see e5d11b9 and the likes).
--
Michael


signature.asc
Description: PGP signature

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-28 Thread Dilip Kumar

On Tue, May 26, 2020 at 12:00 PM Amit Kapila  wrote:
>
> On Fri, May 22, 2020 at 6:22 PM Dilip Kumar  wrote:
> >
> > On Mon, May 18, 2020 at 4:10 PM Amit Kapila  wrote:
> > >
> > > On Sun, May 17, 2020 at 12:41 PM Dilip Kumar  
> > > wrote:
> > > >
> > > > On Fri, May 15, 2020 at 4:04 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > >
> > > > > Review comments:
> > > > > --
> > > > > 1.
> > > > > @@ -1762,10 +1952,16 @@ ReorderBufferCommit(ReorderBuffer *rb,
> > > > > TransactionId xid,
> > > > >   }
> > > > >
> > > > >   case REORDER_BUFFER_CHANGE_MESSAGE:
> > > > > - rb->message(rb, txn, change->lsn, true,
> > > > > - change->data.msg.prefix,
> > > > > - change->data.msg.message_size,
> > > > > - change->data.msg.message);
> > > > > + if (streaming)
> > > > > + rb->stream_message(rb, txn, change->lsn, true,
> > > > > +change->data.msg.prefix,
> > > > > +change->data.msg.message_size,
> > > > > +change->data.msg.message);
> > > > > + else
> > > > > + rb->message(rb, txn, change->lsn, true,
> > > > > +change->data.msg.prefix,
> > > > > +change->data.msg.message_size,
> > > > > +change->data.msg.message);
> > > > >
> > > > > Don't we need to set any_data_sent flag while streaming messages as we
> > > > > do for other types of changes?
> > > >
> > > > I think any_data_sent, was added to avoid sending abort to the
> > > > subscriber if we haven't sent any data,  but this is not complete as
> > > > the output plugin can also take the decision not to send.  So I think
> > > > this should not be done as part of this patch and can be done
> > > > separately.  I think there is already a thread for handling the
> > > > same[1]
> > > >
> > >
> > > Hmm, but prior to this patch, we never use to send (empty) aborts but
> > > now that will be possible. It is probably okay to deal that with
> > > another patch mentioned by you but I felt at least any_data_sent will
> > > work for some cases.  OTOH, it appears to be half-baked solution, so
> > > we should probably refrain from adding it.  BTW, how do the pgoutput
> > > plugin deal with it? I see that apply_handle_stream_abort will
> > > unconditionally try to unlink the file and it will probably fail.
> > > Have you tested this scenario after your latest changes?
> >
> > Yeah, I see, I think this is a problem,  but this exists without my
> > latest change as well, if pgoutput ignore some changes because it is
> > not published then we will see a similar error.  Shall we handle the
> > ENOENT error case from unlink?
> Isn't this problem only for subxact file as we anyway create changes
> file as part of start stream message which should have come after
> abort?  If so, can't we detect whether subxact file exists probably by
> using nsubxacts or something like that?  Can you please once try to
> reproduce this scenario to ensure that we are not missing anything?

I have tested this, as of now, by default we create both changes and
subxact files irrespective of whether we get any subtransactions or
not.  Maybe this could be optimized that only if we have any subxact
then only create that file otherwise not?  What's your opinion on the
same.

> > > > > 8.
> > > > > @@ -2295,6 +2677,13 @@ ReorderBufferXidSetCatalogChanges(ReorderBuffer
> > > > > *rb, TransactionId xid,
> > > > >   txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
> > > > >
> > > > >   txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > +
> > > > > + /*
> > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes too
> > > > > + * if one of its children has.
> > > > > + */
> > > > > + if (txn->toptxn != NULL)
> > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > >  }
> > > > >
> > > > > Why are we marking top transaction here?
> > > >
> > > > We need to mark top transaction to decide whether to build tuplecid
> > > > hash or not.  In non-streaming mode, we are only sending during the
> > > > commit time, and during commit time we know whether the top
> > > > transaction has any catalog changes or not based on the invalidation
> > > > message so we are marking the top transaction there in DecodeCommit.
> > > > Since here we are not waiting till commit so we need to mark the top
> > > > transaction as soon as we mark any of its child transactions.
> > > >
> > >
> > > But how does it help?  We use this flag (via
> > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is
> > > anyway done in DecodeCommit and that too after setting this flag for
> > > the top transaction if required.  So, how will it help in setting it
> > > while processing for subxid.  Also, even if we have to do it won't it
> > > add the xid needlessly in builder->committed.xip array?
> >
> > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether
> > to build the tuplecid hash or not based on whether it has catalog
> > changes or not.
> >
>
> Okay, but you haven't answered the second part of the question: "won't
> it add

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Michael Paquier

On Thu, May 28, 2020 at 09:07:04AM +0300, Vladimir Sitnikov wrote:
> The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became
> bad by 19 May 14:00 UTC,
> so the regression was introduced somewhere in-between.
> 
> Does that ring any bells?

It does, thanks!  This would map with 1d374302 or 850196b6 that
reworked this area of the code, so it seems like we are not quite done
with this work yet.  Do you still see the problem as of 55ca50d
(today's latest HEAD)?

Also, just wondering..  If I use more or less the same commands as
your travis job I should be able to reproduce the problem with a fresh
JDBC repository, right?  Or do you a sub-portion of your regression
tests to run that easily?
--
Michael


signature.asc
Description: PGP signature

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-28 Thread Dilip Kumar

On Tue, May 26, 2020 at 3:04 PM Amit Kapila  wrote:
>
> On Mon, May 25, 2020 at 8:07 PM Dilip Kumar  wrote:
> >
> > On Fri, May 22, 2020 at 11:54 AM Amit Kapila  
> > wrote:
> > >
> > > 4.
> > > + * XXX Do we need to allocate it in TopMemoryContext?
> > > + */
> > > +static void
> > > +subxact_info_add(TransactionId xid)
> > > {
> > > ..
> > >
> > > For this and other places in a patch like in function
> > > stream_open_file(), instead of using TopMemoryContext, can we consider
> > > using a new memory context LogicalStreamingContext or something like
> > > that. We can create LogicalStreamingContext under TopMemoryContext.  I
> > > don't see any need of using TopMemoryContext here.
> >
> > But, when we will delete/reset the LogicalStreamingContext?
> >
>
> Why can't we reset it at each stream stop message?
> >  because
> > we are planning to keep this memory until the worker is alive so that
> > supposed to be the top memory context.
> >
>
> Which part of allocation do we want to keep till the worker is alive?

static TransactionId *xids = NULL; we need to keep till worker life space.

> Why we need memory-related to subxacts till the worker is alive?  As
> we have now, after reading subxact info (subxact_info_read), we need
> to ensure that it is freed after its usage due to which we need to
> remember and perform pfree at various places.
>
> I think we should once see the possibility that such that we could
> switch to this new context in start stream message and reset it in
> stop stream message.  That might help in avoiding
> MemoryContextSwitchTo TopMemoryContext at various places.

Ok, I understand, I think subxacts can be allocated in new
LogicalStreamingContext which we can reset at the stream stop.  How
about xids?
shall we create another context that will stay until the worker lifespan?

> >  If we create any other context
> > with the same life span as TopMemoryContext then what is the point?
>>
>
> It is helpful for debugging.  It is recommended that we don't use the
> top memory context unless it is really required.  Read about it in
> src/backend/utils/mmgr/README.

I see.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Kyotaro Horiguchi

At Thu, 28 May 2020 16:22:33 +0900, Michael Paquier  wrote 
in 
> On Thu, May 28, 2020 at 09:07:04AM +0300, Vladimir Sitnikov wrote:
> > The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became
> > bad by 19 May 14:00 UTC,
> > so the regression was introduced somewhere in-between.
> > 
> > Does that ring any bells?
> 
> It does, thanks!  This would map with 1d374302 or 850196b6 that
> reworked this area of the code, so it seems like we are not quite done
> with this work yet.  Do you still see the problem as of 55ca50d
> (today's latest HEAD)?

I think that's not the case.  I think I cause this crash with the
HEAD.  I'll post a fix soon.

> Also, just wondering..  If I use more or less the same commands as
> your travis job I should be able to reproduce the problem with a fresh
> JDBC repository, right?  Or do you a sub-portion of your regression
> tests to run that easily?

Pgjdbc seems initiating physical replication on a logical replication
session.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: WAL reader APIs and WAL segment open/close callbacks

2020-05-28 Thread Michael Paquier

On Tue, May 26, 2020 at 08:49:44AM +0900, Michael Paquier wrote:
> NB: I found some incorrect comments as per the attached:
> s/open_segment/segment_open/
> s/close_segment/segment_close/

And fixed this one with f93bb0c.
--
Michael


signature.asc
Description: PGP signature

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Michael Paquier

On Thu, May 28, 2020 at 04:32:22PM +0900, Kyotaro Horiguchi wrote:
> I think that's not the case.  I think I cause this crash with the
> HEAD.  I'll post a fix soon.
> 
> Pgjdbc seems initiating physical replication on a logical replication
> session.

Let me see...  Indeed:
$ psql "replication=database dbname=postgres"
=# START_REPLICATION SLOT test_slot PHYSICAL 0/0;
[one  later]

(gdb) bt
#0  XLogSendPhysical () at walsender.c:2762
#1  0x55d5f7803318 in WalSndLoop (send_data=0x55d5f78039c7
) at walsender.c:2300
#2  0x55d5f7800d70 in StartReplication (cmd=0x55d5f919bc60) at
walsender.c:750
#3  0x55d5f78025ad in exec_replication_command
(cmd_string=0x55d5f9119a80 "START_REPLICATION SLOT test_slot PHYSICAL
0/0;") at walsender.c:1643
#4  0x55d5f786a7ea in PostgresMain (argc=1, argv=0x55d5f91472c8,
dbname=0x55d5f9147210 "ioltas", username=0x55d5f91471f0 "ioltas") at
postgres.c:4311
#5  0x55d5f77b4e9c in BackendRun (port=0x55d5f913dcd0) at
postmaster.c:4523
#6  0x55d5f77b4606 in BackendStartup (port=0x55d5f913dcd0) at
postmaster.c:4215
#7  0x55d5f77b08ad in ServerLoop () at postmaster.c:1727
#8  0x55d5f77b00fc in PostmasterMain (argc=3, argv=0x55d5f9113260)
at postmaster.c:1400
#9  0x55d5f76b3736 in main (argc=3, argv=0x55d5f9113260) at
main.c:210

No need for the JDBC test suite then.
--
Michael


signature.asc
Description: PGP signature

Re: Resolving the python 2 -> python 3 mess

2020-05-28 Thread Noah Misch

On Wed, Feb 19, 2020 at 08:42:36PM +0100, Peter Eisentraut wrote:
> I think there should just
> be an option "plpython is: {2|3|don't build it at all}".  Then packagers can
> match this to what their plan for /usr/bin/python* is -- which appears to be
> different everywhere.

Today, we do not give packagers this sort of discretion over SQL-level
behavior.  We formerly had --disable-integer-datetimes, but I view that as the
lesser of two evils, the greater of which would have been to force dump/reload
of terabyte-scale clusters.  We should continue to follow today's principle,
which entails not inviting packagers to align the nature of "LANGUAGE
plpythonu" with the nature of /usr/bin/python.  Users wouldn't benefit from
alignment.  Moreover, it has long been conventional for the host to be a VM
dedicated to PostgreSQL, in which case users of the DBMS aren't users of its
host.  I don't have an opinion on which version "LANGUAGE plpythonu" should
denote in PostgreSQL v13, but the version should be constant across v13
configurations.  (If some builds make it unavailable or make it an error-only
stub, that is no problem.)

I reviewed all code:

On Thu, Feb 27, 2020 at 04:11:05PM -0500, Tom Lane wrote:
> --- a/configure.in
> +++ b/configure.in
> @@ -766,6 +766,9 @@ PGAC_ARG_BOOL(with, python, no, [build Python modules 
> (PL/Python)])
>  AC_MSG_RESULT([$with_python])
>  AC_SUBST(with_python)
>  
> +PGAC_ARG_BOOL(with, python2-stub, no, [build Python 2 compatibility stub])
> +AC_SUBST(with_python2_stub)
> +
>  #
>  # GSSAPI
>  #
> @@ -1042,6 +1045,12 @@ fi
>  if test "$with_python" = yes; then
>PGAC_PATH_PYTHON
>PGAC_CHECK_PYTHON_EMBED_SETUP
> +  # Disable building Python 2 stub if primary version isn't Python 3
> +  if test "$python_majorversion" -lt 3; then
> +with_python2_stub=no
> +  fi

Standard PostgreSQL practice would be to AC_MSG_ERROR in response to the
infeasible option, not to ignore the option.

> --- /dev/null
> +++ b/src/pl/plpython/sql/plpython_stub.sql

> +call convert_python3_all();

Building with PYTHON=python3 --with-python2-stub, this fails on my RHEL 7.8
installation.  I had installed python34-tools, which provides /usr/bin/2to3-3.
I had not installed python-tools (v2.7.5), which provides /usr/bin/2to3.
Making a 2to3 -> 2to3-3 symlink let the test pass.  Requiring such a step to
pass tests may or may not be fine; what do you think?  (I am attaching
regression.diffs; to my knowledge, it's not interesting.)

> --- a/src/tools/msvc/config_default.pl
> +++ b/src/tools/msvc/config_default.pl
> @@ -16,6 +16,7 @@ our $config = {
>   tcl   => undef,# --with-tcl=
>   perl  => undef,# --with-perl=
>   python=> undef,# --with-python=
> + python2_stub => 1, # --with-python2-stub (ignored unless Python is 
> v3)

This default should not depend on whether one uses the MSVC build system or
uses the GNU make build system.

> --- /dev/null
> +++ b/src/pl/plpython/convert_python3--1.0.sql

> +create procedure convert_python3_all(tool text default '2to3',
> + options text default '')
> +language plpython3u as $$
> +import re, subprocess, tempfile
> +
> +# pattern to extract just the function header from pg_get_functiondef result
> +aspat = re.compile("^(.*?\nAS )", re.DOTALL)

This fails on:

create function convert1("
AS " int) returns int
AS $$return 123l$$
language
plpython2u
immutable;

That's not up to project standard, but I'm proceeding to ignore this since the
subject is an untrusted language and ~nobody uses such argument names.
diff -U3 
/home/nm/src/pg/vp/3stub/src/pl/plpython/expected/python3/plpython_stub.out 
/home/nm/src/pg/vp/3stub/src/pl/plpython/results/python3/plpython_stub.out
--- /home/nm/src/pg/vp/3stub/src/pl/plpython/expected/python3/plpython_stub.out 
2020-05-28 01:01:35.445842589 -0700
+++ /home/nm/src/pg/vp/3stub/src/pl/plpython/results/python3/plpython_stub.out  
2020-05-28 01:01:37.765861474 -0700
@@ -50,22 +50,33 @@

 call convert_python3_all();
 NOTICE:  converting function convert1()
-NOTICE:  converting function convert2()
+ERROR:  subprocess.CalledProcessError: Command '2to3  --no-diffs -w 
/tmp/tmpvmbr920w/temp.py' returned non-zero exit status 127.
+CONTEXT:  Traceback (most recent call last):
+  PL/Python function "convert_python3_all", line 42, in 
+subprocess.check_call(tool + " " + options + " --no-diffs -w " + 
tmpdirname + "/temp.py", shell=True)
+  PL/Python function "convert_python3_all", line 310, in check_call
+PL/Python procedure "convert_python3_all"
 \sf convert1()
 CREATE OR REPLACE FUNCTION public.convert1()
  RETURNS integer
- LANGUAGE plpython3u
+ LANGUAGE plpython2u
  IMMUTABLE
-AS $function$return 123$function$
+AS $function$return 123l$function$
 \sf convert2()
 CREATE OR REPLACE FUNCTION public.convert2()
  RETURNS integer
- LANGUAGE plpython3u
+ LANGUAGE plpythonu
  IMMUTABLE
-AS $function$return 123$function$
+AS $function$return 123l$function$
 -- clean up

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Kyotaro Horiguchi

At Thu, 28 May 2020 09:07:04 +0300, Vladimir Sitnikov 
 wrote in 
> Pgjdbc test suite identified a SIGSEGV in the recent HEAD builds of
> PostgreSQL, Ubuntu 14.04.5 LTS
> 
> Here's a call stack:
> https://travis-ci.org/github/pgjdbc/pgjdbc/jobs/691794110#L7484
> The crash is consistent, and it reproduces 100% of the cases so far.
> 
> The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became
> bad by 19 May 14:00 UTC,
> so the regression was introduced somewhere in-between.
> 
> Does that ring any bells?

Thanks for the report.  It is surely a bug since the server crashes,
on the other hand Pgjdbc seems doing bad, too.

It seems to me that that crash means Pgjdbc is initiating a logical
replication connection to start physical replication.

> In case you wonder:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  XLogSendPhysical () at walsender.c:2762
> 2762 if (!WALRead(xlogreader,
> (gdb) #0  XLogSendPhysical () at walsender.c:2762
> SendRqstPtr = 133473640
> startptr = 133473240
> endptr = 133473640
> nbytes = 400
> segno = 1
> errinfo = {wre_errno = 988942240, wre_off = 2, wre_req = -1,
>   wre_read = -1, wre_seg = {ws_file = 4714224,
> ws_segno = 140729887364688, ws_tli = 0}}
> __func__ = "XLogSendPhysical"

I see the probably the same symptom by the following steps with the
current HEAD.

psql 'host=/tmp replication=database'
=# START_REPLICATION 0/1;


Physical replication is not assumed to be started on a logical
replication connection. The attached would fix that.  The patch adds
two tests.  One for this case and another for the reverse.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1f94c98f7459ca8a4942246325815a3e0a91caa4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 28 May 2020 15:55:30 +0900
Subject: [PATCH v1] Fix crash when starting physical replication on logical
 connection

It is an illegal operation to try starting physical replication on a
logical replication session.  We should properly warn the client
instead of crashing.
---
 src/backend/replication/walsender.c |  5 +
 src/test/recovery/t/001_stream_rep.pl   | 14 +++---
 src/test/recovery/t/006_logical_decoding.pl | 10 +-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 86847cbb54..7b79c75311 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -589,6 +589,11 @@ StartReplication(StartReplicationCmd *cmd)
 	StringInfoData buf;
 	XLogRecPtr	FlushPtr;
 
+	if (am_db_walsender)
+		ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot initiate physical replication on a logical replication connection")));
+
 	if (ThisTimeLineID == 0)
 		ereport(ERROR,
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 0c316c1808..0b69b7d8d1 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 35;
+use Test::More tests => 36;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -18,6 +18,14 @@ my $backup_name = 'my_backup';
 # Take backup
 $node_master->backup($backup_name);
 
+# Check if logical-rep session properly refuses to start physical-rep
+my ($ret, $stdout, $stderr) =
+  $node_master->psql('template1',
+	 qq[START_REPLICATION PHYSICAL 0/1],
+	 replication=>'database');
+ok($stderr =~ /ERROR:  cannot initiate physical replication on a logical replication connection/,
+   "check if physical replication is rejected on logical-rep session");
+
 # Create streaming standby linking to master
 my $node_standby_1 = get_new_node('standby_1');
 $node_standby_1->init_from_backup($node_master, $backup_name,
@@ -94,7 +102,7 @@ sub test_target_session_attrs
 
 	# The client used for the connection does not matter, only the backend
 	# point does.
-	my ($ret, $stdout, $stderr) =
+	($ret, $stdout, $stderr) =
 	  $node1->psql('postgres', 'SHOW port;',
 		extra_params => [ '-d', $connstr ]);
 	is( $status == $ret && $stdout eq $target_node->port,
@@ -136,7 +144,7 @@ my $connstr_rep= "$connstr_common replication=1";
 my $connstr_db = "$connstr_common replication=database dbname=postgres";
 
 # Test SHOW ALL
-my ($ret, $stdout, $stderr) = $node_master->psql(
+($ret, $stdout, $stderr) = $node_master->psql(
 	'postgres', 'SHOW ALL;',
 	on_error_die => 1,
 	extra_params => [ '-d', $connstr_rep ]);
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index ee05535b1c..eefde8c3f1 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -7,7 +7

Re: Fix compilation failure against LLVM 11

2020-05-28 Thread Michael Paquier

On Wed, May 27, 2020 at 07:49:45AM -0700, Jesse Zhang wrote:
> For bystanders: Andres and I argued for "fixing this sooner and
> backpatch" and Michael suggested "wait longer and whack all moles". We
> have waited, and there seems to be only one mole (finding all dead
> unbroken "include"s was left as an exercise for the reader). Have we
> come to an agreement on this?

If Andres could take care of this issue as he feels is suited, that's
OK for me.  I could look at that and play with llvm builds. Now I am
not really familiar with it, so it would take me some time but we are
all here to learn :)

Please note that I would still wait for their next GA release to plug
in any extra holes at the same time.  @Jesse: or is this change
actually part of the upcoming 10.0.1?
--
Michael


signature.asc
Description: PGP signature

Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-28 Thread Kyotaro Horiguchi

At Thu, 28 May 2020 15:44:26 +0900, Michael Paquier  wrote 
in 
> On Wed, May 27, 2020 at 04:35:51PM +0900, Michael Paquier wrote:
> > On Wed, May 27, 2020 at 04:21:59PM +0900, Kyotaro Horiguchi wrote:
> > > I don't oppose to full-spelling.  How about the attached?
> > 
> > No problem from me.
> 
> And applied this one as of 55ca50d.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Vladimir Sitnikov

Kyotaro>It seems to me that that crash means Pgjdbc is initiating a logical
Kyotaro>replication connection to start physical replication.

Well, it used to work previously, so it might be a breaking change from the
client/application point of view.

Vladimir

Re: Inlining of couple of functions in pl_exec.c improves performance

2020-05-28 Thread Pavel Stehule

Hi

st 27. 5. 2020 v 13:31 odesílatel Amit Khandekar 
napsal:

> On Tue, 26 May 2020 at 09:06, Amit Khandekar 
> wrote:
> >
> > On Sat, 23 May 2020 at 23:24, Pavel Stehule 
> wrote:
> > >
> > >FOR counter IN 1..180 LOOP
> > >   id = 0; id = 0; id1 = 0;
> > >   id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> > >   id3 = 0; id = 0; id = 0; id1 = 0;
> > >   id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> > >   id3 = 0;
> > >END LOOP;
> > >
> > > This is not too much typical PLpgSQL code. All expressions are not
> parametrized - so this test is little bit obscure.
> > >
> > > Last strange performance plpgsql benchmark did calculation of pi
> value. It does something real
> >
> > Yeah, basically I wanted to have many statements, and that too with
> > many assignments where casts are not required. Let me check if I can
> > come up with a real-enough testcase. Thanks.
>
> create table tab (id int[]);
> insert into tab select array((select ((random() * 10)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 60)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 100)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 10)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 60)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 100)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 10)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 60)::bigint) id
> from generate_series(1, 3) order by 1));
> insert into tab select array((select ((random() * 100)::bigint) id
> from generate_series(1, 3) order by 1));
>
>
> -- Return how much two consecutive array elements are apart from each
> other, on average; i.e. how much the numbers are spaced out.
> -- Input is an ordered array of integers.
> CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$
> DECLARE
>   diff int = 0;
>   num int;
>   prevnum int = 1;
> BEGIN
>   FOREACH num IN ARRAY $1
>   LOOP
> diff = diff + num - prevnum;
> prevnum = num;
>   END LOOP;
>   RETURN diff/array_length($1, 1);
> END;
> $$ LANGUAGE plpgsql;
>
> explain analyze select avg_space(id) from tab;
> Like earlier figures, these are execution times in milliseconds, taken
> from explain-analyze.
> ARM VM:
>HEAD : 49.8
>patch 0001+0002   : 47.8 => 4.2%
>patch 0001+0002+0003 : 42.9 => 16.1%
> x86 VM:
>HEAD : 32.8
>patch 0001+0002   : 32.7 => 0%
>patch 0001+0002+0003 : 28.0 => 17.1%
>

I tested these patches on my notebook - Lenovo T520 (x64) - on pi
calculation

CREATE OR REPLACE FUNCTION pi_est_1(n int)
RETURNS numeric AS $$
DECLARE
  accum double precision DEFAULT 1.0;
  c1 double precision DEFAULT 2.0;
  c2 double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..n
  LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0)));
c1 := c1 + 2.0;
c2 := c2 + 2.0;
  END LOOP;
  RETURN accum * 2.0;
END;
$$ LANGUAGE plpgsql;

and I see about 3-5% of speedup

extra simply test shows

do $$ declare i int default 0; begin while i < 1 loop i := i + 1;
end loop; raise notice 'i=%', i;end $$;

2% speedup

I don't see 17% anywhere, but 3-5% is not bad.

patch 0001 has sense and can help with code structure
patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks
structure it is correct.
patch 0003 has sense too

tested on Fedora 32 with gcc 10.1.1 and -O2 option

Regards

Pavel

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-28 Thread Amit Kapila

On Thu, May 28, 2020 at 12:46 PM Dilip Kumar  wrote:
>
> On Tue, May 26, 2020 at 12:00 PM Amit Kapila  wrote:
> >
> > Isn't this problem only for subxact file as we anyway create changes
> > file as part of start stream message which should have come after
> > abort?  If so, can't we detect whether subxact file exists probably by
> > using nsubxacts or something like that?  Can you please once try to
> > reproduce this scenario to ensure that we are not missing anything?
>
> I have tested this, as of now, by default we create both changes and
> subxact files irrespective of whether we get any subtransactions or
> not.  Maybe this could be optimized that only if we have any subxact
> then only create that file otherwise not?  What's your opinion on the
> same.
>

Yeah, that makes sense.

> > > > > > 8.
> > > > > > @@ -2295,6 +2677,13 @@ 
> > > > > > ReorderBufferXidSetCatalogChanges(ReorderBuffer
> > > > > > *rb, TransactionId xid,
> > > > > >   txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
> > > > > >
> > > > > >   txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > > +
> > > > > > + /*
> > > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes too
> > > > > > + * if one of its children has.
> > > > > > + */
> > > > > > + if (txn->toptxn != NULL)
> > > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > >  }
> > > > > >
> > > > > > Why are we marking top transaction here?
> > > > >
> > > > > We need to mark top transaction to decide whether to build tuplecid
> > > > > hash or not.  In non-streaming mode, we are only sending during the
> > > > > commit time, and during commit time we know whether the top
> > > > > transaction has any catalog changes or not based on the invalidation
> > > > > message so we are marking the top transaction there in DecodeCommit.
> > > > > Since here we are not waiting till commit so we need to mark the top
> > > > > transaction as soon as we mark any of its child transactions.
> > > > >
> > > >
> > > > But how does it help?  We use this flag (via
> > > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is
> > > > anyway done in DecodeCommit and that too after setting this flag for
> > > > the top transaction if required.  So, how will it help in setting it
> > > > while processing for subxid.  Also, even if we have to do it won't it
> > > > add the xid needlessly in builder->committed.xip array?
> > >
> > > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether
> > > to build the tuplecid hash or not based on whether it has catalog
> > > changes or not.
> > >
> >
> > Okay, but you haven't answered the second part of the question: "won't
> > it add the xid of top transaction needlessly in builder->committed.xip
> > array, see function SnapBuildCommitTxn?"  IIUC, this can happen
> > without patch as well because DecodeCommit also sets the flags just
> > based on invalidation messages irrespective of whether the messages
> > are generated by top transaction or not, is that right?
>
> Yes, with or without the patch it always adds the topxid.  I think
> purpose for doing this with/without patch is not for the snapshot
> instead we are marking the top itself that some of its subtxn has the
> catalog changes so that while building the tuplecid has we can know
> whether to build the hash or not.  But, having said that I feel in
> ReorderBufferBuildTupleCidHash why do we need these two checks
> if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> return;
>
> I mean it should be enough to just have the check,  because if we have
> added something to the tuplecids then catalog changes must be there
> because that time we are setting the catalog changes to true.
>
> if (dlist_is_empty(&txn->tuplecids))
> return;
>
> I think in the base code there are multiple things going on
> 1. If we get new CID we always set the catalog change in that
> transaction but add the tuplecids in the top transaction.  So
> basically, top transaction is so far not marked with catalog changes
> but it has tuplecids.
> 2. Now, in DecodeCommit the top xid will be marked that it has catalog
> changes based on the invalidation messages.
>

I don't think it is advisable to remove that check from base code
unless we have a strong reason for doing so.  I think here you can
write better comments about why you are marking the flag for top
transaction and remove TOCHECK from the comment.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Kyotaro Horiguchi

Hello, Vladimir.

At Thu, 28 May 2020 11:57:23 +0300, Vladimir Sitnikov 
 wrote in 
> Kyotaro>It seems to me that that crash means Pgjdbc is initiating a logical
> Kyotaro>replication connection to start physical replication.
> 
> Well, it used to work previously, so it might be a breaking change from the
> client/application point of view.

Mmm. It is not the proper way to use physical replication and it's
totally accidental that that worked (or even it might be a bug). The
documentation is saying as the follows, as more-or-less the same for
all versions since 9.4.

https://www.postgresql.org/docs/13/protocol-replication.html

> To initiate streaming replication, the frontend sends the replication
> parameter in the startup message. A Boolean value of true (or on, yes,
> 1) tells the backend to go into physical replication walsender mode,
> wherein a small set of replication commands, shown below, can be
> issued instead of SQL statements.
> 
> Passing database as the value for the replication parameter instructs
> the backend to go into logical replication walsender mode, connecting
> to the database specified in the dbname parameter. In logical
> replication walsender mode, the replication commands shown below as
> well as normal SQL commands can be issued.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-28 Thread Amit Langote

On Thu, May 28, 2020 at 1:36 PM amul sul  wrote:
> On Wed, May 27, 2020 at 12:53 PM Amit Langote  wrote:
>> Actually, if you declare the cursor without FOR SHARE/UPDATE, the case
>> would fail even with traditional inheritance:
>>
>> drop table if exists p cascade;
>> create table p (a int);
>> create table c (check (a = 2)) inherits (p);
>> insert into p values (1);
>> insert into c values (2);
>> begin;
>> declare c cursor for select * from p where a = 1;
>> fetch c;
>> update p set a = a where current of c;
>> ERROR:  cursor "c" is not a simply updatable scan of table "c"
>> ROLLBACK
>>
>
> I am not sure I understood the point, you'll see the same error with 
> declarative
> partitioning as well.

My point is that if a table is not present in the cursor's plan, there
is no way for CURRENT OF to access it.  Giving an error in that case
seems justified.

OTOH, when the CURRENT OF implementation has RowMarks to look at, it
avoids the error for traditional inheritance children due their
inactive RowMarks being present in the cursor's PlannedStmt.  I think
that's only by accident though.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-28 Thread Amit Kapila

On Thu, May 28, 2020 at 12:57 PM Dilip Kumar  wrote:
>
> On Tue, May 26, 2020 at 3:04 PM Amit Kapila  wrote:
> >
> > Why we need memory-related to subxacts till the worker is alive?  As
> > we have now, after reading subxact info (subxact_info_read), we need
> > to ensure that it is freed after its usage due to which we need to
> > remember and perform pfree at various places.
> >
> > I think we should once see the possibility that such that we could
> > switch to this new context in start stream message and reset it in
> > stop stream message.  That might help in avoiding
> > MemoryContextSwitchTo TopMemoryContext at various places.
>
> Ok, I understand, I think subxacts can be allocated in new
> LogicalStreamingContext which we can reset at the stream stop.  How
> about xids?
>

How about storing xids in ApplyContext?  We do store similar lifespan
things in that context, for ex. see store_flush_position.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-28 Thread Dilip Kumar

On Thu, May 28, 2020 at 3:15 PM Amit Kapila  wrote:
>
> On Thu, May 28, 2020 at 12:57 PM Dilip Kumar  wrote:
> >
> > On Tue, May 26, 2020 at 3:04 PM Amit Kapila  wrote:
> > >
> > > Why we need memory-related to subxacts till the worker is alive?  As
> > > we have now, after reading subxact info (subxact_info_read), we need
> > > to ensure that it is freed after its usage due to which we need to
> > > remember and perform pfree at various places.
> > >
> > > I think we should once see the possibility that such that we could
> > > switch to this new context in start stream message and reset it in
> > > stop stream message.  That might help in avoiding
> > > MemoryContextSwitchTo TopMemoryContext at various places.
> >
> > Ok, I understand, I think subxacts can be allocated in new
> > LogicalStreamingContext which we can reset at the stream stop.  How
> > about xids?
> >
>
> How about storing xids in ApplyContext?  We do store similar lifespan
> things in that context, for ex. see store_flush_position.

That sounds good to me,   I will make this change in the next patch
set, along with other changes.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-28 Thread Dilip Kumar

On Thu, May 28, 2020 at 2:41 PM Amit Kapila  wrote:
>
> On Thu, May 28, 2020 at 12:46 PM Dilip Kumar  wrote:
> >
> > On Tue, May 26, 2020 at 12:00 PM Amit Kapila  
> > wrote:
> > >
> > > Isn't this problem only for subxact file as we anyway create changes
> > > file as part of start stream message which should have come after
> > > abort?  If so, can't we detect whether subxact file exists probably by
> > > using nsubxacts or something like that?  Can you please once try to
> > > reproduce this scenario to ensure that we are not missing anything?
> >
> > I have tested this, as of now, by default we create both changes and
> > subxact files irrespective of whether we get any subtransactions or
> > not.  Maybe this could be optimized that only if we have any subxact
> > then only create that file otherwise not?  What's your opinion on the
> > same.
> >
>
> Yeah, that makes sense.
>
> > > > > > > 8.
> > > > > > > @@ -2295,6 +2677,13 @@ 
> > > > > > > ReorderBufferXidSetCatalogChanges(ReorderBuffer
> > > > > > > *rb, TransactionId xid,
> > > > > > >   txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
> > > > > > >
> > > > > > >   txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes 
> > > > > > > too
> > > > > > > + * if one of its children has.
> > > > > > > + */
> > > > > > > + if (txn->toptxn != NULL)
> > > > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > > >  }
> > > > > > >
> > > > > > > Why are we marking top transaction here?
> > > > > >
> > > > > > We need to mark top transaction to decide whether to build tuplecid
> > > > > > hash or not.  In non-streaming mode, we are only sending during the
> > > > > > commit time, and during commit time we know whether the top
> > > > > > transaction has any catalog changes or not based on the invalidation
> > > > > > message so we are marking the top transaction there in DecodeCommit.
> > > > > > Since here we are not waiting till commit so we need to mark the top
> > > > > > transaction as soon as we mark any of its child transactions.
> > > > > >
> > > > >
> > > > > But how does it help?  We use this flag (via
> > > > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is
> > > > > anyway done in DecodeCommit and that too after setting this flag for
> > > > > the top transaction if required.  So, how will it help in setting it
> > > > > while processing for subxid.  Also, even if we have to do it won't it
> > > > > add the xid needlessly in builder->committed.xip array?
> > > >
> > > > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether
> > > > to build the tuplecid hash or not based on whether it has catalog
> > > > changes or not.
> > > >
> > >
> > > Okay, but you haven't answered the second part of the question: "won't
> > > it add the xid of top transaction needlessly in builder->committed.xip
> > > array, see function SnapBuildCommitTxn?"  IIUC, this can happen
> > > without patch as well because DecodeCommit also sets the flags just
> > > based on invalidation messages irrespective of whether the messages
> > > are generated by top transaction or not, is that right?
> >
> > Yes, with or without the patch it always adds the topxid.  I think
> > purpose for doing this with/without patch is not for the snapshot
> > instead we are marking the top itself that some of its subtxn has the
> > catalog changes so that while building the tuplecid has we can know
> > whether to build the hash or not.  But, having said that I feel in
> > ReorderBufferBuildTupleCidHash why do we need these two checks
> > if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> > return;
> >
> > I mean it should be enough to just have the check,  because if we have
> > added something to the tuplecids then catalog changes must be there
> > because that time we are setting the catalog changes to true.
> >
> > if (dlist_is_empty(&txn->tuplecids))
> > return;
> >
> > I think in the base code there are multiple things going on
> > 1. If we get new CID we always set the catalog change in that
> > transaction but add the tuplecids in the top transaction.  So
> > basically, top transaction is so far not marked with catalog changes
> > but it has tuplecids.
> > 2. Now, in DecodeCommit the top xid will be marked that it has catalog
> > changes based on the invalidation messages.
> >
>
> I don't think it is advisable to remove that check from base code
> unless we have a strong reason for doing so.  I think here you can
> write better comments about why you are marking the flag for top
> transaction and remove TOCHECK from the comment.

Ok, I will do that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-28 Thread Amit Langote

On Thu, May 28, 2020 at 1:36 PM amul sul  wrote:
> On Wed, May 27, 2020 at 12:53 PM Amit Langote  wrote:
>> I guess the workaround is to declare the cursor such that no
>> partitions/children are pruned/excluded.
>
> Disabling pruning as well -- at-least for the statement or function.

Now *that* is actually a workaround to tell a customer. :)

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: WIP: WAL prefetch (another approach)

2020-05-28 Thread Thomas Munro

On Sun, May 3, 2020 at 3:12 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> I've finally performed couple of tests involving more IO. The
> not-that-big dataset of 1.5 GB for the replica with the memory allowing
> fitting ~ 1/6 of it, default prefetching parameters and an update
> workload with uniform distribution. Rather a small setup, but causes
> stable reading into the page cache on the replica and allows to see a
> visible influence of the patch (more measurement samples tend to happen
> at lower latencies):

Thanks for these tests Dmitry.  You didn't mention the details of the
workload, but one thing I'd recommend for a uniform/random workload
that's generating a lot of misses on the primary server using N
backends is to make sure that maintenance_io_concurrency is set to a
number like N*2 or higher, and to look at the queue depth on both
systems with iostat -x 1.  Then you can experiment with ALTER SYSTEM
SET maintenance_io_concurrency = X; SELECT pg_reload_conf(); to try to
understand the way it works; there is a point where you've set it high
enough and the replica is able to handle the same rate of concurrent
I/Os as the primary.  The default of 10 is actually pretty low unless
you've only got ~4 backends generating random updates on the primary.
That's with full_page_writes=off; if you leave it on, it takes a while
to get into a scenario where it has much effect.

Here's a rebase, after the recent XLogReader refactoring.
From a7fd3f728d64c3c94387e9e424dba507b166bcab Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 28 Mar 2020 11:42:59 +1300
Subject: [PATCH v9 1/3] Add pg_atomic_unlocked_add_fetch_XXX().

Add a variant of pg_atomic_add_fetch_XXX with no barrier semantics, for
cases where you only want to avoid the possibility that a concurrent
pg_atomic_read_XXX() sees a torn/partial value.

Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
---
 src/include/port/atomics.h | 24 ++
 src/include/port/atomics/generic.h | 33 ++
 2 files changed, 57 insertions(+)

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index 4956ec55cb..2abb852893 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -389,6 +389,21 @@ pg_atomic_add_fetch_u32(volatile pg_atomic_uint32 *ptr, int32 add_)
 	return pg_atomic_add_fetch_u32_impl(ptr, add_);
 }
 
+/*
+ * pg_atomic_unlocked_add_fetch_u32 - atomically add to variable
+ *
+ * Like pg_atomic_unlocked_write_u32, guarantees only that partial values
+ * cannot be observed.
+ *
+ * No barrier semantics.
+ */
+static inline uint32
+pg_atomic_unlocked_add_fetch_u32(volatile pg_atomic_uint32 *ptr, int32 add_)
+{
+	AssertPointerAlignment(ptr, 4);
+	return pg_atomic_unlocked_add_fetch_u32_impl(ptr, add_);
+}
+
 /*
  * pg_atomic_sub_fetch_u32 - atomically subtract from variable
  *
@@ -519,6 +534,15 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
 	return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
 }
 
+static inline uint64
+pg_atomic_unlocked_add_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 add_)
+{
+#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
+	AssertPointerAlignment(ptr, 8);
+#endif
+	return pg_atomic_unlocked_add_fetch_u64_impl(ptr, add_);
+}
+
 #undef INSIDE_ATOMICS_H
 
 #endif			/* ATOMICS_H */
diff --git a/src/include/port/atomics/generic.h b/src/include/port/atomics/generic.h
index d3ba89a58f..1683653ca6 100644
--- a/src/include/port/atomics/generic.h
+++ b/src/include/port/atomics/generic.h
@@ -234,6 +234,16 @@ pg_atomic_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
 }
 #endif
 
+#if !defined(PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U32)
+#define PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U32
+static inline uint32
+pg_atomic_unlocked_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
+{
+	ptr->value += add_;
+	return ptr->value;
+}
+#endif
+
 #if !defined(PG_HAVE_ATOMIC_SUB_FETCH_U32) && defined(PG_HAVE_ATOMIC_FETCH_SUB_U32)
 #define PG_HAVE_ATOMIC_SUB_FETCH_U32
 static inline uint32
@@ -399,3 +409,26 @@ pg_atomic_sub_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, int64 sub_)
 	return pg_atomic_fetch_sub_u64_impl(ptr, sub_) - sub_;
 }
 #endif
+
+#if defined(PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY) && \
+	!defined(PG_HAVE_ATOMIC_U64_SIMULATION)
+
+#ifndef PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U64
+#define PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U64
+static inline uint64
+pg_atomic_unlocked_add_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 val)
+{
+	ptr->value += val;
+	return ptr->value;
+}
+#endif
+
+#else
+
+static inline uint64
+pg_atomic_unlocked_add_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 val)
+{
+	return pg_atomic_add_fetch_u64_impl(ptr, val);
+}
+
+#endif
-- 
2.20.1

From 6ed95fffba6751ddc9607659183c072cb11fa4a8 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 7 Apr 2020 22:56:27 +1200
Subject: [PATCH v9 2/3] Allow XLogReadRecord() to be non-blocking.

Exte

Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-28 Thread amul sul

On Thu, May 28, 2020 at 3:06 PM Amit Langote 
wrote:

> On Thu, May 28, 2020 at 1:36 PM amul sul  wrote:
> > On Wed, May 27, 2020 at 12:53 PM Amit Langote 
> wrote:
> >> Actually, if you declare the cursor without FOR SHARE/UPDATE, the case
> >> would fail even with traditional inheritance:
> >>
> >> drop table if exists p cascade;
> >> create table p (a int);
> >> create table c (check (a = 2)) inherits (p);
> >> insert into p values (1);
> >> insert into c values (2);
> >> begin;
> >> declare c cursor for select * from p where a = 1;
> >> fetch c;
> >> update p set a = a where current of c;
> >> ERROR:  cursor "c" is not a simply updatable scan of table "c"
> >> ROLLBACK
> >>
> >
> > I am not sure I understood the point, you'll see the same error with
> declarative
> > partitioning as well.
>
> My point is that if a table is not present in the cursor's plan, there
> is no way for CURRENT OF to access it.  Giving an error in that case
> seems justified.
>
> OTOH, when the CURRENT OF implementation has RowMarks to look at, it
> avoids the error for traditional inheritance children due their
> inactive RowMarks being present in the cursor's PlannedStmt.  I think
> that's only by accident though.
>

Yeah, make sense, thank you.

Regards,
Amul

Read access for pg_monitor to pg_replication_origin_status view

2020-05-28 Thread Martín Marqués

Hi all,

While working on some monitoring tasks I realised that the pg_monitor
role doesn't have access to the pg_replication_origin_status.

Are there any strong thoughts on not giving pg_monitor access to this
view, or is it just something that nobody asked for yet? I can't find
any reason for pg_monitor not to have access to it.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-28 Thread Amit Kapila

On Wed, May 27, 2020 at 8:22 PM Dilip Kumar  wrote:
>
> On Tue, May 26, 2020 at 7:45 AM Dilip Kumar  wrote:
> >
> >
> > Okay, sending again.
>
> While reviewing/testing I have found a couple of problems in 0005 and
> 0006 which I have fixed in the attached version.
>

I haven't reviewed the new fixes yet but I have some comments on
0008-Add-support-for-streaming-to-built-in-replicatio.patch.
1.
I think the temporary files (and or handles) used for storing the
information of changes and subxacts are getting leaked in the patch.
At some places, it is taken care to close the file but cases like
apply_handle_stream_commit where if any error occurred in
apply_dispatch(), the file might not get closed.  The other place is
in apply_handle_stream_abort() where if there is an error in ftruncate
the file won't be closed.   Now, the bigger problem is with changes
related file which is opened in apply_handle_stream_start and closed
in apply_handle_stream_stop and if there is any error in-between, we
won't close it.

OTOH, I think the worker will exit on an error so it might not matter
but then why we are at few other places we are closing it before the
error?  I think on error these temporary files should be removed
instead of relying on them to get removed next time when we receive
changes for the same transaction which I feel is what we do in other
cases where we use temporary files like for sorts or hashjoins.

Also, what if the changes file size overflows "OS file size limit"?
If we agree that the above are problems then do you think we should
explore using BufFile interface (see storage/file/buffile.c) to avoid
all such problems?

2.
apply_handle_stream_abort()
{
..
+ /* discard the subxacts added later */
+ nsubxacts = subidx;
+
+ /* write the updated subxact list */
+ subxact_info_write(MyLogicalRepWorker->subid, xid);
..
}

Here, if subxacts becomes zero, then also subxact_info_write will
create a new file and write checksum.  I think subxact_info_write
should have a check for nsubxacts > 0 before writing to the file.

3.
apply_handle_stream_commit(StringInfo s)
{
..
+ /*
+ * send feedback to upstream
+ *
+ * XXX Probably should send a valid LSN. But which one?
+ */
+ send_feedback(InvalidXLogRecPtr, false, false);
..
}

Why do we need to send the feedback at this stage after applying each
message?  If we see a non-streamed case, we never send_feedback after
each message. So, following that, I don't see the need to send it here
but if you see any specific reason then do let me know?  And if we
have to send feedback, then we need to decide the appropriate values
as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: password_encryption default

2020-05-28 Thread Peter Eisentraut


On 2020-05-27 15:25, Jonathan S. Katz wrote:

$ initdb -D data --auth-local=scram-sha-256 --auth-host=md5

Got an error message:

"initdb: error: must specify a password for the superuser to enable md5
authentication"

For the last two, that behavior is to be expected (after all, you've set
the two login vectors to require passwords), but the error message seems
odd now. Perhaps we tweak it to be:


"initdb: error: must specify a password for the superuser when requiring
passwords for both local and host authentication."


That message is a bit long.  Maybe just

must specify a password for the superuser to enable password authentication

without reference to the specific method.  I think the idea is clear there.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: password_encryption default

2020-05-28 Thread Peter Eisentraut


On 2020-05-27 15:59, Stephen Frost wrote:

Agreed- let's remove the legacy options.  As I've mentioned elsewhere,
distros may manage the issue for us, and if we want to get into it, we
could consider adding support to pg_upgrade to complain if it comes
across a legacy setting that isn't valid.  I'm not sure that's
worthwhile though.


More along these lines: We could also remove the ENCRYPTED and 
UNENCRYPTED keywords from CREATE and ALTER ROLE.  AFAICT, these have 
never been emitted by pg_dump or psql, so there are no concerns from 
that end.  Thoughts?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Dave Cramer

On Thu, 28 May 2020 at 05:11, Kyotaro Horiguchi 
wrote:

> Hello, Vladimir.
>
> At Thu, 28 May 2020 11:57:23 +0300, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote in
> > Kyotaro>It seems to me that that crash means Pgjdbc is initiating a
> logical
> > Kyotaro>replication connection to start physical replication.
> >
> > Well, it used to work previously, so it might be a breaking change from
> the
> > client/application point of view.
>
> Mmm. It is not the proper way to use physical replication and it's
> totally accidental that that worked (or even it might be a bug). The
> documentation is saying as the follows, as more-or-less the same for
> all versions since 9.4.
>
> https://www.postgresql.org/docs/13/protocol-replication.html
>
> > To initiate streaming replication, the frontend sends the replication
> > parameter in the startup message. A Boolean value of true (or on, yes,
> > 1) tells the backend to go into physical replication walsender mode,
> > wherein a small set of replication commands, shown below, can be
> > issued instead of SQL statements.
> >
> > Passing database as the value for the replication parameter instructs
> > the backend to go into logical replication walsender mode, connecting
> > to the database specified in the dbname parameter. In logical
> > replication walsender mode, the replication commands shown below as
> > well as normal SQL commands can be issued.
>
> regards.
>
> While the documentation does indeed say that there is quite a bit of
additional confusion added by:

and
START_REPLICATION [ SLOT *slot_name* ] [ PHYSICAL ] *XXX/XXX* [ TIMELINE
*tli* ]

If we already have a physical replication slot according to the startup
message why do we need to specify it in the START REPLICATION message ?

Dave

Re: New 'pg' consolidated metacommand patch

2020-05-28 Thread Robert Haas

On Wed, May 27, 2020 at 4:00 PM Peter Eisentraut
 wrote:
> First, consider that git has over 170 subcommands.  PostgreSQL currently
> has 36, and we're probably not going to add dozens more any time soon.
> So the issue is not of the same scope.  It also seems to me that the way
> git is organized this becomes a self-perpetuating system:  They are
> adding subcommands all the time without much care where you might in
> other situations think harder about combining them and keeping the
> surface area smaller.  For example, we wouldn't really need separate
> commands clusterdb, reindexdb, vacuumdb if we had better support in psql
> for "run this command in each database [in parallel]".

I see your point here, but I think it's clear that some people are
already concerned about the proliferation of binaries on namespace
grounds. For example, it was proposed that pg_verifybackup should be
part of pg_basebackup, and then later and by someone else that it
should be part of pg_checksums. It's quite different from either of
those things and I'm pretty confident it shouldn't be merged with
either one, but there was certainly pressure in that direction. So
apparently for some people 36 is already too many. It's not clear to
me why it should be a problem to have a lot of commands, as long as
they all start with "pg_", but if it is then we should do something
about it rather than waiting until we have more of them.

> git (and svn etc. before it) also has a much more consistent operating
> model that is sensible to reflect in the command structure.  They all
> more or less operate on a git repository, in apparently 170 different
> ways.  The 36 PostgreSQL commands don't all work in the same way.  Now
> if someone were to propose a way to combine server tools, perhaps like
> pginstancetool {init|controldata|resetwal|checksum}, and perhaps also in
> a way that actually saves code duplication and inconsistency, that would
> be something to consider.  Or maybe a client-side tool that does
> pgclienttool {create user|drop user|create database|...} -- but that
> pretty much already exists by the name of psql.  But just renaming
> everything that's shipped with PostgreSQL to one common bucket without
> regard to how it actually works and what role it plays would be
> unnecessarily confusing.

This doesn't strike me as a very practical proposal because
"pginstancetool checksums" is really stinking long compared to
"pg_checksums", where as "pg checksums" is no different, or one
keystroke better if you assume that the underscore requires pressing
shift.

> Also consider some practical concerns with the command structure you
> describe: Tab completion of commands wouldn't work anymore, unless you
> supply custom tab completion setups.  The direct association between a
> command and its man page would be broken.  Shell scripting becomes more
> challenging:  Instead of writing common things like "if which
> pg_waldump; then" you'd need some custom code, to be determined.  These
> are all solvable, but just a sum of slight annoyances, for no real benefit.

There are some potential benefits, I think, such as:

1. It seems to be the emerging standard for command line interfaces.
There's not only the 'git' example but also things like 'aws', which
is perhaps more similar to the case proposed here in that there are a
bunch of subcommands that do quite different sorts of things. I think
a lot of developers are now quite familiar with the idea of a main
command with a bunch of subcommands, and they expect to be able to
type 'git help' or 'aws help' or 'pg help' to get a list of commands,
and then 'pg help createdb' for help with that. If you don't know what
pg commands exist today, how do you discover them? You're right that
not everyone is going this way but it seems to be pretty common
(kubectl, yum, brew, npm, heroku, ...).

2. It lowers the barrier to adding more commands. For example, as
Chris Browne says, we could have a 'pg completion' command to emit
completion information for various shells. True, that would be more
necessary with this proposal. But I bet there's stuff that could be
done even today -- I think most modern shells have pretty powerful
completion facilities. Someone could propose it, but what are the
chances that a patch adding a pg_completion binary would be accepted?
I feel like the argument that this is too narrow to justify the
namespace pollution is almost inevitable.

3. It might help us achieve some better consistency between commands.
Right now we have a lot of warts in the way things are named, like
pgbench vs. pg_ctl vs. createdb, and also pg_basebackup vs.
pg_test_fsync (why not pg_base_backup or pg_testfsync?). Standardizing
on something like this would probably help us be more consistent
there. Over time, we might be able to also clean other things up.
Maybe we could get all of our client-side utilities to share a common
config file, and have a 'pg config' utility to configure it. Maybe we
could have com

Re: password_encryption default

2020-05-28 Thread Jonathan S. Katz

On 5/28/20 8:10 AM, Peter Eisentraut wrote:
> On 2020-05-27 15:25, Jonathan S. Katz wrote:
>> $ initdb -D data --auth-local=scram-sha-256 --auth-host=md5
>>
>> Got an error message:
>>
>> "initdb: error: must specify a password for the superuser to enable md5
>> authentication"
>>
>> For the last two, that behavior is to be expected (after all, you've set
>> the two login vectors to require passwords), but the error message seems
>> odd now. Perhaps we tweak it to be:
>>
>>
>> "initdb: error: must specify a password for the superuser when requiring
>> passwords for both local and host authentication."
> 
> That message is a bit long.  Maybe just
> 
> must specify a password for the superuser to enable password authentication
> 
> without reference to the specific method.  I think the idea is clear there.

+1

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: Explain Analyze (Rollback off) Suggestion

2020-05-28 Thread Robert Haas

On Wed, May 27, 2020 at 9:33 PM David G. Johnston
 wrote:
> I'm not seeing enough similarity with the reasons for, and specific 
> behaviors, of those previous GUCs to dismiss this proposal on that basis 
> alone.  These are "crap we messed things up" switches that alter a query 
> behind the scenes in ways that a user cannot do through SQL - they simply 
> provide for changing a default that we already allow the user to override 
> per-query.  Its akin to "DateStyle" and its pure cosmetic influencing 
> ease-of-use option rather than some changing the fundamental structural 
> meaning of '\n'

Well, I think it's usually worse to have two possible behaviors rather
than one. Like, a lot of people have probably made the mistake of
running EXPLAIN ANALYZE without realizing that it's actually running
the query, and then been surprised or dismayed afterwards. But each
person only has to learn that once. If we had a GUC controlling this
behavior, then you'd have to always be aware of the setting on any
particular system on which you might be thinking of running the
command. Likewise, if you write an application or tool of some sort
that uses EXPLAIN ANALYZE, it has to be aware of the GUC value, or it
won't work as expected on some systems.

This is the general problem with behavior-changing GUCs. I kind of
have mixed feelings about this. On the one hand, it sucks for
operators of individual systems not to be able to customize things so
as to produce the behavior they want. On the other hand, each one you
add makes it harder to write code that will work the same way on every
PostgreSQL system. I don't think the problem would be as bad in this
particular case as in some others that have been proposed, mostly
because EXPLAIN ANALYZE isn't widely-used by applications, so maybe
it's worth considering. But on the whole, I'm inclined to agree with
Tom that it's better not to create too many ways for fundamental
behavior of the system to vary from one installation to another.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: password_encryption default

2020-05-28 Thread Robert Haas

On Thu, May 28, 2020 at 8:53 AM Peter Eisentraut
 wrote:
> More along these lines: We could also remove the ENCRYPTED and
> UNENCRYPTED keywords from CREATE and ALTER ROLE.  AFAICT, these have
> never been emitted by pg_dump or psql, so there are no concerns from
> that end.  Thoughts?

I have a question about this. My understanding of this area isn't
great. As I understand it, you can specify a password unencrypted and
let the system compute the validator from it, or you can compute the
validator yourself and then send that as the 'encrypted' password.
But, apparently, CREATE ROLE and ALTER ROLE don't really know which
thing you did. They just examine the string that you passed and decide
whether it looks like a validator. If so, they assume it is; if not,
they assume it's just a password.

But that seems really odd. What if you choose a password that just
happens to look like a validator? Perhaps that's not real likely, but
why do we not permit -- or even require -- the user to specify intent?
It seems out of character for us to, essentially, guess the meaning of
something ambiguous rather than requiring the user to be clear about
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: password_encryption default

2020-05-28 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, May 28, 2020 at 8:53 AM Peter Eisentraut
>  wrote:
> > More along these lines: We could also remove the ENCRYPTED and
> > UNENCRYPTED keywords from CREATE and ALTER ROLE.  AFAICT, these have
> > never been emitted by pg_dump or psql, so there are no concerns from
> > that end.  Thoughts?
> 
> I have a question about this. My understanding of this area isn't
> great. As I understand it, you can specify a password unencrypted and
> let the system compute the validator from it, or you can compute the
> validator yourself and then send that as the 'encrypted' password.
> But, apparently, CREATE ROLE and ALTER ROLE don't really know which
> thing you did. They just examine the string that you passed and decide
> whether it looks like a validator. If so, they assume it is; if not,
> they assume it's just a password.
> 
> But that seems really odd. What if you choose a password that just
> happens to look like a validator? Perhaps that's not real likely, but
> why do we not permit -- or even require -- the user to specify intent?
> It seems out of character for us to, essentially, guess the meaning of
> something ambiguous rather than requiring the user to be clear about
> it.

Indeed, and it's also been a source of bugs...  Watching pgcon atm but
I do recall some history around exactly this.

I'd certainly be in favor of having these things be more explicit-
including doing things like actually splitting out the actual password
validator from the algorithm instead of having them smashed together as
one string as if we don't know what columns are (also recall complaining
about that when scram was first being developed too, though that might
just be in my head).

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-28 Thread Ashutosh Bapat

On Wed, May 27, 2020 at 6:51 PM Amit Langote  wrote:

>
> So in Rajkumar's example, the cursor is declared as:
>
> CURSOR IS SELECT * FROM tbl WHERE c1< 5 FOR UPDATE;
>
> and the WHERE CURRENT OF query is this:
>
>  UPDATE tbl SET c2='aa' WHERE CURRENT OF cur;

Thanks for the clarification. So it looks like we expand UPDATE on
partitioned table to UPDATE on each partition (inheritance_planner for
DML) and then execute each of those. If CURRENT OF were to save the
table oid or something we could run the UPDATE only on that partition.
I am possibly shooting in dark, but this puzzles me. And it looks like
we can cause wrong rows to be updated in non-partition inheritance
where the ctids match?

-- 
Best Wishes,
Ashutosh Bapat

Re: Explain Analyze (Rollback off) Suggestion

2020-05-28 Thread David G. Johnston

On Thu, May 28, 2020 at 6:42 AM Robert Haas  wrote:

> On Wed, May 27, 2020 at 9:33 PM David G. Johnston
>  wrote:
> > I'm not seeing enough similarity with the reasons for, and specific
> behaviors, of those previous GUCs to dismiss this proposal on that basis
> alone.  These are "crap we messed things up" switches that alter a query
> behind the scenes in ways that a user cannot do through SQL - they simply
> provide for changing a default that we already allow the user to override
> per-query.  Its akin to "DateStyle" and its pure cosmetic influencing
> ease-of-use option rather than some changing the fundamental structural
> meaning of '\n'
>
> Well, I think it's usually worse to have two possible behaviors rather
> than one. Like, a lot of people have probably made the mistake of
> running EXPLAIN ANALYZE without realizing that it's actually running
> the query, and then been surprised or dismayed afterwards.

This really belongs on the other thread (though I basically said the same
thing there two days ago):

The ANALYZE option should not be part of the GUC setup.  None of the other
EXPLAIN default changing options have the same issues with being on by
default - which is basically what we are talking about here: being able to
have an option be on without specifying that option in the command itself.
TIMING already does this without difficulty and the others are no different.

David J.

Re: Resolving the python 2 -> python 3 mess

2020-05-28 Thread Tom Lane

Noah Misch  writes:
> On Wed, Feb 19, 2020 at 08:42:36PM +0100, Peter Eisentraut wrote:
>> I think there should just
>> be an option "plpython is: {2|3|don't build it at all}".  Then packagers can
>> match this to what their plan for /usr/bin/python* is -- which appears to be
>> different everywhere.

> Today, we do not give packagers this sort of discretion over SQL-level
> behavior.  We formerly had --disable-integer-datetimes, but I view that as the
> lesser of two evils, the greater of which would have been to force dump/reload
> of terabyte-scale clusters.  We should continue to follow today's principle,
> which entails not inviting packagers to align the nature of "LANGUAGE
> plpythonu" with the nature of /usr/bin/python.

FWIW, I've abandoned this patch.  We've concluded (by default, at least)
that nothing is getting done in v13, and by the time v14 is out it will
be too late to have any useful effect.  I expect that the situation
on-the-ground by 2021 will be that packagers build with PYTHON=python3
and package whatever they get from that.  That means (1) plpythonu won't
exist anymore and (2) users will be left to their own devices to convert
existing plpython code.  Now, (1) corresponds to not providing any
/usr/bin/python executable, only python3 -- and that is a really common
choice for distros to make, AFAIK, so I don't feel too awful about it.
I find (2) less than ideal, but there's evidently not enough interest
in doing anything about it.  There's certainly going to be no point in
shipping a solution for (2) if we fail to do so before v14; people
will already have done the work by hand.

We should, however, consider updating the plpython docs to reflect
current reality.  Notably, the existing wording in section 45.1
suggests that we'll eventually redefine "plpythonu" as Python 3,
and it seems to me that that's not going to happen.

regards, tom lane

Re: Explain Analyze (Rollback off) Suggestion

2020-05-28 Thread Tom Lane

"David G. Johnston"  writes:
> The ANALYZE option should not be part of the GUC setup.

Yeah.  While I'm generally not in favor of putting GUCs into the mix
here, the only one that seriously scares me is a GUC that would affect
whether the EXPLAIN'd query executes or not.  A GUC that causes buffer
counts to be reported/not-reported is not going to result in data
destruction when someone forgets that it's on.

(BTW, adding an option for auto-rollback wouldn't change my opinion
about that.  Not all side-effects of a query can be rolled back.  Thus,
if there is an auto-rollback option, it mustn't be GUC-adjustable
either.)

regards, tom lane

Re: Explain Analyze (Rollback off) Suggestion

2020-05-28 Thread David G. Johnston

On Thu, May 28, 2020 at 7:52 AM Tom Lane  wrote:

> (BTW, adding an option for auto-rollback wouldn't change my opinion
> about that.  Not all side-effects of a query can be rolled back.  Thus,
> if there is an auto-rollback option, it mustn't be GUC-adjustable
> either.)
>

Yeah, I've worked myself around to that as well, this thread's proposal
would be to just make setting up rollback more obvious and easier for a
user of explain analyze - whose value at this point is wholly independent
of the GUC discussion.

David J.

Incorrect comment in be-secure-openssl.c

2020-05-28 Thread Daniel Gustafsson

The comment in be-secure-openssl.c didn't get the memo that the hardcoded DH
parameters moved in 573bd08b99e277026e87bb55ae69c489fab321b8.  The attached
updates the wording, keeping it generic enough that we wont need to update it
should the parameters move again.

cheers ./daniel



openssl_dh_comment.patch
Description: Binary data

Re: password_encryption default

2020-05-28 Thread Robert Haas

On Thu, May 28, 2020 at 10:01 AM Stephen Frost  wrote:
> as if we don't know what columns are

Amen to that!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: repeat() function, CHECK_FOR_INTERRUPTS(), and unlikely()

2020-05-28 Thread Joe Conway

On 5/27/20 3:29 AM, Michael Paquier wrote:
>> I think that each of those tests should have a separate unlikely() marker,
>> since the whole point here is that we don't expect either of those tests
>> to yield true in the huge majority of CHECK_FOR_INTERRUPTS executions.
> 
> +1.  I am not sure that the addition of unlikely() should be
> backpatched though, that's not something usually done.


I backpatched and pushed the changes to the repeat() function. Any other
opinions regarding backpatch of the unlikely() addition to 
CHECK_FOR_INTERRUPTS()?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: Fix compilation failure against LLVM 11

2020-05-28 Thread Jesse Zhang

On Thu, May 28, 2020 at 1:07 AM Michael Paquier  wrote:
> Please note that I would still wait for their next GA release to plug
> in any extra holes at the same time.  @Jesse: or is this change
> actually part of the upcoming 10.0.1?

No a refactoring like this was not in the back branches (nor is it
expected).

Thanks,
Jesse

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-28 Thread Tomas Vondra


On Wed, May 27, 2020 at 11:07:04AM +0200, Tomas Vondra wrote:

On Tue, May 26, 2020 at 06:42:50PM -0700, Melanie Plageman wrote:

On Tue, May 26, 2020 at 5:40 PM Jeff Davis  wrote:


On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote:


As for the tlist fix, I think that's mostly ready too - the one thing
we
should do is probably only doing it for AGG_HASHED. For AGG_SORTED
it's
not really necessary.


Melanie previously posted a patch to avoid spilling unneeded columns,
but it introduced more code:



https://www.postgresql.org/message-id/caakru_aefesv+ukqwqu+ioenoil2lju9diuh9br8mbyxuz0...@mail.gmail.com

and it seems that Heikki also looked at it. Perhaps we should get an
acknowledgement from one of them that your one-line change is the right
approach?



I spent some time looking at it today, and, it turns out I was wrong.

I thought that there was a case I had found where CP_SMALL_TLIST did not
eliminate as many columns as could be eliminated for the purposes of
spilling, but, that turned out not to be the case.

I changed CP_LABEL_TLIST to CP_SMALL_TLIST in
create_groupingsets_plan(), create_agg_plan(), etc and tried a bunch of
different queries and this 2-3 line change worked for all the cases I
tried. Is that where you made the change?


I've only made the change in create_agg_plan, because that's what was in
the query plan I was investigating. You may be right that the same fix
is needed in additional places, though.



Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and
create_groupingsets_plan. I've looked at the other places that I think
seem like they might benefit from it (create_upper_unique_plan,
create_group_plan) but I think we don't need to modify those - there'll
either be a Sort of HashAgg, which will take care about the projection.

Or do you think some other places need CP_SMALL_TLIST?



And then are you proposing to set it based on the aggstrategy to either
CP_LABEL_TLIST or CP_SMALL_TLIST here?



Yes, something like that. The patch I shared on on 5/21 just changed
that, but I'm wondering if that could add overhead for sorted
aggregation, which already does the projection thanks to the sort.



I ended up tweaking the tlist only for AGG_MIXED and AGG_HASHED. We
clearly don't need it for AGG_PLAIN or AGG_SORTED. This way we don't
break regression tests by adding unnecessary "Subquery Scan" nodes.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/plan/createplan.c 
b/src/backend/optimizer/plan/createplan.c
index 9941dfe65e..ddb277cdf0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2113,12 +2113,22 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
Plan   *subplan;
List   *tlist;
List   *quals;
+   int flags;
 
/*
 * Agg can project, so no need to be terribly picky about child tlist, 
but
-* we do need grouping columns to be available
+* we do need grouping columns to be available. We are a bit more 
careful
+* with hash aggregate, where we explicitly request small tlist to 
minimize
+* I/O needed for spilling (possibly not known already).
 */
-   subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);
+   flags = CP_LABEL_TLIST;
+
+   /* ensure small tlist for hash aggregate */
+   if (best_path->aggstrategy == AGG_HASHED ||
+   best_path->aggstrategy == AGG_MIXED)
+   flags |= CP_SMALL_TLIST;
+
+   subplan = create_plan_recurse(root, best_path->subpath, flags);
 
tlist = build_path_tlist(root, &best_path->path);
 
@@ -2209,7 +2219,8 @@ create_groupingsets_plan(PlannerInfo *root, 
GroupingSetsPath *best_path)
 * Agg can project, so no need to be terribly picky about child tlist, 
but
 * we do need grouping columns to be available
 */
-   subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST);
+   subplan = create_plan_recurse(root, best_path->subpath,
+ 
CP_LABEL_TLIST | CP_SMALL_TLIST);
 
/*
 * Compute the mapping from tleSortGroupRef to column index in the 
child's

Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-05-28 Thread Mark Dilger

> On Mar 2, 2020, at 5:29 PM, Peter Geoghegan  wrote:
> 
> On Sun, Mar 1, 2020 at 12:15 PM Floris Van Nee  
> wrote:
>> Minor conflict with that patch indeed. Attached is rebased version. I'm 
>> running some tests now - will post the results here when finished.
> 
> Thanks.
> 
> We're going to have to go back to my original approach to inlining. At
> least, it seemed to be necessary to do that to get any benefit from
> the patch on my comparatively modest workstation (using a similar
> pgbench SELECT benchmark to the one that you ran). Tom also had a
> concern about the portability of inlining without using "static
> inline" -- that is another reason to avoid the approach to inlining
> taken by v3 + v4.
> 
> It's possible (though not very likely) that performance has been
> impacted by the deduplication patch (commit 0d861bbb), since it
> updated the definition of BTreeTupleGetNAtts() itself.
> 
> Attached is v5, which inlines in a targeted fashion, pretty much in
> the same way as the earliest version. This is the same as v4 in every
> other way. Perhaps you can test this.

Hi Peter, just a quick code review:

The v5 patch files apply and pass the regression tests. I get no errors.  The 
performance impact is within the noise.  The TPS with the patch are higher 
sometimes and lower other times, looking across the 27 subtests of the 
"select-only" benchmark.  Which subtest is slower or faster changes per run, so 
that doesn't seem to be relevant.  I ran the "select-only" six times (three 
with the patch, three without).  The change you made to the loop is clearly 
visible in the nbtsearch.s output, and the change to inline _bt_compare is even 
more visible, so there doesn't seem to be any defeating of your change due to 
the compiler ignoring the "inline" or such.  I compiled using gcc -O2

% gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.17)
Target: x86_64-apple-darwin19.4.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

2.4 GHz 8-Core Intel Core i9
32 GB 2667 MHz DDR4

Reading this thread, I think the lack of a performance impact on laptop 
hardware was expected, but perhaps confirmation that it does not make things 
worse is useful?

Since this patch doesn't seem to do any harm, I would mark it as "ready for 
committer", except that there doesn't yet seem to be enough evidence that it is 
a net win.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Operator class parameters and sgml docs

2020-05-28 Thread Alexander Korotkov

On Thu, May 21, 2020 at 3:17 AM Alexander Korotkov
 wrote:
>
> On Thu, May 21, 2020 at 12:37 AM Peter Geoghegan  wrote:
> > Commit 911e7020770 added a variety of new support routines to index
> > AMs. For example, it added a support function 5 to btree (see
> > BTOPTIONS_PROC), but didn't document this alongside the other support
> > functions in btree.sgml.
> >
> > It looks like the new support functions are fundamentally different to
> > the existing ones in that they exist only as a way of supplying
> > parameters to other support functions. The idea was to preserve
> > compatibility with the old support function signatures. Even still, I
> > think that the new support functions should get some mention alongside
> > the older support functions.
> >
> > I also wonder whether or not xindex.sgml needs to be updated to
> > account for opclass parameters.
>
> Thank you for pointing.  I'm going to take a look on this in next few days.


I'm sorry for the delay.  I was very busy with various stuff.  I'm
going to post docs patch next week.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: WIP: WAL prefetch (another approach)

2020-05-28 Thread Alvaro Herrera

Thomas Munro escribió:

> @@ -1094,8 +1103,16 @@ WALRead(XLogReaderState *state,
>   XLByteToSeg(recptr, nextSegNo, 
> state->segcxt.ws_segsize);
>   state->routine.segment_open(state, nextSegNo, &tli);
>  
> - /* This shouldn't happen -- indicates a bug in 
> segment_open */
> - Assert(state->seg.ws_file >= 0);
> + /* callback reported that there was no such file */
> + if (state->seg.ws_file < 0)
> + {
> + errinfo->wre_errno = errno;
> + errinfo->wre_req = 0;
> + errinfo->wre_read = 0;
> + errinfo->wre_off = startoff;
> + errinfo->wre_seg = state->seg;
> + return false;
> + }

Ah, this is what Michael was saying ... we need to fix WALRead so that
it doesn't depend on segment_open alway returning a good FD.  This needs
a fix everywhere, not just here, and improve the error report interface.

Maybe it does make sense to get it fixed in pg13 and avoid a break
later.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Conflict of implicit collations doesn't propagate out of subqueries

2020-05-28 Thread Markus Winand

Hi!

I think I’ve found a bug related to the strength of collations. Attached is a
WIP patch, that address some other issues too.

In this this example, the conflict of implicit collations propagates correctly:

WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
  FROM data
 ORDER BY ( c || posix ) || posix

ERROR:  collation mismatch between implicit collations "C" and "POSIX"
LINE 6:  ORDER BY ( c || posix ) || posix;
 ^
HINT:  You can choose the collation by applying the COLLATE clause to 
one or both expressions.

However, if the conflict happens in a subquery, it doesn’t anymore:

WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
  FROM (SELECT *, c || posix AS none FROM data) data
 ORDER BY none || posix;

 c | posix | none
---+---+--
 a | b | ab
(1 row)

The problem is in parse_collate.c:566: A Var (and some other nodes) without
valid collation gets the strength COLLATE_NONE:

if (OidIsValid(collation))
strength = COLLATE_IMPLICIT;
else
strength = COLLATE_NONE;

However, for a Var it should be COLLATE_CONFLICT, which corresponds to the
standards collation derivation “none”. I guess there could be other similar
cases which I don’t know about.

The patch fixes that, plus necessary consequences (error handling for new
scenarios) as well as some “cosmetic” improvements I found along the way.

Unnecessary to mention that this patch might break existing code.

With the patch, the second example form above fails similarly to the first 
example:

ERROR:  collation mismatch between implicit collation "POSIX" and 
unknown collation
LINE 6:  ORDER BY none || posix;
  ^
HINT:  You can choose the collation by applying the COLLATE clause to 
one or both expressions.

Other changes in the patch:

** Error Handling

The current parse time error handling of implicit collisions has always both
colliding collations. The patch allows a COLLATE_CONFLICT without knowing which
collations caused the conflict (it’s not stored in the Var node). Thus we may
have know two, one or none of the collations that contributed to the collision.

The new function ereport_implicit_collation_mismatch takes care of that.

** Renaming COLLATE_NONE

I found the name COLLATE_NONE a little bit unfortuante as it can easily be
mistaken for the collation derivation “none” the SQL standard uses. I renamed
it to COLLATE_NA for “not applicable”. The standard doesn’t have a word for
that as non character string types just don’t have collations.

** Removal of location2

The code keeps track of the location (for parser_errposition) of both
collations that collided, even though it reports only one of them. The patch
removes location2.

Due to this, the some errors report the other location as before (previously
the second, now the first). See collate.out in the patch. The patch has TODO
comments for code that would be needed to take the other one.

Is there any policy to report the second location or to strictly keep the
errors where they used to be?

** General cleanup

The patch also removes a little code that I think is not needed (anymore).

** Tests

Besides the errposition, only one existing test breaks. It previously triggered
a runtime error, now it triggers a parse time error:

 SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM 
collate_test2 ORDER BY 2; -- fail
-ERROR:  could not determine which collation to use for string 
comparison
-HINT:  Use the COLLATE clause to set the collation explicitly.
+ERROR:  collation mismatch between implicit collations
+LINE 1: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM co...
+  ^
+HINT:  Use the COLLATE clause to set the collation explicitly

The patch also adds a new test to trigger the problem in the first place.

The patch is against REL_12_STABLE but applies to master too.

-markus



WIP_collation_strength_conflict_for_Var_v1.patch
Description: Binary data

Re: Expand the use of check_canonical_path() for more GUCs

2020-05-28 Thread Mark Dilger

> On May 20, 2020, at 12:03 AM, Michael Paquier  wrote:
> 
> On Tue, May 19, 2020 at 09:32:15AM -0400, Tom Lane wrote:
>> Hm, I'm pretty certain that data_directory does not need this because
>> canonicalization is done elsewhere; the most that you could accomplish
>> there is to cause problems.  Dunno about the rest.
> 
> Hmm.  I missed that this is getting done in SelectConfigFiles() first
> by the postmaster so that's not necessary, which also does the work
> for hba_file and ident_file.  config_file does not need that either as
> AbsoluteConfigLocation() does the same work via ParseConfigFile().  So
> perhaps we could add a comment or such about that?  Attached is an
> idea.
> 
> The rest is made of PromoteTriggerFile, pg_krb_server_keyfile,
> ssl_cert_file, ssl_key_file, ssl_ca_file, ssl_crl_file and
> ssl_dh_params_file where loaded values are taken as-is, so applying
> canonicalization would be helpful there, no?

Before this patch, there are three GUCs that get check_canonical_path treatment:

   log_directory
   external_pid_file
   stats_temp_directory

After the patch, these also get the treatment (though Peter seems to be 
objecting to the change):

   promote_trigger_file
   krb_server_keyfile
   ssl_cert_file
   ssl_key_file
   ssl_ca_file
   ssl_crl_file
   ssl_dh_params_file

and these still don't, with comments about how they are already canonicalized 
when the config file is loaded:

   data_directory
   config_file
   hba_file
   ident_file

A little poking around shows that in SelectConfigFiles(), these four 
directories were set by SetConfigOption().  I don't see a problem with the 
code, but the way this stuff is spread around makes it easy for somebody adding 
a new GUC file path to do it wrong.  I don't have much opinion about Peter's 
preference that paths be left alone, but I'd prefer some comments in guc.c 
explaining it all.  The only cleanup that occurs to me is to reorder 
ConfigureNamesString[] to have all the path options back-to-back, with the four 
that are set by SelectConfigFiles() at the top with comments about why they are 
special, and the rest after that with comments about why they need or do not 
need canonicalization.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Conflict of implicit collations doesn't propagate out of subqueries

2020-05-28 Thread Tom Lane

Markus Winand  writes:
> However, if the conflict happens in a subquery, it doesn’t anymore:

> WITH data (c, posix) AS (
> values ('a' COLLATE "C", 'b' COLLATE "POSIX")
> )
> SELECT *
>   FROM (SELECT *, c || posix AS none FROM data) data
>  ORDER BY none || posix;

>  c | posix | none
> ---+---+--
>  a | b | ab
> (1 row)

I'm not exactly convinced this is a bug.  Can you cite chapter and verse
in the spec to justify throwing an error?

AIUI, collation conflicts can only occur within a single expression, and
this is not that.  Moreover, even if data.none arguably has no collation,
treating it from outside the sub-query as having collation strength "none"
seems to me to be similar to our policy of promoting unknown-type subquery
outputs to type "text" rather than leaving them to cause trouble later.
It's not pedantically correct, but nobody liked the old behavior.

regards, tom lane

Re: speed up unicode normalization quick check

2020-05-28 Thread Mark Dilger

> On May 21, 2020, at 12:12 AM, John Naylor  wrote:
> 
> Hi,
> 
> Attached is a patch to use perfect hashing to speed up Unicode
> normalization quick check.
> 
> 0001 changes the set of multipliers attempted when generating the hash
> function. The set in HEAD works for the current set of NFC codepoints,
> but not for the other types. Also, the updated multipliers now all
> compile to shift-and-add on most platform/compiler combinations
> available on godbolt.org (earlier experiments found in [1]). The
> existing keyword lists are fine with the new set, and don't seem to be
> very picky in general. As a test, it also successfully finds a
> function for the OS "words" file, the "D" sets of codepoints, and for
> sets of the first n built-in OIDs, where n > 5.

Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses 
PerfectHash.  Your patch adds a second.  I'm not convinced that modifying the 
PerfectHash code directly each time a new caller needs different multipliers is 
the right way to go. Could you instead make them arguments such that 
gen_keywordlist.pl, generate-unicode_combining_table.pl, and future callers can 
pass in the numbers they want?  Or is there some advantage to having it this 
way?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix compilation failure against LLVM 11

2020-05-28 Thread Andres Freund

On 2020-05-28 17:07:46 +0900, Michael Paquier wrote:
> Please note that I would still wait for their next GA release to plug
> in any extra holes at the same time.  @Jesse: or is this change
> actually part of the upcoming 10.0.1?

Why? I plan to just commit this change now.

Re: Conflict of implicit collations doesn't propagate out of subqueries

2020-05-28 Thread Markus Winand

> On 28.05.2020, at 23:43, Tom Lane  wrote:
> 
> Markus Winand  writes:
>> However, if the conflict happens in a subquery, it doesn’t anymore:
> 
>>WITH data (c, posix) AS (
>>values ('a' COLLATE "C", 'b' COLLATE "POSIX")
>>)
>>SELECT *
>>  FROM (SELECT *, c || posix AS none FROM data) data
>> ORDER BY none || posix;
> 
>> c | posix | none
>>---+---+--
>> a | b | ab
>>(1 row)
> 
> I'm not exactly convinced this is a bug.  Can you cite chapter and verse
> in the spec to justify throwing an error?

I think it is 6.6 Syntax Rule 17:

• 17)  If the declared type of a  BIC is 
character string, then the collation derivation of the declared type of BIC is

Case:

• a)  If the declared type has a declared type collation DTC, 
then implicit.

• b)  Otherwise, none.

That gives derivation “none” to the column.

When this is concatenated, 9.5 ("Result of data type combinations”) SR 3 a ii 3 
applies:

• ii)  The collation derivation and declared type collation of the 
result are determined as follows. Case:

• 1)  If some data type in DTS has an explicit collation 
derivation [… doesn’t apply]
• 2)  If every data type in DTS has an implicit collation 
derivation, then [… doesn’t apply beause of “every"]
• 3) Otherwise, the collation derivation is none. [applies]

Also, the standard doesn’t have a forth derivation (strength). It also says that
not having a declared type collation implies the derivation “none”. See 4.2.2:

Every declared type that is a character string type has a collation
derivation, this being either none, implicit, or explicit. The
collation derivation of a declared type with a declared type collation
that is explicitly or implicitly specified by a  is implicit.
If the collation derivation of a declared type that has a declared type
collation is not implicit, then it is explicit. The collation derivation
of an expression of character string type that has no declared type
collation is none.

-markus

OpenSSL 3.0.0 compatibility

2020-05-28 Thread Daniel Gustafsson

Since OpenSSL is now releasing 3.0.0-alpha versions, I took a look at using it
with postgres to see what awaits us.  As it is now shipping in releases (with
GA planned for Q4), users will probably soon start to test against it so I
wanted to be prepared.

Regarding the deprecations, we can either set preprocessor directives or use
compiler flags to silence the warning and do nothing (for now), or we could
update to the new API.  We probably want to different things for master vs
back-branches, but as an illustration of what the latter could look like I've
implemented this in 0001.

SSL_CTX_load_verify_locations and X509_STORE_load_locations are deprecated and
replaced by individual calls to load files and directories.  These are quite
straightforward, and are implemented like how we handle the TLS protocol API.
DH_check has been discouraged to use for quite some time, and is now marked
deprecated without a 1:1 replacement.  The OpenSSL docs recommends using the
EVP API, which is also done in 0001.  For now I just stuck it in with version
gated ifdefs, something cleaner than that can clearly be done.  0001 is clearly
not proposed for review/commit yet, it's included in case someone else is
interested as well.

OpenSSL also deprecates DES keys in 3.0.0, which cause our password callback
tests to fail with the cryptic error "fetch failed", as the test suite keys are
encrypted with DES.  0002 fixes this by changing to AES256 (randomly chosen
among the ciphers supported in 1.0.1+ and likely to be around), and could be
applied already today as there is nothing 3.0.0 specific about it.

On top of DES keys there are also a lot of deprecations or low-level functions
which breaks pgcrypto in quite a few ways.  I haven't tackled these yet, but it
looks like we have to do the EVP rewrite there.

cheers ./daniel



0002-OpenSSL-3.0.0-compatibility-in-tests.patch
Description: Binary data


0001-Compatibility-with-OpenSSL-3.0.0.patch
Description: Binary data

Re: Fix compilation failure against LLVM 11

2020-05-28 Thread Andres Freund

Hi,

On 2020-05-27 07:49:45 -0700, Jesse Zhang wrote:
> On the mensiversary of the last response, what can I do to help move
> this along (aside from heeding the advice "don't use LLVM HEAD")?

Sorry, I had looked at it at point with the intent to commit it, and hit
some stupid small snags (*). And then I forgot about it. Pushed.

Thanks for the patch!

Andres Freund

(*) first I didn't see the problem, because I accidentally had an old
version of the header around. Then I couldn't immediately build old
versions of pg + LLVM due to my existing installation needing to be
rebuilt. Then there were compiler errors, due to a too new
compiler. Etc.

Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-28 Thread Tomas Vondra


On Wed, May 27, 2020 at 09:58:04PM +0800, Andy Fan wrote:

On Wed, May 27, 2020 at 8:01 PM Ashutosh Bapat <
ashutosh.ba...@2ndquadrant.com> wrote:




On Wed, 27 May 2020 at 04:43, Andy Fan  wrote:


You can use the attached sql to reproduce this issue, but I'm not sure
you can
get the above result at the first time that is because when optimizer
think the
2 index scan have the same cost, it will choose the first one it found,
the order
depends on RelationGetIndexList.  If so,  you may try drop and create
j1_i_im5 index.

The sense behind this patch is we still use the cost based optimizer,
just when we
we find out the 2 index scans have the same cost,  we prefer to use the
index which
have more qual filter on Index Cond.  This is implemented by adjust the
qual cost
on index filter slightly higher.



Thanks for the example and the explanation.

The execution time difference in your example is pretty high to account
for executing the filter on so many rows. My guess is this has to do with
the heap access. For applying the filter the entire row needs to be fetched
from the heap. So we should investigate this case from that angle. Another
guess I have is the statistics is not correct and hence the cost is wrong.



I believe this is a statistics issue and then the cost is wrong.


I think you're both right. Most of the time probably comes from the
heap accesses, but the dabatabase has no chance to account for that
as there was no analyze after inseting the data causing that. So it's
very difficult to account for this when computing the cost.


More characters of this issue are:  1).  If a data is out of range in
the old statistics, optimizer will given an 1 row assumption.  2).
based on the 1 row assumption,  for query "col1=out_of_range_val AND
col2 = any_value"   Index (col1, col2) and (col1, col3) will have
exactly same cost for current cost model. 3).  If the statistics was
wrong, (col1, col3) maybe a very bad plan as shown above, but index
(col1, col2) should  always better/no worse than (col1, col3) in any
case.  4). To expand the rule, for query "col1 = out_of_range_val AND
col2 = any_value AND col3 = any_val", index are (col1, col2, col_m) and
(col1, col_m, col_n),  the former index will aways has better/no worse
than the later one.  5). an statistics issue like this is not
uncommon, for example an log based application, creation_date is very
easy to out of range in statistics.



Right. There are many ways to cause issues like this.


so we need to optimize the cost model for such case, the method is the
patch I mentioned above.


Making the planner more robust w.r.t. to estimation errors is nice, but
I wouldn't go as far saying we should optimize for such cases. The stats
can be arbitrarily off, so should we expect the error to be 10%, 100% or
100%? We'd probably end up with plans that handle worst cases well,
but the average performance would end up being way worse :-(

Anyway, I kinda doubt making the conditions 1.001 more expensive is a
way to make the planning more robust. I'm pretty sure we could construct
examples in the opposite direction, in which case this change make it
more likely we use the wrong index.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-28 Thread Jeff Davis

On Tue, 2020-05-26 at 17:40 -0700, Jeff Davis wrote:
> On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote:
> > Yeah. I agree prefetching is definitely out of v13 scope. It might
> > be
> > interesting to try how useful would it be, if you're willing to
> > spend
> > some time on a prototype.
> 
> I think a POC would be pretty quick; I'll see if I can hack something
> together.

Attached (intended for v14).

I changed the list from a simple array to a circular buffer so that we
can keep enough preallocated block numbers in it to do prefetching.

On SSD I didn't see any improvement, but perhaps it will do better on
magnetic storage.

Regards,
Jeff Davis

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d7f99d9944c..89edbc4c579 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3941,6 +3941,9 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_BUFFILE_WRITE:
 			event_name = "BufFileWrite";
 			break;
+		case WAIT_EVENT_BUFFILE_PREFETCH:
+			event_name = "BufFilePrefetch";
+			break;
 		case WAIT_EVENT_CONTROL_FILE_READ:
 			event_name = "ControlFileRead";
 			break;
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 35e8f12e62d..4e2d4f059a7 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -618,6 +618,20 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 	return nwritten;
 }
 
+/*
+ * BufFilePrefetch
+ */
+void
+BufFilePrefetchBlock(BufFile *buffile, long blknum)
+{
+	int		filenum = (blknum / BUFFILE_SEG_SIZE);
+	off_t	offset	= (blknum % BUFFILE_SEG_SIZE) * BLCKSZ;
+	File	file	= buffile->files[filenum];
+
+	(void) FilePrefetch(file, offset, BLCKSZ,
+		WAIT_EVENT_BUFFILE_PREFETCH);
+}
+
 /*
  * BufFileFlush
  *
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 666a7c0e81c..014ec67b8ff 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -97,6 +97,7 @@ typedef struct TapeBlockTrailer
  * block */
 	long		next;			/* next block on this tape, or # of valid
  * bytes on last block (if < 0) */
+	long		prefetch;		/* block to prefetch when reading this one */
 } TapeBlockTrailer;
 
 #define TapeBlockPayloadSize  (BLCKSZ - sizeof(TapeBlockTrailer))
@@ -123,6 +124,11 @@ typedef struct TapeBlockTrailer
 #define TAPE_WRITE_PREALLOC_MIN 8
 #define TAPE_WRITE_PREALLOC_MAX 128
 
+#define TAPE_PREFETCH_DISTANCE_MAX 64
+
+StaticAssertDecl(TAPE_PREFETCH_DISTANCE_MAX * 2 <= TAPE_WRITE_PREALLOC_MAX,
+ "Tape prefetch distance too large.");
+
 /*
  * This data structure represents a single "logical tape" within the set
  * of logical tapes stored in the same file.
@@ -165,13 +171,13 @@ typedef struct LogicalTape
 	int			nbytes;			/* total # of valid bytes in buffer */
 
 	/*
-	 * Preallocated block numbers are held in an array sorted in descending
-	 * order; blocks are consumed from the end of the array (lowest block
-	 * numbers first).
+	 * Preallocated block numbers are held in a sorted circular array.
 	 */
 	long	   *prealloc;
-	int		 	nprealloc;		/* number of elements in list */
-	int			prealloc_size;	/* number of elements list can hold */
+	int		 	prealloc_next;	/* next element in circular array */
+	int			prealloc_nelem;	/* number of elements in circular array */
+	int			prealloc_size;	/* total size of circular array */
+	int			prefetch_dist;	/* distance to prefetch */
 } LogicalTape;
 
 /*
@@ -219,7 +225,8 @@ struct LogicalTapeSet
 static void ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer);
 static void ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer);
 static long ltsGetFreeBlock(LogicalTapeSet *lts);
-static long ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt);
+static long ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt,
+long *prefetch_block);
 static void ltsReleaseBlock(LogicalTapeSet *lts, long blocknum);
 static void ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
  SharedFileSet *fileset);
@@ -329,6 +336,10 @@ ltsReadFillBuffer(LogicalTapeSet *lts, LogicalTape *lt)
 		else
 			lt->nextBlockNumber = TapeBlockGetTrailer(thisbuf)->next;
 
+		if (TapeBlockGetTrailer(thisbuf)->prefetch >= 0)
+			BufFilePrefetchBlock(lts->pfile,
+ TapeBlockGetTrailer(thisbuf)->prefetch);
+
 		/* Advance to next block, if we have buffer space left */
 	} while (lt->buffer_size - lt->nbytes > BLCKSZ);
 
@@ -424,38 +435,79 @@ ltsGetFreeBlock(LogicalTapeSet *lts)
  * Refill the preallocation list if necessary.
  */
 static long
-ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt)
+ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt, long *prefetch)
 {
-	/* sorted in descending order, so return the last element */
-	if (lt->nprealloc > 0)
-		return lt->prealloc[--lt->nprealloc];
+	long	block;
+	int		prefetch_idx;
+
+	if (lt->prealloc_nelem > lt->prefetch_dist)
+	{
+		prefetch_id

Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Andres Freund

Hi,

On 2020-05-22 19:32:57 -0400, Alvaro Herrera wrote:
> On 2020-Apr-09, Michael Paquier wrote:
> 
> > Playing more with this stuff, it happens that we have zero code
> > coverage for currtid() and currtid2(), and the main user of those
> > functions I can find around is the ODBC driver:
> > https://coverage.postgresql.org/src/backend/utils/adt/tid.c.gcov.html
> 
> Yeah, they're there solely for ODBC as far as I know.

And there only for very old servers (< 8.2), according to Hiroshi
Inoue. Found that out post 12 freeze. I was planning to drop them for
13, but I unfortunately didn't get around to do so :(

I guess we could decide to make a freeze exception to remove them now,
although I'm not sure the reasons for doing so are strong enough.


> > There are multiple cases to consider, particularly for views:
> > - Case of a view with ctid as attribute taken from table.
> > - Case of a view with ctid as attribute with incorrect attribute
> > type.
> > It is worth noting that all those code paths can trigger various
> > elog() errors, which is not something that a user should be able to do
> > using a SQL-callable function.  There are also two code paths for
> > cases where a view has no or more-than-one SELECT rules, which cannot
> > normally be reached.
> 
> > All in that, I propose something like the attached to patch the
> > surroundings with tests to cover everything I could think of, which I
> > guess had better be backpatched?
> 
> I don't know, but this stuff is so unused that your patch seems
> excessive ... and I think we'd rather not backpatch something so large.
> I propose we do something less invasive in the backbranches, like just
> throw elog() errors (nothing fancy) where necessary to avoid the
> crashes.  Even for pg12 it seems that that should be sufficient.
> 
> For pg13 and beyond, I liked Tom's idea of installing dummy functions

I concur that it seems unnecessary to make these translatable, even with
the reduced scope from
https://www.postgresql.org/message-id/20200526025959.GE6155%40paquier.xyz

Greetings,

Andres Freund

Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-28 Thread Tom Lane

Tomas Vondra  writes:
> On Wed, May 27, 2020 at 09:58:04PM +0800, Andy Fan wrote:
>> so we need to optimize the cost model for such case, the method is the
>> patch I mentioned above.

> Making the planner more robust w.r.t. to estimation errors is nice, but
> I wouldn't go as far saying we should optimize for such cases.

Yeah, it's a serious mistake to try to "optimize" for cases where we have
no data or wrong data.  By definition, we don't know what we're doing,
so who's to say whether we've made it better or worse?  And the possible
side effects on cases where we do have good data are not to be ignored.

> Anyway, I kinda doubt making the conditions 1.001 more expensive is a
> way to make the planning more robust. I'm pretty sure we could construct
> examples in the opposite direction, in which case this change make it
> more likely we use the wrong index.

The other serious error we could be making here is to change things on
the basis of just a few examples.  You really need a pretty wide range
of test cases to be sure that you're not making things worse, any time
you're twiddling basic parameters like these.

regards, tom lane

Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Andres Freund

Hi,

On 2020-04-05 12:51:56 -0400, Tom Lane wrote:
> (2) The proximate cause of the crash is that rd_tableam is zero,
> so that the interface functions in tableam.h just crash hard.
> This seems like a pretty bad thing; does anyone want to promise
> that there are no other oversights of the same ilk elsewhere,
> and never will be?
> 
> I think it might be a good idea to make relations-without-storage
> set up rd_tableam as a vector of dummy functions that will throw
> some suitable complaint about "relation lacks storage".  NULL is
> a horrible default for this.

I don't have particularly strong views here. I can see a benefit to such
a pseudo AM. I can even imagine that there might some cases where we
would actually introduce some tableam functions for e.g. partitioned or
views tables, to centralize their handling more, instead of having such
considerations more distributed.  Clearly not worth actively trying to
do that for all existing code dealing with such relkinds, but there
might be cases where it's worthwhile.

OTOH, it's kinda annoying having to maintain a not insignificant number
of functions that needs to be updated whenever the tableam interface
evolves.  I guess we could partially hack our way through that by having
one such function, and just assigning it to all the mandatory callbacks
by way of a void cast. But that'd be mighty ugly.

Greetings,

Andres Freund

Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-28 Thread Andy Fan

> >so we need to optimize the cost model for such case, the method is the
> >patch I mentioned above.
>
> Making the planner more robust w.r.t. to estimation errors is nice, but
> I wouldn't go as far saying we should optimize for such cases. The stats
> can be arbitrarily off, so should we expect the error to be 10%, 100% or
> 100%?


I don't think my patch relay on anything like that.   My patch doesn't fix
the
statistics issue,  just adding the extra cost on qual cost on Index Filter
part.
Assume the query pattern are where col1= X and col2 = Y. The impacts are :
1).  Make the cost of (col1, other_column) is higher than (col1, col2)
2). The relationship between seqscan and index scan on index (col1,
other_column)
is changed, (this is something I don't want).  However my cost difference
between
index scan & seq scan usually very huge, so the change above should has
nearly no impact on that choice.   3). Make the cost higher index scan for
Index (col1) only.  Overall I think nothing will make thing worse.


> We'd probably end up with plans that handle worst cases well,
> but the average performance would end up being way worse :-(
>
>
That's possible,  that's why I hope to get some feedback on that.  Actually
I
can't think out such case.   can you have anything like that in mind?


I'm feeling that (qpqual_cost.per_tuple * 1.001) is not good enough since
user
may have some where expensive_func(col1) = X.   we may change it
cpu_tuple_cost + qpqual_cost.per_tuple  + (0.0001) * list_lenght(qpquals).

-- 
Best Regards
Andy Fan

Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Tom Lane

Andres Freund  writes:
> On 2020-04-05 12:51:56 -0400, Tom Lane wrote:
>> I think it might be a good idea to make relations-without-storage
>> set up rd_tableam as a vector of dummy functions that will throw
>> some suitable complaint about "relation lacks storage".  NULL is
>> a horrible default for this.

> OTOH, it's kinda annoying having to maintain a not insignificant number
> of functions that needs to be updated whenever the tableam interface
> evolves.

That argument sounds pretty weak.  If you're making breaking changes
in the tableam API, updating the signatures (not even any code) of
some dummy functions seems like by far the easiest part.

regards, tom lane

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-28 Thread Jeff Davis

On Thu, 2020-05-28 at 20:57 +0200, Tomas Vondra wrote:
> Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and
> create_groupingsets_plan.

Looks good, except one question:

Why would aggstrategy ever be MIXED when in create_agg_path? Wouldn't
that only happen in create_groupingsets_path?

Regards,
Jeff Davis

Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-28 Thread Andy Fan

Thanks all of you for your feedback.

On Fri, May 29, 2020 at 9:04 AM Tom Lane  wrote:

> Tomas Vondra  writes:
> > On Wed, May 27, 2020 at 09:58:04PM +0800, Andy Fan wrote:
> >> so we need to optimize the cost model for such case, the method is the
> >> patch I mentioned above.
>
> > Making the planner more robust w.r.t. to estimation errors is nice, but
> > I wouldn't go as far saying we should optimize for such cases.
>
> Yeah, it's a serious mistake to try to "optimize" for cases where we have
> no data or wrong data.  By definition, we don't know what we're doing,
> so who's to say whether we've made it better or worse?

Actually I think it is a more robust way..  the patch can't fix think all
the impact
of bad statistics(That is impossible I think),  but it will make some
simple things
better and make others no worse.  By definition I think I know what we are
doing
here, like what I replied to Tomas above.  But it is possible my think is
wrong.

> The other serious error we could be making here is to change things on
> the basis of just a few examples.  You really need a pretty wide range
> of test cases to be sure that you're not making things worse, any time
> you're twiddling basic parameters like these.
>
>
I will try more thing with this direction,  thanks for suggestion.

-- 
Best Regards
Andy Fan

Re: Problem with pg_atomic_compare_exchange_u64 at 32-bit platforms

2020-05-28 Thread Andres Freund

Hi,

On 2020-05-20 10:32:18 +0300, Konstantin Knizhnik wrote:
> On 20.05.2020 08:10, Andres Freund wrote:
> > On May 19, 2020 8:05:00 PM PDT, Noah Misch  wrote:
> > > On Tue, May 19, 2020 at 04:07:29PM +0300, Konstantin Knizhnik wrote:
> > > > Definition of pg_atomic_compare_exchange_u64 requires alignment of
> > > expected
> > > > pointer on 8-byte boundary.
> > > >
> > > > pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr,
> > > >                                 uint64 *expected, uint64 newval)
> > > > {
> > > > #ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> > > >      AssertPointerAlignment(ptr, 8);
> > > >      AssertPointerAlignment(expected, 8);
> > > > #endif
> > > >
> > > >
> > > > I wonder if there are platforms  where such restriction is actually
> > > needed.
> > >
> > > In general, sparc Linux does SIGBUS on unaligned access.  Other
> > > platforms
> > > function but suffer performance penalties.
> > Indeed. Cross cacheline atomics are e.g. really expensive on x86. 
> > Essentially requiring a full blown bus lock iirc.
> >
> Please notice that here we talk about alignment not of atomic pointer
> itself, but of pointer to the expected value.

That wasn't particularly clear in your first email... In hindsight I
can see that you meant that, but I'm not surprised to not have
understood that the on the first read either.

> At Intel CMPXCHG instruction read and write expected value throw AX
> register.
> So alignment of pointer to expected value in pg_atomic_compare_exchange_u64
> is not needed in this case.

x86 also supports doing a CMPXCHG crossing a cacheline boundary, it's
just terrifyingly expensive...

I can imagine this being a problem on a 32bit platforms, but on 64bit
platforms, it seems only an insane platform ABI would only have 4 byte
alignment on 64bit integers. That'd cause so much unnecessarily split
cachlines... That's separate from the ISA actually supporting doing such
reads efficiently, of course.

But that still leaves the alignment check on expected to be too strong
on 32 bit platforms where 64bit alignment is only 4 bytes. I'm doubtful
that's it's a good idea to use a comparison value potentially split
across cachelines for an atomic operation that's potentially
contended. But also, I don't particularly care about 32 bit performance.

I think we should probably just drop the second assert down to
ALIGNOF_INT64. Would require a new configure stanza, but that's easy
enough to do. It's just adding
AC_CHECK_ALIGNOF(PG_INT64_TYPE)

Doing that change made me think about replace the conditional long long
int alignof logic in configure.in, and just unconditionally do the a
check for PG_INT64_TYPE, seems nicer. That made me look at Solution.pm
due to ALIGNOF_LONG_LONG, and it's interesting:
# Every symbol in pg_config.h.in must be accounted for here.  Set
# to undef if the symbol should not be defined.
my %define = (
...
ALIGNOF_LONG_LONG_INT  => 8,
...
PG_INT64_TYPE   => 'long long int',

so currently our msvc build actually claims that the alignment
requirements are what the code tests. And that's not just since
pg_config.h is autogenerated, it was that way before too:

/* The alignment requirement of a `long long int'. */
#define ALIGNOF_LONG_LONG_INT 8
/* Define to the name of a signed 64-bit integer type. */
#define PG_INT64_TYPE long long int

and has been for a while.

> And my question was whether there are some platforms where implementation of
> compare-exchange 64-bit primitive
> requires stronger alignment of "expected" pointer than one enforced by
> original alignment rules for this platform.

IIRC there's a few older platforms that have single-copy-atomicity for 8
byte values, but do *not* have it for ones not aligned to 8 byte
platforms. Despite not having such an ABI alignment.

It's not impossible to come up with a case where that could matter (if
expected pointed into some shared memory that could be read by others),
but it's hard to take them serious.

Greetings,

Andres Freund

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-05-28 Thread Kyotaro Horiguchi

At Thu, 28 May 2020 09:08:19 -0400, Dave Cramer  
wrote in 
> On Thu, 28 May 2020 at 05:11, Kyotaro Horiguchi 
> wrote:
> > Mmm. It is not the proper way to use physical replication and it's
> > totally accidental that that worked (or even it might be a bug). The
> > documentation is saying as the follows, as more-or-less the same for
> > all versions since 9.4.
> >
> > https://www.postgresql.org/docs/13/protocol-replication.html
...
> >
> While the documentation does indeed say that there is quite a bit of
> additional confusion added by:
> 
> and
> START_REPLICATION [ SLOT *slot_name* ] [ PHYSICAL ] *XXX/XXX* [ TIMELINE
> *tli* ]
> 
> If we already have a physical replication slot according to the startup
> message why do we need to specify it in the START REPLICATION message ?

I don't know, but physical replication has worked that way since
before the replication slots was introduced so we haven't needed to do
so.  Physical replication slots are not assumed as more than
memorandum for the oldest required WAL segment (and oldest xmin).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

feature idea: use index when checking for NULLs before SET NOT NULL

2020-05-28 Thread John Bachir

There's the age-old problem of SET NOT NULL being impossible on large actively 
used tables, because it needs to lock the table and do a table scan to check if 
there are any existing NULL values. I currently have a table that's not 
particularly huge but a scan takes 70 seconds, which causes unacceptable 
downtime for my entire application.

Postgres is not able to use an index when doing this check: 
https://dba.stackexchange.com/questions/267947

Would it be possible to have Postgres use an index for this check? Given the 
right index, the check could be instant and the table would only need to be 
locked for milliseconds.

(I'm sure I'm not the first person to think of this, but I couldn't find any 
other discussion on this list or elsewhere.)

Thanks for reading!
John

Re: speed up unicode normalization quick check

2020-05-28 Thread John Naylor

On Fri, May 29, 2020 at 5:59 AM Mark Dilger
 wrote:
>
> > On May 21, 2020, at 12:12 AM, John Naylor  
> > wrote:

> > very picky in general. As a test, it also successfully finds a
> > function for the OS "words" file, the "D" sets of codepoints, and for
> > sets of the first n built-in OIDs, where n > 5.
>
> Prior to this patch, src/tools/gen_keywordlist.pl is the only script that 
> uses PerfectHash.  Your patch adds a second.  I'm not convinced that 
> modifying the PerfectHash code directly each time a new caller needs 
> different multipliers is the right way to go.

Calling it "each time" with a sample size of two is a bit of a
stretch. The first implementation made a reasonable attempt to suit
future uses and I simply made it a bit more robust. In the text quoted
above you can see I tested some scenarios beyond the current use
cases, with key set sizes as low as 6 and as high as 250k.

> Could you instead make them arguments such that gen_keywordlist.pl, 
> generate-unicode_combining_table.pl, and future callers can pass in the 
> numbers they want?  Or is there some advantage to having it this way?

That is an implementation detail that callers have no business knowing about.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Incorrect comment in be-secure-openssl.c

2020-05-28 Thread Michael Paquier

On Thu, May 28, 2020 at 05:15:17PM +0200, Daniel Gustafsson wrote:
> The comment in be-secure-openssl.c didn't get the memo that the hardcoded DH
> parameters moved in 573bd08b99e277026e87bb55ae69c489fab321b8.  The attached
> updates the wording, keeping it generic enough that we wont need to update it
> should the parameters move again.

Indeed, looks good to me.  I'll go fix, ust let's wait and see first
if others have any comments.
--
Michael


signature.asc
Description: PGP signature

Re: OpenSSL 3.0.0 compatibility

2020-05-28 Thread Laurenz Albe

On Fri, 2020-05-29 at 00:16 +0200, Daniel Gustafsson wrote:
> Since OpenSSL is now releasing 3.0.0-alpha versions, I took a look at using it
> with postgres to see what awaits us.  As it is now shipping in releases (with
> GA planned for Q4), users will probably soon start to test against it so I
> wanted to be prepared.
> 
> Regarding the deprecations, we can either set preprocessor directives or use
> compiler flags to silence the warning and do nothing (for now), or we could
> update to the new API.  We probably want to different things for master vs
> back-branches, but as an illustration of what the latter could look like I've
> implemented this in 0001.

An important question will be: if we convert to functions that are not 
deprecated,
what is the earliest OpenSSL version we can support?

Yours,
Laurenz Albe

Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Michael Paquier

On Thu, May 28, 2020 at 05:55:59PM -0700, Andres Freund wrote:
> And there only for very old servers (< 8.2), according to Hiroshi
> Inoue. Found that out post 12 freeze. I was planning to drop them for
> 13, but I unfortunately didn't get around to do so :(

[... digging ...]
Ah, I think I see your point from the code.  That's related to the use
of RETURNING for ctids.

> I guess we could decide to make a freeze exception to remove them now,
> although I'm not sure the reasons for doing so are strong enough.

Not sure that's a good thing to do after beta1 for 13, but there is an
argument for that in 14.  FWIW, my company is a huge user of the ODBC
driver (perhaps the biggest one?), and we have nothing even close to
8.2.

> I concur that it seems unnecessary to make these translatable, even with
> the reduced scope from
> https://www.postgresql.org/message-id/20200526025959.GE6155%40paquier.xyz

Okay, I have switched the patch to do that.  Any comments or
objections?
--
Michael
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 4ce8375eab..ac232a238f 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -31,6 +31,7 @@
 #include "parser/parsetree.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/varlena.h"
@@ -349,6 +350,12 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 	return (Datum) 0;
 }
 
+/*
+ *	currtid_byreloid
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid(), and
+ *		uses in input the OID of the relation to look at.
+ */
 Datum
 currtid_byreloid(PG_FUNCTION_ARGS)
 {
@@ -378,6 +385,11 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	if (rel->rd_rel->relkind == RELKIND_VIEW)
 		return currtid_for_view(rel, tid);
 
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		elog(ERROR, "cannot look at latest visible tid for relation \"%s.%s\"",
+			 get_namespace_name(RelationGetNamespace(rel)),
+			 RelationGetRelationName(rel));
+
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
@@ -391,6 +403,12 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	PG_RETURN_ITEMPOINTER(result);
 }
 
+/*
+ *	currtid_byrelname
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid2(), and
+ *		uses in input the relation name to look at.
+ */
 Datum
 currtid_byrelname(PG_FUNCTION_ARGS)
 {
@@ -415,6 +433,11 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	if (rel->rd_rel->relkind == RELKIND_VIEW)
 		return currtid_for_view(rel, tid);
 
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		elog(ERROR, "cannot look at latest visible tid for relation \"%s.%s\"",
+			 get_namespace_name(RelationGetNamespace(rel)),
+			 RelationGetRelationName(rel));
+
 	result = (ItemPointer) palloc(sizeof(ItemPointerData));
 	ItemPointerCopy(tid, result);
 
diff --git a/src/test/regress/expected/tid.out b/src/test/regress/expected/tid.out
new file mode 100644
index 00..e7e0d74780
--- /dev/null
+++ b/src/test/regress/expected/tid.out
@@ -0,0 +1,106 @@
+-- tests for functions related to TID handling
+CREATE TABLE tid_tab (a int);
+-- min() and max() for TIDs
+INSERT INTO tid_tab VALUES (1), (2);
+SELECT min(ctid) FROM tid_tab;
+  min  
+---
+ (0,1)
+(1 row)
+
+SELECT max(ctid) FROM tid_tab;
+  max  
+---
+ (0,2)
+(1 row)
+
+TRUNCATE tid_tab;
+-- Tests for currtid() and currtid2() with various relation kinds
+-- Materialized view
+CREATE MATERIALIZED VIEW tid_matview AS SELECT a FROM tid_tab;
+SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  tid (0, 1) is not valid for relation "tid_matview"
+SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- fails
+ERROR:  tid (0, 1) is not valid for relation "tid_matview"
+INSERT INTO tid_tab VALUES (1);
+REFRESH MATERIALIZED VIEW tid_matview;
+SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- ok
+ currtid 
+-
+ (0,1)
+(1 row)
+
+SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- ok
+ currtid2 
+--
+ (0,1)
+(1 row)
+
+DROP MATERIALIZED VIEW tid_matview;
+TRUNCATE tid_tab;
+-- Sequence
+CREATE SEQUENCE tid_seq;
+SELECT currtid('tid_seq'::regclass::oid, '(0,1)'::tid); -- ok
+ currtid 
+-
+ (0,1)
+(1 row)
+
+SELECT currtid2('tid_seq'::text, '(0,1)'::tid); -- ok
+ currtid2 
+--
+ (0,1)
+(1 row)
+
+DROP SEQUENCE tid_seq;
+-- Index, fails with incorrect relation type
+CREATE INDEX tid_ind ON tid_tab(a);
+SELECT currtid('tid_ind'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  "tid_ind" is an index
+SELECT currtid2('tid_ind'::text, '(0,1)'::tid); -- fails
+ERROR:  "tid_ind" is an index
+DROP INDEX tid_ind;
+-- Partitioned table, no storage
+CREATE TABLE tid_part (a int) PARTITION BY RANGE (a);
+SELECT currtid('tid_part'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  cannot look at latest visible

Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-05-28 Thread Sergei Kornilov

Hello

Correct index lookup is a difficult task. I tried to implement this 
previously...

But the answer in SO is a bit incomplete for recent postgresql releases. 
Seqscan is not the only possible way to set not null in pg12+. My patch was 
commited ( https://commitfest.postgresql.org/22/1389/ ) and now it's possible 
to do this way:

alter table foos 
 add constraint foos_not_null 
 check (bar1 is not null) not valid; -- short-time exclusive lock

alter table foos validate constraint foos_not_null; -- still seqscan entire 
table but without exclusive lock

An then another short lock:
alter table foos alter column bar1 set not null;
alter table foos drop constraint foos_not_null;

regards, Sergei

70 matches

Mail list logo