date:20200514

Incorrect OpenSSL type reference in code comment

2020-05-14 Thread Daniel Gustafsson

The comment which refers to the OpenSSL PEM password callback type has a small
typo, the type is called pem_password_cb and not pem_passwd_cb (which is an
easy typo to make to make since confusingly enough the functions in OpenSSL are
called SSL_*_passwd_cb).  PFA patch to fix this.

cheers ./daniel



pem_password_cb.patch
Description: Binary data

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-05-14 Thread Amit Kapila

On Wed, May 13, 2020 at 7:34 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > Now that branches are tagged, I would like to commit and backpatch
> > this patch tomorrow unless there are any more comments/objections.
>
> The "quiet period" is over as soon as the tags appear in git.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-05-14 Thread Ranier Vilela

Em qui., 14 de mai. de 2020 às 05:49, Amit Kapila 
escreveu:

> On Wed, May 13, 2020 at 7:34 PM Tom Lane  wrote:
> >
> > Amit Kapila  writes:
> > > Now that branches are tagged, I would like to commit and backpatch
> > > this patch tomorrow unless there are any more comments/objections.
> >
> > The "quiet period" is over as soon as the tags appear in git.
> >
>
> Pushed.
>
Thank you for the commit.

regards,
Ranier Vilela

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-05-14 Thread Juan José Santamaría Flecha

On Thu, May 14, 2020 at 11:07 AM Ranier Vilela  wrote:

> Em qui., 14 de mai. de 2020 às 05:49, Amit Kapila 
> escreveu:
>
>> On Wed, May 13, 2020 at 7:34 PM Tom Lane  wrote:
>> >
>> > Amit Kapila  writes:
>> > > Now that branches are tagged, I would like to commit and backpatch
>> > > this patch tomorrow unless there are any more comments/objections.
>> >
>> > The "quiet period" is over as soon as the tags appear in git.
>> >
>>
>> Pushed.
>>
> Thank you for the commit.
>

Great. Thanks to everyone involved.

Re: COPY, lock release and MVCC

2020-05-14 Thread Amit Kapila

On Thu, May 14, 2020 at 2:06 AM Laurenz Albe  wrote:
>
> > I wonder why you have removed (rel != NULL) check?
>
> That was just unexcusable sloppiness, nothing more.
>

LGTM.  I have slightly modified the commit message, see if that looks
fine to you.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0001-Make-COPY-TO-keep-locks-until-the-transaction-end.v3.patch
Description: Binary data

Re: pgsql: Show opclass and opfamily related information in psql

2020-05-14 Thread Alexander Korotkov

Hi!

On Tue, May 12, 2020 at 12:09 AM Alvaro Herrera
 wrote:
> On 2020-Mar-08, Alexander Korotkov wrote:
>
> > Show opclass and opfamily related information in psql
> >
> > This commit provides psql commands for listing operator classes, operator
> > families and its contents in psql.  New commands will be useful for 
> > exploring
> > capabilities of both builtin opclasses/opfamilies as well as
> > opclasses/opfamilies defined in extensions.
>
> I had chance to use these new commands this morning.

Great, thank you!

> Note how operator for strategy 1 are all together, then strategy 2, and
> so on.  But I think we'd prefer the operators to be grouped together for
> the same types (just like \dAp already works); so I would change the clause
> from:
>   ORDER BY 1, 2, o.amopstrategy, 3;
> to:
>   ORDER BY 1, 2, pg_catalog.format_type(o.amoplefttype, NULL), 
> pg_catalog.format_type(o.amoprighttype, NULL), o.amopstrategy;

+1

> Also, while I'm going about this, ISTM it'd make sense to
> list same-class operators first, followed by cross-class operators.
> That requires to add "o.amoplefttype = o.amoprighttype DESC," after
> "ORDER BY 1, 2,".  For brin's integer_minmax_ops, the resulting list
> would have first (bigint,bigint) then (integer,integer) then
> (smallint,smallint), then all the rest:

+1

Nikita, what do you think?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Show opclass and opfamily related information in psql

2020-05-14 Thread Nikita Glukhov


On 14.05.2020 12:52, Alexander Korotkov wrote:


Nikita, what do you think?


I agree that this patch is an improvement.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Show opclass and opfamily related information in psql

2020-05-14 Thread Alexander Korotkov

On Thu, May 14, 2020 at 1:30 PM Nikita Glukhov  wrote:
> I agree that this patch is an improvement.

OK, I'm going to push this patch if no objections.
(Sergey doesn't seem to continue involvement in PostgreSQL
development, so it doesn't look like we should wait for him)

--
Alexander Korotkov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Does TupleQueueReaderNext() really need to copy its result?

2020-05-14 Thread Thomas Munro

On Tue, Aug 27, 2019 at 6:35 AM Andres Freund  wrote:
> On 2019-08-26 14:09:45 -0400, Robert Haas wrote:
> > There's a comment in htup.h which says:
> >
> >  * * Separately allocated tuple: t_data points to a palloc'd chunk that
> >  *   is not adjacent to the HeapTupleData.  (This case is deprecated 
> > since
> >  *   it's difficult to tell apart from case #1.  It should be used only 
> > in
> >  *   limited contexts where the code knows that case #1 will never 
> > apply.)
> >
> > I got scared and ran away.
>
> Perhaps this'd could be sidestepped by funneling through MinimalTuples
> instead of HeapTuples. Afaict that should always be sufficient, because
> all system column accesses ought to happen below (including being
> projected into a separate column, if needed above). With the added
> benefit of needing less space, of course.

I tried that out (attached).  That makes various simple tests like
this to go 10%+ faster on my development machine:

  create table s as select generate_series(1, 5000)::int i,
   'hello world' a,
   'this is a message' b,
   42 c;
  select pg_prewarm('s');
  set force_parallel_mode = on;

  explain analyze select * from s;

PS  It looks like the following load of mq_ring_size might be running
a little hot due to false sharing with the atomic counters:

   if (mqh->mqh_consume_pending > mq->mq_ring_size / 4)
From 0ba21ee67c9e8f2be404fa7e16d30a815310c52a Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 14 May 2020 19:08:50 +1200
Subject: [PATCH] Use MinimalTuple for tuple queues.

This saves 8 bytes per tuple in the queue, and avoids the need to copy
and free tuples on the receiving side.

Gather can emit the returned MinimalTuple directly, but GatherMerge now
needs to make an explicit copy because it buffers multiple tuples at a
time.

Discussion: https://postgr.es/m/CA%2BhUKG%2B8T_ggoUTAE-U%3DA%2BOcPc4%3DB0nPPHcSfffuQhvXXjML6w%40mail.gmail.com
---
 src/backend/executor/nodeGather.c  | 16 +--
 src/backend/executor/nodeGatherMerge.c | 40 ++
 src/backend/executor/tqueue.c  | 30 +--
 src/include/executor/tqueue.h  |  4 +--
 4 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 6b8ed867d5..a01b46af14 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -46,7 +46,7 @@
 
 static TupleTableSlot *ExecGather(PlanState *pstate);
 static TupleTableSlot *gather_getnext(GatherState *gatherstate);
-static HeapTuple gather_readnext(GatherState *gatherstate);
+static MinimalTuple gather_readnext(GatherState *gatherstate);
 static void ExecShutdownGatherWorkers(GatherState *node);
 
 
@@ -120,7 +120,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
 	 * Initialize funnel slot to same tuple descriptor as outer plan.
 	 */
 	gatherstate->funnel_slot = ExecInitExtraTupleSlot(estate, tupDesc,
-	  &TTSOpsHeapTuple);
+	  &TTSOpsMinimalTuple);
 
 	/*
 	 * Gather doesn't support checking a qual (it's always more efficient to
@@ -266,7 +266,7 @@ gather_getnext(GatherState *gatherstate)
 	PlanState  *outerPlan = outerPlanState(gatherstate);
 	TupleTableSlot *outerTupleSlot;
 	TupleTableSlot *fslot = gatherstate->funnel_slot;
-	HeapTuple	tup;
+	MinimalTuple	tup;
 
 	while (gatherstate->nreaders > 0 || gatherstate->need_to_scan_locally)
 	{
@@ -278,9 +278,9 @@ gather_getnext(GatherState *gatherstate)
 
 			if (HeapTupleIsValid(tup))
 			{
-ExecStoreHeapTuple(tup, /* tuple to store */
-   fslot,	/* slot to store the tuple */
-   true);	/* pfree tuple when done with it */
+ExecStoreMinimalTuple(tup, /* tuple to store */
+	  fslot,	/* slot to store the tuple */
+	  false);	/* don't pfree tuple  */
 return fslot;
 			}
 		}
@@ -308,7 +308,7 @@ gather_getnext(GatherState *gatherstate)
 /*
  * Attempt to read a tuple from one of our parallel workers.
  */
-static HeapTuple
+static MinimalTuple
 gather_readnext(GatherState *gatherstate)
 {
 	int			nvisited = 0;
@@ -316,7 +316,7 @@ gather_readnext(GatherState *gatherstate)
 	for (;;)
 	{
 		TupleQueueReader *reader;
-		HeapTuple	tup;
+		MinimalTuple tup;
 		bool		readerdone;
 
 		/* Check for async events, particularly messages from workers. */
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 317ddb4ae2..47129344f3 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -45,7 +45,7 @@
  */
 typedef struct GMReaderTupleBuffer
 {
-	HeapTuple  *tuple;			/* array of length MAX_TUPLE_STORE */
+	MinimalTuple *tuple;		/* array of length MAX_TUPLE_STORE */
 	int			nTuples;		/* number of tuples currently stored */
 	int			readCounter;	/* index of next tuple to extract */
 	bool		done;			/* true if reader is known

Re: Incorrect OpenSSL type reference in code comment

2020-05-14 Thread Heikki Linnakangas


On 14/05/2020 11:07, Daniel Gustafsson wrote:

The comment which refers to the OpenSSL PEM password callback type has a small
typo, the type is called pem_password_cb and not pem_passwd_cb (which is an
easy typo to make to make since confusingly enough the functions in OpenSSL are
called SSL_*_passwd_cb).  PFA patch to fix this.


Applied, thanks!

- Heikki

Re: ldap tls test fails in some environments

2020-05-14 Thread Christoph Berg

Re: Thomas Munro
> > 17:28:59 Data directory: 
> > /<>/build/src/test/ldap/tmp_check/t_001_auth_node_data/pgdata
> 
> I know nothing about Debian package building so I could be missing
> something about how this works, but I wonder if our script variable
> handling is hygienic enough for paths like that.

That's from the sbuild log filter, the special chars are not the
problem:

17:06:41 I: NOTICE: Log filtering will replace 
'build/postgresql-13-fdfISX/postgresql-13-13~~devel~20200513.1505' with 
'<>'
17:06:41 I: NOTICE: Log filtering will replace 'build/postgresql-13-fdfISX' 
with '<>'

> > It consistently fails on the build server, but works on my notebook.
> > Maybe that simply means slapd is crashing, but there's no slapd
> > output. Would it be possible to start slapd with "-d 255", even if
> > that means it doesn't background itself?
> 
> That'd require more scripting to put it in the background...

Maybe adding "&" is enough, provided it still writes the pid file for
shutting down...

> This leads me to suspect that something in your build server's
> environment that comes later in that list is overridding the
> TLS_REQCERT setting.  If that's the explanation, perhaps we should do
> this, which seems to work OK on my system, since it comes last in the
> list:

It's not messing with environment variables in that area.

I'll see if I can catch a shell in the environment where it fails.

Christoph

Re: effective_io_concurrency's steampunk spindle maths

2020-05-14 Thread Thomas Munro

On Wed, May 13, 2020 at 6:58 AM Peter Geoghegan  wrote:
> Shouldn't you close out the "Should we rename
> effective_io_concurrency?" Postgres 13 open item now?

Yeah, that doesn't really seem worth the churn.  I'll move it to the
resolved list in a day or two if no one shows up to argue for a
rename.

Re: making update/delete of inheritance trees scale better

2020-05-14 Thread Ashutosh Bapat

On Wed, May 13, 2020 at 9:21 AM Amit Langote  wrote:
>
> Maybe I am misunderstanding you, but the more the rows to update, the
> more overhead we will be paying with the new approach.

Yes, that's right. How much is that compared to the current planning
overhead. How many rows it takes for that overhead to be comparable to
the current planning overhead.

But let's not sweat on that point much right now.

>
> So, we will need to do 2 things:
>
> 1. Implicitly apply an ORDER BY tableoid clause
> 2. Add result relation RTIs to ModifyTable.resultRelations in the
> order of their RTE's relid.
>
> Maybe we can do that as a separate patch.  Also, I am not sure if it
> will get in the way of someone wanting to have ORDER BY LIMIT for
> updates.

It won't. But may be David's idea is better.

>
> > > > * Tuple re-routing during UPDATE. For now it's disabled so your design
> > > > should work. But we shouldn't design this feature in such a way that
> > > > it comes in the way to enable tuple re-routing in future :).
> > >
> > > Sorry, what is tuple re-routing and why does this new approach get in its 
> > > way?
> >
> > An UPDATE causing a tuple to move to a different partition. It would
> > get in its way since the tuple will be located based on tableoid,
> > which will be the oid of the old partition. But I think this approach
> > has higher chance of being able to solve that problem eventually
> > rather than the current approach.
>
> Again, I don't think I understand.   We do currently (as of v11)
> re-route tuples when UPDATE causes them to move to a different
> partition, which, gladly, continues to work with my patch.

Ah! Ok. I missed that part then.

>
> So how it works is like this: for a given "new" tuple, ExecUpdate()
> checks if the tuple would violate the partition constraint of the
> result relation that was passed along with the tuple.  If it does, the
> new tuple will be moved, by calling ExecDelete() to delete it from the
> current relation, followed by ExecInsert() to find the new home for
> the tuple.  The only thing that changes with the new approach is how
> ExecModifyTable() chooses a result relation to pass to ExecUpdate()
> for a given "new" tuple it has fetched from the plan, which is quite
> independent from the tuple re-routing mechanism proper.
>

Thanks for the explanation.

-- 
Best Wishes,
Ashutosh Bapat

Potentially misleading name of libpq pass phrase hook

2020-05-14 Thread Daniel Gustafsson

Reviewing TLS changes for v13 I came across one change which I think might be
better off with a library qualified name.  The libpq frontend sslpassword hook
added in 4dc63552109f65 is OpenSSL specific, but it has a very generic name:

PQsetSSLKeyPassHook(PQsslKeyPassHook_type hook);

This IMO has potential for confusion if we ever commit another TLS backend,
since the above hook wont work for any other library (except maybe OpenSSL
derivatives like LibreSSL et.al).  The backends will always have differently
named hooks, as the signatures will be different, but having one with a generic
name and another with a library qualified name doesn't seem too friendly to
anyone implementing with libpq.

As a point of reference; in the backend we added a TLS init hook in commit
896fcdb230e72 which also is specific to OpenSSL, but the name is library
qualified making the purpose and usecase perfectly clear: openssl_tls_init_hook.

Since we haven't shipped this there is still time to rename, which IMO is the
right way forward.  PQsslKeyPassHook__type would be one option, but
perhaps there are better alternatives?

Thoughts?

cheers ./daniel

Re: COPY, lock release and MVCC

2020-05-14 Thread Laurenz Albe

On Thu, 2020-05-14 at 15:11 +0530, Amit Kapila wrote:
> LGTM.  I have slightly modified the commit message, see if that looks
> fine to you.

Fine with me, thanks.

> This breaks the cases where a REPEATABLE READ transaction could see an
> empty table if it repeats a COPY statement and somebody truncated the
> table in the meantime.

I would use "case" rather than "cases" here.

Yours,
Laurenz Albe

Re: new heapcheck contrib module

2020-05-14 Thread Mark Dilger




> On May 13, 2020, at 5:36 PM, Peter Geoghegan  wrote:
> 
> On Wed, May 13, 2020 at 5:18 PM Mark Dilger
>  wrote:
>> I am not removing any assertions.  I do not propose to remove any 
>> assertions. When I talk about "hardening against assertions", that is not in 
>> any way a proposal to remove assertions from the code.
> 
> I'm sorry if I seemed to suggest that you wanted to remove assertions

Not a problem at all.  As always, I appreciate your involvement in this code 
and design review. 


>> I think this is a misrepresentation of the tests that I've been running.
> 
> I didn't actually mean it that way, but I can see how my words could
> reasonably be interpreted that way. I apologize.

Again, no worries. 

>> There are two kinds of tests that I have done:
>> 
>> First, there is the regression tests, t/004_verify_heapam.pl, which is 
>> obviously contrived.  That was included in the regression test suite because 
>> it needed to be something other developers could read, verify, "yeah, I can 
>> see why that would be corruption, and would give an error message of the 
>> sort the test expects", and then could be run to verify that indeed that 
>> expected error message was generated.
> 
> I still don't think that this is necessary. It could work for one type
> of corruption, that happens to not have any of the problems, but just
> testing that one type of corruption seems rather arbitrary to me.

As discussed with Robert off list, this probably doesn't matter.  The patch can 
be committed with or without this particular TAP test.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PG 13 release notes, first draft

2020-05-14 Thread Bruce Momjian

On Thu, May 14, 2020 at 07:23:05AM +0200, Fabien COELHO wrote:
> 
> Hello Bruce,
> 
> > > > >  * 34a0a81bfb
> > > > 
> > > > We already have:
> > > > 
> > > > Reformat tables containing function information for better
> > > > clarity (Tom Lane)
> > > > 
> > > > so it seems it is covered as part of this.
> > > 
> > > AFAICR this one is not by the same author, and although the point was 
> > > about
> > > better clarity, it was not about formating but rather about restructuring
> > > text vs binary string function documentations. Then Tom reformatted the
> > > result.
> > 
> > Well, we were not even clear we should document changes in the functions
> > section, so going into details of all the changes seems unwise.
> 
> The restructuring was a significant change, and ISTM that another function
> of the release note is also to implicitely thank contributors (their name is
> appended, which does not bring any useful information about the feature from
> a release note perspective) hence my suggestion to include this one,
> the author of which is not Tom Lane.

We list people's names next to items.  We don't list items to list
people's names, as far as I know of the policy.  If you want to change
that, you will need to start a new thread and get agreement.

> > > > >  * e829337d42
> > > > 
> > > > Uh, this is a doc link formatting addition.  I think this falls into the
> > > > error message logic, where it is nice when people want it, but they
> > > > don't need to know about it ahead of time.
> > > 
> > > [...]
> > 
> > I don't see it.
> 
> While reading again the sequence, ISTM that I did not understand your first
> answer, so my answer was kind-of off topic, sorry. This is indeed "link
> formatting addition", which helps making the libpq doc more usable.

> Probably you do not need to know about it in advance, but I do not think
> that it is a good reason not to include it: with the same argument, a
> performance improvement would not need to be advertise, you'll see it when
> you need it. The same holds for all non-functional improvements, and there
> are many which are listed.

Peformance items are listed only if they will produce a visible change
in performance, or enable new workloads that were too slow in the past.

> > > Possibly, but as the "THIS WAS NOT DOCUMENTED BEFORE?" question seemed to
> > > still be in the release notes, I gathered that the information had not
> > > reached its destination, hence the possible repetition. But maybe the 
> > > issue
> > > is that this answer is not satisfactory. Sorry for the inconvenience.
> > 
> > I removed it already based on feedback from someone else.
> 
> Good. I looked at the online version which is off the latest commits by a
> few hours.
> 
> I'd consider moving "Upgrade to use DocBook 4.5 (Peter Eisentraut)" to the
> doc section, maybe.

Agreed, done.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Strange decreasing value of pg_last_wal_receive_lsn()

2020-05-14 Thread Jehan-Guillaume de Rorthais

(please, the list policy is bottom-posting to keep history clean, thanks).

On Thu, 14 May 2020 07:18:33 +0500
godjan •  wrote:

> -> Why do you kill -9 your standby?   
> Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data
> in such situation.

OK. This test is highly useful to stress data high availability and durability,
of course. However, how useful is this test in a context of auto failover for
**service** high availability?  If all your nodes are killed in the same
disaster, how/why an automatic cluster manager should take care of starting all
nodes again and pick the right node to promote?

> So, now we update logic as Michael said. All ha alive standbys now waiting
> for replaying all WAL that they have and after we use pg_last_replay_lsn() to
> choose which standby will be promoted in failover.
> 
> It fixed out trouble, but there is one another. Now we should wait when all
> ha alive hosts finish replaying WAL to failover. It might take a while(for
> example WAL contains wal_record about splitting b-tree).

Indeed, this is the concern I wrote about yesterday in a second mail on this
thread.

> We are looking for options that will allow us to find a standby that contains
> all data and replay all WAL only for this standby before failover.

Note that when you promote a node, it first replays available WALs before
acting as a primary. So you can safely signal the promotion to the node and
wait for it to finish the replay and promote.

> Maybe you have ideas on how to keep the last actual value of
> pg_last_wal_receive_lsn()? 

Nope, no clean and elegant idea. One your instances are killed, maybe you can
force flush the system cache (secure in-memory-only data) and read the latest
received WAL using pg_waldump?

But, what if some more data are available from archives, but not received from
streaming rep because of a high lag?

> As I understand WAL receiver doesn’t write to disk walrcv->flushedUpto.

I'm not sure to understand what you mean here.
pg_last_wal_receive_lsn() reports the actual value of walrcv->flushedUpto.
walrcv->flushedUpto reports the latest LSN force-flushed to disk.

> > On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais 
> > wrote:
> > 
> > 
> > (too bad the history has been removed to keep context)
> > 
> > On Fri, 8 May 2020 15:02:26 +0500
> > godjan •  wrote:
> >   
> >> I got it, thank you.
> >> Can you recommend what to use to determine which quorum standby should be
> >> promoted in such case? We planned to use pg_last_wal_receive_lsn() to
> >> determine which has fresh data but if it returns the beginning of the
> >> segment on both replicas we can’t determine which standby confirmed that
> >> write transaction to disk.  
> > 
> > Wait, pg_last_wal_receive_lsn() only decrease because you killed your
> > standby.
> > 
> > pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The
> > later is set to the beginning of the segment requested only during the first
> > walreceiver startup or a timeline fork:
> > 
> > /*
> >  * If this is the first startup of walreceiver (on this timeline),
> >  * initialize flushedUpto and latestChunkStart to the starting
> > point. */
> > if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli)
> > {
> > walrcv->flushedUpto = recptr;
> > walrcv->receivedTLI = tli;
> > walrcv->latestChunkStart = recptr;
> > }
> > walrcv->receiveStart = recptr;
> > walrcv->receiveStartTLI = tli;
> > 
> > After a primary loss, as far as the standby are up and running, it is fine
> > to use pg_last_wal_receive_lsn().
> > 
> > Why do you kill -9 your standby? Whay am I missing? Could you explain the
> > usecase you are working on to justify this?
> > 
> > Regards,  
> 
> 
> 

-- 
Jehan-Guillaume de Rorthais
Dalibo

[PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Ranier Vilela

Hi,
ItemPointerData, on the contrary, from what the name says,
it is not a pointer to a structure, but a structure in fact.
When assigning the name of the structure variable to a pointer, it may even
work,
but, it is not the right thing to do and it becomes a nightmare,
to discover that any other error they have is at cause.

So:
1. In some cases, there may be a misunderstanding in the use of
ItemPointerData.
2. When using the variable name in an assignment, the variable's address is
used.
3. While this works for a structure, it shouldn't be the right thing to do.
4. If we have a local variable, its scope is limited and when it loses its
scope, memory is certainly garbage.
5. While this may be working for heapam.c, I believe it is being abused and
should be compliant with
the Postgres API and use the functions that were created for this.

The patch is primarily intended to correct the use of ItemPointerData.
But it is also changing the style, reducing the scope of some variables.
If that was not acceptable, reduce the scope and someone has objections,
I can change the patch, to focus only on the use of ItemPointerData.
But as style changes are rare, if possible, it would be good to seize the
opportunity.

regards,
Ranier Vilela


001_fix_outside_scope_t_ctid.patch
Description: Binary data

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Mark Dilger

> On May 14, 2020, at 10:40 AM, Ranier Vilela  wrote:
> 
> Hi,
> ItemPointerData, on the contrary, from what the name says, 
> it is not a pointer to a structure, but a structure in fact.

The project frequently uses the pattern

  typedef struct FooData {
  ...
  } FooData;

  typedef FooData *Foo;

where, in this example, "Foo" = "ItemPointer".

So the "Data" part of "ItemPointerData" clues the reader into the fact that 
ItemPointerData is a struct, not a pointer.  Granted, the "Pointer" part of 
that name may confuse some readers, but the struct itself does contain what is 
essentially a 48-bit pointer, so that name is not nuts.

> When assigning the name of the structure variable to a pointer, it may even 
> work, 
> but, it is not the right thing to do and it becomes a nightmare, 
> to discover that any other error they have is at cause.

Can you give a code example of the type of assigment you mean?

> So:
> 1. In some cases, there may be a misunderstanding in the use of 
> ItemPointerData.
> 2. When using the variable name in an assignment, the variable's address is 
> used.
> 3. While this works for a structure, it shouldn't be the right thing to do.
> 4. If we have a local variable, its scope is limited and when it loses its 
> scope, memory is certainly garbage.
> 5. While this may be working for heapam.c, I believe it is being abused and 
> should be compliant with 
> the Postgres API and use the functions that were created for this.
> 
> The patch is primarily intended to correct the use of ItemPointerData.
> But it is also changing the style, reducing the scope of some variables.
> If that was not acceptable, reduce the scope and someone has objections, 
> I can change the patch, to focus only on the use of ItemPointerData.
> But as style changes are rare, if possible, it would be good to seize the 
> opportunity.

I would like to see a version of the patch that only addresses your concerns 
about ItemPointerData, not because other aspects of the patch are unacceptable 
(I'm not ready to even contemplate that yet), but because I'm not sure what 
your complaint is about.  Can you restrict the patch to just address that one 
issue?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Tom Lane

Ranier Vilela  writes:
> The patch is primarily intended to correct the use of ItemPointerData.

What do you think is being "corrected" here?  It looks to me like
just some random code rearrangements that aren't even clearly
bug-free, let alone being stylistic improvements.

regards, tom lane

Re: Transactions involving multiple postgres foreign servers, take 2

2020-05-14 Thread Muhammad Usama

On Tue, May 12, 2020 at 11:45 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Thu, 30 Apr 2020 at 20:43, Masahiko Sawada
>  wrote:
> >
> > On Tue, 28 Apr 2020 at 19:37, Muhammad Usama  wrote:
> > >
> > >
> > >
> > > On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> > >>
> > >> On Fri, 27 Mar 2020 at 22:06, Muhammad Usama 
> wrote:
> > >> >
> > >> > Hi Sawada San,
> > >> >
> > >> > I have been further reviewing and testing the transaction involving
> multiple server patches.
> > >> > Overall the patches are working as expected bar a few important
> exceptions.
> > >> > So as discussed over the call I have fixed the issues I found
> during the testing
> > >> > and also rebased the patches with the current head of the master
> branch.
> > >> > So can you please have a look at the attached updated patches.
> > >>
> > >> Thank you for reviewing and updating the patch!
> > >>
> > >> >
> > >> > Below is the list of changes I have made on top of V18 patches.
> > >> >
> > >> > 1- In register_fdwxact(), As we are just storing the callback
> function pointers from
> > >> > FdwRoutine in fdw_part structure, So I think we can avoid calling
> > >> > GetFdwRoutineByServerId() in TopMemoryContext.
> > >> > So I have moved the MemoryContextSwitch to TopMemoryContext after
> the
> > >> > GetFdwRoutineByServerId() call.
> > >>
> > >> Agreed.
> > >>
> > >> >
> > >> >
> > >> > 2- If PrepareForeignTransaction functionality is not present in
> some FDW then
> > >> > during the registration process we should only set the
> XACT_FLAGS_FDWNOPREPARE
> > >> > transaction flag if the modified flag is also set for that server.
> As for the server that has
> > >> > not done any data modification within the transaction we do not do
> two-phase commit anyway.
> > >>
> > >> Agreed.
> > >>
> > >> >
> > >> > 3- I have moved the foreign_twophase_commit in sample file after
> > >> > max_foreign_transaction_resolvers because the default value of
> max_foreign_transaction_resolvers
> > >> > is 0 and enabling the foreign_twophase_commit produces an error
> with default
> > >> > configuration parameter positioning in postgresql.conf
> > >> > Also, foreign_twophase_commit configuration was missing the comments
> > >> > about allowed values in the sample config file.
> > >>
> > >> Sounds good. Agreed.
> > >>
> > >> >
> > >> > 4- Setting ForeignTwophaseCommitIsRequired in
> is_foreign_twophase_commit_required()
> > >> > function does not seem to be the correct place. The reason being,
> even when
> > >> > is_foreign_twophase_commit_required() returns true after setting
> ForeignTwophaseCommitIsRequired
> > >> > to true, we could still end up not using the two-phase commit in
> the case when some server does
> > >> > not support two-phase commit and foreign_twophase_commit is set to
> FOREIGN_TWOPHASE_COMMIT_PREFER
> > >> > mode. So I have moved the ForeignTwophaseCommitIsRequired
> assignment to PreCommit_FdwXacts()
> > >> > function after doing the prepare transaction.
> > >>
> > >> Agreed.
> > >>
> > >> >
> > >> > 6- In prefer mode, we commit the transaction in single-phase if the
> server does not support
> > >> > the two-phase commit. But instead of doing the single-phase commit
> right away,
> > >> > IMHO the better way is to wait until all the two-phase transactions
> are successfully prepared
> > >> > on servers that support the two-phase. Since an error during a
> "PREPARE" stage would
> > >> > rollback the transaction and in that case, we would end up with
> committed transactions on
> > >> > the server that lacks the support of the two-phase commit.
> > >>
> > >> When an error occurred before the local commit, a 2pc-unsupported
> > >> server could be rolled back or committed depending on the error
> > >> timing. On the other hand all 2pc-supported servers are always rolled
> > >> back when an error occurred before the local commit. Therefore even if
> > >> we change the order of COMMIT and PREPARE it is still possible that we
> > >> will end up committing the part of 2pc-unsupported servers while
> > >> rolling back others including 2pc-supported servers.
> > >>
> > >> I guess the motivation of your change is that since errors are likely
> > >> to happen during executing PREPARE on foreign servers, we can minimize
> > >> the possibility of rolling back 2pc-unsupported servers by deferring
> > >> the commit of 2pc-unsupported server as much as possible. Is that
> > >> right?
> > >
> > >
> > > Yes, that is correct. The idea of doing the COMMIT on
> NON-2pc-supported servers
> > > after all the PREPAREs are successful is to minimize the chances of
> partial commits.
> > > And as you mentioned there will still be chances of getting a partial
> commit even with
> > > this approach but the probability of that would be less than what it
> is with the
> > > current sequence.
> > >
> > >
> > >>
> > >>
> > >> > So I have modified the flow a little bit and instead of doing a
> one-phase co

Re: Our naming of wait events is a disaster.

2020-05-14 Thread Robert Haas

On Tue, May 12, 2020 at 10:54 PM Tom Lane  wrote:
> I don't actually understand why the LWLock tranche mechanism is designed
> the way it is.  It seems to be intended to support different backends
> having different sets of LWLocks, but I fail to see why that's a good idea,
> or even useful at all.  In any case, dynamically-created LWLocks are
> clearly out of scope for the documentation.  The problem that I'm trying
> to deal with right now is that even LWLocks that are hard-wired into the
> backend code are difficult to enumerate.  That wasn't a problem before
> we decided we needed to expose them all to user view; but now it is.

If you are using parallel query, your backend might have a DSM segment
that contains LWLocks. Anyone who is not attached to that DSM segment
won't know about them, though possibly they have a different DSM
segment containing a tranche with the same name.

Also, if you are using extensions that use LWLocks, a particular
extension may be loaded into backend A but not backend B. Suppose
backend A has an extension loaded but backend B does not. Then suppose
that A waits for an LWLock and B meanwhile examines pg_stat_activity.

It's hard to come up with totally satisfying solutions to problems
like these, but I think the important thing to remember is that, at
least in the extension case, this is really an obscure corner case.
Normally the same extensions will be loaded everywhere and anything
known to one backend will be known also the others. However, it's not
guaranteed.

I tend to prefer that modules register their own tranches rather than
having a central table someplace, because I like the idea that the
things that a particular module knows about are contained within its
own source files and not spread all over the code base. I think that
we've done rather badly with this in a number of places, lwlock.c
among them, and I don't like it much. It tends to lead to layering
violations, and it also tends to create interfaces that extensions
can't really use. I admit that it's not ideal as far as this
particular problem is concerned, but I don't think that makes it a bad
idea in general. In some sense, the lack of naming consistency here is
a manifestation of an underlying chaos in the code: we've created many
different ways of waiting for things with many different
characteristics and little thought to consistency, and this mechanism
has exposed that underlying problem. The wait state interface is
surely not to blame for the fact that LWLock names and heavyweight
lock types are capitalized inconsistently. In fact, there seem to be
few things that PostgreSQL hackers love more than inconsistent
capitalization, and this is just one of a whole lot of instances of
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module

2020-05-14 Thread Robert Haas

On Wed, May 13, 2020 at 5:33 PM Peter Geoghegan  wrote:
> Do you recall seeing corruption resulting in segfaults in production?

I have seen that, I believe. I think it's more common to fail with
errors about not being able to palloc>1GB, not being able to look up
an xid or mxid, etc. but I am pretty sure I've seen multiple cases
involving seg faults, too. Unfortunately for my credibility, I can't
remember the details right now.

> I personally don't recall seeing that. If it happened, the segfaults
> themselves probably wouldn't be the main concern.

I don't really agree. Hypothetically speaking, suppose you corrupt
your only copy of a critical table in such a way that every time you
select from it, the system seg faults. A user in this situation might
ask questions like:

1. How did my table get corrupted?
2. Why do I only have one copy of it?
3. How do I retrieve the non-corrupted portion of my data from that
table and get back up and running?

In the grand scheme of things, #1 and #2 are the most important
questions, but when something like this actually happens, #3 tends to
be the most urgent question, and it's a lot harder to get the
uncorrupted data out if the system keeps crashing.

Also, a seg fault tends to lead customers to think that the database
has a bug, rather than that the database is corrupted.

Slightly off-topic here, but I think our error reporting in this area
is pretty lame. I've learned over the years that when a customer
reports that they get a complaint about a too-large memory allocation
every time they access a table, they've probably got a corrupted
varlena header. However, that's extremely non-obvious to a typical
user. We should try to report errors indicative of corruption in a way
that gives the user some clue that corruption has happened. Peter made
a stab at improving things there by adding
errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of
users will never see the error code, only the message, and a lot of
corruption produces still produces errors that weren't changed by that
commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Ranier Vilela

Em qui., 14 de mai. de 2020 às 15:03, Mark Dilger <
mark.dil...@enterprisedb.com> escreveu:

>
>
> > On May 14, 2020, at 10:40 AM, Ranier Vilela  wrote:
> >
> > Hi,
> > ItemPointerData, on the contrary, from what the name says,
> > it is not a pointer to a structure, but a structure in fact.
>
> The project frequently uses the pattern
>
>   typedef struct FooData {
>   ...
>   } FooData;
>
>   typedef FooData *Foo;
>
> where, in this example, "Foo" = "ItemPointer".
>
> So the "Data" part of "ItemPointerData" clues the reader into the fact
> that ItemPointerData is a struct, not a pointer.  Granted, the "Pointer"
> part of that name may confuse some readers, but the struct itself does
> contain what is essentially a 48-bit pointer, so that name is not nuts.
>
>
> > When assigning the name of the structure variable to a pointer, it may
> even work,
> > but, it is not the right thing to do and it becomes a nightmare,
> > to discover that any other error they have is at cause.
>
> Can you give a code example of the type of assigment you mean?
>
htup->t_ctid = target_tid;
htup->t_ctid = newtid;
Both target_tid and newtid are local variable, whe loss scope, memory is
garbage.


>
> > So:
> > 1. In some cases, there may be a misunderstanding in the use of
> ItemPointerData.
> > 2. When using the variable name in an assignment, the variable's address
> is used.
> > 3. While this works for a structure, it shouldn't be the right thing to
> do.
> > 4. If we have a local variable, its scope is limited and when it loses
> its scope, memory is certainly garbage.
> > 5. While this may be working for heapam.c, I believe it is being abused
> and should be compliant with
> > the Postgres API and use the functions that were created for this.
> >
> > The patch is primarily intended to correct the use of ItemPointerData.
> > But it is also changing the style, reducing the scope of some variables.
> > If that was not acceptable, reduce the scope and someone has objections,
> > I can change the patch, to focus only on the use of ItemPointerData.
> > But as style changes are rare, if possible, it would be good to seize
> the opportunity.
>
> I would like to see a version of the patch that only addresses your
> concerns about ItemPointerData, not because other aspects of the patch are
> unacceptable (I'm not ready to even contemplate that yet), but because I'm
> not sure what your complaint is about.  Can you restrict the patch to just
> address that one issue?
>
Certainly.
In the same file you can find the appropriate use of the API.
ItemPointerSet(&heapTuple->t_self, blkno, offnum);

regards,
Ranier Vilela


001_fix_outside_scope_t_cid_v2.patch
Description: Binary data

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Ranier Vilela

Em qui., 14 de mai. de 2020 às 15:07, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > The patch is primarily intended to correct the use of ItemPointerData.
>
> What do you think is being "corrected" here?  It looks to me like
> just some random code rearrangements that aren't even clearly
> bug-free, let alone being stylistic improvements.
>
It is certainly working, but trusting that the memory of a local variable
will not change,
when it loses its scope, is a risk that, certainly, can cause bugs,
elsewhere.
And it is certainly very difficult to discover its primary cause.

regards,
Ranier Vilela

Re: new heapcheck contrib module

2020-05-14 Thread Robert Haas

On Wed, May 13, 2020 at 7:32 PM Alvaro Herrera  wrote:
> I agree that this (a test tool that exercises our code against
> arbitrarily corrupted data pages) is not going to work as a test that
> all buildfarm members run -- it seems something for specialized
> buildfarm members to run, or even something that's run outside of the
> buildfarm, like sqlsmith.  Obviously such a tool would not be able to
> run against an assertion-enabled build, and we shouldn't even try.

I have a question about what you mean here by "arbitrarily."

If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against heap pages full of randomly-generated
garbage, I tend to agree. Such a test wouldn't be very stable and
might fail in lots of low-probability ways that could require
unreasonable effort to find and fix.

If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against any corrupted heap pages at all, I tend to
disagree. If we did that, then we'd basically be releasing a heap
corruption checker with very limited test coverage. Like, we shouldn't
only have negative test cases, where the absence of corruption
produces no results. We should also have positive test cases, where
the thing finds some problem...

At least, that's what I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SLRU statistics

2020-05-14 Thread Robert Haas

On Thu, May 14, 2020 at 2:27 AM Fujii Masao  wrote:
> Therefore what we can do right now seems to make checkpointer report the SLRU
> stats while it's running. Other issues need more time to investigate...
> Thought?

I'm confused by why SLRU statistics are reported by messages sent to
the stats collector rather than by just directly updating shared
memory. For database or table statistics there can be any number of
objects and we can't know in advance how many there will be, so we
can't set aside shared memory for the stats in advance. For SLRUs,
there's no such problem. Just having the individual backends
periodically merge their accumulated backend-local counters into the
shared counters seems like it would be way simpler and more
performant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Our naming of wait events is a disaster.

2020-05-14 Thread Tom Lane

Robert Haas  writes:
> I tend to prefer that modules register their own tranches rather than
> having a central table someplace, because I like the idea that the
> things that a particular module knows about are contained within its
> own source files and not spread all over the code base. I think that
> we've done rather badly with this in a number of places, lwlock.c
> among them, and I don't like it much.

Well, we could solve this problem very easily by ripping out everything
having to do with wait-state monitoring ... and personally I'd be a lot
in favor of that, because I haven't seen anything about either the
design or documentation of the feature that I thought was very well
done.  However, if you'd like to have wait-state monitoring, and you'd
like the documentation for it to be more useful than "go read the code",
then I don't see any way around the conclusion that there are going to
be centralized lists of the possible wait states.

That being the case, refusing to use a centralized list in the
implementation seems rather pointless; and having some aspects of the
implementation use centralized lists (see the enums in lwlock.h and
elsewhere) while other aspects don't is just schizophrenic.

> In some sense, the lack of naming consistency here is
> a manifestation of an underlying chaos in the code: we've created many
> different ways of waiting for things with many different
> characteristics and little thought to consistency, and this mechanism
> has exposed that underlying problem.

Yeah, agreed.  Nonetheless, now we have a problem.

regards, tom lane

Re: new heapcheck contrib module

2020-05-14 Thread Peter Geoghegan

On Thu, May 14, 2020 at 11:33 AM Robert Haas  wrote:
> I have seen that, I believe. I think it's more common to fail with
> errors about not being able to palloc>1GB, not being able to look up
> an xid or mxid, etc. but I am pretty sure I've seen multiple cases
> involving seg faults, too. Unfortunately for my credibility, I can't
> remember the details right now.

I believe you, both in general, and also because what you're saying
here is plausible, even if it doesn't fit my own experience.

Corruption is by its very nature exceptional. At least, if that isn't
true then something must be seriously wrong, so the idea that it will
be different in some way each time seems like a good working
assumption. Your exceptional cases are not necessarily the same as
mine, especially where hardware problems are concerned. On the other
hand, it's also possible for corruption that originates from very
different sources to exhibit the same basic inconsistencies and
symptoms.

I've noticed that SLRU corruption is often a leading indicator of
general storage problems. The inconsistencies between certain SLRU
state and the heap happens to be far easier to notice in practice,
particularly when VACUUM runs. But it's not fundamentally different to
inconsistencies from pages within one single main fork of some heap
relation.

> > I personally don't recall seeing that. If it happened, the segfaults
> > themselves probably wouldn't be the main concern.
>
> I don't really agree. Hypothetically speaking, suppose you corrupt
> your only copy of a critical table in such a way that every time you
> select from it, the system seg faults. A user in this situation might
> ask questions like:

I agree that that could be a problem. But that's not what I've seen
happen in production systems myself.

Maybe there is some low hanging fruit here. Perhaps we can make the
real PageGetItemId() a little closer to PageGetItemIdCareful() without
noticeable overhead, as I suggested already. Are there any real
generalizations that we can make about why backends segfault with
corrupt data? Maybe there is. That seems important.

> Slightly off-topic here, but I think our error reporting in this area
> is pretty lame. I've learned over the years that when a customer
> reports that they get a complaint about a too-large memory allocation
> every time they access a table, they've probably got a corrupted
> varlena header.

I certainlt learned the same lesson in the same way.

> However, that's extremely non-obvious to a typical
> user. We should try to report errors indicative of corruption in a way
> that gives the user some clue that corruption has happened. Peter made
> a stab at improving things there by adding
> errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of
> users will never see the error code, only the message, and a lot of
> corruption produces still produces errors that weren't changed by that
> commit.

The theory is that "can't happen" errors having an errcode that should
be considered similar to or equivalent to ERRCODE_DATA_CORRUPTED. I
doubt that it works out that way in practice, though.

-- 
Peter Geoghegan

Re: new heapcheck contrib module

2020-05-14 Thread Alvaro Herrera

On 2020-May-14, Robert Haas wrote:

> I have a question about what you mean here by "arbitrarily."
> 
> If you mean that we shouldn't have the buildfarm run the proposed heap
> corruption checker against heap pages full of randomly-generated
> garbage, I tend to agree. Such a test wouldn't be very stable and
> might fail in lots of low-probability ways that could require
> unreasonable effort to find and fix.

This is what I meant.  I was thinking of blocks generated randomly.

> If you mean that we shouldn't have the buildfarm run the proposed heap
> corruption checker against any corrupted heap pages at all, I tend to
> disagree.

Yeah, IMV those would not be arbitrarily corrupted -- instead they're
crafted to be corrupted in some specific way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Our naming of wait events is a disaster.

2020-05-14 Thread Robert Haas

On Thu, May 14, 2020 at 2:54 PM Tom Lane  wrote:
> Well, we could solve this problem very easily by ripping out everything
> having to do with wait-state monitoring ... and personally I'd be a lot
> in favor of that, because I haven't seen anything about either the
> design or documentation of the feature that I thought was very well
> done.

Well, I'm going to disagree with that, but opinions can vary. If I'd
tried to create naming consistency here when I created this stuff, I
would've had to rename things in existing systems rather than just
expose what was already there, and that wasn't the goal of the patch,
and I don't see a very good reason why it should have been. Providing
information is a separate project from cleaning up naming. And, while
I don't love the fact that people have added new things without trying
very hard to be consistent with existing things all that much, I still
don't think inconsistent naming rises to the level of a disaster.

> However, if you'd like to have wait-state monitoring, and you'd
> like the documentation for it to be more useful than "go read the code",
> then I don't see any way around the conclusion that there are going to
> be centralized lists of the possible wait states.
>
> That being the case, refusing to use a centralized list in the
> implementation seems rather pointless; and having some aspects of the
> implementation use centralized lists (see the enums in lwlock.h and
> elsewhere) while other aspects don't is just schizophrenic.

There's something to that argument, especially it enable us to
auto-generate the documentation tables.

That being said, my view of this system is that it's good to document
the wait events that we have, but also that there are almost certainly
going to be cases where we can't say a whole lot more than "go read
the code," or at least not without an awful lot of work. I think
there's a reasonable chance that someone who sees a lot of ClientRead
or DataFileWrite wait events will have some idea what kind of problem
is indicated, even without consulting the documentation and even
moreso if we have some good documentation which they can consult. But
I don't know what anybody's going to do if they see a lot of
OldSerXidLock or AddinShmemInitLock contention. Presumably those are
cases that the developers thought were unlikely, or they'd have chosen
a different locking regimen. If they were wrong, I think it's a good
thing for users to have a relatively easy way to find that out, but
I'm not sure what anybody's going to do be able to do about it without
patching the code, or at least looking at it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SLRU statistics

2020-05-14 Thread Tom Lane

Robert Haas  writes:
> I'm confused by why SLRU statistics are reported by messages sent to
> the stats collector rather than by just directly updating shared
> memory.

It would be better to consider that as an aspect of the WIP stats
collector redesign, rather than inventing a bespoke mechanism for
SLRU stats that's outside the stats collector (and, no doubt,
would have its own set of bugs).  We don't need to invent even more
pathways for this sort of data.

regards, tom lane

Re: new heapcheck contrib module

2020-05-14 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-May-14, Robert Haas wrote:
>> If you mean that we shouldn't have the buildfarm run the proposed heap
>> corruption checker against heap pages full of randomly-generated
>> garbage, I tend to agree. Such a test wouldn't be very stable and
>> might fail in lots of low-probability ways that could require
>> unreasonable effort to find and fix.

> This is what I meant.  I was thinking of blocks generated randomly.

Yeah, -1 for using random data --- when it fails, how you gonna
reproduce the problem?

>> If you mean that we shouldn't have the buildfarm run the proposed heap
>> corruption checker against any corrupted heap pages at all, I tend to
>> disagree.

> Yeah, IMV those would not be arbitrarily corrupted -- instead they're
> crafted to be corrupted in some specific way.

I think there's definitely value in corrupting data in some predictable
(reproducible) way and verifying that the check code catches it and
responds as expected.  Sure, this will not be 100% coverage, but it'll be
a lot better than 0% coverage.

regards, tom lane

Re: Our naming of wait events is a disaster.

2020-05-14 Thread Tom Lane

Robert Haas  writes:
> That being said, my view of this system is that it's good to document
> the wait events that we have, but also that there are almost certainly
> going to be cases where we can't say a whole lot more than "go read
> the code," or at least not without an awful lot of work.

Can't disagree with that.

> I think
> there's a reasonable chance that someone who sees a lot of ClientRead
> or DataFileWrite wait events will have some idea what kind of problem
> is indicated, even without consulting the documentation and even
> moreso if we have some good documentation which they can consult. But
> I don't know what anybody's going to do if they see a lot of
> OldSerXidLock or AddinShmemInitLock contention.

I submit that at least part of the problem is precisely one of crappy
naming.  I didn't know what OldSerXidLock did either, until yesterday
when I dug into the code to find out.  If it's named something like
"SerialSLRULock", then at least somebody who has heard of SLRUs will
have an idea of what is indicated.  And we are exposing the notion
of SLRUs pretty prominently in the monitoring docs as of v13, so that's
not an unreasonable presumption.

regards, tom lane

Re: new heapcheck contrib module

2020-05-14 Thread Peter Eisentraut


On 2020-05-11 19:21, Mark Dilger wrote:

1) A new module, pg_amcheck, which includes a command line client for checking 
a database or subset of a database.  Internally it functions by querying the 
database for a list of tables which are appropriate given the command line 
switches, and then calls amcheck's functions to validate each table and/or 
index.  The options for selecting/excluding tables and schemas is patterned on 
pg_dump, on the assumption that interface is already familiar to users.


Why is this useful over just using the extension's functions via psql?

I suppose you could make an argument for a command-line wrapper around 
almost every admin-focused contrib module (pageinspect, pg_prewarm, 
pgstattuple, ...), but that doesn't seem very sensible.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Our naming of wait events is a disaster.

2020-05-14 Thread Robert Haas

On Thu, May 14, 2020 at 3:58 PM Tom Lane  wrote:
> I submit that at least part of the problem is precisely one of crappy
> naming.  I didn't know what OldSerXidLock did either, until yesterday
> when I dug into the code to find out.  If it's named something like
> "SerialSLRULock", then at least somebody who has heard of SLRUs will
> have an idea of what is indicated.  And we are exposing the notion
> of SLRUs pretty prominently in the monitoring docs as of v13, so that's
> not an unreasonable presumption.

To the extent that exposing some of this information causes us to
think more carefully about the naming, I think that's all to the good.
I don't expect such measures to solve all of our problems in this
area, but the idea that we can choose names with no consideration to
whether anybody can understand what they mean is wrong even when the
audience is only developers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel copy

2020-05-14 Thread Robert Haas

On Thu, May 14, 2020 at 2:18 AM Amit Kapila  wrote:
> To support that, we need to consider a few things.
> a. Currently, we increment the command counter each time we take a key
> share lock on a tuple during trigger execution.  I am really not sure
> if this is required during Copy command execution or we can just
> increment it once for the copy.   If we need to increment the command
> counter just once for copy command then for the parallel copy we can
> ensure that we do it just once at the end of the parallel copy but if
> not then we might need some special handling.

My sense is that it would be a lot more sensible to do it at the
*beginning* of the parallel operation. Once we do it once, we
shouldn't ever do it again; that's how it works now. Deferring it
until later seems much more likely to break things.

> b.  Another point is that after inserting rows we record CTIDs of the
> tuples in the event queue and then once all tuples are processed we
> call FK trigger for each CTID.  Now, with parallelism, the FK checks
> will be processed once the worker processed one chunk.  I don't see
> any problem with it but still, this will be a bit different from what
> we do in serial case.  Do you see any problem with this?

I think there could be some problems here. For instance, suppose that
there are two entries for different workers for the same CTID. If the
leader were trying to do all the work, they'd be handled
consecutively. If they were from completely unrelated processes,
locking would serialize them. But group locking won't, so there you
have an issue, I think. Also, it's not ideal from a work-distribution
perspective: one worker could finish early and be unable to help the
others.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Our naming of wait events is a disaster.

2020-05-14 Thread Tom Lane

I wrote:
> Digging through the existing callers of SimpleLruInit, we have
> namecontrol locksubdir
> "async" AsyncCtlLock"pg_notify"
> "clog"  CLogControlLock "pg_xact"
> "commit_timestamp"  CommitTsControlLock "pg_commit_ts"
> "multixact_member"  MultiXactMemberControlLock  "pg_multixact/members"
> "multixact_offset"  MultiXactOffsetControlLock  "pg_multixact/offsets"
> "oldserxid" OldSerXidLock   "pg_serial"
> "subtrans"  SubtransControlLock "pg_subtrans"

> After studying that list for awhile, it seems to me that we could
> do worse than to name the SLRUs to match their on-disk subdirectories,
> which are names that are already user-visible.  So I propose these
> base names for the SLRUs:

> Notify
> Xact
> CommitTs
> MultiXactMember  (or MultiXactMembers)
> MultiXactOffset  (or MultiXactOffsets)
> Serial
> Subtrans

As POC for this, here's a draft patch to rename the "async" SLRU and
associated locks.  If people are good with this then I'll go through
and do similarly for the other SLRUs.

A case could be made for doing s/async/notify/ more widely in async.c;
for instance it's odd that the struct protected by NotifyQueueLock
didn't get renamed to NotifyQueueControl.  But that seems a bit out
of scope for the immediate problem, and anyway I'm not sure how far to
take it.  I don't really want to rename async.c's externally-visible
functions, for instance.  For the moment I just renamed symbols used
in the SimpleLruInit() call.

regards, tom lane

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 87502a4..27e7591 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1754,12 +1754,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   Waiting to manage space allocation in shared memory.
  
  
-  AsyncCtlLock
-  Waiting to read or update shared notification state.
+  NotifySLRULock
+  Waiting to read or update state of the Notify SLRU cache.
  
  
-  AsyncQueueLock
-  Waiting to read or update notification messages.
+  NotifyQueueLock
+  Waiting to read or update NOTIFY messages.
  
  
   AutoFileLock
@@ -1941,8 +1941,8 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   Waiting to allocate or assign a transaction id.
  
  
-  async
-  Waiting for I/O on an async (notify) buffer.
+  NotifyBuffer
+  Waiting for I/O on a Notify SLRU buffer.
  
  
   buffer_content
@@ -4484,7 +4484,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 Resets statistics to zero for a single SLRU cache, or for all SLRUs in
 the cluster.  If the argument is NULL, all counters shown in
 the pg_stat_slru view for all SLRU caches are
-reset.  The argument can be one of async,
+reset.  The argument can be one of Notify,
 clog, commit_timestamp,
 multixact_offset,
 multixact_member, oldserxid, or
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4613bd1..a3ba88d 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -107,7 +107,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_ASYNC_BUFFERS)
+ * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -225,7 +225,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_ASYNC_BUFFERS, to ensure that backends
+ * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -244,7 +244,7 @@ typedef struct QueueBackendStatus
 /*
  * Shared memory state for LISTEN/NOTIFY (excluding its SLRU stuff)
  *
- * The AsyncQueueControl structure is protected by the AsyncQueueLock.
+ * The AsyncQueueControl structure is protected by the NotifyQueueLock.
  *
  * When holding the lock in SHARED mode, backends may only inspect their own
  * entries as well as the head and tail pointers. Consequently we can allow a
@@ -254,9 +254,9 @@ typedef struct QueueBackendStatus
  * When holding the lock in EXCLUSIVE mode, backends can inspect the e

Re: Our naming of wait events is a disaster.

2020-05-14 Thread Alvaro Herrera

On 2020-May-14, Tom Lane wrote:

> A case could be made for doing s/async/notify/ more widely in async.c;
> for instance it's odd that the struct protected by NotifyQueueLock
> didn't get renamed to NotifyQueueControl.  But that seems a bit out
> of scope for the immediate problem, and anyway I'm not sure how far to
> take it.  I don't really want to rename async.c's externally-visible
> functions, for instance.  For the moment I just renamed symbols used
> in the SimpleLruInit() call.

That approach seems fine -- we'd only rename those things if and when we
modified them for other reasons; and the file itself, probably not at all.
Much like our renaming of XLOG to WAL, we changed the user-visible term
all at once, but the code kept the original names until changed.

Maybe in N years, when the SCM tooling is much better (so that it
doesn't get confused by us having renamed the file in the newer branches
and back-patching to an older branch), we can rename xlog.c to wal.c and
async.c to notify.c.  Or maybe not.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG 13 release notes, first draft

2020-05-14 Thread Peter Eisentraut


On 2020-05-12 02:41, Justin Pryzby wrote:

I'm not opposed to including it, but I think it's still true that the user
doesn't need to know in advance that the error message will be additionally
helpful in the event of corruption.  If we were to include more "error" items,
we might also include these:

71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error 
reporting


This is actually a legitimate user-visible feature and should be listed 
somewhere.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG 13 release notes, first draft

2020-05-14 Thread Peter Geoghegan

On Thu, May 14, 2020 at 2:02 PM Peter Eisentraut
 wrote:
> On 2020-05-12 02:41, Justin Pryzby wrote:
> > I'm not opposed to including it, but I think it's still true that the user
> > doesn't need to know in advance that the error message will be additionally
> > helpful in the event of corruption.  If we were to include more "error" 
> > items,
> > we might also include these:
> >
> > 71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error 
> > reporting
>
> This is actually a legitimate user-visible feature and should be listed
> somewhere.

+1 -- it's very handy. Plus it has user-facing knobs.

-- 
Peter Geoghegan

Re: PG 13 release notes, first draft

2020-05-14 Thread Justin Pryzby

On Thu, May 14, 2020 at 11:01:51PM +0200, Peter Eisentraut wrote:
> On 2020-05-12 02:41, Justin Pryzby wrote:
> > I'm not opposed to including it, but I think it's still true that the user
> > doesn't need to know in advance that the error message will be additionally
> > helpful in the event of corruption.  If we were to include more "error" 
> > items,
> > we might also include these:
> > 
> > 71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error 
> > reporting
> 
> This is actually a legitimate user-visible feature and should be listed
> somewhere.

On Thu, May 14, 2020 at 02:02:52PM -0700, Peter Geoghegan wrote:
> +1 -- it's very handy. Plus it has user-facing knobs.

That's already included:

| Allow function call backtraces of errors to be logged (Peter Eisentraut, 
Álvaro Herrera)
| Server variable backtrace_functions specifies which C functions should 
generate backtraces on error.

I 1) I failed to double check my list; and, 2) intended for that to be
interpretted as items which could be moved to a separate "error reporting"
section of the release notes.

-- 
Justin

Re: new heapcheck contrib module

2020-05-14 Thread Mark Dilger




> On May 14, 2020, at 1:02 PM, Peter Eisentraut 
>  wrote:
> 
> On 2020-05-11 19:21, Mark Dilger wrote:
>> 1) A new module, pg_amcheck, which includes a command line client for 
>> checking a database or subset of a database.  Internally it functions by 
>> querying the database for a list of tables which are appropriate given the 
>> command line switches, and then calls amcheck's functions to validate each 
>> table and/or index.  The options for selecting/excluding tables and schemas 
>> is patterned on pg_dump, on the assumption that interface is already 
>> familiar to users.
> 
> Why is this useful over just using the extension's functions via psql?

The tool doesn't hold a single snapshot or transaction for the lifetime of 
checking the entire database.  A future improvement to the tool might add 
parallelism.  Users could do all of this in scripts, but having a single tool 
with the most commonly useful options avoids duplication of effort.
 
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SEQUENCE values (duplicated) in some corner cases when crash happens

2020-05-14 Thread Jeff Janes

On Wed, May 6, 2020 at 1:52 PM Jeremy Schneider  wrote:

> The behavior we're observing is that a nextval() call in a committed
>
transaction is not crash-safe. This was discovered because some
> applications were using nextval() to get a guaranteed unique sequence
> number [or so they thought], then the application did some processing
> with the value and later stored it in a relation of the same database.
>
> The nextval() number was not used until the transaction was committed -
>

I don't know what this line means.  You said it was stored in a relation,
surely that needs to have happened through some command which preceded the
commit chronologically, though formally they may have happened atomically.

> but then the fact of a value being generated, returned and committed was
> lost on crash. The nextval() call used in isolation did not seem to
> provide durability.
>

Are you clarifying the original complaint, or this a new, different
complaint? Vini's test cases don't include any insertions.  Do you have
test cases that can reproduce your complaint?

Cheers,

Jeff

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Mark Dilger

> On May 14, 2020, at 11:34 AM, Ranier Vilela  wrote:
> 
> htup->t_ctid = target_tid; 
> htup->t_ctid = newtid;
> Both target_tid and newtid are local variable, whe loss scope, memory is 
> garbage.

Ok, thanks for the concrete example of what is bothering you.

In htup_details, I see that struct HeapTupleHeaderData has a field named t_ctid 
of type struct ItemPointerData.  I also see in heapam that target_tid is of 
type ItemPointerData.  The

htup->t_ctid = target_tid

copies the contents of target_tid.  By the time target_tid goes out of scope, 
the contents are already copied.  I would share your concern if t_ctid were of 
type ItemPointer (aka ItemPointerData *) and the code said

htup->t_ctid = &target_tid

but it doesn't say that, so I don't see the issue.  

Also in heapam, I see that newtid is likewise of type ItemPointerData, so the 
same logic applies.  By the time newtid goes out of scope, its contents have 
already been copied into t_ctid, so there is no problem.

But maybe you know all that and are just complaining that the name 
"ItemPointerData" sounds like a pointer rather than a struct?  I'm still 
unclear whether you believe this is a bug, or whether you just don't like the 
naming that is used.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SEQUENCE values (duplicated) in some corner cases when crash happens

2020-05-14 Thread Alvaro Herrera

On 2020-May-14, Jeremy Schneider wrote:

> "Later stored it in the table" - I'd have to double check with the other
> team, but IIUC it was application pseudo-code like this:
> 
>   * execute SQL "select nextval()" and store result in
> my_local_variable_unique_id
>   * commit

Yes, simply inserting the sequence value in a (logged!) dummy table
before this commit, as you suggest, should fix this problem.  The insert
ensures that the transaction commit is flushed to WAL.  The table need
not have indexes, making the insert faster.  Just make sure to truncate
the table every now and then.

+1 to documenting this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Mark Dilger

> On May 14, 2020, at 11:34 AM, Ranier Vilela  wrote:
> 
> Certainly.
> In the same file you can find the appropriate use of the API.
> ItemPointerSet(&heapTuple->t_self, blkno, offnum);

It took a couple reads through your patch to figure out what you were trying to 
accomplish, and I think you are uncomfortable with assigning one 
ItemPointerData variable from another.ItemPointerData is just a struct with 
three int16 variables.  To make a standalone program that has the same 
structure without depending on any postgres headers, I'm using "short int" 
instead of "int16" and structs "TwoData" and "ThreeData" that are analogous to 
BlockIdData and OffsetNumber.

#include 

typedef struct TwoData {
short int a;
short int b;
} TwoData;

typedef struct ThreeData {
TwoData left;
short int right;
} ThreeData;

int main(int argc, char **argv)
{
ThreeData x = { { 5, 10 }, 15 };
ThreeData y = x;
x.left.a = 0;
x.left.b = 1;
x.right = 2;

printf("y = { { %d, %d }, %d }\n",
y.left.a, y.left.b, y.right);

return 0;
}

If you compile and run this, you'll notice it outputs:

y = { { 5, 10 }, 15 }

and not the { { 0, 1}, 2 } that you would expect if y were merely pointing at x.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Ranier Vilela

Em qui., 14 de mai. de 2020 às 19:23, Mark Dilger <
mark.dil...@enterprisedb.com> escreveu:

>
>
> > On May 14, 2020, at 11:34 AM, Ranier Vilela  wrote:
> >
> > htup->t_ctid = target_tid;
> > htup->t_ctid = newtid;
> > Both target_tid and newtid are local variable, whe loss scope, memory is
> garbage.
>
> Ok, thanks for the concrete example of what is bothering you.
>
> In htup_details, I see that struct HeapTupleHeaderData has a field named
> t_ctid of type struct ItemPointerData.  I also see in heapam that
> target_tid is of type ItemPointerData.  The
>
> htup->t_ctid = target_tid
>
> copies the contents of target_tid.  By the time target_tid goes out of
> scope, the contents are already copied.  I would share your concern if
> t_ctid were of type ItemPointer (aka ItemPointerData *) and the code said
>
> htup->t_ctid = &target_tid
>
> but it doesn't say that, so I don't see the issue.
>
Even if the patch simplifies and uses the API to make the assignments.
Really, the central problem does not exist, my fault.
Perhaps because it has never made such use, structure assignment.
And I failed to do research on the subject before.
Sorry.


>
> Also in heapam, I see that newtid is likewise of type ItemPointerData, so
> the same logic applies.  By the time newtid goes out of scope, its contents
> have already been copied into t_ctid, so there is no problem.
>
> But maybe you know all that and are just complaining that the name
> "ItemPointerData" sounds like a pointer rather than a struct?  I'm still
> unclear whether you believe this is a bug, or whether you just don't like
> the naming that is used.
>
My concerns were about whether attribution really would copy the
structure's content and not just its address.
The name makes it difficult, but that was not the point.

The tool warned about uninitialized variable, which I mistakenly deduced
for loss of scope.

Thank you very much for the clarifications and your time.
We never stopped learning, and using structure assignment was a new
learning experience.

regards
Ranier Vilela

Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)

2020-05-14 Thread Ranier Vilela

Em qui., 14 de mai. de 2020 às 19:49, Mark Dilger <
mark.dil...@enterprisedb.com> escreveu:

>
>
> > On May 14, 2020, at 11:34 AM, Ranier Vilela  wrote:
> >
> > Certainly.
> > In the same file you can find the appropriate use of the API.
> > ItemPointerSet(&heapTuple->t_self, blkno, offnum);
>
> It took a couple reads through your patch to figure out what you were
> trying to accomplish, and I think you are uncomfortable with assigning one
> ItemPointerData variable from another.ItemPointerData is just a struct
> with three int16 variables.  To make a standalone program that has the same
> structure without depending on any postgres headers, I'm using "short int"
> instead of "int16" and structs "TwoData" and "ThreeData" that are analogous
> to BlockIdData and OffsetNumber.
>
> #include 
>
> typedef struct TwoData {
> short int a;
> short int b;
> } TwoData;
>
> typedef struct ThreeData {
> TwoData left;
> short int right;
> } ThreeData;
>
> int main(int argc, char **argv)
> {
> ThreeData x = { { 5, 10 }, 15 };
> ThreeData y = x;
> x.left.a = 0;
> x.left.b = 1;
> x.right = 2;
>
> printf("y = { { %d, %d }, %d }\n",
> y.left.a, y.left.b, y.right);
>
> return 0;
> }
>
> If you compile and run this, you'll notice it outputs:
>
> y = { { 5, 10 }, 15 }
>
> and not the { { 0, 1}, 2 } that you would expect if y were merely pointing
> at x.
>
Thanks for the example.
But what I wanted to test was
struct1 = struct2;
Both being of the same type of structure.

What I wrongly deduced was that the address of struct2 was saved and not
its content.

Again, thanks for your time and clarification.

regards,
Ranier Vilela

Re: Event trigger code comment duplication

2020-05-14 Thread Michael Paquier

On Wed, May 13, 2020 at 04:48:59PM +0900, Michael Paquier wrote:
> Still not sure that's worth bothering.  So, let's wait a couple of
> days first to see if anybody has any comments, though I'd like to just
> go with the simplest solution at hand and remove only the duplicated
> comment about the standalone business with event triggers.

I have gone through this one again this morning, and applied the
simple version as merging the checks on the current event trigger
state would make us lose some context about why each event gets
skipped.
--
Michael


signature.asc
Description: PGP signature

Re: Event trigger code comment duplication

2020-05-14 Thread David G. Johnston

On Thu, May 14, 2020 at 4:25 PM Michael Paquier  wrote:

> On Wed, May 13, 2020 at 04:48:59PM +0900, Michael Paquier wrote:
> > Still not sure that's worth bothering.  So, let's wait a couple of
> > days first to see if anybody has any comments, though I'd like to just
> > go with the simplest solution at hand and remove only the duplicated
> > comment about the standalone business with event triggers.
>
> I have gone through this one again this morning, and applied the
> simple version as merging the checks on the current event trigger
> state would make us lose some context about why each event gets
> skipped.
>
>
Ok.  Thanks!

David J.

Re: POC: GROUP BY optimization

2020-05-14 Thread Tomas Vondra


Hi,

I wonder if anyone has plans to try again with this optimization in v14
cycle? The patches no longer apply thanks to the incremental sort patch,
but I suppose fixing that should not be extremely hard.

The 2020-07 CF is still a couple weeks away, but it'd be good to know if
there are any plans to revive this. I'm willing to spend some time on
reviewing / testing this, etc.


I've only quickly skimmed the old thread, but IIRC there were two main
challenges in getting the optimization right:


1) deciding which orderings are interesting / worth additional work

I think we need to consider these orderings, in addition to the one
specified in GROUP BY:

1) as specified in ORDER BY (if different from 1)

2) one with cheapest sort on unsorted input (depending on number of
distinct values, cost of comparisons, etc.)

3) one with cheapest sort on partially sorted input (essentially what we
do with the incremental sort paths, but matching the pathkeys in a more
elaborate way)


2) costing the alternative orderings

I think we've already discussed various ways to leverage as much
available info as possible (extended stats, MCVs, ...) but I think the
patch only does some of it.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

2020-05-14 Thread Alvaro Herrera

Thanks everybody.  I have compiled together all the suggestions and the
result is in the attached patch.  Some of it is of my own devising.

* I changed "instance", and made "cluster" be mostly a synonym of that.

* I removed "global SQL object" and made "SQL object" explain it.

* Added definitions for ACID, sequence, bloat, fork, FSM, VM, data page,
  transaction ID, epoch.

* Changed "a SQL" to "an sql" everywhere.

* Sorted alphabetically.

* Removed caps in term names.

I think I should get this pushed, and if there are further suggestions,
they're welcome.

Dim Fontaine and others suggested a number of terms that could be
included; see https://twitter.com/alvherre/status/1246192786287865856

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..b05c065546 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -766,7 +766,7 @@
 XID
 
  
-  Transaction Identifier
+  Transaction identifier
  
 

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8c6cb6e942..d4255215aa 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -7,8 +7,22 @@
  
 
  
+  
+   ACID
+   
+
+ Atomicity,
+ consistency,
+ isolation, and
+ durability.
+ A set of properties of database transactions intended to guarantee validity
+ in concurrent operation and even in event of errors, power failures, etc.
+ 
+   
+  
+
   
-   Aggregate Function
+   Aggregate function

 
  A function that
@@ -35,11 +49,15 @@
  to make decisions about how to execute
  queries.
 
+
+ (Don't confuse this term with the ANALYZE option
+ to the  command.)
+

   
 
   
-   Analytic Function
+   Analytic function

   
 
@@ -106,8 +124,8 @@

 
  Process of an instance
- which act on behalf of client sessions
- and handle their requests.
+ which acts on behalf of a client session
+ and handles its requests.
 
 
  (Don't confuse this term with the similar terms
@@ -118,7 +136,7 @@
   
 
   
-   Background Worker (process)
+   Background worker (process)

 
  Process within an instance,
@@ -138,10 +156,11 @@
   
 
   
-   Background Writer (process)
+   Background writer (process)

 
- A process that continuously writes dirty pages from
+ A process that continuously writes dirty
+ data pages from
  shared memory to
  the file system.  It wakes up periodically, but works only for a short
  period in order to distribute its expensive I/O
@@ -155,6 +174,16 @@

   
 
+  
+   Bloat
+   
+
+ Space in data pages which does not contain relevant data,
+ such as unused (free) space or outdated row versions.
+
+   
+  
+
   
Cast

@@ -190,7 +219,7 @@
   
 
   
-   Check Constraint
+   Check constraint

 
  A type of constraint
@@ -208,15 +237,6 @@

   
 
-  
-   Checkpointer (process)
-   
-
- A specialized process responsible for executing checkpoints.
-
-   
-  
-
   
Checkpoint

@@ -244,6 +264,15 @@

   
 
+  
+   Checkpointer (process)
+   
+
+ A specialized process responsible for executing checkpoints.
+
+   
+  
+
   
Class (archaic)

@@ -262,27 +291,6 @@

   
 
-  
-   Cluster
-   
-
- A group of databases plus their
- global SQL objects. The
- cluster is managed by exactly one
- instance. A newly created
- Cluster will have three databases created automatically. They are
- template0, template1, and
- postgres. It is expected that an application will
- create one or more additional database aside from these three.
-
-
- (Don't confuse the PostgreSQL-specific term
- Cluster with the SQL
- command CLUSTER).
-
-   
-  
-
   
Column

@@ -363,7 +371,10 @@

 
  A restriction on the values of data allowed within a
- Table.
+ table,
+ or in attributes of a
+ 
+ domain.
 
 
  For more information, see
@@ -373,18 +384,18 @@
   
 
   
-   Data Area
+   Data area

   
 
   
-   Data Directory
+   Data directory

 
  The base directory on the filesystem of a
  server that contains all
  data files and subdirectories associated with a
- cluster with the
+ instance with the
  exception of tablespaces.
  The environment variable PGDATA is commonly used to
  refer to the
@@ -416,15 +427,31 @@
   
 
   
-   Database Server
+   Database server

   
 
+  
+   Data page
+   
+
+ The basic structure used to store relation data.
+ All pages are of the same size.
+ Data pages are typically stored on disk, each in a specific file,
+ and can be read to shared buffers
+ where they can be modified, be

Re: MultiXact\SLRU buffers configuration

2020-05-14 Thread Kyotaro Horiguchi

At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin"  
wrote in 
> > GetMultiXactIdMembers believes that 4 is successfully done if 2
> > returned valid offset, but actually that is not obvious.
> > 
> > If we add a single giant lock just to isolate ,say,
> > GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
> > unnecessarily.  Perhaps we need finer-grained locking-key for standby
> > that works similary to buffer lock on primary, that doesn't cause
> > confilicts between irrelevant mxids.
> > 
> We can just replay members before offsets. If offset is already there - 
> members are there too.
> But I'd be happy if we could mitigate those 1000us too - with a hint about 
> last maixd state in a shared MX state, for example.

Generally in such cases, condition variables would work.  In the
attached PoC, the reader side gets no penalty in the "likely" code
path.  The writer side always calls ConditionVariableBroadcast but the
waiter list is empty in almost all cases.  But I couldn't cause the
situation where the sleep 1000u is reached.

> Actually, if we read empty mxid array instead of something that is replayed 
> just yet - it's not a problem of inconsistency, because transaction in this 
> mxid could not commit before we started. ISTM.
> So instead of fix, we, probably, can just add a comment. If this reasoning is 
> correct.

The step 4 of the reader side reads the members of the target mxid. It
is already written if the offset of the *next* mxid is filled-in.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index e2aa5c9ce4..9db8f6cddd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -82,6 +82,7 @@
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "storage/lmgr.h"
 #include "storage/pmsignal.h"
@@ -233,6 +234,7 @@ typedef struct MultiXactStateData
 	/* support for members anti-wraparound measures */
 	MultiXactOffset offsetStopLimit;	/* known if oldestOffsetKnown */
 
+	ConditionVariable nextoff_cv;
 	/*
 	 * Per-backend data starts here.  We have two arrays stored in the area
 	 * immediately following the MultiXactStateData struct. Each is indexed by
@@ -873,6 +875,14 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	/* Exchange our lock */
 	LWLockRelease(MultiXactOffsetControlLock);
 
+	/*
+	 *  Let everybody know the offset of this mxid is recorded now. The waiters
+	 *  are waiting for the offset of the mxid next of the target to know the
+	 *  number of members of the target mxid, so we don't need to wait for
+	 *  members of this mxid are recorded.
+	 */
+	ConditionVariableBroadcast(&MultiXactState->nextoff_cv);
+
 	LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
 
 	prev_pageno = -1;
@@ -1367,9 +1377,19 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
+
+			/*
+			 * The recorder of the next mxid is just before writing the offset.
+			 * Wait for the offset to be written.
+			 */
+			ConditionVariablePrepareToSleep(&MultiXactState->nextoff_cv);
+
 			LWLockRelease(MultiXactOffsetControlLock);
 			CHECK_FOR_INTERRUPTS();
-			pg_usleep(1000L);
+
+			ConditionVariableSleep(&MultiXactState->nextoff_cv,
+   WAIT_EVENT_WAIT_NEXT_MXMEMBERS);
+			ConditionVariableCancelSleep();
 			goto retry;
 		}
 
@@ -1847,6 +1867,7 @@ MultiXactShmemInit(void)
 
 		/* Make sure we zero out the per-backend state */
 		MemSet(MultiXactState, 0, SHARED_MULTIXACT_STATE_SIZE);
+		ConditionVariableInit(&MultiXactState->nextoff_cv);
 	}
 	else
 		Assert(found);
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0ecd29a1d9..1ac6b37188 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3879,6 +3879,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_SYNC_REP:
 			event_name = "SyncRep";
 			break;
+		case WAIT_EVENT_WAIT_NEXT_MXMEMBERS:
+			event_name = "Mact/WaitNextXactMembers";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ae9a39573c..e79bba0bef 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -886,7 +886,8 @@ typedef enum
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
 	WAIT_EVENT_SAFE_SNAPSHOT,
-	WAIT_EVENT_SYNC_REP
+	WAIT_EVENT_SYNC_REP,
+	WAIT_EVENT_WAIT_NEXT_MXMEMBERS
 } WaitEventIPC;
 
 /* --

Re: Add "-Wimplicit-fallthrough" to default flags

2020-05-14 Thread Andy Fan

On Wed, May 13, 2020 at 10:02 PM Tom Lane  wrote:

> Andy Fan  writes:
> >> FWIW, I got a warning for jsonpath_gram.c.
>
> Ugh.  Confirmed here on Fedora 30 (bison 3.0.5).
>
> > I just found this just serval minutes ago.  Upgrading your bison to the
> > latest version (3.6) is ok. I'd like we have a better way to share this
> > knowledge through.  I spend ~30 minutes to troubleshooting this issue.
>
> I fear that is going to mean that we revert this patch.
> We are *NOT* moving the minimum bison requirement for this,
> especially not to a bleeding-edge bison version.


Yes,  I didn't mean revert the patch, but I was thinking moving the minimum
bison.  But since down to the warning level 3 also resolved the issue,
looks it is a better way to do it.

  (On the other hand, if you have an old bison,
>
you likely also have an old gcc that doesn't know this warning
> switch, so maybe it'd be all right in practice?)
>
>
I just use an old bision and a newer gcc:(  and I used "echo "COPT=-Wall
-Werror"
> src/Makefile.custom" which is same as our cfbot system.  Thank you all
for so quick
fix!

Best Regards
Andy Fan

Re: Add explanations which are influenced by track_io_timing

2020-05-14 Thread Atsushi Torikoshi

Thanks for reviewing!

On Wed, May 13, 2020 at 11:27 PM Fujii Masao 
wrote:

> What about the attached patch based on yours?


It looks better.

Regards,

--
Atsushi Torikoshi

Re: Add A Glossary

2020-05-14 Thread Justin Pryzby

On Thu, May 14, 2020 at 08:00:17PM -0400, Alvaro Herrera wrote:
> +   ACID
> +   
> +
> + Atomicity,
> + consistency,
> + isolation, and
> + durability.
> + A set of properties of database transactions intended to guarantee 
> validity
> + in concurrent operation and even in event of errors, power failures, 
> etc.

I would capitalize Consistency, Isolation, Durability, and say "These four
properties" or "This set of four properties" (althought that makes this sounds
more like a fun game of DBA jeopardy).

> +   Background writer (process)
> 
>  
> - A process that continuously writes dirty pages from
> + A process that continuously writes dirty

I don't like "continuously"

> + data pages from
>  
> +  
> +   Bloat
> +   
> +
> + Space in data pages which does not contain relevant data,
> + such as unused (free) space or outdated row versions.

"current row versions" instead of relevant ?

> +  
> +   Data page
> +   
> +
> + The basic structure used to store relation data.
> + All pages are of the same size.
> + Data pages are typically stored on disk, each in a specific file,
> + and can be read to shared 
> buffers
> + where they can be modified, becoming
> + dirty.  They get clean by being written down

say "They become clean when written to disk"

> + to disk.  New pages, which initially exist in memory only, are also
> + dirty until written.

> +  
> +   Fork
> +   
> +
> + Each of the separate segmented file sets that a relation stores its
> + data in.  There exist a main fork and two 
> secondary

"in which a relation's data is stored"

> + forks: the free space map
> + visibility map.

missing "and" ?

> +  
> +   Free space map (fork)
> +   
> +
> + A storage structure that keeps metadata about each data page in a 
> table's
> + main storage space.

s/in/of/

just say "main fork"?

> The free space map entry for each space stores the

for each page ?

> + amount of free space that's available for future tuples, and is 
> structured
> + so it is efficient to search for available space for a new tuple of a 
> given
> + size.

..to be efficiently searched to find free space..

>   The heap is realized within
> - segment files.
> + segmented files
> + in the relation's main 
> fork.

Hm, the files aren't segmented.  Say "one or more file segments per relation"

> +  There also exist local objects that do not belong to schemas; some 
> examples are
> +  extensions,
> +  data type casts, and
> +  foreign data 
> wrappers.

Don't extensions have schemas ?

> +  
> +   Transaction ID
> +   
> +
> + The numerical, unique, sequentially-assigned identifier that each
> + transaction receives when it first causes a database modification.
> + Frequently abbreviated xid.

abbreviated *as* xid

> + approximately four billion write transactions IDs can be generated;
> + to permit the system to run for longer than that would allow,

remove "would allow"

>  
>   The process of removing outdated  linkend="glossary-tuple">tuple
>   versions from tables, and other closely related

actually tables or materialized views..

> +  
> +   Visibility map (fork)
> +   
> +
> + A storage structure that keeps metadata about each data page
> + in a table's main storage space.  The visibility map entry for

s/in/of/

main fork?

-- 
Justin

Re: Transactions involving multiple postgres foreign servers, take 2

2020-05-14 Thread Masahiko Sawada

On Fri, 15 May 2020 at 03:08, Muhammad Usama  wrote:
>
>
> Hi Sawada,
>
> I have just done some review and testing of the patches and have
> a couple of comments.

Thank you for reviewing!

>
> 1- IMHO the PREPARE TRANSACTION should always use 2PC even
> when the transaction has operated on a single foreign server regardless
> of foreign_twophase_commit setting, and throw an error otherwise when
> 2PC is not available on any of the data-modified servers.
>
> For example, consider the case
>
> BEGIN;
> INSERT INTO ft_2pc_1 VALUES(1);
> PREPARE TRANSACTION 'global_x1';
>
> Here since we are preparing the local transaction so we should also prepare
> the transaction on the foreign server even if the transaction has modified 
> only
> one foreign table.
>
> What do you think?

Good catch and I agree with you. The transaction should fail if it
opened a transaction on a 2pc-no-support server regardless of
foreign_twophase_commit. And I think we should prepare a transaction
on a foreign server even if it didn't modify any data on that.

>
> Also without this change, the above test case produces an assertion failure
> with your patches.
>
> 2- when deciding if the two-phase commit is required or not in
> FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
> 2PC when we have at least one server capable of doing that.
>
> i.e
>
> For FOREIGN_TWOPHASE_COMMIT_PREFER case in
> checkForeignTwophaseCommitRequired() function I think
> the condition should be
>
> need_twophase_commit = (nserverstwophase >= 1);
> instead of
> need_twophase_commit = (nserverstwophase >= 2);
>

Hmm I might be missing your point but it seems to me that you want to
use two-phase commit even in the case where a transaction modified
data on only one server. Can't we commit distributed transaction
atomically even using one-phase commit in that case?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_regress cleans up tablespace twice.

2020-05-14 Thread Michael Paquier

On Mon, May 11, 2020 at 05:13:54PM +0900, Kyotaro Horiguchi wrote:
> Tablespace directory cleanup is not done for all testing
> targets. Actually it is not done for the tools under bin/ except
> pg_upgrade.

Let's first take one problem at a time, as I can see that your patch
0002 is modifying a portion of what you added in 0001, and so let's
try to remove this WIN32 stuff from pg_regress.c.

+sub CleanupTablespaceDirectory
+{
+   my $tablespace = 'testtablespace';
+
+   rmtree($tablespace) if (-e $tablespace);
+   mkdir($tablespace);
+}
This check should use "-d" and not "-e" as it would be true for a file
as well.  Also, in pg_regress.c, we remove the existing tablespace
test directory in --outputdir, which is "." by default but it can be a
custom one.  Shouldn't you do the same logic in this new routine?  So
we should have an optional argument for the output directory that
defaults to `pwd` if not defined, no?  This means passing down the
argument only for upgradecheck() in vcregress.pl.

 sub isolationcheck
 {
chdir "../isolation";
+   CleanupTablespaceDirectory();
copy("../../../$Config/isolationtester/isolationtester.exe",
"../../../$Config/pg_isolation_regress");
my @args = (
[...]
print "\n";
print "Checking $module\n";
+   CleanupTablespaceDirectory();
my @args = (
"$topdir/$Config/pg_regress/pg_regress",
"--bindir=${topdir}/${Config}/psql",
I would put that just before the system() calls for consistency with
the rest.
--
Michael


signature.asc
Description: PGP signature

Re: pg_regress cleans up tablespace twice.

2020-05-14 Thread Michael Paquier

On Fri, May 15, 2020 at 11:58:55AM +0900, Michael Paquier wrote:
> Let's first take one problem at a time, as I can see that your patch
> 0002 is modifying a portion of what you added in 0001, and so let's
> try to remove this WIN32 stuff from pg_regress.c.

(Please note that this is not v13 material.)
--
Michael


signature.asc
Description: PGP signature

Fix a typo in slot.c

2020-05-14 Thread Masahiko Sawada

Hi,

I've attached the patch for $subject. The old comment seems to be
borrowed from WalSndShmemInit().

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fix_typo.patch
Description: Binary data

Re: Parallel copy

2020-05-14 Thread Amit Kapila

On Fri, May 15, 2020 at 1:51 AM Robert Haas  wrote:
>
> On Thu, May 14, 2020 at 2:18 AM Amit Kapila  wrote:
> > To support that, we need to consider a few things.
> > a. Currently, we increment the command counter each time we take a key
> > share lock on a tuple during trigger execution.  I am really not sure
> > if this is required during Copy command execution or we can just
> > increment it once for the copy.   If we need to increment the command
> > counter just once for copy command then for the parallel copy we can
> > ensure that we do it just once at the end of the parallel copy but if
> > not then we might need some special handling.
>
> My sense is that it would be a lot more sensible to do it at the
> *beginning* of the parallel operation. Once we do it once, we
> shouldn't ever do it again; that's how it works now. Deferring it
> until later seems much more likely to break things.
>

AFAIU, we always increment the command counter after executing the
command.  Why do we want to do it differently here?

> > b.  Another point is that after inserting rows we record CTIDs of the
> > tuples in the event queue and then once all tuples are processed we
> > call FK trigger for each CTID.  Now, with parallelism, the FK checks
> > will be processed once the worker processed one chunk.  I don't see
> > any problem with it but still, this will be a bit different from what
> > we do in serial case.  Do you see any problem with this?
>
> I think there could be some problems here. For instance, suppose that
> there are two entries for different workers for the same CTID.
>

First, let me clarify the CTID I have used in my email are for the
table in which insertion is happening which means FK table.  So, in
such a case, we can't have the same CTIDs queued for different
workers.  Basically, we use CTID to fetch the row from FK table later
and form a query to lock (in KEY SHARE mode) the corresponding tuple
in PK table.  Now, it is possible that two different workers try to
lock the same row of PK table.  I am not clear what problem group
locking can have in this case because these are non-conflicting locks.
Can you please elaborate a bit more?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Transactions involving multiple postgres foreign servers, take 2

2020-05-14 Thread Muhammad Usama

On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Fri, 15 May 2020 at 03:08, Muhammad Usama  wrote:
> >
> >
> > Hi Sawada,
> >
> > I have just done some review and testing of the patches and have
> > a couple of comments.
>
> Thank you for reviewing!
>
> >
> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even
> > when the transaction has operated on a single foreign server regardless
> > of foreign_twophase_commit setting, and throw an error otherwise when
> > 2PC is not available on any of the data-modified servers.
> >
> > For example, consider the case
> >
> > BEGIN;
> > INSERT INTO ft_2pc_1 VALUES(1);
> > PREPARE TRANSACTION 'global_x1';
> >
> > Here since we are preparing the local transaction so we should also
> prepare
> > the transaction on the foreign server even if the transaction has
> modified only
> > one foreign table.
> >
> > What do you think?
>
> Good catch and I agree with you. The transaction should fail if it
> opened a transaction on a 2pc-no-support server regardless of
> foreign_twophase_commit. And I think we should prepare a transaction
> on a foreign server even if it didn't modify any data on that.
>
> >
> > Also without this change, the above test case produces an assertion
> failure
> > with your patches.
> >
> > 2- when deciding if the two-phase commit is required or not in
> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
> > 2PC when we have at least one server capable of doing that.
> >
> > i.e
> >
> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in
> > checkForeignTwophaseCommitRequired() function I think
> > the condition should be
> >
> > need_twophase_commit = (nserverstwophase >= 1);
> > instead of
> > need_twophase_commit = (nserverstwophase >= 2);
> >
>
> Hmm I might be missing your point but it seems to me that you want to
> use two-phase commit even in the case where a transaction modified
> data on only one server. Can't we commit distributed transaction
> atomically even using one-phase commit in that case?
>
>
I think you are confusing between nserverstwophase and nserverswritten.

need_twophase_commit = (nserverstwophase >= 1)  would mean
use two-phase commit if at least one server exists in the list that is
capable of doing 2PC

For the case when the transaction modified data on only one server we
already exits the function indicating no two-phase required

if (nserverswritten <= 1)
  return false;




> Regards,
>
> --
> Masahiko Sawadahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Regards,
...
Muhammad Usama
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC

Re: Fix a typo in slot.c

2020-05-14 Thread Amit Kapila

On Fri, May 15, 2020 at 9:16 AM Masahiko Sawada
 wrote:
>
> Hi,
>
> I've attached the patch for $subject. The old comment seems to be
> borrowed from WalSndShmemInit().
>

 /*
- * Allocate and initialize walsender-related shared memory.
+ * Allocate and initialize replication slots' shared memory.
  */

How about changing it to "Allocate and initialize shared memory for
replication slots"?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Fix a typo in slot.c

2020-05-14 Thread Masahiko Sawada

On Fri, 15 May 2020 at 13:26, Amit Kapila  wrote:
>
> On Fri, May 15, 2020 at 9:16 AM Masahiko Sawada
>  wrote:
> >
> > Hi,
> >
> > I've attached the patch for $subject. The old comment seems to be
> > borrowed from WalSndShmemInit().
> >
>
>  /*
> - * Allocate and initialize walsender-related shared memory.
> + * Allocate and initialize replication slots' shared memory.
>   */
>
> How about changing it to "Allocate and initialize shared memory for
> replication slots"?
>

Agreed.

Attached the updated patch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fix_typo_v2.patch
Description: Binary data

Re: COPY, lock release and MVCC

2020-05-14 Thread Amit Kapila

On Thu, May 14, 2020 at 7:10 PM Laurenz Albe  wrote:
>
> On Thu, 2020-05-14 at 15:11 +0530, Amit Kapila wrote:
> > LGTM.  I have slightly modified the commit message, see if that looks
> > fine to you.
>
> Fine with me, thanks.
>
> > This breaks the cases where a REPEATABLE READ transaction could see an
> > empty table if it repeats a COPY statement and somebody truncated the
> > table in the meantime.
>
> I would use "case" rather than "cases" here.
>

Okay, changed, and pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Transactions involving multiple postgres foreign servers, take 2

2020-05-14 Thread Masahiko Sawada

On Fri, 15 May 2020 at 13:26, Muhammad Usama  wrote:
>
>
>
> On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada 
>  wrote:
>>
>> On Fri, 15 May 2020 at 03:08, Muhammad Usama  wrote:
>> >
>> >
>> > Hi Sawada,
>> >
>> > I have just done some review and testing of the patches and have
>> > a couple of comments.
>>
>> Thank you for reviewing!
>>
>> >
>> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even
>> > when the transaction has operated on a single foreign server regardless
>> > of foreign_twophase_commit setting, and throw an error otherwise when
>> > 2PC is not available on any of the data-modified servers.
>> >
>> > For example, consider the case
>> >
>> > BEGIN;
>> > INSERT INTO ft_2pc_1 VALUES(1);
>> > PREPARE TRANSACTION 'global_x1';
>> >
>> > Here since we are preparing the local transaction so we should also prepare
>> > the transaction on the foreign server even if the transaction has modified 
>> > only
>> > one foreign table.
>> >
>> > What do you think?
>>
>> Good catch and I agree with you. The transaction should fail if it
>> opened a transaction on a 2pc-no-support server regardless of
>> foreign_twophase_commit. And I think we should prepare a transaction
>> on a foreign server even if it didn't modify any data on that.
>>
>> >
>> > Also without this change, the above test case produces an assertion failure
>> > with your patches.
>> >
>> > 2- when deciding if the two-phase commit is required or not in
>> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use
>> > 2PC when we have at least one server capable of doing that.
>> >
>> > i.e
>> >
>> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in
>> > checkForeignTwophaseCommitRequired() function I think
>> > the condition should be
>> >
>> > need_twophase_commit = (nserverstwophase >= 1);
>> > instead of
>> > need_twophase_commit = (nserverstwophase >= 2);
>> >
>>
>> Hmm I might be missing your point but it seems to me that you want to
>> use two-phase commit even in the case where a transaction modified
>> data on only one server. Can't we commit distributed transaction
>> atomically even using one-phase commit in that case?
>>
>
> I think you are confusing between nserverstwophase and nserverswritten.
>
> need_twophase_commit = (nserverstwophase >= 1)  would mean
> use two-phase commit if at least one server exists in the list that is
> capable of doing 2PC
>
> For the case when the transaction modified data on only one server we
> already exits the function indicating no two-phase required
>
> if (nserverswritten <= 1)
>   return false;
>

Thank you for your explanation. If the transaction modified two
servers that don't' support 2pc and one server that supports 2pc I
think we don't want to use 2pc even in 'prefer' case. Because even if
we use 2pc in that case, it's still possible to have the atomic commit
problem. For example, if we failed to commit a transaction after
committing other transactions on the server that doesn't support 2pc
we cannot rollback the already-committed transaction.

On the other hand, in 'prefer' case, if the transaction also modified
the local data, we need to use 2pc even if it modified data on only
one foreign server that supports 2pc. But the current code doesn't
work fine in that case for now. Probably we also need the following
change:

@@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void)

/* Did we modify the local non-temporary data? */
if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0)
+   {
nserverswritten++;
+   nserverstwophase++;
+   }

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: COPY, lock release and MVCC

2020-05-14 Thread Laurenz Albe

On Fri, 2020-05-15 at 10:11 +0530, Amit Kapila wrote:
> Okay, changed, and pushed.

Thank you!

Yours,
Laurenz Albe

Re: Problem with logical replication

2020-05-14 Thread Michael Paquier

On Tue, May 12, 2020 at 09:45:45PM -0300, Euler Taveira wrote:
>> In any case, it seems to me that the comment of
>> build_replindex_scan_key needs to be updated.
>>
>>  * This is not generic routine, it expects the idxrel to be replication
>>  * identity of a rel and meet all limitations associated with that.
>
> It is implicit that a primary key can be a replica identity so I think this
> comment is fine.

Agreed.  I don't think either that we need to update this comment.  I
was playing with this patch and what you have here looks fine by me.
Two nits: the extra parenthesis in the assert are not necessary, and
the indentation had some diffs.  Tom has just reindented the whole
tree, so let's keep things clean.
--
Michael
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 1418746eb8..8f474faed0 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -57,7 +57,8 @@ build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
 	int2vector *indkey = &idxrel->rd_index->indkey;
 	bool		hasnulls = false;
 
-	Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel));
+	Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
+		   RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));
 
 	indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
 	Anum_pg_index_indclass, &isnull);
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index 6da7f71ca3..8c3dee8ffc 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -34,6 +34,8 @@ $node_publisher->safe_psql('postgres',
 $node_publisher->safe_psql('postgres',
 	"CREATE TABLE tab_include (a int, b text, CONSTRAINT covering PRIMARY KEY(a) INCLUDE(b))"
 );
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE tab_full_pk (a int primary key, b text)");
 # Let this table with REPLICA IDENTITY NOTHING, allowing only INSERT changes.
 $node_publisher->safe_psql('postgres', "CREATE TABLE tab_nothing (a int)");
 $node_publisher->safe_psql('postgres',
@@ -46,6 +48,10 @@ $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int)");
 $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)");
 $node_subscriber->safe_psql('postgres',
 	"CREATE TABLE tab_rep (a int primary key)");
+$node_subscriber->safe_psql('postgres',
+	"CREATE TABLE tab_full_pk (a int primary key, b text)");
+$node_subscriber->safe_psql('postgres',
+	"ALTER TABLE tab_full_pk REPLICA IDENTITY FULL");
 $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_nothing (a int)");
 
 # different column count and order than on publisher
@@ -64,7 +70,7 @@ $node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub");
 $node_publisher->safe_psql('postgres',
 	"CREATE PUBLICATION tap_pub_ins_only WITH (publish = insert)");
 $node_publisher->safe_psql('postgres',
-	"ALTER PUBLICATION tap_pub ADD TABLE tab_rep, tab_full, tab_full2, tab_mixed, tab_include, tab_nothing"
+	"ALTER PUBLICATION tap_pub ADD TABLE tab_rep, tab_full, tab_full2, tab_mixed, tab_include, tab_nothing, tab_full_pk"
 );
 $node_publisher->safe_psql('postgres',
 	"ALTER PUBLICATION tap_pub_ins_only ADD TABLE tab_ins");
@@ -102,6 +108,9 @@ $node_publisher->safe_psql('postgres', "UPDATE tab_rep SET a = -a");
 $node_publisher->safe_psql('postgres',
 	"INSERT INTO tab_mixed VALUES (2, 'bar', 2.2)");
 
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab_full_pk VALUES (1, 'foo')");
+
 $node_publisher->safe_psql('postgres',
 	"INSERT INTO tab_nothing VALUES (generate_series(1,20))");
 
@@ -159,6 +168,8 @@ $node_publisher->safe_psql('postgres',
 	"UPDATE tab_full2 SET x = 'bb' WHERE x = 'b'");
 $node_publisher->safe_psql('postgres',
 	"UPDATE tab_mixed SET b = 'baz' WHERE a = 1");
+$node_publisher->safe_psql('postgres',
+	"UPDATE tab_full_pk SET b = 'bar' WHERE a = 1");
 
 $node_publisher->wait_for_catchup('tap_sub');
 


signature.asc
Description: PGP signature

pg_bsd_indent and -Wimplicit-fallthrough

2020-05-14 Thread Michael Paquier

Hi,

I have just noticed that pg_bsd_indent complains since
-Wimplicit-fallthrough=3 has been added to the default set of switches
if available.

Something like the attached is fine to take care of those warnings,
but what's our current patching policy for this tool?

Thanks,
--
Michael
diff --git a/indent.c b/indent.c
index 9faf57a..5da3401 100644
--- a/indent.c
+++ b/indent.c
@@ -917,6 +917,7 @@ check_type:
 	case structure:
 	if (ps.p_l_follow > 0)
 		goto copy_id;
+		/* FALLTHROUGH */
 	case decl:		/* we have a declaration type (int, etc.) */
 	parse(decl);	/* let parser worry about indentation */
 	if (ps.last_token == rparen && ps.tos <= 1) {
diff --git a/parse.c b/parse.c
index a51eb8b..bf6b169 100644
--- a/parse.c
+++ b/parse.c
@@ -100,6 +100,7 @@ parse(int tk) /* tk: the code for the construct scanned */
 		 */
 		ps.i_l_follow = ps.il[ps.tos--];
 	/* the rest is the same as for dolit and forstmt */
+	/* FALLTHROUGH */
 case dolit:		/* 'do' */
 case forstmt:		/* for (...) */
 	ps.p_stack[++ps.tos] = tk;


signature.asc
Description: PGP signature

Re: documenting the backup manifest file format

2020-05-14 Thread Fujii Masao





On 2020/04/15 5:33, Andrew Dunstan wrote:


On 4/14/20 4:09 PM, Alvaro Herrera wrote:

On 2020-Apr-14, Andrew Dunstan wrote:


OK, but I think if we're putting a timestamp string in ISO-8601 format
in the manifest it should be in UTC / Zulu time, precisely to avoid
these issues. If that's too much trouble then yes an epoch time will
probably do.

The timestamp is always specified and always UTC (except the code calls
it GMT).

+   /*
+* Convert last modification time to a string and append it to the
+* manifest. Since it's not clear what time zone to use and since time
+* zone definitions can change, possibly causing confusion, use GMT
+* always.
+*/
+   appendStringInfoString(&buf, "\"Last-Modified\": \"");
+   enlargeStringInfo(&buf, 128);
+   buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z",
+  pg_gmtime(&mtime));
+   appendStringInfoString(&buf, "\"");

I was merely saying that it's trivial to make this iso-8601 compliant as

 buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ",

ie. omit the "GMT" string and replace it with a literal Z, and remove
the space and replace it with a T.


I have one question related to this; Why don't we use log_timezone,
like backup_label? log_timezone is used for "START TIME" field in
backup_label. Sorry if this was already discussed.

/* Use the log timezone here, not the session timezone */
stamp_time = (pg_time_t) time(NULL);
pg_strftime(strfbuf, sizeof(strfbuf),
"%Y-%m-%d %H:%M:%S %Z",
pg_localtime(&stamp_time, 
log_timezone));

OTOH, *if* we want to use the same timezone for backup-related files because
backup can be used in different environements and timezone setting
may be different there or for other reasons, backup_label also should use
GMT or something for the sake of consistency?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pg_bsd_indent and -Wimplicit-fallthrough

2020-05-14 Thread Julien Rouhaud

On Fri, May 15, 2020 at 8:03 AM Michael Paquier  wrote:
>
> Hi,
>
> I have just noticed that pg_bsd_indent complains since
> -Wimplicit-fallthrough=3 has been added to the default set of switches
> if available.

Oh Indeed.

> Something like the attached is fine to take care of those warnings,
> but what's our current patching policy for this tool?

The patch looks good to me.  It looks like we already have custom
patches, so +1 to applying it.

Re: PG 13 release notes, first draft

2020-05-14 Thread Fujii Masao





On 2020/05/05 12:16, Bruce Momjian wrote:

I have committed the first draft of the PG 13 release notes.  You can
see them here:

https://momjian.us/pgsql_docs/release-13.html

It still needs markup, word wrap, and indenting.  The community doc
build should happen in a few hours.


Many thanks for working on this!

When I did "make html", I got the following message.

Link element has no content and no Endterm. Nothing to show in the link to 
sepgsql

"Allow  to control access to the" in release-13.sgml
seems to have caused this. Also I found it's converted into "Allow ??? to
 control access to the", i.e., ??? was used.

- Allow  to control access to the
+ Allow sepgsql to control access to the

Shouldn't we change that as the above?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

74 matches

Mail list logo