date:20200408


On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Global temporary tables

2020-04-08 Thread Prabhat Sahu

On Wed, Apr 8, 2020 at 1:48 PM 曾文旌  wrote:

>
>
> 2020年4月7日 下午6:22，Prabhat Sahu  写道：
>
> Thanks for review.
>> This parameter should support all types of writing of the bool type like
>> parameter autovacuum_enabled.
>> So I fixed in global_temporary_table_v24-pg13.patch.
>>
>
> Thank you Wenjing for the new patch with the fix and the "VACUUM FULL GTT"
> support.
> I have verified the above issue now its resolved.
>
> Please check the below findings on VACUUM FULL.
>
> postgres=# create global temporary table  gtt(c1 int) on commit preserve
> rows;
> CREATE TABLE
> postgres=# vacuum FULL ;
> WARNING:  global temp table oldest FrozenXid is far in the past
> HINT:  please truncate them or kill those sessions that use them.
> VACUUM
>
>
> This is expected,
> This represents that the GTT FrozenXid is the oldest in the entire db, and
> dba should vacuum the GTT if he want to push the db datfrozenxid.
> Also he can use function pg_list_gtt_relfrozenxids() to check which
> session has "too old” data and truncate them or kill the sessions.
>

Again as per the HINT given, as  "HINT:  please truncate them or kill those
sessions that use them."
There is only a single session.
If we try "TRUNCATE" and "VACUUM FULL" still the behavior is same as below.

postgres=# truncate gtt ;
TRUNCATE TABLE
postgres=# vacuum full;
WARNING: global temp table oldest FrozenXid is far in the past
HINT: please truncate them or kill those sessions that use them.
VACUUM

I have one more finding related to "CLUSTER table USING index", Please
check the below issue.
postgres=# create global temporary table gtt(c1 int) on commit preserve
rows;
CREATE TABLE
postgres=# create index idx1 ON gtt (c1);
CREATE INDEX

-- exit and re-connect the psql prompt
postgres=# \q
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# cluster gtt using idx1;
WARNING:  relcache reference leak: relation "gtt" not closed
CLUSTER

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel copy

2020-04-08 Thread Amit Kapila

On Tue, Apr 7, 2020 at 7:08 PM Ants Aasma  wrote:
>
> On Tue, 7 Apr 2020 at 08:24, vignesh C  wrote:
> > Leader will create a circular queue
> > and share it across the workers. The circular queue will be present in
> > DSM. Leader will be using a fixed size queue to share the contents
> > between the leader and the workers. Currently we will have 100
> > elements present in the queue. This will be created before the workers
> > are started and shared with the workers. The data structures that are
> > required by the parallel workers will be initialized by the leader,
> > the size required in dsm will be calculated and the necessary keys
> > will be loaded in the DSM. The specified number of workers will then
> > be launched. Leader will read the table data from the file and copy
> > the contents to the queue element by element. Each element in the
> > queue will have 64K size DSA. This DSA will be used to store tuple
> > contents from the file. The leader will try to copy as much content as
> > possible within one 64K DSA queue element. We intend to store at least
> > one tuple in each queue element. There are some cases where the 64K
> > space may not be enough to store a single tuple. Mostly in cases where
> > the table has toast data present and the single tuple can be more than
> > 64K size. In these scenarios we will extend the DSA space accordingly.
> > We cannot change the size of the dsm once the workers are launched.
> > Whereas in case of DSA we can free the dsa pointer and reallocate the
> > dsa pointer based on the memory size required. This is the very reason
> > for choosing DSA over DSM for storing the data that must be inserted
> > into the relation.
>
> I think the element based approach and requirement that all tuples fit
> into the queue makes things unnecessarily complex. The approach I
> detailed earlier allows for tuples to be bigger than the buffer. In
> that case a worker will claim the long tuple from the ring queue of
> tuple start positions, and starts copying it into its local line_buf.
> This can wrap around the buffer multiple times until the next start
> position shows up. At that point this worker can proceed with
> inserting the tuple and the next worker will claim the next tuple.
>

IIUC, with the fixed size buffer, the parallelism might hit a bit
because till the worker copies the data from shared buffer to local
buffer the reader process won't be able to continue.  I think there
will be somewhat more leader-worker coordination is required with the
fixed buffer size. However, as you pointed out, we can't allow it to
increase it to max_size possible for all tuples as that might require
a lot of memory.  One idea could be that we allow it for first any
such tuple and then if any other element/chunk in the queue required
more memory than the default 64KB, then we will always fallback to use
the memory we have allocated for first chunk.  This will allow us to
not use more memory except for one tuple and won't hit parallelism
much as in many cases not all tuples will be so large.

I think in the proposed approach queue element is nothing but a way to
divide the work among workers based on size rather than based on
number of tuples.  Say if we try to divide the work among workers
based on start offsets, it can be more tricky.  Because it could lead
to either a lot of contentention if we choose say one offset
per-worker (basically copy the data for one tuple, process it and then
pick next tuple) or probably unequal division of work because some can
be smaller and others can be bigger.  I guess division based on size
would be a better idea. OTOH, I see the advantage of your approach as
well and I will think more on it.

>
> > We had a couple of options for the way in which queue elements can be 
> > stored.
> > Option 1:  Each element (DSA chunk) will contain tuples such that each
> > tuple will be preceded by the length of the tuple.  So the tuples will
> > be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
> > tuple-2),  Or Option 2: Each element (DSA chunk) will contain only
> > tuples (tuple-1), (tuple-2), .  And we will have a second
> > ring-buffer which contains a start-offset or length of each tuple. The
> > old design used to generate one tuple of data and process tuple by
> > tuple. In the new design, the server will generate multiple tuples of
> > data per queue element. The worker will then process data tuple by
> > tuple. As we are processing the data tuple by tuple, I felt both of
> > the options are almost the same. However Design1 was chosen over
> > Design 2 as we can save up on some space that was required by another
> > variable in each element of the queue.
>
> With option 1 it's not possible to read input data into shared memory
> and there needs to be an extra memcpy in the time critical sequential
> flow of the leader. With option 2 data could be read directly into the
> shared memory buffer. With future async io support, reading an

Re: adding partitioned tables to publications

2020-04-08 Thread Amit Langote

On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
 wrote:
> All committed.
>
> Thank you and everyone very much for working on this.  I'm very happy
> that these two features from PG10 have finally met. :)

Thanks a lot for reviewing and committing.

prion seems to have failed:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13

Also, still unsure why the coverage report for pgoutput.c changes not good:
https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html

Will check.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: WIP: WAL prefetch (another approach)

2020-04-08 Thread Thomas Munro

On Wed, Apr 8, 2020 at 12:52 PM Thomas Munro  wrote:
> * he gave some feedback on the read_local_xlog_page() modifications: I
> probably need to reconsider the change to logical.c that passes NULL
> instead of cxt to the read_page callback; and the switch statement in
> read_local_xlog_page() probably should have a case for the preexisting
> mode

So... logical.c wants to give its LogicalDecodingContext to any
XLogPageReadCB you give it, via "private_data"; that is, it really
only accepts XLogPageReadCB implementations that understand that (or
ignore it).  What I want to do is give every XLogPageReadCB the chance
to have its own state that it is control of (to receive settings
specific to the implementation, or whatever), that you supply along
with it.  We can't do both kinds of things with private_data, so I
have added a second member read_page_data to XLogReaderState.  If you
pass in read_local_xlog_page as read_page, then you can optionally
install a pointer to XLogReadLocalOptions as reader->read_page_data,
to activate the new behaviours I added for prefetching purposes.

While working on that, I realised the readahead XLogReader was
breaking a rule expressed in XLogReadDetermineTimeLine().  Timelines
are really confusing and there were probably several subtle or not to
subtle bugs there.  So I added an option to skip all of that logic,
and just say "I command you to read only from TLI X".  It reads the
same TLI as recovery is reading, until it hits the end of readable
data and that causes prefetching to shut down.  Then the main recovery
loop resets the prefetching module when it sees a TLI switch, so then
it starts up again.  This seems to work reliably, but I've obviously
had limited time to test.  Does this scheme sound sane?

I think this is basically committable (though of course I wish I had
more time to test and review).  Ugh.  Feature freeze in half an hour.
From 8654ea7f2ed61de7ab3f0b305e37d190932ad97c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 17 Mar 2020 15:28:08 +1300
Subject: [PATCH v7 1/4] Rationalize GetWalRcv{Write,Flush}RecPtr().

GetWalRcvWriteRecPtr() previously reported the latest *flushed*
location.  Adopt the conventional terminology used elsewhere in the tree
by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related
variables that used the term "received".

Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest
*written* value.  This will allow later patches to use the value for
non-data-integrity purposes, without having to wait for the flush
pointer to advance.

Reviewed-by: Alvaro Herrera 
Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
---
 src/backend/access/transam/xlog.c  | 20 +-
 src/backend/access/transam/xlogfuncs.c |  2 +-
 src/backend/replication/README |  2 +-
 src/backend/replication/walreceiver.c  | 15 +-
 src/backend/replication/walreceiverfuncs.c | 24 --
 src/backend/replication/walsender.c|  2 +-
 src/include/replication/walreceiver.h  | 18 
 7 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8a4c1743e5..c60842ea03 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -209,8 +209,8 @@ HotStandbyState standbyState = STANDBY_DISABLED;

 static XLogRecPtr LastRec;

-/* Local copy of WalRcv->receivedUpto */
-static XLogRecPtr receivedUpto = 0;
+/* Local copy of WalRcv->flushedUpto */
+static XLogRecPtr flushedUpto = 0;
 static TimeLineID receiveTLI = 0;

 /*
@@ -9376,7 +9376,7 @@ CreateRestartPoint(int flags)
 	 * Retreat _logSegNo using the current end of xlog replayed or received,
 	 * whichever is later.
 	 */
-	receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
+	receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
 	replayPtr = GetXLogReplayRecPtr(&replayTLI);
 	endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
 	KeepLogSeg(endptr, &_logSegNo);
@@ -11869,7 +11869,7 @@ retry:
 	/* See if we need to retrieve more data */
 	if (readFile < 0 ||
 		(readSource == XLOG_FROM_STREAM &&
-		 receivedUpto < targetPagePtr + reqLen))
+		 flushedUpto < targetPagePtr + reqLen))
 	{
 		if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
 		 private->randAccess,
@@ -11900,10 +11900,10 @@ retry:
 	 */
 	if (readSource == XLOG_FROM_STREAM)
 	{
-		if (((targetPagePtr) / XLOG_BLCKSZ) != (receivedUpto / XLOG_BLCKSZ))
+		if (((targetPagePtr) / XLOG_BLCKSZ) != (flushedUpto / XLOG_BLCKSZ))
 			readLen = XLOG_BLCKSZ;
 		else
-			readLen = XLogSegmentOffset(receivedUpto, wal_segment_size) -
+			readLen = XLogSegmentOffset(flushedUpto, wal_segment_size) -
 targetPageOff;
 	}
 	else
@@ -12318,7 +12318,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		RequestXLogStreaming(tli, ptr, PrimaryCo

Re: WIP: WAL prefetch (another approach)

2020-04-08 Thread Thomas Munro

On Wed, Apr 8, 2020 at 11:27 PM Thomas Munro  wrote:
> On Wed, Apr 8, 2020 at 12:52 PM Thomas Munro  wrote:
> > * he gave some feedback on the read_local_xlog_page() modifications: I
> > probably need to reconsider the change to logical.c that passes NULL
> > instead of cxt to the read_page callback; and the switch statement in
> > read_local_xlog_page() probably should have a case for the preexisting
> > mode
>
> So... logical.c wants to give its LogicalDecodingContext to any
> XLogPageReadCB you give it, via "private_data"; that is, it really
> only accepts XLogPageReadCB implementations that understand that (or
> ignore it).  What I want to do is give every XLogPageReadCB the chance
> to have its own state that it is control of (to receive settings
> specific to the implementation, or whatever), that you supply along
> with it.  We can't do both kinds of things with private_data, so I
> have added a second member read_page_data to XLogReaderState.  If you
> pass in read_local_xlog_page as read_page, then you can optionally
> install a pointer to XLogReadLocalOptions as reader->read_page_data,
> to activate the new behaviours I added for prefetching purposes.
>
> While working on that, I realised the readahead XLogReader was
> breaking a rule expressed in XLogReadDetermineTimeLine().  Timelines
> are really confusing and there were probably several subtle or not to
> subtle bugs there.  So I added an option to skip all of that logic,
> and just say "I command you to read only from TLI X".  It reads the
> same TLI as recovery is reading, until it hits the end of readable
> data and that causes prefetching to shut down.  Then the main recovery
> loop resets the prefetching module when it sees a TLI switch, so then
> it starts up again.  This seems to work reliably, but I've obviously
> had limited time to test.  Does this scheme sound sane?
>
> I think this is basically committable (though of course I wish I had
> more time to test and review).  Ugh.  Feature freeze in half an hour.

Ok, so the following parts of this work have been committed:

b09ff536:  Simplify the effective_io_concurrency setting.
fc34b0d9:  Introduce a maintenance_io_concurrency setting.
3985b600:  Support PrefetchBuffer() in recovery.
d140f2f3:  Rationalize GetWalRcv{Write,Flush}RecPtr().

However, I didn't want to push the main patch into the tree at
(literally) the last minute after doing such much work on it in the
last few days, without more review from recovery code experts and some
independent testing.  Judging by the comments made in this thread and
elsewhere, I think the feature is in demand so I hope there is a way
we could get it into 13 in the next couple of days, but I totally
accept the release management team's prerogative on that.

Re: adding partitioned tables to publications

2020-04-08 Thread Peter Eisentraut


On 2020-04-08 13:16, Amit Langote wrote:

On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
 wrote:

All committed.

Thank you and everyone very much for working on this.  I'm very happy
that these two features from PG10 have finally met. :)


Thanks a lot for reviewing and committing.

prion seems to have failed:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13


This comes from -DRELCACHE_FORCE_RELEASE.


Also, still unsure why the coverage report for pgoutput.c changes not good:
https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html


I think this is because the END { } section in PostgresNode.pm shuts 
down all running instances in immediate mode, which doesn't save 
coverage properly.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread tushar


Hi,

I just came across this scenario  where - vaccum o/p with (full 1, 
parallel 0) option not working


--working

postgres=# vacuum (parallel 1, full 0 ) foo;
VACUUM
postgres=#

--Not working

postgres=# vacuum (full 1, parallel 0 ) foo;
ERROR:  cannot specify both FULL and PARALLEL options

I think it should work.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: Thoughts on "killed tuples" index hint bits support on standby

2020-04-08 Thread Michail Nikolaev

Hello, Peter.

Thanks for your feedback.

> Attached is a very rough POC patch of my own, which makes item
> deletion occur "non-opportunistically" in unique indexes. The idea is
> that we exploit the uniqueness property of unique indexes to identify
> "version churn" from non-HOT updates. If any single value on a leaf
> page has several duplicates, then there is a good chance that we can
> safely delete some of them. It's worth going to the heap to check
> whether that's safe selectively, at the point where we'd usually have
> to split the page. We only have to free one or two items to avoid
> splitting the page. If we can avoid splitting the page immediately, we
> may well avoid splitting it indefinitely, or forever.

Yes, it is a brilliant idea to use uniqueness to avoid bloating the index. I am
not able to understand all the code now, but I’ll check the patch in more
detail later.

> This seems fairly relevant to what you're doing. It makes almost all
> index cleanup occur using the new delete infrastructure for some of
> the most interesting workloads where deletion takes place in client
> backends. In practice, a standby will almost be in the same position
> as the primary in a workload that this approach really helps with,
> since setting the LP_DEAD bit itself doesn't really need to happen (we
> can go straight to deleting the items in the new deletion path).

> This is probably
> not limited to the special unique index case that my patch focuses on
> -- we can probably push this general approach forward in a number of
> different ways. I just started with unique indexes because that seemed
> most promising. I have only worked on the project for a few days. I
> don't really know how it will evolve.

Yes, it is relevant, but I think it is «located in a different plane» and
complement each other. Because most of the indexes are not unique these days
and most of the standbys (and even primaries) have long snapshots (up to
minutes, hours) – so, multiple versions of index records are still required for
them. Even if we could avoid multiple versions somehow - it could lead to a very
high rate of query cancelations on standby.

> To address the questions you've asked: I don't really like the idea of
> introducing new rules around tuple visibility and WAL logging to set
> more LP_DEAD bits like this at all. It seems very complicated.

I don’t think it is too complicated. I have polished the idea a little and now
it looks even elegant for me :) I’ll try to explain the concept briefly (there
are no new visibility rules or changes to set more LP_DEAD bits than now –
everything is based on well-tested mechanics):

1) There is some kind of horizon of xmin values primary pushes to a standby
currently. All standby’s snapshots are required to satisfice this horizon to
access heap and indexes. This is done by *ResolveRecoveryConflictWithSnapshot*
and corresponding WAL records (for example -XLOG_BTREE_DELETE).

2) We could introduce a new WAL record (named XLOG_INDEX_HINT in the patch for
now) to define a horizon of xmin required for standby’s snapshot to use LP_DEAD
bits in the indexes.

3) Master sends XLOG_INDEX_HINT in case it sets LP_DEAD bit on the index page
(but before possible FPW caused by hints) by calling *LogIndexHintIfNeeded*. It
is required to send such a record only if the new xmin value is greater than
one send before. I made tests to estimate the amount of new WAL – it is really
small (especially compared to FPW writes done because of LP_DEAD bit set).

4) New XLOG_INDEX_HINT contains only a database id and value of
*latestIndexHintXid* (new horizon position). For simplicity, the primary could
set just set horizon to *RecentGlobalXmin*. But for now in the patch horizon
value extracted from heap in *HeapTupleIsSurelyDead* to reduce the number of
XLOG_INDEX_HINT records even more).


5) There is a new field in PGPROC structure - *indexIgnoreKilledTuples*. If it
is set to true – standby queries are going to use LP_DEAD bits in index scans.
In such a case snapshot is required to satisfice new LP_DEAD-horizon pushed by
XLOG_INDEX_HINT records. It is done by the same mechanism as used for heap -
*ResolveRecoveryConflictWithSnapshot*.

6) The major thing here – it is safe to set *indexIgnoreKilledTuples* to both
‘true’ and ‘false’ from the perspective of correctness. It is just some kind of
performance compromise – use LP_DEAD bits but be aware of XLOG_INDEX_HINT
horizon or vice versa.

7) What is the way to do the right decision about this compromise? It is pretty
simple – if hot_standby_feedback is on and primary confirmed our feedback is
received – then set *indexIgnoreKilledTuples* too ‘true’ – since while feedback
is working as expected – the query will be never canceled by XLOG_INDEX_HINT
horizon!

8) To support cascading standby setups (with a possible break of feedback
chain) – additional byte added to the ‘keep-alive’ message of the feedback
protocol.

9) So, at the moment we are safe to use LP_D

Re: WIP: WAL prefetch (another approach)


On 4/8/20 8:12 AM, Thomas Munro wrote:


Ok, so the following parts of this work have been committed:

b09ff536:  Simplify the effective_io_concurrency setting.
fc34b0d9:  Introduce a maintenance_io_concurrency setting.
3985b600:  Support PrefetchBuffer() in recovery.
d140f2f3:  Rationalize GetWalRcv{Write,Flush}RecPtr().

However, I didn't want to push the main patch into the tree at
(literally) the last minute after doing such much work on it in the
last few days, without more review from recovery code experts and some
independent testing.  


I definitely think that was the right call.


Judging by the comments made in this thread and
elsewhere, I think the feature is in demand so I hope there is a way
we could get it into 13 in the next couple of days, but I totally
accept the release management team's prerogative on that.


That's up to the RMT, of course, but we did already have an extra week. 
Might be best to just get this in at the beginning of the PG14 cycle. 
FWIW, I do think the feature is really valuable.


Looks like you'll need to rebase, so I'll move this to the next CF in 
WoA state.


Regards,
--
-David
da...@pgmasters.net

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

On Wed, Apr 8, 2020 at 8:22 AM tushar  wrote:
> I just came across this scenario  where - vaccum o/p with (full 1,
> parallel 0) option not working
>
> --working
>
> postgres=# vacuum (parallel 1, full 0 ) foo;
> VACUUM
> postgres=#
>
> --Not working
>
> postgres=# vacuum (full 1, parallel 0 ) foo;
> ERROR:  cannot specify both FULL and PARALLEL options
>
> I think it should work.

Uh, why? There's a clear error message which matches what you tried to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-04-08 Thread legrand legrand

Fujii Masao-4 wrote
> On 2020/04/03 16:26
> [...]
>> 
>> "Note that planning and execution statistics are updated only at their
>> respective end phase, and only for successful operations.
>> For example the execution counters of a long running query
>> will only be updated at the execution end, without showing any progress
>> report before that.
> 
> Probably since this is not the example for explaining the relationship of
> planning and execution stats, it's better to explain this separately or
> just
> drop it?
> 
> What about the attached patch based on your proposals?

+1
Your patch is perfect ;^>

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: [patch]socket_timeout in interfaces/libpq


On 3/24/20 10:58 AM, David Steele wrote:

On 11/29/19 12:22 AM, nagaura.ryo...@fujitsu.com wrote:

>

I couldn't understand what you meant.
Do you say that we shouldn't change pqWait() behavior?
Or should I modify my patch to use pqDropConnection()?


This patch no longer applies: http://cfbot.cputube.org/patch_27_2175.log

CF entry has been updated to Waiting on Author.

More importantly it looks like there is still no consensus on this 
patch, which is an uncommitted part of a previous patch [1].


Unless somebody chimes in I'll mark this Returned with Feedback at the 
end of the CF.


Marked Returned with Feedback.

Regards,
--
-David
da...@pgmasters.net

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)


On 3/28/20 5:27 AM, Fabien COELHO wrote:


Hello Tom,

Thanks for your feedback,


I'd be rather unclear about what the actual feedback is, though. I'd
interpret it as "pg does not care much about code coverage". Most 
clients
are in the red on coverage.postgresql.org. I'd like pgbench at least 
to be

in the green, but it does not look that it will ever be the case.



The reason why the first iteration failed was that it was insufficiently
insensitive to timing.


This patch has been marked Returned with Feedback.

If the TAP tests could be made to work without the special exceptions 
added to pgbench.c I think this patch would have a better chance.


Regards,
--
-David
da...@pgmasters.net

Re: Improving connection scalability: GetSnapshotData()

2020-04-08 Thread Alexander Korotkov

On Wed, Apr 8, 2020 at 3:43 PM Andres Freund  wrote:
> Realistically it still 2-3 hours of proof-reading.
>
> This makes me sad :(

Can we ask RMT to extend feature freeze for this particular patchset?
I think it's reasonable assuming extreme importance of this patchset.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Improving connection scalability: GetSnapshotData()

On Tue, Apr 7, 2020 at 4:27 PM Andres Freund  wrote:
> The main reason is that we want to be able to cheaply check the current
> state of the variables (mostly when checking a backend's own state). We
> can't access the "dense" ones without holding a lock, but we e.g. don't
> want to make ProcArrayEndTransactionInternal() take a lock just to check
> if vacuumFlags is set.
>
> It turns out to also be good for performance to have the copy for
> another reason: The "dense" arrays share cachelines with other
> backends. That's worth it because it allows to make GetSnapshotData(),
> by far the most frequent operation, touch fewer cache lines. But it also
> means that it's more likely that a backend's "dense" array entry isn't
> in a local cpu cache (it'll be pulled out of there when modified in
> another backend). In many cases we don't need the shared entry at commit
> etc time though, we just need to check if it is set - and most of the
> time it won't be.  The local entry allows to do that cheaply.
>
> Basically it makes sense to access the PGPROC variable when checking a
> single backend's data, especially when we have to look at the PGPROC for
> other reasons already.  It makes sense to look at the "dense" arrays if
> we need to look at many / most entries, because we then benefit from the
> reduced indirection and better cross-process cacheability.

That's a good explanation. I think it should be in the comments or a
README somewhere.

> How about:
> /*
>  * If the current xactCompletionCount is still the same as it was at 
> the
>  * time the snapshot was built, we can be sure that rebuilding the
>  * contents of the snapshot the hard way would result in the same 
> snapshot
>  * contents:
>  *
>  * As explained in transam/README, the set of xids considered running 
> by
>  * GetSnapshotData() cannot change while ProcArrayLock is held. 
> Snapshot
>  * contents only depend on transactions with xids and 
> xactCompletionCount
>  * is incremented whenever a transaction with an xid finishes (while
>  * holding ProcArrayLock) exclusively). Thus the xactCompletionCount 
> check
>  * ensures we would detect if the snapshot would have changed.
>  *
>  * As the snapshot contents are the same as it was before, it is is 
> safe
>  * to re-enter the snapshot's xmin into the PGPROC array. None of the 
> rows
>  * visible under the snapshot could already have been removed (that'd
>  * require the set of running transactions to change) and it fulfills 
> the
>  * requirement that concurrent GetSnapshotData() calls yield the same
>  * xmin.
>  */

That's nice and clear.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow auto_explain to log plans before queries are executed


On 3/5/20 8:46 AM, Julien Rouhaud wrote:

On Thu, Feb 27, 2020 at 7:31 AM Julien Rouhaud  wrote:


On Thu, Feb 27, 2020 at 7:12 AM Pavel Stehule  wrote:


čt 27. 2. 2020 v 7:01 odesílatel Yugo NAGATA  napsal:

  I think "query debugger" feature you proposed is out of scope of
auto_explain module. I also think the feature to analyze running
query online is great, but we will need another discussion on a new
module or eature for it.



sure. My note was about using auto_explain like query_debugger. It has not too 
sense, and from this perspective, the original proposal to log plan before 
execution has more sense.

you can log every plan with higher cost than some constant.


Yes I thought about that too.  If you're not in an OLAP environment
(or with a specific user running few expensive queries), setup an
auto_explain.log_before_execution_min_cost.


There was some discussion but no clear consensus on what should really
be done.  I'm marking the patch as waiting on author which seems more
accurate.  Feel free to switch it back if that's a wrong move.


There does seem to be any progress towards a consensus so I'm marking 
this Returned with Feedback.


Regards,
--
-David
da...@pgmasters.net

Re: Improving connection scalability: GetSnapshotData()

2020-04-08 Thread Jonathan S. Katz

On 4/8/20 8:59 AM, Alexander Korotkov wrote:
> On Wed, Apr 8, 2020 at 3:43 PM Andres Freund  wrote:
>> Realistically it still 2-3 hours of proof-reading.
>>
>> This makes me sad :(
> 
> Can we ask RMT to extend feature freeze for this particular patchset?
> I think it's reasonable assuming extreme importance of this patchset.

One of the features of RMT responsibilities[1] is to be "hands off" as
much as possible, so perhaps a reverse ask: how would people feel about
this patch going into PG13, knowing that the commit would come after the
feature freeze date?

My 2¢, with RMT hat off:

As mentioned earlier[2], we know that connection scalability is a major
pain point with PostgreSQL and any effort that can help alleviate that
is a huge win, even in incremental gains. Andres et al experimentation
show that this is more than incremental gains, and will certainly make a
huge difference in people's PostgreSQL experience. It is one of those
features where you can "plug in and win" -- you get a performance
benefit just by upgrading. That is not insignificant.

However, I also want to ensure that we are fair: in the past there have
also been other patches that have been "oh-so-close" to commit before
feature freeze but have not made it in (an example escapes me at the
moment). Therefore, we really need to have consensus among ourselves
that the right decision is to allow this to go in after feature freeze.

Did this come in (very) late into the development cycle? Yes, and I
think normally that's enough to give cause for pause. But I could also
argue that Andres is fixing a "bug" with PostgreSQL (probably several
bugs ;) with PostgreSQL -- and perhaps the fixes can't be backpatched
per se, but they do improve the overall stability and usability of
PostgreSQL and it'd be a shame if we have to wait on them.

Lastly, with the ongoing world events, perhaps time that could have been
dedicated to this and other patches likely affected their completion. I
know most things in my life take way longer than they used to (e.g.
taking out the trash/recycles has gone from a 15s to 240s routine). The
same could be said about other patches as well, but this one has a far
greater impact (a double-edged sword, of course) given it's a feature
that everyone uses in PostgreSQL ;)

So with my RMT hat off, I say +1 to allowing this post feature freeze,
though within a reasonable window.

My 2¢, with RMT hat on:

I believe in[2] I outlined a way a path for including the patch even at
this stage in the game. If it is indeed committed, I think we
immediately put it on the "Recheck a mid-Beta" list. I know it's not as
trivial to change as something like "Determine if jit="on" by default"
(not picking on Andres, I just remember that example from RMT 11), but
it at least provides a notable reminder that we need to ensure we test
this thoroughly, and point people to really hammer it during beta.

So with my RMT hat on, I say +0 but with a ;)

Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/Release_Management_Team#History
[2]
https://www.postgresql.org/message-id/6be8c321-68ea-a865-d8d0-50a3af616463%40postgresql.org

signature.asc
Description: OpenPGP digital signature

Re: NOT IN subquery optimization

On 3/26/20 4:58 PM, Li, Zheng wrote:

>BTW, so far as I can see, the only reason you're bothering with the whole
thing is to compare the size of the subquery output with work_mem, because
that's all that subplan_is_hashable does. I wonder whether that
consideration is even still necessary in the wake of 1f39bce02. If it is,
I wonder whether there isn't a cheaper way to figure it out. (Note
similar comment in make_subplan.)

The comment in make_subplan says there is no cheaper way to figure out:
/* At present, however, we can only check hashability after
* we've made the subplan :-(. (Determining whether it'll fit in work_mem
* is the really hard part.)
*/

I don't see why commit 1f39bce02 is related to this problem. Can you
expand on this?

>But can't you detect that case directly? It seems like you'd need to

figure out the NULL situation anyway to know whether the transformation
to antijoin is valid in the first place.

Yes, we do need to figure out the NULL situation, and there is always valid transformation

to antijoin, it's just in the NULL case we need to stuff additional clause
to the anti join
condition, and in these cases the transformation actually outperforms
Subplan (non-hashed),
but underperforms the hashed Subplan. The unmodified anti hash join has
similar performance
compared to hashed Subplan.

There seem to be enough questions about this implementation that I think
it makes sense to mark this patch Returned with Feedback.

Feel free to resubmit it to a future CF when there is more of a
consensus on the implementation.

Regards,
--
-David
da...@pgmasters.net

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Improving connection scalability: GetSnapshotData()

On Wed, Apr 8, 2020 at 9:27 AM Jonathan S. Katz  wrote:
> One of the features of RMT responsibilities[1] is to be "hands off" as
> much as possible, so perhaps a reverse ask: how would people feel about
> this patch going into PG13, knowing that the commit would come after the
> feature freeze date?

Letting something be committed after feature freeze, or at any other
time, is just a risk vs. reward trade-off. Every patch carries some
chance of breaking stuff or making things worse. And every patch has a
chance of making something better that people care about.

On general principle, I would categorize this as a moderate-risk
patch. It doesn't change SQL syntax like, e.g. MERGE, nor does it
touch the on-disk format, like, e.g. INSERT .. ON CONFLICT UPDATE. The
changes are relatively localized, unlike, e.g. parallel query. Those
are all things that reduce risk. On the other hand, it's a brand new
patch which has not been thoroughly reviewed by anyone. Moreover,
shakedown time will be minimal because we're so late in the release
cycle. if it has subtle synchronization problems or if it regresses
performance badly in some cases, we might not find out about any of
that until after release. While in theory we could revert it any time,
since no SQL syntax or on-disk format is affected, in practice it will
be difficult to do that if it's making life better for some people and
worse for others.

I don't know what the right thing to do is. I agree with everyone who
says this is a very important problem, and I have the highest respect
for Andres's technical ability. On the other hand, I have been around
here long enough to know that deciding whether to allow late commits
on the basis of how much we like the feature is a bad plan, because it
takes into account only the upside of a commit, and ignores the
possible downside risk. Typically, the commit is late because the
feature was rushed to completion at the last minute, which can have an
effect on quality. I can say, having read through the patches
yesterday, that they don't suck, but I can't say that they're fully
correct. That's not to say that we shouldn't decide to take them, but
it is a concern to be taken seriously. We have made mistakes before in
what we shipped that had serious implications for many users and for
the project; we should all be wary of making more such mistakes. I am
not trying to say that solving problems and making stuff better is NOT
important, just that every coin has two sides.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

2020-04-08 Thread Kashif Zeeshan

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman  wrote:

> Hi,
>
> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>
> I have added the shared state as previously described. The new grammar
> changes
> are as follows:
>
> START_BACKUP [LABEL ''] [FAST] [MAX_RATE %d]
> - This will generate a unique backupid using pg_strong_random(16) and
> hex-encoded
>   it. which is then returned as the result set.
> - It will also create a shared state and add it to the hashtable. The
> hash table size is set
>   to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
> I think it's
>   sufficient initial size. max_wal_senders is not used, because it can
> be set to quite a
>   large values.
>
> JOIN_BACKUP 'backup_id'
> - finds 'backup_id' in hashtable and attaches it to server process.
>
>
> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
> - renamed SEND_FILES to SEND_FILE
> - removed START_WAL_LOCATION from this because 'startptr' is now
> accessible through
>   shared state.
>
> There is no change in other commands:
> STOP_BACKUP [NOWAIT]
> LIST_TABLESPACES [PROGRESS]
> LIST_FILES [TABLESPACE]
> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>
> The current patches (v11) have been rebased to the latest master. The
> backup manifest is enabled
> by default, so I have disabled it for parallel backup mode and have
> generated a warning so that
> user is aware of it and not expect it in the backup.
>
> Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please
have a look.

1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were
given in previous emails of each bug I tried to verify the fix. Backtrace
is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could
not look up local user ID 1000: Too many open files" when the
max_wal_senders was set to 2000.

[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
 /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/228 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup
/tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from
/home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D
/home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x0040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x00403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x7f2226f76a49 in __run_exit_handlers (status=1,
listp=0x7f22272f86c8 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true)
at exit.c:77
#4  0x7f2226f76a95 in __GI_exit (status=) at exit.c:99
#5  0x00408c54 in create_parallel_workers (backupinfo=0x952ca0) at
pg_basebackup.c:2811
#6  0x0040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x00408b4d in main (argc=6, argv=0x7ffe3dabc718) at
pg_basebackup.c:2765
(gdb)

b) When executing two backups at the same time, getting FATAL error due to
max_wal_senders and instead of exit  Backup g

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> >On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >>hyrax is not too happy with this test:
> >>
> >>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
> >>
> >>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >>EXPLAIN output, but it evidently is.
> >>
> >
> >Thanks, I'll investigate. It's not clear to me either what might be
> >causing this, but I guess something must have gone wrong in
> >estimation/planning.
> >
>
> OK, I know what's going on - it's a rather embarassing issue in the
> regression test. There's no analyze on the test tables, so it uses
> default estimates for number of groups etc. But with clobber cache the
> test runs long enough for autoanalyze to kick in and collect stats, so
> we generate better estimates which changes the plan.
>
> I'll get this fixed - explicit analyze and tweaking the data a bit
> should do the trick.

Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.

James

Re: Resume vacuum and autovacuum from interruption and cancellation

On 2/28/20 8:56 AM, Masahiko Sawada wrote:

According to those results, it's thought that the more we resume
vacuum from the tail of the table, the efficiency is good. Since the
table is being updated uniformly even during autovacuum it was more
efficient to restart autovacuum from last position rather than from
the beginning of the table. I think that results shows somewhat the
benefit of this patch but I'm concerned that it might be difficult for
users when to use this option. In practice the efficiency completely
depends on the dispersion of updated pages, and that test made pages
dirty uniformly, which is not a common situation. So probably if we
want this feature, I think we should automatically enable resuming
when we can basically be sure that resuming is better. For example, we
remember both the last vacuumed block and how many vacuum-able pages
seems to exist from there, and we decide to resume vacuum if we can
expect to process more many pages.

I have to say I'm a bit confused by the point of this patch. I get that 
starting in progress is faster but that's only true because the entire 
table is not being vacuumed?

If as you say:

> If we start to vacuum from not first block, we can update neither
> relfrozenxid nor relfrozenxmxid. And we might not be able to update
> even relation statistics.

Then we'll still need to vacuum the entire table before we can be sure 
the oldest xid has been removed/frozen. If we could do those updates on 
a resume then that would change my thoughts on the feature a lot.

What am I missing?

I'm marking this Returned with Feedback due concerns expressed up-thread 
(and mine) and because the patch has been Waiting on Author for nearly 
the entire CF.

Regards,
--
-David
da...@pgmasters.net

Re: adding partitioned tables to publications

2020-04-08 Thread Amit Langote

On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut
 wrote:
> On 2020-04-08 13:16, Amit Langote wrote:
> > On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
> >  wrote:
> >> All committed.
> >>
> >> Thank you and everyone very much for working on this.  I'm very happy
> >> that these two features from PG10 have finally met. :)
> >
> > Thanks a lot for reviewing and committing.
> >
> > prion seems to have failed:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13
>
> This comes from -DRELCACHE_FORCE_RELEASE.

I'm seeing some funny stuff on such a build locally too, although
haven't been able to make sense of it yet.

> > Also, still unsure why the coverage report for pgoutput.c changes not good:
> > https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html
>
> I think this is because the END { } section in PostgresNode.pm shuts
> down all running instances in immediate mode, which doesn't save
> coverage properly.

Thanks for that tip.  Appending the following at the end of the test
file has fixed the coverage reporting for me.

I noticed the following coverage issues:

1. The previous commit f1ac27bfd missed a command that I had included
to cover the following blocks of apply_handle_tuple_routing():

1165 : else
1166 : {
1167   0 : remoteslot =
ExecCopySlot(remoteslot, remoteslot_part);
1168   0 : slot_getallattrs(remoteslot);
1169 : }
...

1200   2 : if (map != NULL)
1201 : {
1202   0 : remoteslot_part =
execute_attr_map_slot(map->attrMap,
1203 :
remoteslot,
1204 :
remoteslot_part);
1205 : }

2. Now that I am able to see proper coverage fo
publish_via_partition_root related changes, I can see that a block in
pgoutput_change() is missing coverage

The attached fixes these coverage issues.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

logicalrep-partition-test-coverage-fixes.patch
Description: Binary data

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-08 Thread Juan José Santamaría Flecha

On Wed, Apr 8, 2020 at 9:03 AM davinder singh 
wrote:

> On Tue, Apr 7, 2020 at 8:30 PM Juan José Santamaría Flecha <
> juanjo.santama...@gmail.com> wrote:
>
>>
>> * The logic on "defined(_MSC_VER) && (_MSC_VER >= 1900)" is defined as
>> "_WIN32_WINNT >= 0x0600" on other parts of the code. I would
>> recommend using the later.
>>
> I think  "_WIN32_WINNT >= 0x0600" represents windows versions only and
> doesn't include any information about Visual Studio versions. So I am
> sticking to " defined(_MSC_VER) && (_MSC_VER >= 1900)".
>

Let me explain further, in pg_config_os.h you can check that the value of
_WIN32_WINNT is solely based on checking _MSC_VER. This patch should also
be meaningful for WIN32 builds using MinGW, or we might see this issue
reappear in those  systems if update the  MIN_WINNT value to more current
OS versions. So, I still think  _WIN32_WINNT is a better option.

I have resolved other comments. I have attached a new version of the patch.
>

I still see the same last lines in both #ifdef blocks, and pgindent might
change a couple of lines to:
+ MultiByteToWideChar(CP_ACP, 0, winlocname, -1, wc_locale_name,
+ LOCALE_NAME_MAX_LENGTH);
+
+ if ((GetLocaleInfoEx(wc_locale_name, LOCALE_SNAME,
+ (LPWSTR)&buffer, LOCALE_NAME_MAX_LENGTH)) > 0)
+ {

But that is pretty trivial stuff.

Please open an item in the commitfest for this patch.

Regards,

Juan José Santamaría Flecha

Re: adding partitioned tables to publications

2020-04-08 Thread Amit Langote

On Wed, Apr 8, 2020 at 11:07 PM Amit Langote  wrote:
> On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut
>  wrote:
> > I think this is because the END { } section in PostgresNode.pm shuts
> > down all running instances in immediate mode, which doesn't save
> > coverage properly.
>
> Thanks for that tip.  Appending the following at the end of the test
> file has fixed the coverage reporting for me.

The patch posted in the previous email has it, but I meant this by
"the following":

+
+$node_publisher->stop('fast');
+$node_subscriber1->stop('fast');
+$node_subscriber2->stop('fast');

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: Online checksums patch - once again


On 4/1/20 11:30 AM, David Steele wrote:

On 1/18/20 6:18 PM, Daniel Gustafsson wrote:


Attached is a v16 rebased on top of current master which addresses the 
above

commented points, and which I am basing the concurrency work on.


This patch no longer applies cleanly: 
http://cfbot.cputube.org/patch_27_2260.log


The CF entry has been updated to Waiting on Author.

Regards,


There has been review on this patch but no updates in some time. As far 
as I can see there's debate on how to mark relations as fully 
checksummed and/or how to resume.


I'm marking this patch Returned with Feedback. Please feel free to 
resubmit when it is again ready for review.


Regards,
--
-David
da...@pgmasters.net

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Mahendra Singh Thalor

On Wed, 8 Apr 2020 at 17:59, Robert Haas  wrote:
>
> On Wed, Apr 8, 2020 at 8:22 AM tushar  wrote:
> > I just came across this scenario  where - vaccum o/p with (full 1,
> > parallel 0) option not working
> >
> > --working
> >
> > postgres=# vacuum (parallel 1, full 0 ) foo;
> > VACUUM
> > postgres=#
> >
> > --Not working
> >
> > postgres=# vacuum (full 1, parallel 0 ) foo;
> > ERROR:  cannot specify both FULL and PARALLEL options
> >
> > I think it should work.
>
> Uh, why? There's a clear error message which matches what you tried to do.
>

I think, Tushar point is that either we should allow both
vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
both cases, we should through error.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: Verify true root on replicas with amcheck


On 1/16/20 7:40 PM, Peter Geoghegan wrote:

On Thu, Jan 9, 2020 at 12:55 AM godjan •  wrote:


I heard that amcheck has an invariant about locking no more than 1 page at a 
moment for avoiding deadlocks. Is there possible a deadlock situation?


This is a conservative principle that I came up with when I wrote the
original version of amcheck. It's not strictly necessary, but it
seemed like a good idea. It should be safe to "couple" buffer locks in
a way that matches the B-Tree code -- as long as it is thought through
very carefully. I am probably going to relax the rule for one specific
case soon -- see:

https://postgr.es/m/f7527087-6e95-4077-b964-d2cafef62...@yandex-team.ru

Your patch looks like it gets it right (it won't deadlock with other
sessions that access the metapage), but I hesitate to commit it
without a strong justification. Acquiring multiple buffer locks
concurrently is worth avoiding wherever possible.


I have marked this patch Returned with Feedback since it has been 
sitting for a while with no response from the author.


Regards,
--
-David
da...@pgmasters.net

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-04-08 Thread Julien Rouhaud

On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
 wrote:
>
> Hi Julien,
>
> On 2020/04/02 22:25, Julien Rouhaud wrote:
> > New conflict, rebased v9 attached.
>
> I tested the patch on the head (c7654f6a3) and
> the result was fine. See below:
>
> $ make installcheck-world
> =
>   All 1 tests passed.
> =

Thanks Yamada-san!  Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Kuntal Ghosh

Hello Ashutosh, Fujita,

On Wed, Apr 8, 2020 at 3:49 PM Ashutosh Bapat
 wrote:
> On Wed, 8 Apr 2020 at 15:42, Etsuro Fujita  wrote:
>> On Wed, Apr 8, 2020 at 4:30 PM Kuntal Ghosh  
>> wrote:
>> > I'm getting the following warning during compilation.
>> >
>> > partbounds.c: In function ‘partition_bounds_merge’:
>> > partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
>> > [-Wunused-variable]
>> >   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>> >  ^
>> > For fixing the same, we can declare inner_binfo as
>> > PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.
>>
>> I'd propose to remove an assertion causing this (and  the
>> outer_binfo/inner_binfo variables) from partition_bounds_merge(),
>> rather than doing so, because the assertion is redundant, as we have
>> the same assertion in merge_list_bounds() and merge_range_bounds().
>> Please find attached a patch.
>
>
> I think it's better to have the assertion in all the three places and also in 
> merge_hash_bounds() whenever that comes along. The assertion in 
> merge_*_bounds() will be good to in case those functions are called from 
> places other than partition_bounds_merge(). The assertion in 
> partition_bounds_merge() will make sure that when the individual 
> merge_*_bounds() functions are called based on one of the bounds both of the 
> bounds have same strategy.

Both of your patches fix the problem. I don't have much exposure in
this area to comment on whether we should keep/remove the assertion
from the code. But, here is my opinion:

The code structure looks like following:
Assert(condition A);
if (Condition B)
merge_*_bounds();

Inside merge_*_bounds(), you have both the above assert and the if
condition as another assert:
Assert(condition A and Condition B);

And, merge_*_bounds() are called from only one place. So, something is
redundant here and I'm inclined towards removal of the assert
condition. Another thing I noticed:

/* The partitioning strategies should be the same. */
Assert(outer_binfo->strategy == inner_binfo->strategy);

The comment just reads the assertion aloud which looks unnecessary.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.


Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.



Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.



I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/test/regress/expected/incremental_sort.out 
b/src/test/regress/expected/incremental_sort.out
index 3072d95643..fb4ab95922 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -141,7 +141,8 @@ begin
 end;
 $$;
 -- A single large group tested around each mode transition point.
-insert into t(a, b) select 1, i from generate_series(1, 100) n(i);
+insert into t(a, b) select i/100 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 31;
QUERY PLAN
 -
@@ -456,7 +457,8 @@ select * from (select * from t order by a) s order by a, b 
limit 66;
 
 delete from t;
 -- An initial large group followed by a small group.
-insert into t(a, b) select (case when i < 50 then 1 else 2 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select i/50 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 55;
QUERY PLAN
 -
@@ -521,7 +523,7 @@ select * from (select * from t order by a) s order by a, b 
limit 55;
  1 | 47
  1 | 48
  1 | 49
- 2 | 50
+ 1 | 50
  2 | 51
  2 | 52
  2 | 53
@@ -538,10 +540,10 @@ select explain_analyze_without_memory('select * from 
(select * from t order by a
  Sort Key: t.a, t.b
  Presorted Key: t.a
  Full-sort Groups: 2 Sort Methods: top-N heapsort, quicksort Memory: 
avg=NNkB peak=NNkB
- ->  Sort (actual rows=100 loops=1)
+ ->  Sort (actual rows=101 loops=1)
Sort Key: t.a
Sort Method: quicksort  Memory: NNkB
-   ->  Seq Scan on t (actual rows=100 loops=1)
+   ->  Seq Scan on t (actual rows=1000 loops=1)
 (9 rows)
 
 select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * 
from (select * from t order by a) s order by a, b limit 55'));
@@ -584,7 +586,8 @@ select 
explain_analyze_inc_sort_nodes_verify_invariants('select * from (select *
 
 delete from t;
 -- An initial small group followed by a large group.
-insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 1000) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 70;
QUERY PLAN
 -
@@ -705,17 +708,17 @@ select * from t left join (select * from (select * from t 
order by a) v order by
 rollback;
 -- Test EXPLAIN ANALYZE with both fullsort and presorted groups.
 select explain_analyze_without_memory('select * from (select * from t order by 
a) s order by a, b limit 70');
-   
explain_analyze_without_memory  
  
--

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman

On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
> >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
> >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
> >> wrote:
> >>>
> >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >hyrax is not too happy with this test:
> >
> >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
> >
> >It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >EXPLAIN output, but it evidently is.
> >
> 
> Thanks, I'll investigate. It's not clear to me either what might be
> causing this, but I guess something must have gone wrong in
> estimation/planning.
> 
> >>>
> >>>OK, I know what's going on - it's a rather embarassing issue in the
> >>>regression test. There's no analyze on the test tables, so it uses
> >>>default estimates for number of groups etc. But with clobber cache the
> >>>test runs long enough for autoanalyze to kick in and collect stats, so
> >>>we generate better estimates which changes the plan.
> >>>
> >>>I'll get this fixed - explicit analyze and tweaking the data a bit
> >>>should do the trick.
> >>
> >>Looking at the tests that failed, I think we should consider just adding:
> >>set enable_sort = off;
> >>because several of those tests have very specific amounts of data to
> >>ensure we test the transition points around the different modes in the
> >>incremental sort node.
> >>
> >
> >Maybe, but I'd much rather tweak the data so that we test both the
> >costing and execution part.
> >
>
> I do think this does the trick by increasing the number of rows a bit
> (from 100 to 1000) to make the Sort more expensive than Incremental
> Sort, while still testing the transition points.
>
> James, can you verify it that's still true?

Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".

I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...

Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?

James

Re: Conflict handling for COPY FROM

On Fri, Mar 27, 2020 at 3:27 PM Tom Lane > wrote:


I took a quick look at this patch, since it was marked "ready for
committer", but I don't see how it can possibly be considered
committable.


Based on Tom's review I've marked this patch Returned with Feedback.

If there's an acceptable implementation, it seems that it would be very 
different from what we have here.


Regards,
--
-David
da...@pgmasters.net

DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH FOR"
was first allowed on partition tables (86f575948).

I thought this would work like partitioned indexes (8b08f7d48), where detaching
a partition makes its index non-inherited, and attaching a partition marks a
pre-existing, matching partition as inherited rather than creating a new one.

DROP TABLE t, t1;
CREATE TABLE t(i int)PARTITION BY RANGE(i);
CREATE TABLE t1 PARTITION OF t FOR VALUES FROM(1)TO(2);
CREATE OR REPLACE FUNCTION trigf() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN 
END $$;
CREATE TRIGGER trig AFTER INSERT ON t FOR EACH ROW EXECUTE FUNCTION trigf();
SELECT tgrelid::regclass, * FROM pg_trigger WHERE tgrelid='t1'::regclass;
ALTER TABLE t DETACH PARTITION t1;
ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
ERROR:  trigger "trig" for relation "t1" already exists

DROP TRIGGER trig ON t1;
ERROR:  cannot drop trigger trig on table t1 because trigger trig on table t 
requires it
HINT:  You can drop trigger trig on table t instead.

I remember these, but they don't seem to be relevant to this issue, which seems
to be independant.

1fa846f1c9 Fix cloning of row triggers to sub-partitions
b9b408c487 Record parents of triggers

The commit for partitioned indexes talks about using an pre-existing index on
the child as a "convenience gadget", puts indexes into pg_inherit, and
introduces "ALTER INDEX..ATTACH PARTITION" and "CREATE INDEX..ON ONLY".

It's probably rare for a duplicate index to be useful (unless rebuilding to be
more optimal, which is probably not reasonably interspersed with altering
inheritence).  But I don't know if that's equally true for triggers.  So I'm
not sure what the intended behavior is, so I've stopped after implementing
a partial fix.
>From 2c31cac22178d904ee108b77f316886d1e2f6288 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 3 Apr 2020 22:43:26 -0500
Subject: [PATCH] WIP: fix detaching tables with inherited triggers

---
 src/backend/commands/tablecmds.c | 33 
 1 file changed, 33 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 037d457c3d..10a60e158f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16797,6 +16797,39 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 	}
 	table_close(classRel, RowExclusiveLock);
 
+	/* detach triggers too */
+	{
+		/* XXX: relcache.c */
+		ScanKeyData skey;
+		SysScanDesc	scan;
+		HeapTuple	trigtup;
+		Relation	tgrel = table_open(TriggerRelationId, RowExclusiveLock);
+
+		ScanKeyInit(&skey, Anum_pg_trigger_tgrelid, BTEqualStrategyNumber,
+F_OIDEQ, ObjectIdGetDatum(RelationGetRelid(partRel)));
+
+		scan = systable_beginscan(tgrel, TriggerRelidNameIndexId,
+true, NULL, 1, &skey);
+
+		while (HeapTupleIsValid(trigtup = systable_getnext(scan)))
+		{
+			Form_pg_trigger pg_trigger;
+			trigtup = heap_copytuple(trigtup);  /* need a modifiable copy */
+			pg_trigger = (Form_pg_trigger) GETSTRUCT(trigtup);
+			/* Set the trigger's parent to Invalid */
+			if (!OidIsValid(pg_trigger->tgparentid))
+continue;
+			if (!pg_trigger->tgisinternal)
+continue;
+			pg_trigger->tgparentid = InvalidOid;
+			pg_trigger->tgisinternal = false;
+			CatalogTupleUpdate(tgrel, &trigtup->t_self, trigtup);
+			heap_freetuple(trigtup);
+		}
+		systable_endscan(scan);
+		table_close(tgrel, RowExclusiveLock);
+	}
+
 	/*
 	 * Detach any foreign keys that are inherited.  This includes creating
 	 * additional action triggers.
-- 
2.17.0

Re: [PATCH] Incremental sort


On 4/8/20 11:13 AM, James Coleman wrote:


James, can you verify it that's still true?


I marked this entry as committed in the 2020-03 CF but it's not clear to 
me if that's entirely true. I'll leave it up to you (all) to move it to 
the 2020-07 CF if there is remaining work (other than making the build 
farm happy).


Regards,
--
-David
da...@pgmasters.net

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

James Coleman  writes:
> Should we change `analyze` to `analyze t` to avoid unnecessarily
> re-analyzing all other tables in the regression db?

Yes, a global analyze here is a remarkably horrid idea.

regards, tom lane

Re: [PATCH] Incremental sort

2020-04-08 Thread James Coleman

On Wed, Apr 8, 2020 at 11:29 AM David Steele  wrote:
>
> On 4/8/20 11:13 AM, James Coleman wrote:
> >>
> >> James, can you verify it that's still true?
>
> I marked this entry as committed in the 2020-03 CF but it's not clear to
> me if that's entirely true. I'll leave it up to you (all) to move it to
> the 2020-07 CF if there is remaining work (other than making the build
> farm happy).

Thanks.

I think it's true enough. The vast majority is committed, and the
small amount that isn't we'll leave as future improvements and
separate threads.

James

Re: Function to track shmem reinit time


On 2/24/20 10:57 PM, Robert Haas wrote:

On Sat, Feb 22, 2020 at 10:31 PM Tom Lane  wrote:

I'm still going to object to it, on the grounds that

(1) it's exposing an implementation detail that clients should not be
concerned with, and that we might change in future.  The name isn't
even well chosen --- if the argument is that it's useful to monitor
server uptime, why isn't the name "pg_server_uptime"?

(2) if your server is crashing often enough that postmaster start
time isn't an adequate substitute, you have way worse problems than
whether you can measure it.  I'd rather see us put effort into
fixing whatever the underlying problem is.


I don't accept the first argument, because I think the chances that
we'll change our basic approach to this problem are indistinguishable
from zero.  A few years back, you criticized some idea of mine as
tantamount to rm -rf src/backend/optimizer, which you were not ready
to endorse. That proposal was surely vastly less ambitious than some
proposal which would remove the idea of shared memory reinitialization
after an unsafe backend exit.

I don't accept the second argument either, because it's a false
dichotomy. Adding this function won't reduce the amount of energy that
gets spent fixing crashes. There are always more crashes.

I realize that several other people voted against this proposal so I
don't think it's OK to commit a patch in this area unless some more
positive votes turn up, but I'm still +1 on the idea. Granted, we
don't want an infinite amount of clutter in the system catalogs, but I
have a hard time believing that this proposed function would get any
less use than the hyperbolic trig functions.


Marking this Returned with Feedback again since we are still at an impasse.

Regards,
--
-David
da...@pgmasters.net

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
 wrote:
> I think, Tushar point is that either we should allow both
> vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> both cases, we should through error.

Oh, yeah, good point. Somebody must not've been careful enough with
the options-checking code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Wed, Apr 08, 2020 at 11:13:26AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:

On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
>On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
>>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
>> wrote:
>>>
>>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>hyrax is not too happy with this test:
>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
>
>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>EXPLAIN output, but it evidently is.
>

Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.

>>>
>>>OK, I know what's going on - it's a rather embarassing issue in the
>>>regression test. There's no analyze on the test tables, so it uses
>>>default estimates for number of groups etc. But with clobber cache the
>>>test runs long enough for autoanalyze to kick in and collect stats, so
>>>we generate better estimates which changes the plan.
>>>
>>>I'll get this fixed - explicit analyze and tweaking the data a bit
>>>should do the trick.
>>
>>Looking at the tests that failed, I think we should consider just adding:
>>set enable_sort = off;
>>because several of those tests have very specific amounts of data to
>>ensure we test the transition points around the different modes in the
>>incremental sort node.
>>
>
>Maybe, but I'd much rather tweak the data so that we test both the
>costing and execution part.
>

I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?

Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".

OK, good. I'll push the fix.

I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...

Yeah, but with 1000 rows we'd need a more complex CASE statement (I
don't think simply having two groups - small+large would work).

Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?

Ah, definitely. That was a mistake. Thanks for noticing.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort

On Wed, Apr 08, 2020 at 11:42:12AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 11:29 AM David Steele  wrote:

On 4/8/20 11:13 AM, James Coleman wrote:
>>
>> James, can you verify it that's still true?

I marked this entry as committed in the 2020-03 CF but it's not clear to
me if that's entirely true. I'll leave it up to you (all) to move it to
the 2020-07 CF if there is remaining work (other than making the build
farm happy).

Thanks.

I think it's true enough. The vast majority is committed, and the
small amount that isn't we'll leave as future improvements and
separate threads.

Right. The remaining bit is a generic issue not entirely specific to
incremental sort, so we better not hide it in this CF entry.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Alvaro Herrera

On 2020-Apr-08, Justin Pryzby wrote:

> This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH 
> FOR"
> was first allowed on partition tables (86f575948).
> 
> I thought this would work like partitioned indexes (8b08f7d48), where 
> detaching
> a partition makes its index non-inherited, and attaching a partition marks a
> pre-existing, matching partition as inherited rather than creating a new one.

Hmm.  Let's agree to what behavior we want, and then we implement that.
It seems to me there are two choices:

1. on detach, keep the trigger but make it independent of the trigger on
parent.  (This requires that the trigger is made dependent on the
trigger on parent, if the table is attached as partition again;
otherwise you'd end up with multiple copies of the trigger if you
detach/attach multiple times).

2. on detach, remove the trigger from the partition.

I think (2) is easier to implement, but (1) is the more convenient
behavior.

(The current behavior is obviously a bug.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

warning in partioning code

2020-04-08 Thread Erik Rijkers


Hi,

Compiling master on debian stretch, gcc 9.3.0 complains:

partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
[-Wunused-variable]

 1024 |  PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
  | ^~~

Maybe it can be fixed.

thanks,

Erik Rijkers

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:

On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>>hyrax is not too happy with this test:
>>
>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
>>
>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>>EXPLAIN output, but it evidently is.
>>
>
>Thanks, I'll investigate. It's not clear to me either what might be
>causing this, but I guess something must have gone wrong in
>estimation/planning.
>

OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.

Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.

Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Let people set host(no)ssl settings from initdb

2020-04-08 Thread David Fetter

On Mon, Apr 06, 2020 at 10:12:16PM +, Cary Huang wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
> 
> Hi 
> I applied the patch 
> "v3-0001-Enable-setting-pg_hba.conf-permissions-from-initd.patch" and did 
> some verification with it. The intended feature works overall and I think it 
> is quite useful to support default auth methods for ssl and gss host types. I 
> have also found some minor things in the patch and would like to share as 
> below:
> 
> > +"# CAUTION: Configuring the system for \"trust\" authentication\n" \
> > +"# allows any user who can reach the databse on the route specified\n" \
> > +"# to connect as any PostgreSQL user, including the database\n" \
> > +"# superuser.  If you do not trust all the users who could\n" \
> > +"# reach the database on the route specified, use a more restrictive\n" \
> > +"# authentication method.\n"
> 
> Found a typo: should be 'database' instead of 'databse'

Fixed.

> >  * a sort of poor man's grep -v
> >  */
> > -#ifndef HAVE_UNIX_SOCKETS
> >  static char **
> >  filter_lines_with_token(char **lines, const char *token)
> >  {
> > @@ -461,7 +466,6 @@ filter_lines_with_token(char **lines, const char *token)
> >  
> > return result;
> > }
> > -#endif
> 
> I see that you have removed "#ifndef HAVE_UNIX_SOCKETS" around the
> filter_lines_with_token() function definition so it would be always
> available, which is used to remove the @tokens@ in case user does
> not specify a default auth method for the new hostssl, hostgss
> options. I think you should also remove the "#ifndef
> HAVE_UNIX_SOCKETS" around its declaration as well so both function
> definition and declaration would make sense.

Fixed.

Thanks very much for the review!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 3b49afec7f0b708285eadc10a097b4dc23d23927 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 11 Dec 2019 18:55:03 -0800
Subject: [PATCH v4] Enable setting pg_hba.conf permissions from initdb
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.25.2"

This is a multi-part message in MIME format.
--2.25.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


The new optional arguments are --auth-hostssl, --auth-hostnossl,
--auth-hostgssenc, and --auth-hostnogssenc.

diff --git doc/src/sgml/ref/initdb.sgml doc/src/sgml/ref/initdb.sgml
index a04a180165..ee65cf925d 100644
--- doc/src/sgml/ref/initdb.sgml
+++ doc/src/sgml/ref/initdb.sgml
@@ -154,6 +154,50 @@ PostgreSQL documentation
   
  
 
+ 
+  --auth-hostssl=authmethod
+  
+   
+This option specifies the authentication method for users via
+TLS remote connections used in pg_hba.conf
+(hostssl lines).
+   
+  
+ 
+
+ 
+  --auth-hostnossl=authmethod
+  
+   
+This option specifies the authentication method for users via
+non-TLS remote connections used in pg_hba.conf
+(hostnossl lines).
+   
+  
+ 
+
+ 
+  --auth-hostgssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+encrypted GSSAPI remote connections used in pg_hba.conf
+(hostgssenc lines).
+   
+  
+ 
+
+ 
+  --auth-hostnogssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+remote connections not using encrypted GSSAPI in pg_hba.conf
+(hostnogssenc lines).
+   
+  
+ 
+
  
   --auth-local=authmethod
   
diff --git src/backend/libpq/pg_hba.conf.sample src/backend/libpq/pg_hba.conf.sample
index c853e36232..7d5189dd8f 100644
--- src/backend/libpq/pg_hba.conf.sample
+++ src/backend/libpq/pg_hba.conf.sample
@@ -87,3 +87,15 @@ hostall all ::1/128 @authmethodhost@
 @remove-line-for-nolocal@local   replication all @authmethodlocal@
 hostreplication all 127.0.0.1/32@authmethodhost@
 hostreplication all ::1/128 @authmethodhost@
+
+# hostnossl all   all 0.0.0.0/0   @authmethodhostnossl@
+# hostnossl all   all ::/0@authmethodhostnossl@
+
+# hostssl all all 0.0.0.0/0   @authmethodhostssl@
+# hostssl all all ::/0@authmethodhostssl@
+
+# hostnogssenc all   all 0.0.0.0/0   @authmethodhostnogssenc@
+# hostnogssenc all   all ::/0

Re: warning in partioning code

2020-04-08 Thread Etsuro Fujita

Hi Erik,

On Thu, Apr 9, 2020 at 1:07 AM Erik Rijkers  wrote:
> Compiling master on debian stretch, gcc 9.3.0 complains:
>
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1024:21: warning: unused variable ‘inner_binfo’
> [-Wunused-variable]
>   1024 |  PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>| ^~~
>
> Maybe it can be fixed.

Yeah, that is a known issue [1].  I'll work on that tomorrow.  (I'm
too tired today.)  Thanks for the report!

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAGz5QCJ1gBhg6upU0%2BpkYmHZsj%2BOaMgXCAf2GBVEm_k6%2BUr0zQ%40mail.gmail.com

Re: Commitfest 2020-03 Now in Progress


On 4/1/20 10:09 AM, David Steele wrote:

On 3/17/20 8:10 AM, David Steele wrote:

On 3/1/20 4:10 PM, David Steele wrote:

The last Commitfest for v13 is now in progress!

Current stats for the Commitfest are:

Needs review: 192
Waiting on Author: 19
Ready for Committer: 4
Total: 215


Halfway through, here's where we stand.  Note that a CF entry was 
added after the stats above were recorded so the total has increased.


Needs review: 151 (-41)
Waiting on Author: 20 (+1)
Ready for Committer: 9 (+5)
Committed: 25
Moved to next CF: 1
Withdrawn: 4
Returned with Feedback: 6
Total: 216


After regulation time here's where we stand:

Needs review: 111 (-40)
Waiting on Author: 26 (+6)
Ready for Committer: 11 (+2)
Committed: 51 (+11)
Moved to next CF: 2 (+1)
Returned with Feedback: 10 (+4)
Rejected: 1
Withdrawn: 5 (+1)
Total: 217

We have one more patch so it appears that one in a completed state 
(committed, etc.) at the beginning of the CF has been moved to an 
uncompleted state. Or perhaps my math is just bad.


The RMT has determined that the CF will be extended for one week so I'll 
hold off on moving and marking patches until April 8.


The 2020-03 Commitfest is officially closed.

Final stats are (for entire CF, not just from March 1 this time):

Committed: 90.
Moved to next CF: 115.
Withdrawn: 8.
Rejected: 1.
Returned with Feedback: 23.
Total: 237.

Good job everyone!
--
-David
da...@pgmasters.net

More efficient RI checks - take 2

2020-04-08 Thread Antonin Houska

After having reviewed [1] more than a year ago (the problem I found was that
the transient table is not available for deferred constraints), I've tried to
implement the same in an alternative way. The RI triggers still work as row
level triggers, but if multiple events of the same kind appear in the queue,
they are all passed to the trigger function at once. Thus the check query does
not have to be executed that frequently.

Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In general,
the checks are significantly faster if there are many rows to process, and a
bit slower when we only need to check a single row. However I'm not sure about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).

Comments are welcome.

Setup
=

CREATE TABLE p(i int primary key);
INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
CREATE TABLE f(i int REFERENCES p);


Insert many rows into the FK table
==

master:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
   QUERY PLAN

 Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual 
time=32.741..32.741 rows=0 loops=1)
   ->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384 
width=4) (actual time=2.403..4.802 rows=16384 loops=1)
 Planning Time: 0.050 ms
 Trigger for constraint f_i_fkey: time=448.986 calls=16384
 Execution Time: 485.444 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
   QUERY PLAN

 Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual 
time=34.053..34.053 rows=0 loops=1)
   ->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384 
width=4) (actual time=2.223..4.448 rows=16384 loops=1)
 Planning Time: 0.047 ms
 Trigger for constraint f_i_fkey: time=105.164 calls=8
 Execution Time: 141.201 ms


Insert a single row into the FK table
=

master:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
--
 Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060 rows=0 
loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 
rows=1 loops=1)
 Planning Time: 0.026 ms
 Trigger for constraint f_i_fkey: time=0.435 calls=1
 Execution Time: 0.517 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
--
 Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066 rows=0 
loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 
rows=1 loops=1)
 Planning Time: 0.025 ms
 Trigger for constraint f_i_fkey: time=0.578 calls=1
 Execution Time: 0.670 ms


Check if FK row exists during deletion from the PK
==

master:

DELETE FROM p WHERE i=16384;
ERROR:  update or delete on table "p" violates foreign key constraint 
"f_i_fkey" on table "f"
DETAIL:  Key (i)=(16384) is still referenced from table "f".
Time: 3.381 ms

patched:

DELETE FROM p WHERE i=16384;
ERROR:  update or delete on table "p" violates foreign key constraint 
"f_i_fkey" on table "f"
DETAIL:  Key (i)=(16384) is still referenced from table "f".
Time: 5.561 ms


Cascaded DELETE --- many PK rows


DROP TABLE f;
CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
---
 Delete on p  (cost=0.00..236.84 rows=16384 width=6) (actual 
time=38.334..38.334 rows=0 loops=1)
   ->  Seq Scan on p  (cost=0.00..236.84 rows=16384 width=6) (actual 
time=0.019..3.925 rows=16384 loops=1)
 Planning Time: 0.049 ms
 Trigger for constraint f_i_fkey: time=31348.756 calls=16384
 Execution Time: 31390.784 ms

patched:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
---
 Delete on p  (cost=0.00..236.84 rows=16384 width=6) (actual 
time=33.360..33.360 rows=0 loops=1)

Re: Commitfest 2020-03 Now in Progress

David Steele  writes:
> The 2020-03 Commitfest is officially closed.

> Final stats are (for entire CF, not just from March 1 this time):

> Committed: 90.
> Moved to next CF: 115.
> Withdrawn: 8.
> Rejected: 1.
> Returned with Feedback: 23.
> Total: 237.

> Good job everyone!

Thanks for running it!  I know it's a tedious responsibility.

regards, tom lane

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

On Wed, Apr 08, 2020 at 11:57:08AM -0400, Robert Haas wrote:
> On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
>  wrote:
> > I think, Tushar point is that either we should allow both
> > vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> > both cases, we should through error.
> 
> Oh, yeah, good point. Somebody must not've been careful enough with
> the options-checking code.

Actually I think someone was too careful.

>From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Apr 2020 11:38:36 -0500
Subject: [PATCH v1] parallel vacuum: options check to use same test as in
 vacuumlazy.c

---
 src/backend/commands/vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 351d5215a9..660c854d49 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
boolfreeze = false;
boolfull = false;
booldisable_page_skipping = false;
-   boolparallel_option = false;
ListCell   *lc;
 
/* Set default value */
@@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
params.truncate = get_vacopt_ternary_value(opt);
else if (strcmp(opt->defname, "parallel") == 0)
{
-   parallel_option = true;
if (opt->arg == NULL)
{
ereport(ERROR,
@@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
Assert(!(params.options & VACOPT_SKIPTOAST));
 
-   if ((params.options & VACOPT_FULL) && parallel_option)
+   if ((params.options & VACOPT_FULL) && params.nworkers > 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot specify both FULL and PARALLEL 
options")));
-- 
2.17.0

>From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Apr 2020 11:38:36 -0500
Subject: [PATCH v1] parallel vacuum: options check to use same test as in
 vacuumlazy.c

---
 src/backend/commands/vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 351d5215a9..660c854d49 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	bool		freeze = false;
 	bool		full = false;
 	bool		disable_page_skipping = false;
-	bool		parallel_option = false;
 	ListCell   *lc;
 
 	/* Set default value */
@@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "parallel") == 0)
 		{
-			parallel_option = true;
 			if (opt->arg == NULL)
 			{
 ereport(ERROR,
@@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
 	Assert(!(params.options & VACOPT_SKIPTOAST));
 
-	if ((params.options & VACOPT_FULL) && parallel_option)
+	if ((params.options & VACOPT_FULL) && params.nworkers > 0)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot specify both FULL and PARALLEL options")));
-- 
2.17.0

Re: Conflict handling for COPY FROM

2020-04-08 Thread David G. Johnston

On Fri, Mar 27, 2020 at 3:27 PM Tom Lane  wrote:

> Surafel Temesgen  writes:
> > [ conflict-handling-copy-from-v16.patch ]
>
> I took a quick look at this patch, since it was marked "ready for
> committer", but I don't see how it can possibly be considered committable.
>

[...]

>
> Looking at this patch in terms of whether the functionality is
> available in that approach, it seems like you might want two parts
> of it:
>
> 1. A COPY option to be flexible about the number of columns in the
> input, say by filling omitted columns with NULLs.
>
> 2. INSERT ... ON CONFLICT can be used to transfer data to permanent
> tables with rejection of duplicate keys, but it doesn't have much
> flexibility about just what to do with duplicates.  Maybe we could
> add more ON CONFLICT actions?  Sending conflicted rows to some other
> table, or updating the source table to show which rows were copied
> and which not, might be useful things to think about.
>

tl/dr: patch 1: change the option to something like "processing_mode = row
{default = file}" and relay existing errors across all rows in the table in
the error detail message.

tl/dr patch 2: add an "IGNORE_CONFLICTS = {-1, 0 default, 1+}" option that
converts the errors that appear for speculative insertions into warnings
(in the error detail message) and either aborts the copy (cleanly, no
inserted rows) if the count exceeds the user specified value) or commits if
it is able (i.e., no errors in the error message detail - which would come
from other problems besides conflicts).  I don't really get why a number is
needed here though - its not like ON CONFLICT DO NOTHING has that ability
and all this seems to be wanting to do is enable a subset of ON CONFLICT DO
NOTHING for COPY - in which case no warning or error would appear in the
first place.

Sorry for the rambling below, a decent chuck of the material is stream of
consciousness but it all generally relates to the behaviors that this patch
is touching.

My desired take-away from this would be to have COPY's error response
include one line for each input line that fails to be inserted with the
line number and the underlying error message passed along verbatim.
Importing, fixing one error, trying again, fixing another error, repeat is
a first level goal for me.  Including at most the first 30 characters of
the problematic line would facilitate the common case where the first field
is an identifier which is more useful that a pure line number.  But
including the entire line doesn't seem worth the risk.  Though it should be
considered whether to include the error detail of the underlying error
message - probably as a subsequent line.

After that, consider having a facility for telling COPY that you are OK if
it ignores the problem rows and inserts the ones it is able - and
converting the final exit code to be a warning instead of an error
(whatever that means, if anything, in this context).

The speculative insertion stuff is outside my experience but are we trying
to just make it work, give the COPY user control of whether its used or
not, have an independent facility just for copy, or something else?
Teaching copy to use speculative insertion infrastructure that already
exists is a patch in its own right and seems to be fixing a current POLA
violation - this other stuff about reporting and ignoring errors is besides
the point.  The basic inter-operability fix seems like it could be made to
work under today's copy error handling and report framework just fine.

If there is value in someone augmenting the error handling and response
framework to work in a tabular format instead of a free-form error message
text body that too could be presented and discussed as its own commit.
Granted it may be the case that such a verbose error message from COPY as
described above is undesirable but would be palatable in some other
presentation format.  I'm inclined to believe, however, that even if COPY
is made an exception to the general error message rules (and the detail
information would be part of the errdetail, not the main message, it could
just include a summary count) that the added functionality would make the
exception acceptable.

To be honest I haven't dived into the assumptions baked into the regression
tests to know whether the tabular stuff that was discussed and that is
shown in the regression expected outputs is normal or not.  I assume the
part in question is:

+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001 22 32" --- missing data for column "d"
+WARNING:  skipping "70002 23 33 43 53 54" --- extra data after last
expected column
+WARNING:  skipping "70003 24 34 44" --- missing data for column "e"
+
+   a   | b  | c  | d  |  e
+---++++--
+ 70005 | 27 | 37 | 47 | before trigger fired
+(1 row)

Right now COPY would just fail.  What would be nice is for it to say:

ERROR: Line 1?  missing data for column "d"
ERROR: Line 2?  --- extra data after last expe

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

On Wed, Apr 08, 2020 at 12:02:39PM -0400, Alvaro Herrera wrote:
> On 2020-Apr-08, Justin Pryzby wrote:
> 
> > This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH 
> > FOR"
> > was first allowed on partition tables (86f575948).
> > 
> > I thought this would work like partitioned indexes (8b08f7d48), where 
> > detaching
> > a partition makes its index non-inherited, and attaching a partition marks a
> > pre-existing, matching partition as inherited rather than creating a new 
> > one.
> 
> Hmm.  Let's agree to what behavior we want, and then we implement that.
> It seems to me there are two choices:
> 
> 1. on detach, keep the trigger but make it independent of the trigger on
> parent.  (This requires that the trigger is made dependent on the
> trigger on parent, if the table is attached as partition again;
> otherwise you'd end up with multiple copies of the trigger if you
> detach/attach multiple times).
> 
> 2. on detach, remove the trigger from the partition.
> 
> I think (2) is easier to implement, but (1) is the more convenient
> behavior.

At telsasoft, we don't care (we uninherit tables before ALTERing parents to
avoid disruptive locking and to avoid worst-case disk use).

(1) is consistent with the behavior for indexes, which is a slight advantage
for users' ability to understand and keep track of the behavior.  But adding
triggers is pretty different so I'm not sure it's a totally compelling
parallel.

-- 
Justin

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

Alvaro Herrera  writes:
> Hmm.  Let's agree to what behavior we want, and then we implement that.
> It seems to me there are two choices:

> 1. on detach, keep the trigger but make it independent of the trigger on
> parent.  (This requires that the trigger is made dependent on the
> trigger on parent, if the table is attached as partition again;
> otherwise you'd end up with multiple copies of the trigger if you
> detach/attach multiple times).

> 2. on detach, remove the trigger from the partition.

> I think (2) is easier to implement, but (1) is the more convenient
> behavior.

I think that #1 would soon lead to needing all the same infrastructure
as we have for inherited columns and constraints, ie triggers would need
equivalents of attislocal and attinhcount.  I don't really want to go
there, so I'd vote for #2.

regards, tom lane

doc review for v13

I reviewed docs for v13, like:
git log --cherry-pick origin/master...origin/REL_12_STABLE -p doc

I did something similar for v12 [0].  I've included portions of that here which
still seem lacking 12 months later (but I'm not intending to continue defending
each individual patch hunk).

I previously mailed separately about a few individual patches, some of which
have separate, ongoing discussion and aren't included here (incr sort, parallel
vacuum).

Justin

[0] 
https://www.postgresql.org/message-id/flat/20190709161256.GH22387%40telsasoft.com#56889b868e5886e36b90e9f5a1165186
>From 482b590355cd7df327602dd36e91721b827f9c37 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:31:04 -0500
Subject: [PATCH v1 01/19] docs: pg_statistic_ext.stxstattarget

commit c31132d87c6315bbbe4b4aa383705aaae2348c0e
Author: Tomas Vondra 
Date:   Wed Mar 18 16:48:12 2020 +0100
---
 doc/src/sgml/catalogs.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 386c6d7bd1..ce33df9e58 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6472,7 +6472,7 @@ SCRAM-SHA-256$:&l
.
A zero value indicates that no statistics should be collected.
A negative value says to use the system default statistics target.
-   Positive values stxstattarget
+   Positive values of stxstattarget
determine the target number of most common values
to collect.
   
-- 
2.17.0

>From 2a3a4d7028b02070447fafd37e66e72da59966bf Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:33:44 -0500
Subject: [PATCH v1 02/19] docs: reg* functions

commit 8408e3a557ad26a7e88f867a425b2b9a86c4fa04
Author: Peter Eisentraut 
Date:   Wed Mar 18 14:51:37 2020 +0100
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a38387b8c6..fd0f5d64b3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18796,7 +18796,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
to_regnamespace, to_regoper,
to_regoperator, to_regrole,
to_regproc, to_regprocedure, and
-   to_regtype, functions translate relation, collation, schema,
+   to_regtype translate relation, collation, schema,
operator, role, function, and type names (given as text) to
objects of the corresponding reg* type (see  about the types).  These functions differ from a
-- 
2.17.0

>From 6864ced0a9eaeab4c010d1f090b26b337f125742 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:43:42 -0500
Subject: [PATCH v1 03/19] Minus one

See also
ac862376037727e744f25030bd8b6090c707247b
---
 doc/src/sgml/config.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a0da4aabac..ea2749535d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6110,7 +6110,7 @@ local0.*/var/log/postgresql
  unoptimized queries in your applications.
  If this value is specified without units, it is taken as milliseconds.
  Setting this to zero prints all statement durations.
- Minus-one (the default) disables logging statement durations.
+ -1 (the default) disables logging statement durations.
  Only superusers can change this setting.
 
 
@@ -6162,7 +6162,7 @@ local0.*/var/log/postgresql
  traffic is too high to log all queries.
  If this value is specified without units, it is taken as milliseconds.
  Setting this to zero samples all statement durations.
- Minus-one (the default) disables sampling statement durations.
+ -1 (the default) disables sampling statement durations.
  Only superusers can change this setting.
 
 
-- 
2.17.0

>From bfb8439eb5618db3a36ca2794dbcc35489d98c27 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:51:32 -0500
Subject: [PATCH v1 04/19] doc: psql opclass/opfamily

commit b0b5e20cd8d1a58a8782d5dc806a5232db116e2f
Author: Alexander Korotkov 

ALSO, should we rename the "Purpose" column?  I see we have pg_amop.amoppurpose
so maybe it's fine ?
---
 doc/src/sgml/ref/psql-ref.sgml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 0595d1c04b..cdd24fad98 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1244,7 +1244,7 @@ testdb=>
 (see ).
 If access-method-patttern
 is specified, only operator classes associated with access methods whose
-names match pattern are listed.
+names match the pattern are listed.
 If input-type-pattern
 is specified, only operator classes associated with input types whose
 names match the pattern are listed.
@@ -1267,7 +1267,7 @@ testdb=>

Re: where should I stick that backup?

2020-04-08 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Apr 6, 2020 at 2:23 PM Stephen Frost  wrote:
> > So, instead of talking about 'bzip2 > %f.bz2', and then writing into our
> > documentation that that's how this feature can be used, what about
> > proposing something that would actually work reliably with this
> > interface?  Which properly fsync's everything, has good retry logic for
> > when failures happen, is able to actually detect when a failure
> > happened, how to restore from a backup taken this way, and it'd probably
> > be good to show how pg_verifybackup could be used to make sure the
> > backup is actually correct and valid too.
> 
> I don't really understand the problem here.  Suppose I do:
> 
> mkdir ~/my-brand-new-empty-directory
> cd ~/my-brand-new-empty-directory
> pg_basebackup -Ft --pipe-output 'bzip2 > %f.bz2'
> initdb -S --dont-expect-that-this-is-a-data-directory . # because
> right now it would complain about pg_wal and pg_tblspc being missing
> 
> I think if all that works, my backup should be good and durably on
> disk. If it's not, then either pg_basebackup or bzip2 or initdb didn't
> report errors that they should have reported. If you're worried about
> that, say because you suspect those programs are buggy or because you
> think the kernel may not be reporting errors properly, you can use tar
> -jxvf + pg_validatebackup to check.
> 
> What *exactly* do you think can go wrong here?

What if %f.bz2 already exists?  How about if %f has a space in it?  What
about if I'd like to verify that the backup looks reasonably valid
without having to find space to store it entirely decompressed?

Also, this argument feels disingenuous to me.  That isn't the only thing
you're promoting this feature be used for, as you say below.  If the
only thing this feature is *actually* intended for is to add bzip2
support, then we should just add bzip2 support directly and call it a
day, but what you're really talking about here is a generic interface
that you'll want to push users to for things like "how do I back up to
s3" or "how do I back up to GCS" and so we should be thinking about
those cases and not just a relatively simple use case.

This is the same kind of slippery slope that our archive command is
built on- sure, if everything "works" then it's "fine", even with our
documented example, but we know that not everything works in the real
world, and just throwing an 'initdb -S' in there isn't a general
solution because users want to do things like send WAL to s3 or GCS or
such.

I don't think there's any doubt that there'll be no shortage of shell
scripts and various other things that'll be used with this that, yes,
will be provided by our users and therefore we can blame them for doing
it wrong, but then they'll complain on our lists and we'll spend time
educating them as to how to write proper software to be used, or
pointing them to a solution that someone writes specifically for this.
I don't view that as, ultimately, a good solution.

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Apr 6, 2020 at 1:32 PM Magnus Hagander  wrote:
> > Now, if we were just talking about compression, it would actually be
> > interesting to implement some sort of "postgres compression API" if
> > you will, that is implemented by a shared library. This library could
> > then be used from pg_basebackup or from anything else that needs
> > compression. And anybody who wants could then do a "
> > for PostgreSQL" module, removing the need for us to carry such code
> > upstream.
> 
> I think it could be more general than a compression library. It could
> be a store-my-stuff-and-give-it-back-to-me library, which might do
> compression or encryption or cloud storage or any combination of the
> three, and probably other stuff too. Imagine that you first call an
> init function with a namespace that is basically a string provided by
> the user. Then you open a file either for read or for write (but not
> both). Then you read or write a series of chunks (depending on the
> file mode). Then you close the file. Then you can do the same with
> more files. Finally at the end you close the namespace. You don't
> really need to care where or how the functions you are calling store
> the data. You just need them to return proper error indicators if by
> chance they fail.

Yes, having a storage layer makes a lot of sense here, with features
that are understood by the core system and which each driver
understands, and then having a filter system which is also pluggable and
can support things like compression and hashing for this would also be
great.

I can point you to examples of all of the above, already implemented, in
C, all OSS.  Sure seems like a pretty good and reasonable approach to
take when other folks are doing it.

> As compared with my previous proposal, this would work much better for
> pg_basebackup -Fp, because you wouldn't launch a new bzip2 process for
> every file. You'd just bzopen(), which

Re: More efficient RI checks - take 2

2020-04-08 Thread Pavel Stehule

st 8. 4. 2020 v 18:36 odesílatel Antonin Houska  napsal:

> After having reviewed [1] more than a year ago (the problem I found was
> that
> the transient table is not available for deferred constraints), I've tried
> to
> implement the same in an alternative way. The RI triggers still work as row
> level triggers, but if multiple events of the same kind appear in the
> queue,
> they are all passed to the trigger function at once. Thus the check query
> does
> not have to be executed that frequently.
>
> Some performance comparisons are below. (Besides the execution time, please
> note the difference in the number of trigger function executions.) In
> general,
> the checks are significantly faster if there are many rows to process, and
> a
> bit slower when we only need to check a single row. However I'm not sure
> about
> the accuracy if only a single row is measured (if a single row check is
> performed several times, the execution time appears to fluctuate).
>

It is hard task to choose good strategy for immediate constraints, but for
deferred constraints you know how much rows should be checked, and then you
can choose better strategy.

Is possible to use estimation for choosing method of RI checks?



> Comments are welcome.
>
> Setup
> =
>
> CREATE TABLE p(i int primary key);
> INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
> CREATE TABLE f(i int REFERENCES p);
>
>
> Insert many rows into the FK table
> ==
>
> master:
>
> EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
>QUERY PLAN
>
> 
>  Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual
> time=32.741..32.741 rows=0 loops=1)
>->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384
> width=4) (actual time=2.403..4.802 rows=16384 loops=1)
>  Planning Time: 0.050 ms
>  Trigger for constraint f_i_fkey: time=448.986 calls=16384
>  Execution Time: 485.444 ms
> (5 rows)
>
> patched:
>
> EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
>QUERY PLAN
>
> 
>  Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual
> time=34.053..34.053 rows=0 loops=1)
>->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384
> width=4) (actual time=2.223..4.448 rows=16384 loops=1)
>  Planning Time: 0.047 ms
>  Trigger for constraint f_i_fkey: time=105.164 calls=8
>  Execution Time: 141.201 ms
>
>
> Insert a single row into the FK table
> =
>
> master:
>
> EXPLAIN ANALYZE INSERT INTO f VALUES (1);
> QUERY PLAN
>
> --
>  Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060
> rows=0 loops=1)
>->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
> rows=1 loops=1)
>  Planning Time: 0.026 ms
>  Trigger for constraint f_i_fkey: time=0.435 calls=1
>  Execution Time: 0.517 ms
> (5 rows)
>
> patched:
>
> EXPLAIN ANALYZE INSERT INTO f VALUES (1);
> QUERY PLAN
>
> --
>  Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066
> rows=0 loops=1)
>->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
> rows=1 loops=1)
>  Planning Time: 0.025 ms
>  Trigger for constraint f_i_fkey: time=0.578 calls=1
>  Execution Time: 0.670 ms
>
>
> Check if FK row exists during deletion from the PK
> ==
>
> master:
>
> DELETE FROM p WHERE i=16384;
> ERROR:  update or delete on table "p" violates foreign key constraint
> "f_i_fkey" on table "f"
> DETAIL:  Key (i)=(16384) is still referenced from table "f".
> Time: 3.381 ms
>
> patched:
>
> DELETE FROM p WHERE i=16384;
> ERROR:  update or delete on table "p" violates foreign key constraint
> "f_i_fkey" on table "f"
> DETAIL:  Key (i)=(16384) is still referenced from table "f".
> Time: 5.561 ms
>
>
> Cascaded DELETE --- many PK rows
> 
>
> DROP TABLE f;
> CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
> INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
>
> master:
>
> EXPLAIN ANALYZE DELETE FROM p;
> QUERY PLAN
>
> ---
>  Delete on p  (cost=0.00..236.84 rows=16384 width=6) (actual
> time=38.334..38.334 rows=0 loops=1)
>

Re: A problem about partitionwise join

Richard Guo  writes:
> On Sun, Apr 5, 2020 at 4:38 AM Tom Lane  wrote:
>> There is already something in equivclass.c that would almost do what
>> we want here: exprs_known_equal() would tell us whether the partkeys
>> can be found in the same eclass, without having to generate data
>> structures along the way.  The current implementation is not watertight
>> because it doesn't check opclass semantics, but that consideration
>> can be bolted on readily enough.  So that leads me to something like
>> the attached.

> I have some concern about we only check non-nullable partexprs. Is it
> possible that two nullable partexprs come from the same EC? I tried to
> give an example but failed.

Currently the EC infrastructure doesn't really cope with outer join
equijoins.  They are not treated as producing true equivalences,
so I think that the case you're worried about can't occur (which is why
I didn't code for it).  I have hopes of being able to incorporate outer
joins into the EC logic in a less squishy way in the future, by making
the representation of Vars distinguish explicitly between
value-before-outer-join and value-after-outer-join, after which we could
make bulletproof assertions about what is equal to what, even with outer
joins in the mix.  If that works out it might produce a cleaner answer
in this area too.

TBH, now that I have had some exposure to the partitionwise join
matching logic I don't much like any of it.  I feel like it's doing
about the same job as ECs, but in an unprincipled and not very
efficient manner.  Right now is no time to redesign it, of course,
but maybe at some point we could do that.  (I did experiment with
removing all the rest of have_partkey_equi_join() and having it
*only* ask exprs_known_equal() about equivalences, which is more or
less what I'm envisioning here.  That caused some of the existing
regression tests to fail, so there's something that the idea isn't
covering.  I didn't dig any further at the time, and in particular
failed to check whether the problems were specifically about outer
joins, which'd be unsurprising given the above.)

Anyway, this work has missed the window for v13, so we've got plenty
of time to think about it.

regards, tom lane

Re: where should I stick that backup?

On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost  wrote:
> What if %f.bz2 already exists?

That cannot occur in the scenario I described.

> How about if %f has a space in it?

For a tar-format backup I don't think that can happen, because the
file names will be base.tar and ${tablespace_oid}.tar. For a plain
format backup it's a potential issue.

> What
> about if I'd like to verify that the backup looks reasonably valid
> without having to find space to store it entirely decompressed?

Then we need to make pg_validatebackup better.

> Also, this argument feels disingenuous to me.
> [ lots more stuff ]

This all just sounds like fearmongering to me. "archive_command
doesn't work very well, so maybe your thing won't either." Maybe it
won't, but the fact that archive_command doesn't isn't a reason.

> Yes, having a storage layer makes a lot of sense here, with features
> that are understood by the core system and which each driver
> understands, and then having a filter system which is also pluggable and
> can support things like compression and hashing for this would also be
> great.

It's good to know that you prefer a C interface to one based on shell
scripting. I hope that we will also get some other opinions on that
question, as my own feelings are somewhat divided (but with some bias
toward trying to making the shell scripting thing work, because I
believe it gives a lot more practical flexibility).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: backup manifests

On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao  wrote:
> When there is a backup_manifest in the database cluster, it's included in
> the backup even when --no-manifest is specified. ISTM that this is problematic
> because the backup_manifest is obviously not valid for the backup.
> So, isn't it better to always exclude the *existing* backup_manifest in the
> cluster from the backup, like backup_label/tablespace_map? Patch attached.
>
> Also I found the typo in the document. Patch attached.

Both patches look good. The second one is definitely a mistake on my
part, and the first one seems like a totally reasonable change.
Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Andrew Dunstan


On 4/7/20 9:42 AM, Andrew Dunstan wrote:
> On Tue, Apr 7, 2020 at 12:37 AM Tom Lane  wrote:
>> Robert Haas  writes:
>>> Taking stock of the situation this morning, most of the buildfarm is
>>> now green. There are three failures, on eelpout (6 hours ago),
>>> fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
>> fairywren has now done this twice in the pg_validatebackupCheck step:
>>
>> exec failed: Bad address at 
>> /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm 
>> line 340.
>>  at 
>> /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm 
>> line 340.
>>
>> I'm a tad suspicious that it needs another perl2host()
>> somewhere, but the log isn't very clear as to where.
>>
>> More generally, I wonder if we ought to be trying to
>> centralize those perl2host() calls instead of sticking
>> them into individual test cases.
>>
>>
>
> Not sure about that. I'll see if I can run it by hand and get some
> more info. What's quite odd is that jacana (a very similar setup) is
> passing this happily.
>


OK, tricky, but here's what I did to get this working on fairywren.


First, on Msys2 there is a problem with name mangling. We've had to fix
this before by telling it to ignore certain argument prefixes.


Second, once that was fixed rmdir was failing on the tablespace. On
Windows this is a junction, so unlink is the correct thing to do, I
believe, just as it is on Unix where it's a symlink.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/bin/pg_validatebackup/t/003_corruption.pl b/src/bin/pg_validatebackup/t/003_corruption.pl
index 7a09d02e6c..fe717dfc73 100644
--- a/src/bin/pg_validatebackup/t/003_corruption.pl
+++ b/src/bin/pg_validatebackup/t/003_corruption.pl
@@ -16,6 +16,8 @@ $master->start;
 # Include a user-defined tablespace in the hopes of detecting problems in that
 # area.
 my $source_ts_path = TestLib::perl2host(TestLib::tempdir_short());
+my $source_ts_prefix = $source_ts_path;
+$source_ts_prefix =~ s!([^A-Z]:/[^/]*)/.*!$1!;
 $master->safe_psql('postgres', command_ok(['pg_basebackup', '-D', $backup_path, '--no-sync',
 			'-T', "${source_ts_path}=${backup_ts_path}"],
 			"base backup ok");
@@ -177,14 +180,7 @@ sub mutilate_missing_tablespace
 	my ($tsoid) = grep { $_ ne '.' && $_ ne '..' }
 		 slurp_dir("$backup_path/pg_tblspc");
 	my $pathname = "$backup_path/pg_tblspc/$tsoid";
-	if ($windows_os)
-	{
-		rmdir($pathname) || die "$pathname: $!";
-	}
-	else
-	{
-		unlink($pathname) || die "$pathname: $!";
-	}
+	unlink($pathname) || die "$pathname: $!";
 	return;
 }

Re: More efficient RI checks - take 2

2020-04-08 Thread Corey Huinker

On Wed, Apr 8, 2020 at 1:06 PM Pavel Stehule 
wrote:

>
>
> st 8. 4. 2020 v 18:36 odesílatel Antonin Houska  napsal:
>
>> After having reviewed [1] more than a year ago (the problem I found was
>> that
>> the transient table is not available for deferred constraints), I've
>> tried to
>> implement the same in an alternative way. The RI triggers still work as
>> row
>> level triggers, but if multiple events of the same kind appear in the
>> queue,
>> they are all passed to the trigger function at once. Thus the check query
>> does
>> not have to be executed that frequently.
>>
>
I'm excited that you picked this up!

>
>> Some performance comparisons are below. (Besides the execution time,
>> please
>> note the difference in the number of trigger function executions.) In
>> general,
>> the checks are significantly faster if there are many rows to process,
>> and a
>> bit slower when we only need to check a single row. However I'm not sure
>> about
>> the accuracy if only a single row is measured (if a single row check is
>> performed several times, the execution time appears to fluctuate).
>>
>
These numbers are very promising, and much more in line with my initial
expectations. Obviously the impact on single-row DML is of major concern,
though.

It is hard task to choose good strategy for immediate constraints, but for
> deferred constraints you know how much rows should be checked, and then you
> can choose better strategy.
>

> Is possible to use estimation for choosing method of RI checks?
>

In doing my initial attempt, the feedback I was getting was that the people
who truly understood the RI checks fell into the following groups:
1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers

While #3 is clearly beyond the scope for an endeavor like this, #1 seems
like it would nearly eliminate the 1-row penalty (we'd still have the
TupleStore initi penalty, but it would just be a handy queue structure, and
maybe that cost would be offset by removing the SPI overhead), and once
that is done, we could see about step #2.

Re: backup manifests and contemporaneous buildfarm failures

Andrew Dunstan  writes:
> OK, tricky, but here's what I did to get this working on fairywren.
> First, on Msys2 there is a problem with name mangling. We've had to fix
> this before by telling it to ignore certain argument prefixes.
> Second, once that was fixed rmdir was failing on the tablespace. On
> Windows this is a junction, so unlink is the correct thing to do, I
> believe, just as it is on Unix where it's a symlink.

Hmm, no opinion about the name mangling business, but the other part
seems like it might break jacana and/or bowerbird, which are currently
happy with this test?  (AFAICS we only have four Windows animals
running the TAP tests, and the fourth (drongo) hasn't reported in
for awhile.)

I guess we could commit it and find out.  I'm all for the simpler
coding if it works.

regards, tom lane

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Alvaro Herrera

On 2020-Apr-08, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Hmm.  Let's agree to what behavior we want, and then we implement that.
> > It seems to me there are two choices:
> 
> > 1. on detach, keep the trigger but make it independent of the trigger on
> > parent.  (This requires that the trigger is made dependent on the
> > trigger on parent, if the table is attached as partition again;
> > otherwise you'd end up with multiple copies of the trigger if you
> > detach/attach multiple times).
> 
> > 2. on detach, remove the trigger from the partition.
> 
> > I think (2) is easier to implement, but (1) is the more convenient
> > behavior.
> 
> I think that #1 would soon lead to needing all the same infrastructure
> as we have for inherited columns and constraints, ie triggers would need
> equivalents of attislocal and attinhcount.  I don't really want to go
> there, so I'd vote for #2.

Hmm.  Those things are used for the legacy inheritance case supporting
multiple inheritance, where we need to figure out which parent the table
is being detached (disinherited) from.  But for partitioning we know
which parent it is, since there can only be one.  So I don't think that
argument applies.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: where should I stick that backup?

2020-04-08 Thread Stephen Frost

Greeitngs,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost  wrote:
> > What if %f.bz2 already exists?
> 
> That cannot occur in the scenario I described.

Of course it can.

> > How about if %f has a space in it?
> 
> For a tar-format backup I don't think that can happen, because the
> file names will be base.tar and ${tablespace_oid}.tar. For a plain
> format backup it's a potential issue.

I agree that it might not be an issue for tar-format.

> > What
> > about if I'd like to verify that the backup looks reasonably valid
> > without having to find space to store it entirely decompressed?
> 
> Then we need to make pg_validatebackup better.

Sure- but shouldn't the design be contemplating how these various tools
will work together?

> > Also, this argument feels disingenuous to me.
> > [ lots more stuff ]
> 
> This all just sounds like fearmongering to me. "archive_command
> doesn't work very well, so maybe your thing won't either." Maybe it
> won't, but the fact that archive_command doesn't isn't a reason.

I was trying to explain that we have literally gone down exactly this
path before and it's not been a good result, hence we should be really
careful before going down it again.  I don't consider that to be
fearmongering, nor that we should be dismissing that concern out of
hand.

> > Yes, having a storage layer makes a lot of sense here, with features
> > that are understood by the core system and which each driver
> > understands, and then having a filter system which is also pluggable and
> > can support things like compression and hashing for this would also be
> > great.
> 
> It's good to know that you prefer a C interface to one based on shell
> scripting. I hope that we will also get some other opinions on that
> question, as my own feelings are somewhat divided (but with some bias
> toward trying to making the shell scripting thing work, because I
> believe it gives a lot more practical flexibility).

Yes, I do prefer a C interface.  One might even say "been there, done
that."  Hopefully sharing such experience is still useful to do on these
lists.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

Alvaro Herrera  writes:
> On 2020-Apr-08, Tom Lane wrote:
>> I think that #1 would soon lead to needing all the same infrastructure
>> as we have for inherited columns and constraints, ie triggers would need
>> equivalents of attislocal and attinhcount.  I don't really want to go
>> there, so I'd vote for #2.

> Hmm.  Those things are used for the legacy inheritance case supporting
> multiple inheritance, where we need to figure out which parent the table
> is being detached (disinherited) from.  But for partitioning we know
> which parent it is, since there can only be one.  So I don't think that
> argument applies.

My point is that so long as you only allow the case of exactly one parent,
you can just delete the child trigger, because it must belong to that
parent.  As soon as there's any flexibility, you are going to end up
reinventing all the stuff we had to invent to manage
maybe-or-maybe-not-inherited columns.  So I think the "detach" idea
is the first step on that road, and I counsel not taking that step.

(This implies that when creating a child trigger, we should error out,
*not* allow the case, if there's already a trigger by that name.  Not
sure if that's what happens today, but again I'd say that's what we
should do to avoid complicated cases.)

regards, tom lane

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Mahendra Singh Thalor

On Wed, 8 Apr 2020 at 22:11, Justin Pryzby  wrote:
>
> On Wed, Apr 08, 2020 at 11:57:08AM -0400, Robert Haas wrote:
> > On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
> >  wrote:
> > > I think, Tushar point is that either we should allow both
> > > vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> > > both cases, we should through error.
> >
> > Oh, yeah, good point. Somebody must not've been careful enough with
> > the options-checking code.
>
> Actually I think someone was too careful.
>
> From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Wed, 8 Apr 2020 11:38:36 -0500
> Subject: [PATCH v1] parallel vacuum: options check to use same test as in
>  vacuumlazy.c
>
> ---
>  src/backend/commands/vacuum.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index 351d5215a9..660c854d49 100644
> --- a/src/backend/commands/vacuum.c
> +++ b/src/backend/commands/vacuum.c
> @@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
> isTopLevel)
> boolfreeze = false;
> boolfull = false;
> booldisable_page_skipping = false;
> -   boolparallel_option = false;
> ListCell   *lc;
>
> /* Set default value */
> @@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
> isTopLevel)
> params.truncate = get_vacopt_ternary_value(opt);
> else if (strcmp(opt->defname, "parallel") == 0)
> {
> -   parallel_option = true;
> if (opt->arg == NULL)
> {
> ereport(ERROR,
> @@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
> isTopLevel)
>!(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
> Assert(!(params.options & VACOPT_SKIPTOAST));
>
> -   if ((params.options & VACOPT_FULL) && parallel_option)
> +   if ((params.options & VACOPT_FULL) && params.nworkers > 0)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>  errmsg("cannot specify both FULL and 
> PARALLEL options")));
> --
> 2.17.0
>

Thanks Justin for the patch.

Patch looks fine to me and it is fixing the issue. After applying this
patch, vacuum will work as:

1) vacuum (parallel 1, full 0);
-- vacuuming will be done with 1 parallel worker.
2) vacuum (parallel 0, full 1);
-- full vacuuming will be done.
3) vacuum (parallel 1, full 1);
-- will give error :ERROR:  cannot specify both FULL and PARALLEL options

3rd example is telling that we can't give both FULL and PARALLEL
options but in 1st and 2nd, we are giving both FULL and PARALLEL
options and we are not giving any error. I think, irrespective of
value of both FULL and PARALLEL options, we should give error in all
the above mentioned three cases.

Thoughts?

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: Vacuum o/p with (full 1, parallel 0) option throwing an error