Re: WIP: BRIN multi-range indexes

2019-03-02 Thread Alexander Korotkov
Hi!

I'm starting to look at this patchset.  In the general, I think it's
very cool!  We definitely need this.

On Tue, Apr 3, 2018 at 10:51 PM Tomas Vondra
 wrote:
> 1) index parameters
>
> The main improvement of this version is an introduction of a couple of
> BRIN index parameters, next to pages_per_range and autosummarize.
>
> a) n_distinct_per_range - used to size Bloom index
> b) false_positive_rate - used to size Bloom index
> c) values_per_range - number of values in the minmax-multi summary
>
> Until now those parameters were pretty much hard-coded, this allows easy
> customization depending on the data set. There are some basic rules to
> to clamp the values (e.g. not to allow ndistinct to be less than 128 or
> more than MaxHeapTuplesPerPage * page_per_range), but that's about it.
> I'm sure we could devise more elaborate heuristics (e.g. when building
> index on an existing table, we could inspect table statistics first),
> but the patch does not do that.
>
> One disadvantage is that those parameters are per-index.

For me, the main disadvantage of this solution is that we put
opclass-specific parameters into access method.  And this is generally
bad design.  So, user can specify such parameter if even not using
corresponding opclass, that may cause a confuse (if even we forbid
that, it needs to be hardcoded).  Also, extension opclasses can't do
the same thing.  Thus, it appears that extension opclasses are not
first class citizens anymore.  Have you take a look at opclass
parameters patch [1]?  I think it's proper solution of this problem.
I think we should postpone this parameterization until we push opclass
parameters patch.

1. 
https://www.postgresql.org/message-id/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: WIP: BRIN multi-range indexes

2019-03-02 Thread Alexander Korotkov
On Sun, Mar 4, 2018 at 3:15 AM Tomas Vondra
 wrote:
> I've been thinking about this after looking at 0a459cec96, and I don't
> think this patch has the same issues. One reason is that just like the
> original minmax opclass, it does not really mess with the data it
> stores. It only does min/max on the values, and stores that, so if there
> was NaN or Infinity, it will index NaN or Infinity.

FWIW, I think the closest similar functionality is subtype_diff
function of range type.  But I don't think we should require range
type here just in order to fetch subtype_diff function out of it.  So,
opclass distance function looks OK for me, assuming it's not
AM-defined function, but function used for inter-opclass
compatibility.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



RE: Timeout parameters

2019-03-02 Thread Fabien COELHO



Hello Ryohei-san,

There are three independent patches in this thread.

About the socket timeout patch v7:

Patch applies & compiles cleanly. "make check" is ok, although there are 
no specific tests, which is probably ok. Doc build is ok.


I'd simplify the doc first sentence to:

""" Number of seconds to wait for socket read/write operations before 
closing the connection, as a decimal integer. This can be used both to 
force global client-side query timeout and to detect network problems. A 
value of zero (the default) turns this off, which means wait indefinitely. 
The minimum allowed timeout is 2 seconds, so a value of 1 is interpreted 
as 2."""


The code looks mostly ok.

The comment on finish_time computation does not look very useful to me, it 
just states in English the simple formula below. I'd suggest to remove it.


I've tested setting "socket_timeout=2" and doing "SELECT pg_sleep(10);". 
It works somehow, however after a few attempts I have:


 psql> SELECT COUNT(*) FROM pg_stat_activity WHERE application_name='psql';
 3

I.e. I have just created lingering postmasters, which only stop when the 
queries are actually finished. I cannot say that I'm thrilled by that 
behavior. ISTM that libpq should at least attempt to cancel the query by 
sending a cancel request before closing the connection, as I have already 
suggested in a previous review. The "client side query timeout" use case 
does not seem well addressed by the current implementation.


If the user reconnects, eg "\c db", the setting is lost. The re-connection 
handling should probably take care of this parameter, and maybe others.


The implementation does not check that the parameter is indeed an integer, 
eg "socket_timeout=2bad" is silently ignored. I'd suggest to check like 
already done for other parameters, eg "port".


--
Fabien.



Re: [PATCH 0/3] Introduce spgist quadtree @<(point,circle) operator

2019-03-02 Thread Alexander Korotkov
Hi!

On Fri, Feb 1, 2019 at 7:08 PM Matwey V. Kornilov
 wrote:
> This patch series is to add support for spgist quadtree @<(point,circle)
> operator. The first two patches are to refactor existing code before
> implemention the new feature. The third commit is the actual implementation
> provided with a set of simple unit tests.

Cool!

> Matwey V. Kornilov (3):
>   Introduce helper variable in spgquadtreeproc.c
>   Introduce spg_quad_inner_consistent_box_helper() in spgquadtreeproc.c
>   Add initial support for spgist quadtree @<(point,circle) operator

At first, I have to note that it's not necessary to post every patch
in separate message.  It would be both easier and comfortable for
readers if you just put your patches as multiple attachments to the
same email message.

Regarding the patchset itself
 * spg_quad_inner_consistent_circle_helper() definitely needs comments.
 * In PostgreSQL we require that index scan produce exactly same
results as sequence scan.  Can we ensure this is so for
@<(point,circle) operator even in corner cases of rounding error?
 * In our coding style we have function name is the separate line from
its return type.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pg_partition_tree crashes for a non-defined relation

2019-03-02 Thread Michael Paquier
On Fri, Mar 01, 2019 at 11:38:20AM -0500, Tom Lane wrote:
> Right, while you'd get zero rows out for a non-partitioned table.
> WFM.

Exactly.  I have committed a patch doing exactly that, and I have
added test cases with a partitioned table and a partitioned index
which have no partitions. 
--
Michael


signature.asc
Description: PGP signature


Re: Looks heap_create_with_catalog ignored the if_not_exists options

2019-03-02 Thread Michael Paquier
On Sat, Mar 02, 2019 at 12:15:19AM +0800, Andy Fan wrote:
> Looks we need some locking there, but since PG is processes model,  I even
> don't know how to sync some code among processes in PG (any hint on this
> will be pretty good as well).

No, you shouldn't need any kind of extra locking here.
--
Michael


signature.asc
Description: PGP signature


Re: Online verification of checksums

2019-03-02 Thread Michael Banck
Hi,

Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
>  wrote:
> > I have added a retry for this as well now, without a pg_sleep() as well.
> > This catches around 80% of the half-reads, but a few slip through. At
> > that point we bail out with exit(1), and the user can try again, which I
> > think is fine?
> 
> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> robust at all.

The chance that pg_verify_checksums hits a torn page (at least in my
tests, see below) is already pretty low, a couple of times per 1000
runs. Maybe 4 out 5 times, the page is read fine on retry and we march
on. Otherwise, we now just issue a warning and skip the file (or so was
the idea, see below), do you think that is not acceptable?

I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
pg_verify_checksums in tight loops) with the current patch version, and
I am seeing short reads very, very rarely (maybe every 1000th run) with
a warning like:

|1174
|pg_verify_checksums: warning: could not read block 374 in file 
"data/base/18032/18045": read 4096 of 8192
|pg_verify_checksums: warning: could not read block 375 in file 
"data/base/18032/18045": read 4096 of 8192
|Files skipped: 2

The 1174 is the sequence number, the first 1173 runs of
pg_verify_checksums only skipped blocks.

However, the fact it shows two warnings for the same file means there is
something wrong here. It was continueing to the next block while I think
it should just skip to the next file on read failures. So I have changed
that now, new patch attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..4ad6edcde6 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -37,9 +37,8 @@ PostgreSQL documentation
   Description
   
pg_verify_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_verify_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   PostgreSQL cluster.  The exit status is zero if
+   there are no checksum errors, otherwise nonzero.  
   
  
 
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 511262ab5f..15ac9d00dc 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies page level checksums in a cluster
  *
  *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -24,12 +24,16 @@
 
 
 static int64 files = 0;
+static int64 skippedfiles = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
+static int64 skippedblocks = 0;
 static ControlFileData *ControlFile;
+static XLogRecPtr checkpointLSN;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static bool online = false;
 
 static const char *progname;
 
@@ -86,10 +90,17 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	bool		block_retry = false;
 
 	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
+		if (online && errno == ENOENT)
+		{
+			/* File was removed in the meantime */
+			return;
+		}
+
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
 progname, fn, strerror(errno));
 		exit(1);
@@ -104,26 +115,124 @@ scan_file(const char *fn, BlockNumber segmentno)
 
 		if (r == 0)
 			break;
+		if (r < 0)
+		{
+			skippedfiles++;
+			fprintf(stderr, _("%s: could not read block %u in file \"%s\": %s\n"),
+	progname, blockno, fn, strerror(errno));
+			return;
+		}
 		if (r != BLCKSZ)
 		{
-			fprintf(stderr, _("%s: could not read block %u in file \"%s\": read %d of %d\n"),
-	progname, blockno, fn, r, BLCKSZ);
-			exit(1);
+			if (online)
+			{
+if (block_retry)
+{
+	/* We already tried once to reread the block, skip it */
+	skippedfiles++;
+	fprintf(stderr, _("%s: warning: could not read block %u in file \"%s\": read %d of %d\n"),
+			progname, blockno, fn, r, BLCKSZ);
+	return;
+}
+
+/*
+ * Retry the block. It's possible that we read the block while it
+ * was extended or shrinked, so it it ends up looking torn to us.
+ */
+
+/*
+ * Seek back by the amount of bytes we read to the beginning of
+ * the fail

Re: allow online change primary_conninfo

2019-03-02 Thread Sergei Kornilov
Hello

This might be not the right way, but I can't think of a better way to not 
switch to a different method than split of lastSourceFailed processing and 
starting new source. Something like refactoring in first attached patch. I move 
RequestXLogStreaming logic from lastSourceFailed processing and add new 
pendingRestartSource indicator.
Second patch is mostly the same as previous version but uses new 
pendingRestartSource mechanism instead of lastSourceFailed.

Thanks in advance!

regards, Sergeidiff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ecd12fc53a..daddaeb236 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -798,10 +798,11 @@ static XLogSource readSource = 0;	/* XLOG_FROM_* code */
  * different from readSource in that this is always set, even when we don't
  * currently have a WAL file open. If lastSourceFailed is set, our last
  * attempt to read from currentSource failed, and we should try another source
- * next.
+ * next. If pendingRestartSource is set we want restart current source
  */
 static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
 static bool lastSourceFailed = false;
+static bool pendingRestartSource = false;
 
 typedef struct XLogPageReadPrivate
 {
@@ -11828,48 +11829,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	if (!StandbyMode)
 		return false;
 
-	/*
-	 * If primary_conninfo is set, launch walreceiver to try
-	 * to stream the missing WAL.
-	 *
-	 * If fetching_ckpt is true, RecPtr points to the initial
-	 * checkpoint location. In that case, we use RedoStartLSN
-	 * as the streaming start position instead of RecPtr, so
-	 * that when we later jump backwards to start redo at
-	 * RedoStartLSN, we will have the logs streamed already.
-	 */
-	if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
-	{
-		XLogRecPtr	ptr;
-		TimeLineID	tli;
-
-		if (fetching_ckpt)
-		{
-			ptr = RedoStartLSN;
-			tli = ControlFile->checkPointCopy.ThisTimeLineID;
-		}
-		else
-		{
-			ptr = RecPtr;
-
-			/*
-			 * Use the record begin position to determine the
-			 * TLI, rather than the position we're reading.
-			 */
-			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
-
-			if (curFileTLI > 0 && tli < curFileTLI)
-elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
-	 (uint32) (tliRecPtr >> 32),
-	 (uint32) tliRecPtr,
-	 tli, curFileTLI);
-		}
-		curFileTLI = tli;
-		RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
-			 PrimarySlotName);
-		receivedUpto = 0;
-	}
-
 	/*
 	 * Move to XLOG_FROM_STREAM state in either case. We'll
 	 * get immediate failure if we didn't launch walreceiver,
@@ -11969,10 +11928,83 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
  lastSourceFailed ? "failure" : "success");
 
 		/*
-		 * We've now handled possible failure. Try to read from the chosen
-		 * source.
+		 * Prepare to read from the chosen source if we asked restart source
+		 * or last source was failed
+		 */
+		if (pendingRestartSource || lastSourceFailed)
+		{
+			/*
+			 * make sure that walreceiver is not active. this is needed for
+			 * all supported sources
+			 */
+			if (WalRcvRunning())
+ShutdownWalRcv();
+
+			switch (currentSource)
+			{
+case XLOG_FROM_ARCHIVE:
+case XLOG_FROM_PG_WAL:
+
+	/*
+	 * We do not need additional actions here
+	 */
+	break;
+
+case XLOG_FROM_STREAM:
+
+	/*
+	 * If primary_conninfo is set, launch walreceiver to try
+	 * to stream the missing WAL.
+	 *
+	 * If fetching_ckpt is true, RecPtr points to the initial
+	 * checkpoint location. In that case, we use RedoStartLSN
+	 * as the streaming start position instead of RecPtr, so
+	 * that when we later jump backwards to start redo at
+	 * RedoStartLSN, we will have the logs streamed already.
+	 */
+	if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
+	{
+		XLogRecPtr	ptr;
+		TimeLineID	tli;
+
+		if (fetching_ckpt)
+		{
+			ptr = RedoStartLSN;
+			tli = ControlFile->checkPointCopy.ThisTimeLineID;
+		}
+		else
+		{
+			ptr = RecPtr;
+
+			/*
+			 * Use the record begin position to determine the
+			 * TLI, rather than the position we're reading.
+			 */
+			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
+
+			if (curFileTLI > 0 && tli < curFileTLI)
+elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
+	 (uint32) (tliRecPtr >> 32),
+	 (uint32) tliRecPtr,
+	 tli, curFileTLI);
+		}
+		curFileTLI = tli;
+		RequestXLogStreaming(tli, p

RE: Timeout parameters

2019-03-02 Thread Fabien COELHO



Hello again Ryohei-san,

About the tcp_user_timeout libpq parameter v8.

Patch applies & compiles cleanly. Global check is ok, although there are 
no specific tests.


Documentation English can be improved. Could a native speaker help, 
please?


ISTM that the documentation both states that it works and does not work on 
Windows. One assertion must be false.


Syntax error, eg "tcp_user_timeout=2bad", are detected, good.

I could not really test the feature, i.e. I could not trigger a timeout. 
Do you have a suggestion on how to test it?


--
Fabien.



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-02 Thread David Rowley
On Fri, 1 Mar 2019 at 08:56, Tomas Vondra  wrote:
> Attached is an updated version of this patch series.

I made a quick pass over the 0001 patch. I edited a few small things
along the way; patch attached.

I'll try to do a more in-depth review soon.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


mcv_stats_small_fixups_for_0001.patch
Description: Binary data


Re: NOT IN subquery optimization

2019-03-02 Thread David Rowley
On Sat, 2 Mar 2019 at 13:45, Tom Lane  wrote:
>
> David Rowley  writes:
> > I think you're fighting a losing battle here with adding OR quals to
> > the join condition.
>
> Yeah --- that has a nontrivial risk of making things significantly worse,
> which makes it a hard sell.  I think the most reasonable bet here is
> simply to not perform the transformation if we can't prove the inner side
> NOT NULL.  That's going to catch most of the useful cases anyway IMO.

Did you mean outer side NOT NULL?   The OR col IS NULL was trying to
solve the outer side nullability problem when the inner side is empty.
  Of course, the inner side needs to not produce NULLs either, but
that's due to the fact that if a NULL exists in the inner side then
the anti-join should not produce any records.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Question about pg_upgrade from 9.2 to X.X

2019-03-02 Thread Perumal Raj
Hi Sergei and Team

Could you share your observation further.

Perumal Raju


On Thu, Feb 28, 2019, 11:21 AM Perumal Raj  wrote:

> here is the data,
>
> postgres=# \c template1
> You are now connected to database "template1" as user "postgres".
> template1=# \dx
>  List of installed extensions
>   Name   | Version |   Schema   | Description
> -+-++--
>  plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
> (1 row)
>
> template1=# \c postgres
> You are now connected to database "postgres" as user "postgres".
> postgres=# \dx
>  List of installed extensions
>   Name   | Version |   Schema   | Description
> -+-++--
>  plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
> (1 row)
>
> postgres=# \c nagdb
> You are now connected to database "nagdb" as user "postgres".
> nagdb=# \dx
>  List of installed extensions
>   Name   | Version |   Schema   | Description
> -+-++--
>  plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
> (1 row)
>
> nagdb=# \c archive_old
>
>  List of installed extensions
> Name| Version |   Schema   |
> Description
> +-++---
>  pg_stat_statements | 1.1 | public | track execution statistics of 
> all SQL statements executed
>  plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
> (2 rows)
>
> archive_old=# \c production
> # \dx
>  List of installed extensions
> Name| Version |   Schema   |
> Description
> +-++---
>  hstore | 1.1 | public | data type for storing sets of 
> (key, value) pairs
>  pg_stat_statements | 1.1 | public | track execution statistics of 
> all SQL statements executed
>  plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
>  uuid-ossp  | 1.0 | public | generate universally unique 
> identifiers (UUIDs)
> (4 rows)
>
>
> Thanks,
>
>
>
> On Thu, Feb 28, 2019 at 11:04 AM Sergei Kornilov  wrote:
>
>> Hi
>>
>> > Yes, i want to get rid of old extension, Could you please share the
>> query to find extension which is using pg_reorg.
>>
>> pg_reorg is name for both tool and extension.
>> Check every database in cluster with, for example, psql command "\dx" or
>> read pg_dumpall -s output for some CREATE EXTENSION statements to find all
>> installed extensions.
>>
>> regards, Sergei
>>
>


Re: Online verification of checksums

2019-03-02 Thread Stephen Frost
Greetings,

* Michael Banck (michael.ba...@credativ.de) wrote:
> Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> >  wrote:
> > > I have added a retry for this as well now, without a pg_sleep() as well.
> > > This catches around 80% of the half-reads, but a few slip through. At
> > > that point we bail out with exit(1), and the user can try again, which I
> > > think is fine?
> > 
> > Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> > robust at all.
> 
> The chance that pg_verify_checksums hits a torn page (at least in my
> tests, see below) is already pretty low, a couple of times per 1000
> runs. Maybe 4 out 5 times, the page is read fine on retry and we march
> on. Otherwise, we now just issue a warning and skip the file (or so was
> the idea, see below), do you think that is not acceptable?
> 
> I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
> pg_verify_checksums in tight loops) with the current patch version, and
> I am seeing short reads very, very rarely (maybe every 1000th run) with
> a warning like:
> 
> |1174
> |pg_verify_checksums: warning: could not read block 374 in file 
> "data/base/18032/18045": read 4096 of 8192
> |pg_verify_checksums: warning: could not read block 375 in file 
> "data/base/18032/18045": read 4096 of 8192
> |Files skipped: 2
> 
> The 1174 is the sequence number, the first 1173 runs of
> pg_verify_checksums only skipped blocks.
> 
> However, the fact it shows two warnings for the same file means there is
> something wrong here. It was continueing to the next block while I think
> it should just skip to the next file on read failures. So I have changed
> that now, new patch attached.

I'm confused- if previously it was continueing to the next block instead
of doing the re-read on the same block, why don't we just change it to
do the re-read on the same block properly and see if that fixes the
retry, instead of just giving up and skipping..?  I'm not necessairly
against skipping to the next file, to be clear, but I think I'd be
happier if we kept reading the file until we actually get EOF.

(I've not looked at the actual patch, just read what you wrote..)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: NOT IN subquery optimization

2019-03-02 Thread Tom Lane
David Rowley  writes:
> On Sat, 2 Mar 2019 at 13:45, Tom Lane  wrote:
>> Yeah --- that has a nontrivial risk of making things significantly worse,
>> which makes it a hard sell.  I think the most reasonable bet here is
>> simply to not perform the transformation if we can't prove the inner side
>> NOT NULL.  That's going to catch most of the useful cases anyway IMO.

> Did you mean outer side NOT NULL?

Sorry, sloppy thinking.

>   Of course, the inner side needs to not produce NULLs either, but
> that's due to the fact that if a NULL exists in the inner side then
> the anti-join should not produce any records.

Right.  So the path of least resistance is to transform to antijoin
only if we can prove both of those things (and maybe we need to check
that the join operator is strict, too?  -ENOCAFFEINE).  The question
before us is what is the cost-benefit ratio of trying to cope with
additional cases.  I'm skeptical that it's attractive: the cost
certainly seems high, and I don't know that there are very many
real-world cases where we'd get a win.

Hmm ... thinking about the strictness angle some more: what we really
need to optimize NOT IN, IIUC, is an assumption that the join operator
will never return NULL.  While not having NULL inputs is certainly a
*necessary* condition for that (assuming a strict operator) it's not a
*sufficient* condition.  Any Postgres function/operator is capable
of returning NULL whenever it feels like it.  So checking strictness
does not lead to a mathematically correct optimization.

My initial thought about plugging that admittedly-academic point is
to insist that the join operator be both strict and a member of a
btree opclass (hash might be OK too; less sure about other index types).
The system already contains assumptions that btree comparators never
return NULL.  I doubt that this costs us any real-world applicability,
because if the join operator can neither merge nor hash, we're screwed
anyway for finding a join plan that's better than nested-loop.

regards, tom lane



Re: GSoC 2019

2019-03-02 Thread Stephen Frost
Greetings,

* Sumukha Pk (sumukhap...@gmail.com) wrote:
> I am Sumukha PK a student of NITK. I am interested in the WAL-G backup tool. 
> I haven’t been able to catch hold of anyone through the IRC channels so I 
> need someone to point me to appropriate resources so that I can be introduced 
> to it. I am proficient in Golang an would be interested to work on this 
> project.

Please work with Andrey on coming up with a project plan to be submitted
formally through the GSoC website.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: GSoC 2019

2019-03-02 Thread Andrey Borodin
Hello!

> 2 марта 2019 г., в 21:40, Stephen Frost  написал(а):
> 
> Greetings,
> 
> * Sumukha Pk (sumukhap...@gmail.com) wrote:
>> I am Sumukha PK a student of NITK. I am interested in the WAL-G backup tool. 
>> I haven’t been able to catch hold of anyone through the IRC channels so I 
>> need someone to point me to appropriate resources so that I can be 
>> introduced to it. I am proficient in Golang an would be interested to work 
>> on this project.
> 
> Please work with Andrey on coming up with a project plan to be submitted
> formally through the GSoC website.

Thanks, Stephen!

Sumukha Pk, that's great that you have subscribed to pgsql-hackers and joined 
PostgreSQL development list, here's a lot of useful information.
In GSOC, WAL-G is under PostgreSQL umbrella project. But usually we are not 
using this list for WAL-G related discussions.
We have a WAL-G Slack channel https://postgresteam.slack.com/messages/CA25P48P2 
. To get there you can use this invite app https://postgres-slack.herokuapp.com/
There you can ask whatever you want to know about WAL-G or your GSOC proposal.

Thanks for your interest in our project!

Best regards, Andrey Borodin.


GSoC 2019 - TOAST'ing in slices idea

2019-03-02 Thread Bruno Hass
Hello everyone,

I am currently writing my proposal for GSoC 2019 for the TOAST'ing in slices 
idea.
 I already have a sketch of the description and approach outline, which I am 
sending in this e-mail. I would be happy to receive some feedback on it. I've 
been reading the PostgreSQL source code and documentation in order to 
understand better how to implement this idea and to write a detailed proposal. 
I would also be glad to receive some recommendation of documentation or 
material on TOAST'ing internals as well as some hint on where to look further 
in the source code.


I am looking forward to your feedback!


PostgreSQL Proposal.pdf
Description: PostgreSQL Proposal.pdf


Re: WIP: BRIN multi-range indexes

2019-03-02 Thread Tomas Vondra


On 3/2/19 10:05 AM, Alexander Korotkov wrote:
> On Sun, Mar 4, 2018 at 3:15 AM Tomas Vondra
>  wrote:
>> I've been thinking about this after looking at 0a459cec96, and I don't
>> think this patch has the same issues. One reason is that just like the
>> original minmax opclass, it does not really mess with the data it
>> stores. It only does min/max on the values, and stores that, so if there
>> was NaN or Infinity, it will index NaN or Infinity.
> 
> FWIW, I think the closest similar functionality is subtype_diff
> function of range type.  But I don't think we should require range
> type here just in order to fetch subtype_diff function out of it.  So,
> opclass distance function looks OK for me,

OK, agreed.

> assuming it's not AM-defined function, but function used for
> inter-opclass compatibility.
> 

I'm not sure I understand what you mean by this. Can you elaborate? Does
the current implementation (i.e. distance function being implemented as
an opclass support procedure) work for you or not?

Thanks for looking at the patch!


cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: BRIN multi-range indexes

2019-03-02 Thread Tomas Vondra



On 3/2/19 10:00 AM, Alexander Korotkov wrote:
> Hi!
> 
> I'm starting to look at this patchset.  In the general, I think it's
> very cool!  We definitely need this.
> 
> On Tue, Apr 3, 2018 at 10:51 PM Tomas Vondra
>  wrote:
>> 1) index parameters
>>
>> The main improvement of this version is an introduction of a couple of
>> BRIN index parameters, next to pages_per_range and autosummarize.
>>
>> a) n_distinct_per_range - used to size Bloom index
>> b) false_positive_rate - used to size Bloom index
>> c) values_per_range - number of values in the minmax-multi summary
>>
>> Until now those parameters were pretty much hard-coded, this allows easy
>> customization depending on the data set. There are some basic rules to
>> to clamp the values (e.g. not to allow ndistinct to be less than 128 or
>> more than MaxHeapTuplesPerPage * page_per_range), but that's about it.
>> I'm sure we could devise more elaborate heuristics (e.g. when building
>> index on an existing table, we could inspect table statistics first),
>> but the patch does not do that.
>>
>> One disadvantage is that those parameters are per-index.
> 
> For me, the main disadvantage of this solution is that we put
> opclass-specific parameters into access method.  And this is generally
> bad design.  So, user can specify such parameter if even not using
> corresponding opclass, that may cause a confuse (if even we forbid
> that, it needs to be hardcoded).  Also, extension opclasses can't do
> the same thing.  Thus, it appears that extension opclasses are not
> first class citizens anymore.  Have you take a look at opclass
> parameters patch [1]?  I think it's proper solution of this problem.
> I think we should postpone this parameterization until we push opclass
> parameters patch.
> 
> 1. 
> https://www.postgresql.org/message-id/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
> 

I've looked at that patch only very briefly so far, but I agree it's
likely a better solution than what my patch does at the moment (which I
agree is a misuse of the AM-level options). I'll take a closer look.

I agree it makes sense to re-use that infrastructure for this patch, but
I'm hesitant to rebase it on top of that patch right away. Because it
would mean this thread dependent on it, which would confuse cputube,
make it bitrot faster etc.

So I suggest we ignore this aspect of the patch for now, and let's talk
about the other bits first.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-03-02 Thread James Coleman
On Fri, Mar 1, 2019 at 5:28 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > [ saop_is_not_null-v10.patch ]
>
> I went through this again, and this time (after some more rewriting
> of the comments) I satisfied myself that the logic is correct.
> Hence, pushed.

Thanks!

> I also tweaked it to recognize the case where we can prove the
> array, rather than the scalar, to be null.  I'm not sure how useful
> that is in practice, but it seemed weird that we'd exploit that
> only if we can also prove the scalar to be null.

Just for my own understanding: I thought the "if
(arrayconst->constisnull)" took care of the array constant being null?
I don't see a check on the scalar node / lhs. I do see you added a
check for the entire clause being null, but I'm not sure I understand
when that would occur (unless it's only in the recursive case?)

> Take a look at the ScalarArrayOp case in eval_const_expressions.
> Right now it's only smart about the all-inputs-constant case.
> I'm not really convinced it's worth spending cycles on the constant-
> null-array case, but that'd be where to do it if we want to do it
> in a general way.  (I think that what I added to clause_is_strict_for
> is far cheaper, because while it's the same test, it's in a place
> that we won't hit during most queries.)

Thanks for the pointer; I'll take a look if for no other reason than curiosity.

Thanks,
James Coleman



Re: Online verification of checksums

2019-03-02 Thread Tomas Vondra
On 3/2/19 12:03 AM, Robert Haas wrote:
> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
>  wrote:
>> I have added a retry for this as well now, without a pg_sleep() as well.
>> This catches around 80% of the half-reads, but a few slip through. At
>> that point we bail out with exit(1), and the user can try again, which I
>> think is fine?
> 
> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> robust at all.
> 

FWIW I don't think this qualifies as torn page - i.e. it's not a full
read with a mix of old and new data. This is partial write, most likely
because we read the blocks one by one, and when we hit the last page
while the table is being extended, we may only see the fist 4kB. And if
we retry very fast, we may still see only the first 4kB.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2019-03-02 Thread Tomas Vondra



On 3/2/19 5:08 PM, Stephen Frost wrote:
> Greetings,
> 
> * Michael Banck (michael.ba...@credativ.de) wrote:
>> Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
>>> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
>>>  wrote:
 I have added a retry for this as well now, without a pg_sleep() as well.
 This catches around 80% of the half-reads, but a few slip through. At
 that point we bail out with exit(1), and the user can try again, which I
 think is fine?
>>>
>>> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
>>> robust at all.
>>
>> The chance that pg_verify_checksums hits a torn page (at least in my
>> tests, see below) is already pretty low, a couple of times per 1000
>> runs. Maybe 4 out 5 times, the page is read fine on retry and we march
>> on. Otherwise, we now just issue a warning and skip the file (or so was
>> the idea, see below), do you think that is not acceptable?
>>
>> I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
>> pg_verify_checksums in tight loops) with the current patch version, and
>> I am seeing short reads very, very rarely (maybe every 1000th run) with
>> a warning like:
>>
>> |1174
>> |pg_verify_checksums: warning: could not read block 374 in file 
>> "data/base/18032/18045": read 4096 of 8192
>> |pg_verify_checksums: warning: could not read block 375 in file 
>> "data/base/18032/18045": read 4096 of 8192
>> |Files skipped: 2
>>
>> The 1174 is the sequence number, the first 1173 runs of
>> pg_verify_checksums only skipped blocks.
>>
>> However, the fact it shows two warnings for the same file means there is
>> something wrong here. It was continueing to the next block while I think
>> it should just skip to the next file on read failures. So I have changed
>> that now, new patch attached.
> 
> I'm confused- if previously it was continueing to the next block instead
> of doing the re-read on the same block, why don't we just change it to
> do the re-read on the same block properly and see if that fixes the
> retry, instead of just giving up and skipping..?  I'm not necessairly
> against skipping to the next file, to be clear, but I think I'd be
> happier if we kept reading the file until we actually get EOF.
> 
> (I've not looked at the actual patch, just read what you wrote..)
> 

Notice that those two errors are actually for two consecutive blocks in
the same file. So what probably happened is that postgres started to
extend the page, and the verification tried to read the last page after
the kernel added just the first 4kB filesystem page. Then it probably
succeeded on a retry, and then the same thing happened on the next page.

I don't think EOF addresses this, though - the partial read happens
before we actually reach the end of the file.

And re-reads are not a solution either, because the second read may
still see only the first half, and then what - is it a permanent issue
(in which case it's a data corruption), or an extension in progress?

I wonder if we can simply ignore those errors entirely, if it's the last
page in the segment? We can't really check the file is "complete"
anyway, e.g. if you have multiple segments for a table, and the "middle"
one is a page shorter, we'll happily ignore that during verification.

Also, what if we're reading a file and it gets truncated (e.g. after
vacuum notices the last few pages are empty)? Doesn't that have the same
issue?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2019-03-02 Thread Andres Freund
Hi,


On 2019-03-02 22:49:33 +0100, Tomas Vondra wrote:
> 
> 
> On 3/2/19 5:08 PM, Stephen Frost wrote:
> > Greetings,
> > 
> > * Michael Banck (michael.ba...@credativ.de) wrote:
> >> Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> >>> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> >>>  wrote:
>  I have added a retry for this as well now, without a pg_sleep() as well.
>  This catches around 80% of the half-reads, but a few slip through. At
>  that point we bail out with exit(1), and the user can try again, which I
>  think is fine?
> >>>
> >>> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> >>> robust at all.
> >>
> >> The chance that pg_verify_checksums hits a torn page (at least in my
> >> tests, see below) is already pretty low, a couple of times per 1000
> >> runs. Maybe 4 out 5 times, the page is read fine on retry and we march
> >> on. Otherwise, we now just issue a warning and skip the file (or so was
> >> the idea, see below), do you think that is not acceptable?
> >>
> >> I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
> >> pg_verify_checksums in tight loops) with the current patch version, and
> >> I am seeing short reads very, very rarely (maybe every 1000th run) with
> >> a warning like:
> >>
> >> |1174
> >> |pg_verify_checksums: warning: could not read block 374 in file 
> >> "data/base/18032/18045": read 4096 of 8192
> >> |pg_verify_checksums: warning: could not read block 375 in file 
> >> "data/base/18032/18045": read 4096 of 8192
> >> |Files skipped: 2
> >>
> >> The 1174 is the sequence number, the first 1173 runs of
> >> pg_verify_checksums only skipped blocks.
> >>
> >> However, the fact it shows two warnings for the same file means there is
> >> something wrong here. It was continueing to the next block while I think
> >> it should just skip to the next file on read failures. So I have changed
> >> that now, new patch attached.
> > 
> > I'm confused- if previously it was continueing to the next block instead
> > of doing the re-read on the same block, why don't we just change it to
> > do the re-read on the same block properly and see if that fixes the
> > retry, instead of just giving up and skipping..?  I'm not necessairly
> > against skipping to the next file, to be clear, but I think I'd be
> > happier if we kept reading the file until we actually get EOF.
> > 
> > (I've not looked at the actual patch, just read what you wrote..)
> > 
> 
> Notice that those two errors are actually for two consecutive blocks in
> the same file. So what probably happened is that postgres started to
> extend the page, and the verification tried to read the last page after
> the kernel added just the first 4kB filesystem page. Then it probably
> succeeded on a retry, and then the same thing happened on the next page.
> 
> I don't think EOF addresses this, though - the partial read happens
> before we actually reach the end of the file.
> 
> And re-reads are not a solution either, because the second read may
> still see only the first half, and then what - is it a permanent issue
> (in which case it's a data corruption), or an extension in progress?
> 
> I wonder if we can simply ignore those errors entirely, if it's the last
> page in the segment? We can't really check the file is "complete"
> anyway, e.g. if you have multiple segments for a table, and the "middle"
> one is a page shorter, we'll happily ignore that during verification.
> 
> Also, what if we're reading a file and it gets truncated (e.g. after
> vacuum notices the last few pages are empty)? Doesn't that have the same
> issue?

I gotta say, my conclusion from this debate is that it's simply a
mistake to do this without involvement of the server that can use
locking to prevent these kind of issues.  It seems pretty absurd to me
to have hacky workarounds around partial writes of a live server, around
truncation, etc, even though the server has ways to deal with that.

- Andres



Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-03-02 Thread Tom Lane
James Coleman  writes:
> On Fri, Mar 1, 2019 at 5:28 PM Tom Lane  wrote:
>> I also tweaked it to recognize the case where we can prove the
>> array, rather than the scalar, to be null.  I'm not sure how useful
>> that is in practice, but it seemed weird that we'd exploit that
>> only if we can also prove the scalar to be null.

> Just for my own understanding: I thought the "if
> (arrayconst->constisnull)" took care of the array constant being null?

Yeah, but only after we've already matched the subexpr to the LHS,
otherwise we'd not be bothering to determine the array's size.

On the other hand, if for some reason we know that the array side
is NULL, we can conclude that the ScalarArrayOp returns NULL
independently of what the LHS is.

> I don't see a check on the scalar node / lhs. I do see you added a
> check for the entire clause being null, but I'm not sure I understand
> when that would occur (unless it's only in the recursive case?)

Yeah, it's to handle the case where we run into a constant NULL
below a strict node.  Typically, that doesn't happen because
eval_const_expressions would have const-folded the upper node, but
it's not impossible given all the cases that clause_is_strict_for
can recurse through now.  (The coverage bot does show that code being
reached, although perhaps that only occurs for null-below-ScalarArrayOp,
in which case it might be dead if we were to make eval_const_expressions
smarter.)

regards, tom lane



Re: POC: converting Lists into arrays

2019-03-02 Thread Tom Lane
Here's a v3 incorporating Andres' idea of trying to avoid a separate
palloc for the list cell array.  In a 64-bit machine we can have up
to five ListCells in the initial allocation without actually increasing
space consumption at all compared to the old code.  So only when a List
grows larger than that do we need more than one palloc.

I'm still having considerable difficulty convincing myself that this
is enough of a win to justify the bug hazards we'll introduce, though.
On test cases like "pg_bench -S" it seems to be pretty much within the
noise level of being the same speed as HEAD.  I did see a nice improvement
in the test case described in
https://www.postgresql.org/message-id/6970.1545327...@sss.pgh.pa.us
but considering that that's still mostly a tight loop in
match_eclasses_to_foreign_key_col, it doesn't seem very interesting
as an overall figure of merit.

I wonder what test cases Andres has been looking at that convince
him that we need a reimplementation of Lists.

regards, tom lane



reimplement-List-as-array-3.patch.gz
Description: reimplement-List-as-array-3.patch.gz


Re: Online verification of checksums

2019-03-02 Thread Michael Paquier
On Sat, Mar 02, 2019 at 02:00:31PM -0800, Andres Freund wrote:
> I gotta say, my conclusion from this debate is that it's simply a
> mistake to do this without involvement of the server that can use
> locking to prevent these kind of issues.  It seems pretty absurd to me
> to have hacky workarounds around partial writes of a live server, around
> truncation, etc, even though the server has ways to deal with that.

I agree with Andres on this one.  We are never going to make this
stuff safe if we don't handle page reads with the proper locks because
of torn pages.  What I think we should do is provide a SQL function
which reads a page in shared mode, and then checks its checksum if its
LSN is older than the previous redo point.  This discards cases with
rather hot pages, but if the page is hot enough then the backend
re-reading the page would just do the same by verifying the page
checksum by itself.
--
Michael


signature.asc
Description: PGP signature


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-03-02 Thread James Coleman
On Sat, Mar 2, 2019 at 5:29 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Fri, Mar 1, 2019 at 5:28 PM Tom Lane  wrote:
> >> I also tweaked it to recognize the case where we can prove the
> >> array, rather than the scalar, to be null.  I'm not sure how useful
> >> that is in practice, but it seemed weird that we'd exploit that
> >> only if we can also prove the scalar to be null.
>
> > Just for my own understanding: I thought the "if
> > (arrayconst->constisnull)" took care of the array constant being null?
>
> Yeah, but only after we've already matched the subexpr to the LHS,
> otherwise we'd not be bothering to determine the array's size.
>
> On the other hand, if for some reason we know that the array side
> is NULL, we can conclude that the ScalarArrayOp returns NULL
> independently of what the LHS is.

Thanks for the detailed explanation.

> > I don't see a check on the scalar node / lhs. I do see you added a
> > check for the entire clause being null, but I'm not sure I understand
> > when that would occur (unless it's only in the recursive case?)
>
> Yeah, it's to handle the case where we run into a constant NULL
> below a strict node.  Typically, that doesn't happen because
> eval_const_expressions would have const-folded the upper node, but
> it's not impossible given all the cases that clause_is_strict_for
> can recurse through now.  (The coverage bot does show that code being
> reached, although perhaps that only occurs for null-below-ScalarArrayOp,
> in which case it might be dead if we were to make eval_const_expressions
> smarter.)

Ah, that makes sense.

James Coleman



Re: Online verification of checksums

2019-03-02 Thread Tomas Vondra
On 3/3/19 12:48 AM, Michael Paquier wrote:
> On Sat, Mar 02, 2019 at 02:00:31PM -0800, Andres Freund wrote:
>> I gotta say, my conclusion from this debate is that it's simply a
>> mistake to do this without involvement of the server that can use
>> locking to prevent these kind of issues.  It seems pretty absurd to me
>> to have hacky workarounds around partial writes of a live server, around
>> truncation, etc, even though the server has ways to deal with that.
> 
> I agree with Andres on this one.  We are never going to make this
> stuff safe if we don't handle page reads with the proper locks because
> of torn pages.  What I think we should do is provide a SQL function
> which reads a page in shared mode, and then checks its checksum if its
> LSN is older than the previous redo point.  This discards cases with
> rather hot pages, but if the page is hot enough then the backend
> re-reading the page would just do the same by verifying the page
> checksum by itself.

Handling torn pages is not difficult, and the patch already does that
(it reads LSN of the last checkpoint LSN from the control file, and uses
it the same way basebackup does). That's working since (at least)
September, so I don't see how would the SQL function help with this?

The other issue (raised recently) is partial reads, where we read only a
fraction of the page. Basebackup simply ignores such pages, likely on
the assumption that it's either concurrent extension or truncation (in
which case it's newer than the last checkpoint LSN anyway). So maybe we
should do the same thing here. As I mentioned before, we can't reliably
detect incomplete segments anyway (at least I believe that's the case).

You and Andres may be right that trying to verify checksums online
without close interaction with the server is ultimately futile (or at
least overly complex). But I'm not sure those issues (torn pages and
partial reads) are very good arguments, considering basebackup has to
deal with them too. Not sure.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: NOT IN subquery optimization

2019-03-02 Thread David Rowley
On Sun, 3 Mar 2019 at 05:25, Tom Lane  wrote:
> Hmm ... thinking about the strictness angle some more: what we really
> need to optimize NOT IN, IIUC, is an assumption that the join operator
> will never return NULL.  While not having NULL inputs is certainly a
> *necessary* condition for that (assuming a strict operator) it's not a
> *sufficient* condition.  Any Postgres function/operator is capable
> of returning NULL whenever it feels like it.  So checking strictness
> does not lead to a mathematically correct optimization.

That's something I didn't think of.

> My initial thought about plugging that admittedly-academic point is
> to insist that the join operator be both strict and a member of a
> btree opclass (hash might be OK too; less sure about other index types).
> The system already contains assumptions that btree comparators never
> return NULL.  I doubt that this costs us any real-world applicability,
> because if the join operator can neither merge nor hash, we're screwed
> anyway for finding a join plan that's better than nested-loop.

Why strict? If both inputs are non-NULL, then what additional
guarantees does strict give us?

I implemented a btree opfamily check in my version of the patch. Not
so sure about hash, can you point me in the direction of a mention of
how this is guarantees for btree?

The attached v1.2 does this adds a regression test using the LINE
type. This has an operator named '=', but no btree opfamily. A few
other types are in this boat too, per:

select typname from pg_type t where not exists(select 1 from pg_amop
where amoplefttype = t.oid and amopmethod=403) and exists (select 1
from pg_operator where oprleft = t.oid and oprname = '=');

The list of builtin types that have a hash opfamily but no btree
opfamily that support NOT IN are not very exciting, so doing the same
for hash might not be worth the extra code.

select typname from pg_type t where exists(select 1 from pg_amop where
amoplefttype = t.oid and amopmethod=405) and exists (select 1 from
pg_operator where oprleft = t.oid and oprname = '=') and not
exists(select 1 from pg_amop where amoplefttype = t.oid and
amopmethod=403);
 typname
-
 xid
 cid
 aclitem
(3 rows)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


not_in_anti_join_v1.2.patch
Description: Binary data


Re: NOT IN subquery optimization

2019-03-02 Thread Tom Lane
David Rowley  writes:
> On Sun, 3 Mar 2019 at 05:25, Tom Lane  wrote:
>> My initial thought about plugging that admittedly-academic point is
>> to insist that the join operator be both strict and a member of a
>> btree opclass (hash might be OK too; less sure about other index types).

> Why strict? If both inputs are non-NULL, then what additional
> guarantees does strict give us?

Yeah, if we're verifying the inputs are non-null, I think that probably
doesn't matter.

> I implemented a btree opfamily check in my version of the patch. Not
> so sure about hash, can you point me in the direction of a mention of
> how this is guarantees for btree?

https://www.postgresql.org/docs/devel/btree-support-funcs.html
quoth

The comparison function must take two non-null values A and B and
return an int32 value that is < 0, 0, or > 0 when A < B, A = B, or A >
B, respectively. A null result is disallowed: all values of the data
type must be comparable.

(At the code level, this is implicit in the fact that the comparison
function will be called via FunctionCall2Coll or a sibling, and those
all throw an error if the called function returns NULL.)

Now, it doesn't say in so many words that the comparison operators
have to yield results consistent with the comparison support function,
but I think that's pretty obvious ...

For hash, the equivalent constraint is that the hash function has to
work for every possible input value.  I suppose it's possible that
the associated equality operator would sometimes return null, but
it's hard to think of a practical reason for doing so.

I've not dug in the code, but I wouldn't be too surprised if
nodeMergejoin.c or nodeHashjoin.c, or the stuff related to hash
grouping or hash aggregation, also contain assumptions about
the equality operators not returning null.

> The list of builtin types that have a hash opfamily but no btree
> opfamily that support NOT IN are not very exciting, so doing the same
> for hash might not be worth the extra code.

Agreed for builtin types, but there might be some extensions out there
where this doesn't hold.  It's not terribly hard to imagine a data type
that hasn't got a linear sort order but is amenable to hashing.
(The in-core xid type is an example, actually.)

regards, tom lane



Re: WIP: BRIN multi-range indexes

2019-03-02 Thread Alexander Korotkov
On Sun, Mar 3, 2019 at 12:25 AM Tomas Vondra
 wrote:
> I've looked at that patch only very briefly so far, but I agree it's
> likely a better solution than what my patch does at the moment (which I
> agree is a misuse of the AM-level options). I'll take a closer look.
>
> I agree it makes sense to re-use that infrastructure for this patch, but
> I'm hesitant to rebase it on top of that patch right away. Because it
> would mean this thread dependent on it, which would confuse cputube,
> make it bitrot faster etc.
>
> So I suggest we ignore this aspect of the patch for now, and let's talk
> about the other bits first.

Works for me.  We don't need to make the whole work made by this patch
to be dependent on opclass parameters.  It's OK to ignore this aspect
for now and come back when opclass parameters get committed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: POC: converting Lists into arrays

2019-03-02 Thread Andres Freund
Hi,

On 2019-03-02 18:11:43 -0500, Tom Lane wrote:
> I wonder what test cases Andres has been looking at that convince
> him that we need a reimplementation of Lists.

My main observation was from when the expression evaluation was using
lists all over. List iteration overhead was very substantial there. But
that's not a problem anymore, because all of those are gone now due to
the expression rewrite.  I personally wasn't actually advocating for a
new list implementation, I was/am advocating that we should move some
tasks over to a more optimized representation.

I still regularly see list overhead matter in production workloads. A
lot of it being memory allocator overhead, which is why I'm concerned
with a rewrite that doesn't reduce the number of memory allocations. And
a lot of it is stuff that you won't see in pgbench - e.g. there's a lot
of production queries that join a bunch of tables with a few dozen
columns, where e.g. all the targetlists are much longer than what you'd
see in pgbench -S.

Greetings,

Andres Freund



Re: WIP: BRIN multi-range indexes

2019-03-02 Thread Alexander Korotkov
On Sun, Mar 3, 2019 at 12:12 AM Tomas Vondra
 wrote:
> On 3/2/19 10:05 AM, Alexander Korotkov wrote:
> > assuming it's not AM-defined function, but function used for
> > inter-opclass compatibility.
>
> I'm not sure I understand what you mean by this. Can you elaborate? Does
> the current implementation (i.e. distance function being implemented as
> an opclass support procedure) work for you or not?

I mean that unlike other index access methods BRIN allow opclasses to
define custom support procedures.  These support procedures are not
directly called from AM, but might be called from other opclass
support procedures.  That allows to re-use the same high-level support
procedures in multiple opclasses.

So, distance support procedure is not directly called from AM.  We
don't have to change the interface between AM and opclass for that.
This is why I'm OK with that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Online verification of checksums

2019-03-02 Thread Fabien COELHO


Bonjour Michaël,


I gotta say, my conclusion from this debate is that it's simply a
mistake to do this without involvement of the server that can use
locking to prevent these kind of issues.  It seems pretty absurd to me
to have hacky workarounds around partial writes of a live server, around
truncation, etc, even though the server has ways to deal with that.


I agree with Andres on this one.  We are never going to make this stuff 
safe if we don't handle page reads with the proper locks because of torn 
pages. What I think we should do is provide a SQL function which reads a 
page in shared mode, and then checks its checksum if its LSN is older 
than the previous redo point.  This discards cases with rather hot 
pages, but if the page is hot enough then the backend re-reading the 
page would just do the same by verifying the page checksum by itself. -- 
Michael


My 0.02€ about that, as one of the reviewer of the patch:

I agree that having a server function (extension?) to do a full checksum 
verification, possibly bandwidth-controlled, would be a good thing. 
However it would have side effects, such as interfering deeply with the 
server page cache, which may or may not be desirable.


On the other hand I also see value in an independent system-level external 
tool capable of a best effort checksum verification: the current check 
that the cluster is offline to prevent pg_verify_checksum from running is 
kind of artificial, and when online simply counting 
online-database-related checksum issues looks like a reasonable 
compromise.


So basically I think that allowing pg_verify_checksum to run on an online 
cluster is still a good thing, provided that expected errors are correctly 
handled.


--
Fabien.

RE: Timeout parameters

2019-03-02 Thread Fabien COELHO




About the tcp_user_timeout libpq parameter v8.


Basically same thing about the tcp_user_timeout guc v8, especially: do you 
have any advice about how I can test the feature, i.e. trigger a timeout?


Patch applies & compiles cleanly. Global check is ok, although there are no 
specific tests.


Documentation English can be improved. Could a native speaker help, please?

ISTM that the documentation both states that it works and does not work on 
Windows. One assertion must be false.


Syntax error, eg "tcp_user_timeout=2bad", are detected, good.

I could not really test the feature, i.e. I could not trigger a timeout. Do 
you have a suggestion on how to test it?



--
Fabien.