date:20190627


On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.


I lack energy to actually read this patch right now, and I don't currently
have an opinion about whether it's worth another catversion bump to fix
this stuff in v12.  But I did test the patch, and I can confirm it gets
through the core regression tests on hppa (both gaur's host environment
with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).



Thanks for running it through regression tests, that alone is a very
useful piece of information for me.

As for the catversion bump - I'd probably vote to do it. Not just because
of this serialization stuff, but to fix the pg_mcv_list_items function.
It's not something I'm very enthusiastic about (kinda embarassed about it,
really), but it seems better than shipping something that we'll need to
rework in PG13.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: GiST VACUUM

2019-06-27 Thread Heikki Linnakangas


On 26/06/2019 06:07, Thomas Munro wrote:

On Wed, Jun 26, 2019 at 2:33 PM Michael Paquier  wrote:

On Tue, Jun 25, 2019 at 02:38:43PM +0500, Andrey Borodin wrote:

I feel a little uncomfortable with number 0x7fff right in code.


PG_INT32_MAX...


MaxTransactionId / 2?


Yeah, that makes sense.

Here's a new version of the patches. Changes:

* I changed the reuse-logging so that we always write a reuse WAL 
record, even if the deleteXid is very old. I tried to avoid that with 
the check for MaxTransactionId / 2 or 0x7fff, but it had some 
problems. In the previous patch version, it wasn't just an optimization. 
Without the check, we would write 32-bit XIDs to the log that had 
already wrapped around, and that caused the standby to unnecessarily 
wait for or kill backends. But checking for MaxTransaction / 2 isn't 
quite enough: there was a small chance that the next XID counter 
advanced just after checking for that, so that we still logged an XID 
that had just wrapped around. A more robust way to deal with this is to 
log a full 64-bit XID, and check for wraparound at redo in the standby. 
And if we do that, trying to optimize this in the master doesn't seem 
that important anymore. So in this patch version, we always log the 
64-bit XID, and check for the MaxTransaction / 2 when replaying the WAL 
record instead.


* I moved the logic to extend a 32-bit XID to 64-bits to a new helper 
function in varsup.c.


* Instead of storing just a naked FullTransactionId in the "page 
contents" of a deleted page, I created a new struct for that. The effect 
is the same, but I think the struct clarifies what's happening, and it's 
a good location to attach a comment explaining it.


* Fixed the mixup between < and >

I haven't done any testing on this yet. Andrey, would you happen to have 
an environment ready to test this?


- Heikki
>From 7fd5901e15ac9e0f1928eeecbb9ae1212bacf3f3 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 4 Apr 2019 18:06:48 +0300
Subject: [PATCH 1/2] Refactor checks for deleted GiST pages.

The explicit check in gistScanPage() isn't currently really necessary, as
a deleted page is always empty, so the loop would fall through without
doing anything, anyway. But it's a marginal optimization, and it gives a
nice place to attach a comment to explain how it works.
---
 src/backend/access/gist/gist.c| 40 ---
 src/backend/access/gist/gistget.c | 14 +++
 2 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 470b121e7da..79030e77153 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -709,14 +709,15 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 			continue;
 		}
 
-		if (stack->blkno != GIST_ROOT_BLKNO &&
-			stack->parent->lsn < GistPageGetNSN(stack->page))
+		if ((stack->blkno != GIST_ROOT_BLKNO &&
+			 stack->parent->lsn < GistPageGetNSN(stack->page)) ||
+			GistPageIsDeleted(stack->page))
 		{
 			/*
-			 * Concurrent split detected. There's no guarantee that the
-			 * downlink for this page is consistent with the tuple we're
-			 * inserting anymore, so go back to parent and rechoose the best
-			 * child.
+			 * Concurrent split or page deletion detected. There's no
+			 * guarantee that the downlink for this page is consistent with
+			 * the tuple we're inserting anymore, so go back to parent and
+			 * rechoose the best child.
 			 */
 			UnlockReleaseBuffer(stack->buffer);
 			xlocked = false;
@@ -735,9 +736,6 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 			GISTInsertStack *item;
 			OffsetNumber downlinkoffnum;
 
-			/* currently, internal pages are never deleted */
-			Assert(!GistPageIsDeleted(stack->page));
-
 			downlinkoffnum = gistchoose(state.r, stack->page, itup, giststate);
 			iid = PageGetItemId(stack->page, downlinkoffnum);
 			idxtuple = (IndexTuple) PageGetItem(stack->page, iid);
@@ -858,12 +856,13 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 	 * leaf/inner is enough to recognize split for root
 	 */
 }
-else if (GistFollowRight(stack->page) ||
-		 stack->parent->lsn < GistPageGetNSN(stack->page))
+else if ((GistFollowRight(stack->page) ||
+		  stack->parent->lsn < GistPageGetNSN(stack->page)) &&
+		 GistPageIsDeleted(stack->page))
 {
 	/*
-	 * The page was split while we momentarily unlocked the
-	 * page. Go back to parent.
+	 * The page was split or deleted while we momentarily
+	 * unlocked the page. Go back to parent.
 	 */
 	UnlockReleaseBuffer(stack->buffer);
 	xlocked = false;
@@ -872,18 +871,6 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 }
 			}
 
-			/*
-			 * The page might have been deleted after we scanned the parent
-			 * and saw the downlink.
-			 */
-			if (GistPageIsDeleted(stack->page))
-			{
-UnlockReleaseBuffer(stack->buffer);
-xlocked =

Use relative rpath if possible

2019-06-27 Thread Peter Eisentraut

On several popular operating systems, we can use relative rpaths, using
the $ORIGIN placeholder, so that the resulting installation is
relocatable.  Then we also don't need to set LD_LIBRARY_PATH during make
check.

This implementation will use a relative rpath if bindir and libdir are
under the same common parent directory.

Supported platforms are: freebsd, linux, netbsd, openbsd, solaris

Information from https://lekensteyn.nl/rpath.html

(Yes, something for macOS would be nice, to work around SIP issues, but
I'll leave that as a separate future item.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e15d15749a7b19fc9fe933f3728668299a0201f4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 25 Jun 2019 11:54:51 +
Subject: [PATCH] Use relative rpath if possible

On several popular operating systems, we can use relative rpaths, using
the $ORIGIN placeholder, so that the resulting installation is
relocatable. Then we also don't need to set LD_LIBRARY_PATH during make
check.

This implementation will use a relative rpath if bindir and libdir are
under the same common parent directory.
---
 src/Makefile.global.in |  2 +-
 src/makefiles/Makefile.freebsd |  5 +
 src/makefiles/Makefile.linux   |  5 +
 src/makefiles/Makefile.netbsd  |  5 +
 src/makefiles/Makefile.openbsd |  5 +
 src/makefiles/Makefile.solaris | 10 ++
 6 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index b9d86ac..0b69064 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -422,7 +422,7 @@ ld_library_path_var = LD_LIBRARY_PATH
 # nothing.
 with_temp_install = \
PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" \
-   $(call add_to_path,$(strip 
$(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+   $(if $(strip $(ld_library_path_var)),$(call add_to_path,$(strip 
$(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir))) \
$(with_temp_install_extra)
 
 ifeq ($(enable_tap_tests),yes)
diff --git a/src/makefiles/Makefile.freebsd b/src/makefiles/Makefile.freebsd
index 98a6f50..6378551 100644
--- a/src/makefiles/Makefile.freebsd
+++ b/src/makefiles/Makefile.freebsd
@@ -2,8 +2,13 @@ AROPT = cr
 
 ifdef ELF_SYSTEM
 export_dynamic = -Wl,-export-dynamic
+ifeq ($(dir $(bindir)),$(dir $(libdir)))
+rpath = -Wl,-R,'$(subst $(dir $(libdir)),$$ORIGIN/../,$(rpathdir))',-z,origin
+ld_library_path_var =
+else
 rpath = -Wl,-R'$(rpathdir)'
 endif
+endif
 
 DLSUFFIX = .so
 
diff --git a/src/makefiles/Makefile.linux b/src/makefiles/Makefile.linux
index ac58fe4..d967eec 100644
--- a/src/makefiles/Makefile.linux
+++ b/src/makefiles/Makefile.linux
@@ -3,7 +3,12 @@ AROPT = crs
 export_dynamic = -Wl,-E
 # Use --enable-new-dtags to generate DT_RUNPATH instead of DT_RPATH.
 # This allows LD_LIBRARY_PATH to still work when needed.
+ifeq ($(dir $(bindir)),$(dir $(libdir)))
+rpath = -Wl,-rpath,'$(subst $(dir 
$(libdir)),$$ORIGIN/../,$(rpathdir))',--enable-new-dtags
+ld_library_path_var =
+else
 rpath = -Wl,-rpath,'$(rpathdir)',--enable-new-dtags
+endif
 
 DLSUFFIX = .so
 
diff --git a/src/makefiles/Makefile.netbsd b/src/makefiles/Makefile.netbsd
index 7bb9721..5cc152c 100644
--- a/src/makefiles/Makefile.netbsd
+++ b/src/makefiles/Makefile.netbsd
@@ -2,7 +2,12 @@ AROPT = cr
 
 ifdef ELF_SYSTEM
 export_dynamic = -Wl,-E
+ifeq ($(dir $(bindir)),$(dir $(libdir)))
+rpath = -Wl,-R,'$(subst $(dir $(libdir)),$$ORIGIN/../,$(rpathdir))'
+ld_library_path_var =
+else
 rpath = -Wl,-R'$(rpathdir)'
+endif
 else
 rpath = -Wl,-R'$(rpathdir)'
 endif
diff --git a/src/makefiles/Makefile.openbsd b/src/makefiles/Makefile.openbsd
index eda3110..c03987d 100644
--- a/src/makefiles/Makefile.openbsd
+++ b/src/makefiles/Makefile.openbsd
@@ -2,8 +2,13 @@ AROPT = cr
 
 ifdef ELF_SYSTEM
 export_dynamic = -Wl,-E
+ifeq ($(dir $(bindir)),$(dir $(libdir)))
+rpath = -Wl,-R,'$(subst $(dir $(libdir)),$$ORIGIN/../,$(rpathdir))',-z,origin
+ld_library_path_var =
+else
 rpath = -Wl,-R'$(rpathdir)'
 endif
+endif
 
 DLSUFFIX = .so
 
diff --git a/src/makefiles/Makefile.solaris b/src/makefiles/Makefile.solaris
index a7f5652..b61757f 100644
--- a/src/makefiles/Makefile.solaris
+++ b/src/makefiles/Makefile.solaris
@@ -4,10 +4,20 @@ AROPT = crs
 
 ifeq ($(with_gnu_ld), yes)
 export_dynamic = -Wl,-E
+ifeq ($(dir $(bindir)),$(dir $(libdir)))
+rpath = -Wl,-rpath,'$(subst $(dir $(libdir)),$$ORIGIN/../,$(rpathdir))'
+ld_library_path_var =
+else
 rpath = -Wl,-rpath,'$(rpathdir)'
+endif
+else
+ifeq ($(dir $(bindir)),$(dir $(libdir)))
+rpath = -Wl,-R,'$(subst $(dir $(libdir)),$$ORIGIN/../,$(rpathdir))'
+ld_library_path_var =
 else
 rpath = -Wl,-R'$(rpathdir)'
 endif
+endif
 
 DLSUFFIX = .so
 ifeq ($(GCC), yes)
-- 
1.8.3.1

Re: Obsolete comment in commands/analyze.c

2019-06-27 Thread Daniel Gustafsson

> On 27 Jun 2019, at 13:05, Etsuro Fujita  wrote:

> since the citation was also moved to
> utils/misc/sampling.c, I think the "see full citation below" part
> should be updated accordingly.  Attached is a patch for that.

Agreed, nice catch!

cheers ./daniel

Re: GiST VACUUM

2019-06-27 Thread Andrey Borodin




> 27 июня 2019 г., в 16:38, Heikki Linnakangas  написал(а):
> 
> I haven't done any testing on this yet. Andrey, would you happen to have an 
> environment ready to test this?

Thanks!

I will do some testing this evening (UTC+5). But I'm not sure I can reliably 
test wraparound of xids...
I will try to break code with usual setup which we used to stress vacuum with 
deletes and inserts.

Best regards, Andrey Borodin.

Re: PG 11 JIT deform failure

2019-06-27 Thread didier

Hi,

I searched the mailing list but found nothing. Any reason why
TupleDescAttr is a macro and not a static inline?

Rather than adding an Assert attached POC replace TupleDescAttr macro
by a static inline function with AssertArg.
It:
- Factorize Assert.

- Trigger an Assert in JIT_deform if natts is wrong.

- Currently In HEAD
src/backend/access/common/tupdesc.c:TupleDescCopyEntry() compiler can
optimize out AssertArg(PointerIsValid(...)), no idea
 if compiling with both cassert and -O2 make sense though).

- Remove two UB in memcpy when natts is zero.

Note:
Comment line 1480 in ../contrib/tablefunc/tablefunc.c is wrong it's
the fourth column.

Regards
Didier


On Thu, Jun 13, 2019 at 8:35 PM Andres Freund  wrote:
>
> Hi,
>
> On June 13, 2019 11:08:15 AM PDT, didier  wrote:
> >Extensions can do it, timescaledb in this case with:
> >INSERT INTO ... RETURNING *;
> >
> >Or replacing the test in llvm_compile_expr with  an Assert in
> >slot_compile_deform ?
>
> In that case we ought to never generate a deform expression step - core code 
> doesn't afair. That's only done I'd there's actually something to deform. I'm 
> fine with adding an assert tough
>
> Andres
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index a48a6cd757..23ef9a6a75 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -86,9 +86,6 @@ getmissingattr(TupleDesc tupleDesc,
 {
 	Form_pg_attribute att;
 
-	Assert(attnum <= tupleDesc->natts);
-	Assert(attnum > 0);
-
 	att = TupleDescAttr(tupleDesc, attnum - 1);
 
 	if (att->atthasmissing)
diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 6bc4e4c036..850771acf5 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -115,7 +115,8 @@ CreateTupleDescCopy(TupleDesc tupdesc)
 	desc = CreateTemplateTupleDesc(tupdesc->natts);
 
 	/* Flat-copy the attribute array */
-	memcpy(TupleDescAttr(desc, 0),
+	if (desc->natts)
+		memcpy(TupleDescAttr(desc, 0),
 		   TupleDescAttr(tupdesc, 0),
 		   desc->natts * sizeof(FormData_pg_attribute));
 
@@ -156,7 +157,8 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
 	desc = CreateTemplateTupleDesc(tupdesc->natts);
 
 	/* Flat-copy the attribute array */
-	memcpy(TupleDescAttr(desc, 0),
+	if (desc->natts)
+		memcpy(TupleDescAttr(desc, 0),
 		   TupleDescAttr(tupdesc, 0),
 		   desc->natts * sizeof(FormData_pg_attribute));
 
@@ -274,16 +276,6 @@ TupleDescCopyEntry(TupleDesc dst, AttrNumber dstAttno,
 	Form_pg_attribute dstAtt = TupleDescAttr(dst, dstAttno - 1);
 	Form_pg_attribute srcAtt = TupleDescAttr(src, srcAttno - 1);
 
-	/*
-	 * sanity checks
-	 */
-	AssertArg(PointerIsValid(src));
-	AssertArg(PointerIsValid(dst));
-	AssertArg(srcAttno >= 1);
-	AssertArg(srcAttno <= src->natts);
-	AssertArg(dstAttno >= 1);
-	AssertArg(dstAttno <= dst->natts);
-
 	memcpy(dstAtt, srcAtt, ATTRIBUTE_FIXED_PART_SIZE);
 
 	/*
@@ -611,13 +603,6 @@ TupleDescInitEntry(TupleDesc desc,
 	Form_pg_type typeForm;
 	Form_pg_attribute att;
 
-	/*
-	 * sanity checks
-	 */
-	AssertArg(PointerIsValid(desc));
-	AssertArg(attributeNumber >= 1);
-	AssertArg(attributeNumber <= desc->natts);
-
 	/*
 	 * initialize the attribute fields
 	 */
@@ -682,11 +667,6 @@ TupleDescInitBuiltinEntry(TupleDesc desc,
 {
 	Form_pg_attribute att;
 
-	/* sanity checks */
-	AssertArg(PointerIsValid(desc));
-	AssertArg(attributeNumber >= 1);
-	AssertArg(attributeNumber <= desc->natts);
-
 	/* initialize the attribute fields */
 	att = TupleDescAttr(desc, attributeNumber - 1);
 	att->attrelid = 0;			/* dummy value */
@@ -770,13 +750,6 @@ TupleDescInitEntryCollation(TupleDesc desc,
 			AttrNumber attributeNumber,
 			Oid collationid)
 {
-	/*
-	 * sanity checks
-	 */
-	AssertArg(PointerIsValid(desc));
-	AssertArg(attributeNumber >= 1);
-	AssertArg(attributeNumber <= desc->natts);
-
 	TupleDescAttr(desc, attributeNumber - 1)->attcollation = collationid;
 }
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d768b9b061..2fbd2ef207 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3842,7 +3842,6 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
 	}
 	else
 	{
-		Assert(attrnum <= tupdesc->natts);
 		att = TupleDescAttr(tupdesc, attrnum - 1);
 		return datumIsEqual(value1, value2, att->attbyval, att->attlen);
 	}
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 77a48b039d..f34334b2a2 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -2600,7 +2600,6 @@ expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
 		}
 	}
 
-	Assert(count <= tupdesc->natts);
 	for (varattno = 0; varattno < count; varattno++)
 	{
 		Form_pg_attribute attr = TupleDescAttr(tupdesc, varattno);
@@ -2837,8 +2836,6 @@ get_rte_att

Re: SQL/JSON path issues/questions

2019-06-27 Thread Thom Brown

On Wed, 19 Jun 2019 at 20:04, Alexander Korotkov
 wrote:
>
> On Wed, Jun 19, 2019 at 7:07 PM Thom Brown  wrote:
> > On Thu, 13 Jun 2019 at 14:59, Thom Brown  wrote:
> > >
> > > Hi,
> > >
> > > I've been reading through the documentation regarding jsonpath and
> > > jsonb_path_query etc., and I have found it lacking explanation for
> > > some functionality, and I've also had some confusion when using the
> > > feature.
> > >
> > > ? operator
> > > ==
> > > The first mention of '?' is in section 9.15, where it says:
> > >
> > > "Suppose you would like to retrieve all heart rate values higher than
> > > 130. You can achieve this using the following expression:
> > > '$.track.segments[*].HR ? (@ > 130)'"
> > >
> > > So what is the ? operator doing here?  Sure, there's the regular ?
> > > operator, which is given as an example further down the page:
> > >
> > > '{"a":1, "b":2}'::jsonb ? 'b'
> > >
> > > But this doesn't appear to have the same purpose.
> > >
> > >
> > > like_regex
> > > ==
> > > Then there's like_regex, which shows an example that uses the keyword
> > > "flag", but that is the only instance of that keyword being mentioned,
> > > and the flags available to this expression aren't anywhere to be seen.
> > >
> > >
> > > is unknown
> > > ==
> > > "is unknown" suggests a boolean output, but the example shows an
> > > output of "infinity".  While I understand what it does, this appears
> > > inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> > > pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> > > pg_is_in_backup() etc.).
> > >
> > >
> > > $varname
> > > ==
> > > The jsonpath variable, $varname, has an incomplete description: "A
> > > named variable. Its value must be set in the PASSING clause of an
> > > SQL/JSON query function. for details."
> > >
> > >
> > > Binary operation error
> > > ==
> > > I get an error when I run this query:
> > >
> > > postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> > > psql: ERROR:  right operand of jsonpath operator + is not a single 
> > > numeric value
> > >
> > > While I know it's correct to get an error in this scenario as there is
> > > no element beyond 0, the message I get is confusing.  I'd expect this
> > > if it encountered another array in that position, but not for
> > > exceeding the upper bound of the array.
> > >
> > >
> > > Cryptic error
> > > ==
> > > postgres=# SELECT jsonb_path_query('[1, "2",
> > > {},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
> > > psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath 
> > > input
> > > LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
> > >  ^
> > > Again, I expect an error, but the message produced doesn't help me.
> > > I'll remove the ANY_P if I can find it.
> > >
> > >
> > > Can't use nested arrays with jsonpath
> > > ==
> > >
> > > I encounter an error in this scenario:
> > >
> > > postgres=# select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == 
> > > [1,2])');
> > > psql: ERROR:  syntax error, unexpected '[' at or near "[" of jsonpath 
> > > input
> > > LINE 1: select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == ...
> > >
> > > So these filter operators only work with scalars?
> > >
> > >
> >
> > Another observation about the documentation is that the examples given
> > in 9.15. JSON Functions, Operators, and Expressions aren't all
> > functional.  Some example JSON is provided, followed by example
> > jsonpath queries which could be used against it.  These will produce
> > results for the reader wishing to test them out until this example:
> >
> > '$.track.segments[*].HR ? (@ > 130)'
> >
> > This is because there is no HR value greater than 130.  May I propose
> > setting this and all similar examples to (@ > 120) instead?
>
> Makes sense to me.
>
> > Also, this example doesn't work:
> >
> > '$.track ? (@.segments[*] ? (@.HR > 130)).segments.size()'
> >
> > This gives me:
> >
> > psql: ERROR:  syntax error, unexpected $end at end of jsonpath input
> > LINE 13: }','$.track ? (@.segments[*]');
> > ^
>
> Perhaps it should be following:
>
> '$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()'

I'm not clear on why the original example doesn't work here.

Thom

OpenSSL specific value under USE_SSL instead of USE_OPENSSL

2019-06-27 Thread Daniel Gustafsson

The ssl_ciphers GUC for configuring cipher suites sets the default value based
on USE_SSL but it’s actually an OpenSSL specific value.  As with other such
changes, it works fine now but will become an issue should we support other TLS
backends.  Attached patch fixes this.

cheers ./daniel




openssl_cipherconfig.patch
Description: Binary data

pg_receivewal documentation

2019-06-27 Thread Jesper Pedersen


Hi,

Here is a patch for the pg_receivewal documentation to highlight that 
WAL isn't acknowledged to be applied.


I'll add a CF entry for it.

Best regards,
 Jesper
>From 138e09c74db5ea08fd03b4e77853e2ca01742512 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Thu, 27 Jun 2019 09:54:44 -0400
Subject: [PATCH] Highlight that pg_receivewal doesn't acknowledge that WAL has
 been applied, and as such synchronous-commit needs to be remote_write or
 lower.

Author: Jesper Pedersen 
---
 doc/src/sgml/ref/pg_receivewal.sgml | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 0506120c00..132a599d1b 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -207,6 +207,14 @@ PostgreSQL documentation
 server as a synchronous standby, to ensure that timely feedback is
 sent to the server.

+
+   
+Note, that pg_receivewal doesn't acknowledge
+that the Write-Ahead Log () has been applied, so
+ needs to have a setting
+of remote_write or lower, if pg_receivewal
+is the only standby.
+   
   
  
 
-- 
2.21.0

Re: [HACKERS] Restricting maximum keep segments by repslots

2019-06-27 Thread Jehan-Guillaume de Rorthais

Hi all,

Being interested by this feature, I did a patch review.

This features adds the GUC "max_slot_wal_keep_size". This is the maximum amount
of WAL that can be kept in "pg_wal" by active slots.

If the amount of WAL is superior to this limit, the slot is deactivated and
its status (new filed in pg_replication_slot) is set as "lost".


Patching


The patch v13-0003 does not apply on HEAD anymore.

The patch v13-0005 applies using "git am --ignore-space-change"

Other patches applies correctly.

Please, find attached the v14 set of patches rebased on master.


Documentation
=

The documentation explains the GUC and related columns in "pg_replication_slot".

It reflects correctly the current behavior of the patch.


Usability
=

The patch implement what it described. It is easy to enable and disable. The
GUC name is describing correctly its purpose.

This feature is useful in some HA scenario where slot are required (eg. no
possible archiving), but where primary availability is more important than
standbys.

In "pg_replication_slots" view, the new "wal_status" field is misleading.
Consider this sentence and the related behavior from documentation
(catalogs.sgml):

  keeping means that some of them are to be removed by the
  next checkpoint.

"keeping" appears when the current checkpoint will delete some WAL further than
"current_lsn - max_slot_wal_keep_size", but still required by at least one slot.
As some WAL required by some slots will be deleted quite soon, probably before
anyone can react, "keeping" status is misleading here. We are already in the
red zone.

I would expect this "wal_status" to be:

- streaming: slot lag between 0 and "max_wal_size"
- keeping: slot lag between "max_wal_size" and "max_slot_wal_keep_size". the
  slot actually protect some WALs from being deleted
- lost: slot lag superior of max_slot_wal_keep_size. The slot couldn't protect
  some WAL from deletion

Documentation follows with:

  The last two states are seen only when max_slot_wal_keep_size is
  non-negative

This is true with the current behavior. However, if "keeping" is set as soon as
te slot lag is superior than "max_wal_size", this status could be useful even
with "max_slot_wal_keep_size = -1". As soon as a slot is stacking WALs that
should have been removed by previous checkpoint, it "keeps" them.


Feature tests
=

I have played with various traffic shaping setup between nodes, to observe how
columns "active", "wal_status" and "remain" behaves in regard to each others
using:

  while true; do
   sleep 0.3; 
   psql -p 5432 -AXtc "
select now(), active, restart_lsn, wal_status, pg_size_pretty(remain)
from pg_replication_slots
where slot_name='slot_limit_st'" 
  done

The primary is created using:

  initdb -U postgres -D slot_limit_pr --wal-segsize=1

  cat<>slot_limit_pr/postgresql.conf
  port=5432
  max_wal_size = 3MB
  min_wal_size = 2MB
  max_slot_wal_keep_size = 4MB
  logging_collector = on
  synchronous_commit = off
  EOF

WAL activity is generated using a simple pgbench workload. Then, during
this activity, packets on loopback are delayed using:

  tc qdisc add dev lo root handle 1:0 netem delay 140msec

Here is how the wal_status behave. I removed the timestamps, but the
record order is the original one:

 t|1/7B116898|streaming|1872 kB
 t|1/7B1A|lost|0 bytes
 t|1/7B32|keeping|0 bytes
 t|1/7B78|lost|0 bytes
 t|1/7BB0|keeping|0 bytes
 t|1/7BE0|keeping|0 bytes
 t|1/7C10|lost|0 bytes
 t|1/7C40|keeping|0 bytes
 t|1/7C70|lost|0 bytes
 t|1/7CA4|keeping|0 bytes
 t|1/7CDE|lost|0 bytes
 t|1/7D10|keeping|0 bytes
 t|1/7D40|keeping|0 bytes
 t|1/7D7C|keeping|0 bytes
 t|1/7DB4|keeping|0 bytes
 t|1/7DE6|lost|0 bytes
 t|1/7E18|keeping|0 bytes
 t|1/7E50|keeping|0 bytes
 t|1/7E86|lost|0 bytes
 t|1/7EB8|keeping|0 bytes
 [...x15]
 t|1/8080|keeping|0 bytes
 t|1/8090|streaming|940 kB
 t|1/80A0|streaming|1964 kB

When increasing the network delay to 145ms, the slot has been lost for real.
Note that it has been shown as lost but active twice (during approx 0.6s) before
being deactivated.

 t|1/8570|streaming|2048 kB
 t|1/8580|keeping|0 bytes
 t|1/8594|lost|0 bytes
 t|1/85AC|lost|0 bytes
 f|1/85C4|lost|0 bytes

Finally, at least once the following messages appeared in primary logs
**before** the "wal_status" changed from "keeping" to "streaming":

 WARNING:  some replication slots have lost required WAL segments

So the slot lost one WAL, but the standby has been able to catch-up anyway.
 
My humble opinion about these results:

* after many different tests, the status "keeping" appears only when "remain"
  equals 0. In current implementation, "keeping" really adds no value...
* "remain" should be NULL if "max_slot_wal_keep_size=-1 or if the slot isn't
  a

Missing PG 12 stable branch

2019-06-27 Thread Bruce Momjian

Does anyone know why there is no PG 12 stable branch in our git tree?

$ git branch -l
  REL9_4_STABLE
  REL9_5_STABLE
  REL9_6_STABLE
  REL_10_STABLE
  REL_11_STABLE
  master

They exist for earlier releases.  Is this a problem?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Missing PG 12 stable branch

2019-06-27 Thread Bruce Momjian

On Thu, Jun 27, 2019 at 10:28:40AM -0400, Bruce Momjian wrote:
> Does anyone know why there is no PG 12 stable branch in our git tree?
> 
>   $ git branch -l
> REL9_4_STABLE
> REL9_5_STABLE
> REL9_6_STABLE
> REL_10_STABLE
> REL_11_STABLE
> master
> 
> They exist for earlier releases.  Is this a problem?

Sorry, I now realize that we haven't branched git yet for PG 12, so
there is no branch.  I have fixed pglife to handle that case:

https://pglife.momjian.us/

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Missing PG 12 stable branch

Bruce Momjian  writes:
> Does anyone know why there is no PG 12 stable branch in our git tree?

For the record, I'm intending to make the branch as soon as the
July CF starts (i.e., first thing Monday).

regards, tom lane

Re: Missing PG 12 stable branch


On Thu, Jun 27, 2019 at 10:28:40AM -0400, Bruce Momjian wrote:

Does anyone know why there is no PG 12 stable branch in our git tree?

$ git branch -l
  REL9_4_STABLE
  REL9_5_STABLE
  REL9_6_STABLE
  REL_10_STABLE
  REL_11_STABLE
  master

They exist for earlier releases.  Is this a problem?



We haven't stamped master as 13dev yet ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Obsolete comment in commands/analyze.c


On Thu, Jun 27, 2019 at 01:53:52PM +0200, Daniel Gustafsson wrote:

On 27 Jun 2019, at 13:05, Etsuro Fujita  wrote:



since the citation was also moved to
utils/misc/sampling.c, I think the "see full citation below" part
should be updated accordingly.  Attached is a patch for that.


Agreed, nice catch!



Thanks, committed.


cheers

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix doc bug in logical replication.

On Sun, Jun 23, 2019 at 10:26:47PM -0400, Robert Treat wrote:

On Sun, Jun 23, 2019 at 1:25 PM Peter Eisentraut
 wrote:

On 2019-04-12 19:52, Robert Treat wrote:
> It is clear to me that the docs are wrong, but I don't see anything
> inherently incorrect about the code itself. Do you have suggestions
> for how you would like to see the code comments improved?

The question is perhaps whether we want to document that non-matching
data types do work.  It happens to work now, but do we always want to
guarantee that?  There is talk of a binary mode for example.

Whether we *want* to document that it works, documenting that it
doesn't work when it does can't be the right answer. If you want to
couch the language to leave the door open that we may not support this
the same way in the future I wouldn't be opposed to that, but at this
point we will have three releases with the current behavior in
production, so if we decide to change the behavior, it is likely going
to break certain use cases. That may be ok, but I'd expect a
documentation update to accompany a change that would cause such a
breaking change.

I agree with that. We have this behavior for quite a bit of time, and
while technically we could change the behavior in the future (using the
"not supported" statement), IMO that'd be pretty annoying move. I always
despised systems that "fix" bugs by documenting that it does not work, and
this is a bit similar.

FWIW I don't quite see why supporting binary mode would change this?
Surely we can't just enable binary mode blindly, there need to be some
sort of checks (alignment, type sizes, ...) with fallback to text mode.
And perhaps support only for built-in types.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> In general, the point I'm trying to make is that our policy should
>  Tom> be "Ties are broken arbitrarily, and if you don't like the choice
>  Tom> that initdb makes, here's how to fix it".

> Yes, you've repeated that point at some length, and I am not convinced.

[ shrug... ]  You haven't convinced me, either.  By my count we each have
about 0.5 other votes in favor of our positions, so barring more opinions
there's no consensus here for the sort of behavioral change you suggest.

However, not to let the perfect be the enemy of the good, it seems like
nobody has spoken against the ideas of (a) installing negative preferences
for the "localtime" and "posixrules" pseudo-zones, and (b) getting rid of
our now-unnecessary special treatment for "Factory".  How about we do that
much and leave any more-extensive change for another day?

regards, tom lane

Re: GiST VACUUM

2019-06-27 Thread Andrey Borodin




> 27 июня 2019 г., в 16:38, Heikki Linnakangas  написал(а):
> 
> I haven't done any testing on this yet. Andrey, would you happen to have an 
> environment ready to test this?

Patches do not deadlock and delete pages on "rescan test" - setup that we used 
to detect "left jumps" in during development of physical vacuum. check-world is 
happy on my machine.
I really like that now there is GISTDeletedPageContents and we do not just cast 
*(FullTransactionId *) PageGetContents(page).

But I have stupid question again, about this code:

https://github.com/x4m/postgres_g/commit/096d5586537d29ff316ca8ce074bbe1b325879ee#diff-754126824470cb8e68fd5e32af6d3bcaR417

nextFullXid = ReadNextFullTransactionId();
diff = U64FromFullTransactionId(nextFullXid) -
U64FromFullTransactionId(latestRemovedFullXid);
if (diff < MaxTransactionId / 2)
{
TransactionId latestRemovedXid;

// sleep(100500 hours); latestRemovedXid 
becomes xid from future

latestRemovedXid = 
XidFromFullTransactionId(latestRemovedFullXid);

ResolveRecoveryConflictWithSnapshot(latestRemovedXid,

xlrec->node);
}

Do we have a race condition here? Can latestRemovedXid overlap and start to be 
xid from future?
I understand that it is purely hypothetical, but still latestRemovedXid is from 
ancient past already. 

Best regards, Andrey Borodin.

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-27 Thread Robert Haas

On Tue, Jun 25, 2019 at 8:18 PM Tom Lane  wrote:
> As long as we have a trivial and obviously apolitical rule like
> alphabetical order, I think we can skate over such things; but the minute
> we have any sort of human choices involved there, we're going to be
> getting politically driven requests to do-it-like-this-because-I-think-
> the-default-should-be-that.  Again, trawl the tzdb list archives for
> awhile if you think this might not be a problem:
> http://mm.icann.org/pipermail/tz/

I'm kind of unsure what to think about this whole debate
substantively. If Andrew is correct that zone.tab or zone1970.tab is a
list of time zone names to be preferred over alternatives, then it
seems like we ought to prefer them. He remarks that we are preferring
"deprecated backward-compatibility aliases" and to the extent that
this is true, it seems like a bad thing. We can't claim to be
altogether here apolitical, because when those deprecated
backward-compatibility names are altogether removed, we are going to
remove them and they're going to stop working. If we know which ones
are likely to suffer that fate eventually, we ought to stop spitting
them out. It's no more political to de-prefer them when upstream does
than it is to remove them with the upstream does.

However, I don't know whether Andrew is right about those things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Regression tests vs existing users in an installation

2019-06-27 Thread Robert Haas

On Tue, Jun 25, 2019 at 11:33 PM Tom Lane  wrote:
> Comments?

LGTM.

s/must/should/ ?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix doc bug in logical replication.

2019-06-27 Thread Dave Cramer

On Thu, 27 Jun 2019 at 12:50, Tomas Vondra 
wrote:

> On Sun, Jun 23, 2019 at 10:26:47PM -0400, Robert Treat wrote:
> >On Sun, Jun 23, 2019 at 1:25 PM Peter Eisentraut
> > wrote:
> >>
> >> On 2019-04-12 19:52, Robert Treat wrote:
> >> > It is clear to me that the docs are wrong, but I don't see anything
> >> > inherently incorrect about the code itself. Do you have suggestions
> >> > for how you would like to see the code comments improved?
> >>
> >> The question is perhaps whether we want to document that non-matching
> >> data types do work.  It happens to work now, but do we always want to
> >> guarantee that?  There is talk of a binary mode for example.
> >>
> >
> >Whether we *want* to document that it works, documenting that it
> >doesn't work when it does can't be the right answer. If you want to
> >couch the language to leave the door open that we may not support this
> >the same way in the future I wouldn't be opposed to that, but at this
> >point we will have three releases with the current behavior in
> >production, so if we decide to change the behavior, it is likely going
> >to break certain use cases. That may be ok, but I'd expect a
> >documentation update to accompany a change that would cause such a
> >breaking change.
> >
>
> I agree with that. We have this behavior for quite a bit of time, and
> while technically we could change the behavior in the future (using the
> "not supported" statement), IMO that'd be pretty annoying move. I always
> despised systems that "fix" bugs by documenting that it does not work, and
> this is a bit similar.
>
> FWIW I don't quite see why supporting binary mode would change this?
> Surely we can't just enable binary mode blindly, there need to be some
> sort of checks (alignment, type sizes, ...) with fallback to text mode.
> And perhaps support only for built-in types.
>

The proposed implementation of binary only supports built-in types.
The subscriber turns it on so presumably it can handle the binary data
coming at it.

Dave

>
>

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

Robert Haas  writes:
> I'm kind of unsure what to think about this whole debate
> substantively. If Andrew is correct that zone.tab or zone1970.tab is a
> list of time zone names to be preferred over alternatives, then it
> seems like we ought to prefer them.

It's not really clear to me that the IANA folk intend those files to
be read as a list of preferred zone names.  If they do, what are we
to make of the fact that no variant of "UTC" appears in them?

> He remarks that we are preferring
> "deprecated backward-compatibility aliases" and to the extent that
> this is true, it seems like a bad thing. We can't claim to be
> altogether here apolitical, because when those deprecated
> backward-compatibility names are altogether removed, we are going to
> remove them and they're going to stop working. If we know which ones
> are likely to suffer that fate eventually, we ought to stop spitting
> them out. It's no more political to de-prefer them when upstream does
> than it is to remove them with the upstream does.

I think that predicting what IANA will do in the future is a fool's
errand.  Our contract is to select some one of the aliases that the
tzdb database presents, not to guess about whether it might present
a different set in the future.  (Also note that a lot of the observed
variation here has to do with whether individual platforms choose to
install backward-compatibility zone names.  I think the odds that
IANA proper will remove those links are near zero; TTBOMK they
never have removed one yet.)

More generally, my unhappiness about Andrew's proposal is:

1. It's solving a problem that just about nobody cares about, as
evidenced by the very tiny number of complaints we've had to date.
As long as the "timezone" setting has the correct external behavior
(UTC offset, DST rules, and abbreviations), very few people notice
it at all.  With the addition of the code to resolve /etc/localtime
when it's a symlink, the population of people who might care has
taken a further huge drop.

2. Changing this behavior might create more problems than it solves.
In particular, it seemed to me that a lot of the complaints in the
UCT/UTC kerfuffle were less about "UCT is a silly name for my zone"
than about "this change broke my regression test that expected
timezone to be set to X in this environment".  Rearranging the tiebreak
rules is just going to make different sets of such people unhappy.
(Admittedly, the symlink-lookup addition has already created some
risk of this ilk.  Maybe we should wait for that to be in the field
for more than a week before we judge whether further hacking is
advisable.)

3. The proposal has technical issues, in particular I'm not nearly
as sanguine as Andrew is about whether we can rely on zone[1970].tab
to be available.

So I'm very unexcited about writing a bunch of new code or opening
ourselves to politically-driven complaints in order to change this.
It seems like a net loss almost independently of the details.

regards, tom lane

Hypothetical indexes using BRIN broken since pg10

2019-06-27 Thread Julien Rouhaud

Hello,

I just realized that 7e534adcdc7 broke support for hypothetical
indexes using BRIN am.  Attached patch fix the issue.

There's no interface to provide the hypothetical pagesPerRange value,
so I used the default one, and used simple estimates.

I'll add this patch to the next commitfest.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d7e3f09f1a..64f27369c0 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -102,6 +102,7 @@
 #include 
 
 #include "access/brin.h"
+#include "access/brin_page.h"
 #include "access/gin.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
@@ -6802,12 +6803,26 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 			  &spc_seq_page_cost);
 
 	/*
-	 * Obtain some data from the index itself.  A lock should have already
-	 * been obtained on the index in plancat.c.
+	 * Obtain some data from the index itself, if possible.  Otherwise invent
+	 * some plausible internal statistics based on the relation page count.
 	 */
-	indexRel = index_open(index->indexoid, NoLock);
-	brinGetStats(indexRel, &statsData);
-	index_close(indexRel, NoLock);
+	if (!index->hypothetical)
+	{
+		/*
+		 * A lock should have already been obtained on the index in plancat.c.
+		 */
+		indexRel = index_open(index->indexoid, NoLock);
+		brinGetStats(indexRel, &statsData);
+		index_close(indexRel, NoLock);
+	}
+	else
+	{
+		indexRanges = Max(ceil((double) baserel->pages /
+	BRIN_DEFAULT_PAGES_PER_RANGE), 1.0);
+
+		statsData.pagesPerRange = BRIN_DEFAULT_PAGES_PER_RANGE;
+		statsData.revmapNumPages = (indexRanges / REVMAP_PAGE_MAXITEMS) + 1;
+	}
 
 	/*
 	 * Compute index correlation

Re: POC: Cleaning up orphaned files using undo logs

2019-06-27 Thread Robert Haas

On Tue, Jun 25, 2019 at 4:00 PM Amit Kapila  wrote:
> [ new patches ]

I happened to open up 0001 from this series, which is from Thomas, and
I do not think that the pg_buffercache changes are correct.  The idea
here is that the customer might install version 1.3 or any prior
version on an old release, then upgrade to PostgreSQL 13. When they
do, they will be running with the old SQL definitions and the new
binaries.  At that point, it sure looks to me like the code in
pg_buffercache_pages.c is going to do the Wrong Thing.  There's some
existing code in there to omit the pinning_backends column if the SQL
definitions don't know about it; instead of adding similar code for
the newly-added smgrid column, this patch rips out the existing
backward-compatibility code.  I think that's a double fail.

At some later point, the customer is going to run ALTER EXTENSION
pg_buffercache UPDATE. At that point they are going to be expecting
that their SQL definitions now match the state that would have been
created had they installed on pg_buffercache for the first time on
PG13, starting with pg_buffercache v1.4. Since the view is now going
to have a new column, that would seem to required dropping and
recreating the view, but pg_buffercache--1.3--1.4.sql doesn't do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Hypothetical indexes using BRIN broken since pg10

Hi, thanks for the patch.

On 2019-Jun-27, Julien Rouhaud wrote:

> I just realized that 7e534adcdc7 broke support for hypothetical
> indexes using BRIN am.  Attached patch fix the issue.
> 
> There's no interface to provide the hypothetical pagesPerRange value,
> so I used the default one, and used simple estimates.

I think it would look nicer to have a routine parallel to brinGetStats()
(brinGetStatsHypothetical?), instead of polluting selfuncs.c with these
gory details.

This seems back-patchable ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix doc bug in logical replication.

On Thu, Jun 27, 2019 at 01:46:47PM -0400, Dave Cramer wrote:

On Thu, 27 Jun 2019 at 12:50, Tomas Vondra 
wrote:

On Sun, Jun 23, 2019 at 10:26:47PM -0400, Robert Treat wrote:
>On Sun, Jun 23, 2019 at 1:25 PM Peter Eisentraut
> wrote:
>>
>> On 2019-04-12 19:52, Robert Treat wrote:
>> > It is clear to me that the docs are wrong, but I don't see anything
>> > inherently incorrect about the code itself. Do you have suggestions
>> > for how you would like to see the code comments improved?
>>
>> The question is perhaps whether we want to document that non-matching
>> data types do work.  It happens to work now, but do we always want to
>> guarantee that?  There is talk of a binary mode for example.
>>
>
>Whether we *want* to document that it works, documenting that it
>doesn't work when it does can't be the right answer. If you want to
>couch the language to leave the door open that we may not support this
>the same way in the future I wouldn't be opposed to that, but at this
>point we will have three releases with the current behavior in
>production, so if we decide to change the behavior, it is likely going
>to break certain use cases. That may be ok, but I'd expect a
>documentation update to accompany a change that would cause such a
>breaking change.
>

I agree with that. We have this behavior for quite a bit of time, and
while technically we could change the behavior in the future (using the
"not supported" statement), IMO that'd be pretty annoying move. I always
despised systems that "fix" bugs by documenting that it does not work, and
this is a bit similar.

FWIW I don't quite see why supporting binary mode would change this?
Surely we can't just enable binary mode blindly, there need to be some
sort of checks (alignment, type sizes, ...) with fallback to text mode.
And perhaps support only for built-in types.

The proposed implementation of binary only supports built-in types.
The subscriber turns it on so presumably it can handle the binary data
coming at it.

I don't recall that being discussed in the patch thread, but maybe it
should not be enabled merely based on what the subscriber requests. Maybe
the subscriber should indicate "interest" and the decision should be made
on the upstream, after some additional checks.

That's why pglogical does check_binary_compatibility() - see [1].

This is necessary, because the FE/BE protocol docs [2] say:

   Keep in mind that binary representations for complex data types might
   change across server versions; the text format is usually the more
   portable choice.

So you can't just assume the subscriber knows what it's doing, because
either of the sides might be upgraded.

Note: The pglogical code also does check additional stuff (like
sizeof(Datum) or endianess), but I'm not sure that's actually necessary -
I believe the binary protocol should be independent of that.

regards

[1] 
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_output_plugin.c#L107

[2] https://www.postgresql.org/docs/current/protocol-overview.html

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Hypothetical indexes using BRIN broken since pg10

2019-06-27 Thread Julien Rouhaud

On Thu, Jun 27, 2019 at 8:14 PM Alvaro Herrera  wrote:
>
> Hi, thanks for the patch.

Thanks for looking at it!

> On 2019-Jun-27, Julien Rouhaud wrote:
>
> > I just realized that 7e534adcdc7 broke support for hypothetical
> > indexes using BRIN am.  Attached patch fix the issue.
> >
> > There's no interface to provide the hypothetical pagesPerRange value,
> > so I used the default one, and used simple estimates.
>
> I think it would look nicer to have a routine parallel to brinGetStats()
> (brinGetStatsHypothetical?), instead of polluting selfuncs.c with these
> gory details.

I'm not opposed to it, but I used the same approach as a similar fix
for gincostestimate() (see 7fb008c5ee5).  If we add an hypothetical
version of brinGetStats(), we should also do it for ginGetStats().

> This seems back-patchable ...

I definitely hope so!

Re: pglz performance

2019-06-27 Thread Andrey Borodin



> 13 мая 2019 г., в 12:14, Michael Paquier  написал(а):
> 
> Decompression can matter a lot for mostly-read workloads and
> compression can become a bottleneck for heavy-insert loads, so
> improving compression or decompression should be two separate
> problems, not two problems linked.  Any improvement in one or the
> other, or even both, is nice to have.

Here's patch hacked by Vladimir for compression.

Key differences (as far as I see, maybe Vladimir will post more complete list 
of optimizations):
1. Use functions instead of macro-functions: not surprisingly it's easier to 
optimize them and provide less constraints for compiler to optimize.
2. More compact hash table: use indexes instead of pointers.
3. More robust segment comparison: like memcmp, but return index of first 
different byte

In weighted mix of different data (same as for compression), overall speedup is 
x1.43 on my machine.

Current implementation is integrated into test_pglz suit for benchmarking 
purposes[0].

Best regards, Andrey Borodin.

[0] https://github.com/x4m/test_pglz


0001-Reorganize-pglz-compression-code.patch
Description: Binary data

Re: Fix doc bug in logical replication.

2019-06-27 Thread Dave Cramer

On Thu, 27 Jun 2019 at 14:20, Tomas Vondra 
wrote:

> On Thu, Jun 27, 2019 at 01:46:47PM -0400, Dave Cramer wrote:
> >On Thu, 27 Jun 2019 at 12:50, Tomas Vondra 
> >wrote:
> >
> >> On Sun, Jun 23, 2019 at 10:26:47PM -0400, Robert Treat wrote:
> >> >On Sun, Jun 23, 2019 at 1:25 PM Peter Eisentraut
> >> > wrote:
> >> >>
> >> >> On 2019-04-12 19:52, Robert Treat wrote:
> >> >> > It is clear to me that the docs are wrong, but I don't see anything
> >> >> > inherently incorrect about the code itself. Do you have suggestions
> >> >> > for how you would like to see the code comments improved?
> >> >>
> >> >> The question is perhaps whether we want to document that non-matching
> >> >> data types do work.  It happens to work now, but do we always want to
> >> >> guarantee that?  There is talk of a binary mode for example.
> >> >>
> >> >
> >> >Whether we *want* to document that it works, documenting that it
> >> >doesn't work when it does can't be the right answer. If you want to
> >> >couch the language to leave the door open that we may not support this
> >> >the same way in the future I wouldn't be opposed to that, but at this
> >> >point we will have three releases with the current behavior in
> >> >production, so if we decide to change the behavior, it is likely going
> >> >to break certain use cases. That may be ok, but I'd expect a
> >> >documentation update to accompany a change that would cause such a
> >> >breaking change.
> >> >
> >>
> >> I agree with that. We have this behavior for quite a bit of time, and
> >> while technically we could change the behavior in the future (using the
> >> "not supported" statement), IMO that'd be pretty annoying move. I always
> >> despised systems that "fix" bugs by documenting that it does not work,
> and
> >> this is a bit similar.
> >>
> >> FWIW I don't quite see why supporting binary mode would change this?
> >> Surely we can't just enable binary mode blindly, there need to be some
> >> sort of checks (alignment, type sizes, ...) with fallback to text mode.
> >> And perhaps support only for built-in types.
> >>
> >
> >The proposed implementation of binary only supports built-in types.
> >The subscriber turns it on so presumably it can handle the binary data
> >coming at it.
> >
>
> I don't recall that being discussed in the patch thread, but maybe it
> should not be enabled merely based on what the subscriber requests. Maybe
> the subscriber should indicate "interest" and the decision should be made
> on the upstream, after some additional checks.
>
> That's why pglogical does check_binary_compatibility() - see [1].
>
> This is necessary, because the FE/BE protocol docs [2] say:
>
> Keep in mind that binary representations for complex data types might
> change across server versions; the text format is usually the more
> portable choice.
>
> So you can't just assume the subscriber knows what it's doing, because
> either of the sides might be upgraded.
>
> Note: The pglogical code also does check additional stuff (like
> sizeof(Datum) or endianess), but I'm not sure that's actually necessary -
> I believe the binary protocol should be independent of that.
>
>
> regards
>
> [1]
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_output_plugin.c#L107
>
> [2] https://www.postgresql.org/docs/current/protocol-overview.html
>
>
Thanks for the pointer. I'll add that to the patch.


Dave Cramer

da...@postgresintl.com
www.postgresintl.com


>
>

Re: Hypothetical indexes using BRIN broken since pg10

On 2019-Jun-27, Julien Rouhaud wrote:

> On Thu, Jun 27, 2019 at 8:14 PM Alvaro Herrera  
> wrote:

> > I think it would look nicer to have a routine parallel to brinGetStats()
> > (brinGetStatsHypothetical?), instead of polluting selfuncs.c with these
> > gory details.
> 
> I'm not opposed to it, but I used the same approach as a similar fix
> for gincostestimate() (see 7fb008c5ee5).

How many #define lines did you have to add to selfuncs there?

> If we add an hypothetical
> version of brinGetStats(), we should also do it for ginGetStats().

Dunno, seems pointless.  The GIN case is different.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Hypothetical indexes using BRIN broken since pg10

Alvaro Herrera  writes:
> On 2019-Jun-27, Julien Rouhaud wrote:
>> On Thu, Jun 27, 2019 at 8:14 PM Alvaro Herrera  
>> wrote:
>>> I think it would look nicer to have a routine parallel to brinGetStats()
>>> (brinGetStatsHypothetical?), instead of polluting selfuncs.c with these
>>> gory details.

>> I'm not opposed to it, but I used the same approach as a similar fix
>> for gincostestimate() (see 7fb008c5ee5).

> How many #define lines did you have to add to selfuncs there?

FWIW, the proposed patch doesn't seem to me like it adds much more
BRIN-specific knowledge to brincostestimate than is there already.

I think a more useful response to your modularity concern would be
to move all the [indextype]costestimate functions out of the common
selfuncs.c file and into per-AM files.  I fooled around with that
while trying to refactor selfuncs.c back in February, but I didn't
come up with something that seemed clearly better.  Still, as we
move into a world with external index AMs, I think we're going to
have to make that happen eventually.

regards, tom lane

Missing hook for UPPERREL_PARTIAL_GROUP_AGG rels?

2019-06-27 Thread Erik Nordström

Hi,

I noticed that the create_upper_paths_hook is never called for partially
grouped rels. Is this intentional or a bug? Only the FDW routine's
GetForeignUpperPaths hook is called for partially grouped rels. This seems
odd since the regular create_upper_paths_hook gets called for all other
upper rels.

Regards,

Erik
Timescale

Re: Missing hook for UPPERREL_PARTIAL_GROUP_AGG rels?

=?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> I noticed that the create_upper_paths_hook is never called for partially
> grouped rels. Is this intentional or a bug? Only the FDW routine's
> GetForeignUpperPaths hook is called for partially grouped rels. This seems
> odd since the regular create_upper_paths_hook gets called for all other
> upper rels.

This seems possibly related to the discussion re set_rel_pathlist_hook
at

https://www.postgresql.org/message-id/flat/CADsUR0AaPx4sVgmnuVJ_bOkocccQZGubv6HajzW826rbSmFpCg%40mail.gmail.com

I don't think we've quite resolved what to do there, but maybe
create_upper_paths_hook needs to be looked at at the same time.

regards, tom lane

Re: Hypothetical indexes using BRIN broken since pg10

On 2019-Jun-27, Tom Lane wrote:

> FWIW, the proposed patch doesn't seem to me like it adds much more
> BRIN-specific knowledge to brincostestimate than is there already.

It's this calculation that threw me off:
statsData.revmapNumPages = (indexRanges / REVMAP_PAGE_MAXITEMS) + 1;
ISTM that selfuncs has no reason to learn about revmap low-level
details.

> I think a more useful response to your modularity concern would be
> to move all the [indextype]costestimate functions out of the common
> selfuncs.c file and into per-AM files.

Yeah, that would be nice, but then I'm not going to push Julien to do
that to fix just this one problem; and on the other hand, that's even
less of a back-patchable fix.

> I fooled around with that while trying to refactor selfuncs.c back in
> February, but I didn't come up with something that seemed clearly
> better.  Still, as we move into a world with external index AMs, I
> think we're going to have to make that happen eventually.

No disagreement.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Hypothetical indexes using BRIN broken since pg10

Alvaro Herrera  writes:
> On 2019-Jun-27, Tom Lane wrote:
>> FWIW, the proposed patch doesn't seem to me like it adds much more
>> BRIN-specific knowledge to brincostestimate than is there already.

> It's this calculation that threw me off:
>   statsData.revmapNumPages = (indexRanges / REVMAP_PAGE_MAXITEMS) + 1;
> ISTM that selfuncs has no reason to learn about revmap low-level
> details.

Um ... it's accounting for revmap pages already (which is why it needs
this field set), so hasn't that ship sailed?

regards, tom lane

Re: Hypothetical indexes using BRIN broken since pg10

On 2019-Jun-27, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Jun-27, Tom Lane wrote:
> >> FWIW, the proposed patch doesn't seem to me like it adds much more
> >> BRIN-specific knowledge to brincostestimate than is there already.
> 
> > It's this calculation that threw me off:
> > statsData.revmapNumPages = (indexRanges / REVMAP_PAGE_MAXITEMS) + 1;
> > ISTM that selfuncs has no reason to learn about revmap low-level
> > details.
> 
> Um ... it's accounting for revmap pages already (which is why it needs
> this field set), so hasn't that ship sailed?

Yes, but does it need to know how many items there are in a revmap page?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Hypothetical indexes using BRIN broken since pg10

Alvaro Herrera  writes:
> On 2019-Jun-27, Tom Lane wrote:
>> Um ... it's accounting for revmap pages already (which is why it needs
>> this field set), so hasn't that ship sailed?

> Yes, but does it need to know how many items there are in a revmap page?

Dunno, I just can't get excited about exposing REVMAP_PAGE_MAXITEMS.
Especially not since we seem to agree on the long-term solution here,
and what you're suggesting to Julien doesn't particularly fit into
that long-term solution.

regards, tom lane

Re: Hypothetical indexes using BRIN broken since pg10

On 2019-Jun-27, Tom Lane wrote:

> Dunno, I just can't get excited about exposing REVMAP_PAGE_MAXITEMS.
> Especially not since we seem to agree on the long-term solution here,
> and what you're suggesting to Julien doesn't particularly fit into
> that long-term solution.

Well, it was brin_page.h, which is supposed to be internal to BRIN
itself.  But since we admit that in its current state selfuncs.c is not
salvageable as a module and we'll redo the whole thing in the short
term, I withdraw my comment.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Missing hook for UPPERREL_PARTIAL_GROUP_AGG rels?

2019-06-27 Thread Erik Nordström

Thanks for the quick response. I do think this might be a separate issue
because in the set_rel_pathlist_hook case, that hook is actually called at
one point. In this case there is simply no place in the PostgreSQL code
where a call is made to create_upper_paths_hook for the
UPPERREL_PARTIAL_GROUP_AGG upper rel kind. See
create_partial_grouping_paths().

Regards,

Erik

On Thu, Jun 27, 2019 at 8:58 PM Tom Lane  wrote:

> =?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> > I noticed that the create_upper_paths_hook is never called for partially
> > grouped rels. Is this intentional or a bug? Only the FDW routine's
> > GetForeignUpperPaths hook is called for partially grouped rels. This
> seems
> > odd since the regular create_upper_paths_hook gets called for all other
> > upper rels.
>
> This seems possibly related to the discussion re set_rel_pathlist_hook
> at
>
>
> https://www.postgresql.org/message-id/flat/CADsUR0AaPx4sVgmnuVJ_bOkocccQZGubv6HajzW826rbSmFpCg%40mail.gmail.com
>
> I don't think we've quite resolved what to do there, but maybe
> create_upper_paths_hook needs to be looked at at the same time.
>
> regards, tom lane
>

Re: [HACKERS] Regression tests vs existing users in an installation

Robert Haas  writes:
> On Tue, Jun 25, 2019 at 11:33 PM Tom Lane  wrote:
>> Comments?

> LGTM.

Thanks for looking!

> s/must/should/ ?

Sure, if you like.

Further on the rolenames test mess: I started to work on removing
that script's creation of out-of-spec user names, but my heart just
sank to the floor when I noticed that it was also doing stuff like
this:

ALTER USER ALL SET application_name to 'SLAP';
ALTER USER ALL RESET application_name;

The extent to which that's Not OK inside a production installation
is hard to overstate.

At the same time, I can see that we'd want to have some coverage
for that code path, so just deleting those tests isn't attractive.

So I think the only real solution is to cordon off this test script
in some environment where it won't get run by "make installcheck".
I thought briefly about recasting it as a TAP test, but that looked
like a huge amount of make-work.

What I propose that we do instead is invent an empty "module" under
src/test/modules/ and install rolenames as a test for that.  We
already have this in src/test/modules/README:

src/test/modules contains PostgreSQL extensions that are primarily or
entirely intended for testing PostgreSQL and/or to serve as example
code. The extensions here aren't intended to be installed in a
production server and aren't suitable for "real work".

So I think we could just extend that verbiage to insist that neither "make
install" nor "make installcheck" are good ideas against production
servers.  Perhaps like

Furthermore, while you can do "make install" and "make installcheck"
in this directory or its children, it is HIGHLY NOT ADVISED to do so
with a server containing valuable data.  Some of these tests may have
undesirable side-effects on roles or other global objects within the
tested server.

Defining things this way also makes it a non-problem that
src/test/modules/test_pg_dump creates global objects and doesn't drop
them.

(src/test/Makefile is already on board with this definition.)

Now, this doesn't in itself fix the problem that my proposed patch will
emit warnings about the rolenames test script creating "Public" and so on.
We could fix that by maintaining a variant expected-file that includes
those warnings, but probably a less painful answer is just to jack
client_min_messages up to ERROR for that short segment of the test script.

We could make the new subdirectory be something specific like
"src/test/modules/test_rolenames", but I think very likely we'll be
wanting some additional test scripts that we likewise deem unsafe to
run during "installcheck".  So I'd rather choose a more generic module
name, but I'm not sure what ... "unsafe_tests"?

Comments?

regards, tom lane

Re: GiST VACUUM

2019-06-27 Thread Heikki Linnakangas


On 27/06/2019 20:15, Andrey Borodin wrote:

But I have stupid question again, about this code:

https://github.com/x4m/postgres_g/commit/096d5586537d29ff316ca8ce074bbe1b325879ee#diff-754126824470cb8e68fd5e32af6d3bcaR417

nextFullXid = ReadNextFullTransactionId();
diff = U64FromFullTransactionId(nextFullXid) -
U64FromFullTransactionId(latestRemovedFullXid);
if (diff < MaxTransactionId / 2)
{
TransactionId latestRemovedXid;

// sleep(100500 hours); latestRemovedXid 
becomes xid from future

latestRemovedXid = 
XidFromFullTransactionId(latestRemovedFullXid);

ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
   
 xlrec->node);
}

Do we have a race condition here? Can latestRemovedXid overlap and start to be 
xid from future?
I understand that it is purely hypothetical, but still latestRemovedXid is from 
ancient past already.


Good question. No, that can't happen, because this code is in the WAL 
redo function. In a standby, the next XID counter only moves forward 
when a WAL record is replayed that advances it, and all WAL records are 
replayed serially, so that can't happen when we're in the middle of 
replaying this record. A comment on that would be good, though.


When I originally had the check like above in the code that created the 
WAL record, it had exactly that problem, because in the master the next 
XID counter can advance concurrently.


- Heikki

Re: [HACKERS] Regression tests vs existing users in an installation

On 2019-Jun-27, Tom Lane wrote:

> Further on the rolenames test mess: I started to work on removing
> that script's creation of out-of-spec user names, but my heart just
> sank to the floor when I noticed that it was also doing stuff like
> this:
> 
> ALTER USER ALL SET application_name to 'SLAP';
> ALTER USER ALL RESET application_name;
> 
> The extent to which that's Not OK inside a production installation
> is hard to overstate.

Uh-oh.  I don't remember doing that, but evidently I did :-(

> At the same time, I can see that we'd want to have some coverage
> for that code path, so just deleting those tests isn't attractive.

Yeah ...

> What I propose that we do instead is invent an empty "module" under
> src/test/modules/ and install rolenames as a test for that.

Hmm, that's an idea, yes.

> Now, this doesn't in itself fix the problem that my proposed patch will
> emit warnings about the rolenames test script creating "Public" and so on.
> We could fix that by maintaining a variant expected-file that includes
> those warnings, but probably a less painful answer is just to jack
> client_min_messages up to ERROR for that short segment of the test script.

+1

> We could make the new subdirectory be something specific like
> "src/test/modules/test_rolenames", but I think very likely we'll be
> wanting some additional test scripts that we likewise deem unsafe to
> run during "installcheck".  So I'd rather choose a more generic module
> name, but I'm not sure what ... "unsafe_tests"?

+0 for unsafe_tests, -1 for test_rolenames.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: GiST VACUUM

On Thu, Jun 27, 2019 at 11:38 PM Heikki Linnakangas  wrote:
> * I moved the logic to extend a 32-bit XID to 64-bits to a new helper
> function in varsup.c.

I'm a bit uneasy about making this into a generally available function
for reuse.  I think we should instead come up with a very small number
of sources of fxids that known to be free of races because of some
well explained interlocking.

For example, I believe it is safe to convert an xid obtained from a
WAL record during recovery, because (for now) recovery is a single
thread of execution and the next fxid can't advance underneath you.
So I think XLogRecGetFullXid(XLogReaderState *record)[1] as I'm about
to propose in another thread (though I've forgotten who wrote it,
maybe Andres, Amit or me, but if it wasn't me then it's exactly what I
would have written) is a safe blessed source of fxids.  The rationale
for writing this function instead of an earlier code that looked much
like your proposed helper function, during EDB-internal review of
zheap stuff, was this: if we provide a general purpose xid->fxid
facility, it's virtually guaranteed that someone will eventually use
it to shoot footwards, by acquiring an xid from somewhere, and then
some arbitrary time later extending it to a fxid when no interlocking
guarantees that the later conversion has the correct epoch.

I'd like to figure out how to maintain full versions of the
procarray.c-managed xid horizons, without widening the shared memory
representation.  I was planning to think harder about for 13.
Obviously that doesn't help you now.  So I'm wondering if you should
just open-code this for now, and put in a comment about why it's safe
and a note that there'll hopefully be a fxid horizon available in a
later release?

[1] 
https://github.com/EnterpriseDB/zheap/commit/1203c2fa49f5f872f42ea4a02ccba2b191c45f32

-- 
Thomas Munro
https://enterprisedb.com

Re: make \d pg_toast.foo show its indices ; and, \d toast show its main table ; and \d relkind=I show its partitions (and tablespace)

2019-06-27 Thread Justin Pryzby

My previous patch missed a 1-line hunk, so resending.
>From 29e4c0b9700b9dee5f6ff2abc442e08e5221eb93 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 30 Apr 2019 19:05:53 -0500
Subject: [PATCH v3] print table associated with given TOAST table

---
 src/bin/psql/describe.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c65bc82..ff98c4f 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2153,6 +2153,28 @@ describeOneTableDetails(const char *schemaname,
 		}
 	}
 
+	/* print table associated with given TOAST table */
+	if (tableinfo.relkind == RELKIND_TOASTVALUE)
+	{
+		PGresult   *result = NULL;
+		printfPQExpBuffer(&buf,
+		  "SELECT relnamespace::pg_catalog.regnamespace, relname FROM pg_class WHERE reltoastrelid = '%s'",
+		  oid);
+		result = PSQLexec(buf.data);
+		if (!result) {
+			goto error_return;
+		} else if (1 != PQntuples(result)) {
+			PQclear(result);
+			goto error_return;
+		} else {
+			char	   *schemaname = PQgetvalue(result, 0, 0);
+			char	   *relname = PQgetvalue(result, 0, 1);
+			appendPQExpBuffer(&tmpbuf, _("For table: \"%s.%s\""),
+		  schemaname, relname);
+			printTableAddFooter(&cont, tmpbuf.data);
+		}
+	}
+
 	if (tableinfo.relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		/* Get the partition key information  */
-- 
2.7.4

>From 185d8723bec45824a0db245f69e948080b7fbbb2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 3 May 2019 09:24:51 -0500
Subject: [PATCH v3] make \d pg_toast.foo show its indices

---
 src/bin/psql/describe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index af2f440..c65bc82 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2275,6 +2275,7 @@ describeOneTableDetails(const char *schemaname,
 	else if (tableinfo.relkind == RELKIND_RELATION ||
 			 tableinfo.relkind == RELKIND_MATVIEW ||
 			 tableinfo.relkind == RELKIND_FOREIGN_TABLE ||
+			 tableinfo.relkind == RELKIND_TOASTVALUE ||
 			 tableinfo.relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		/* Footer information about a table */
-- 
2.7.4

>From 7e771e08cee52e506d4df18a1316652ac7e8f516 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 19 Jun 2019 15:41:25 -0500
Subject: [PATCH v4] show childs and tablespaces of partitioned indices

---
 src/bin/psql/describe.c | 53 +++--
 1 file changed, 25 insertions(+), 28 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index ff98c4f..5ebad24 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3063,6 +3063,7 @@ describeOneTableDetails(const char *schemaname,
 	if (tableinfo.relkind == RELKIND_RELATION ||
 		tableinfo.relkind == RELKIND_MATVIEW ||
 		tableinfo.relkind == RELKIND_FOREIGN_TABLE ||
+		tableinfo.relkind == RELKIND_PARTITIONED_INDEX ||
 		tableinfo.relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		PGresult   *result;
@@ -3114,6 +3115,7 @@ describeOneTableDetails(const char *schemaname,
 		  " FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i"
 		  " WHERE c.oid=i.inhparent AND i.inhrelid = '%s'"
 		  " AND c.relkind != " CppAsString2(RELKIND_PARTITIONED_TABLE)
+		  " AND c.relkind != " CppAsString2(RELKIND_PARTITIONED_INDEX)
 		  " ORDER BY inhseqno;", oid);
 
 		result = PSQLexec(buf.data);
@@ -3178,7 +3180,8 @@ describeOneTableDetails(const char *schemaname,
 		 * Otherwise, we will not print "Partitions" section for a partitioned
 		 * table without any partitions.
 		 */
-		if (tableinfo.relkind == RELKIND_PARTITIONED_TABLE && tuples == 0)
+		if (tuples == 0 && (tableinfo.relkind == RELKIND_PARTITIONED_TABLE ||
+	tableinfo.relkind == RELKIND_PARTITIONED_INDEX))
 		{
 			printfPQExpBuffer(&buf, _("Number of partitions: %d"), tuples);
 			printTableAddFooter(&cont, buf.data);
@@ -3188,7 +3191,7 @@ describeOneTableDetails(const char *schemaname,
 			/* print the number of child tables, if any */
 			if (tuples > 0)
 			{
-if (tableinfo.relkind != RELKIND_PARTITIONED_TABLE)
+if (tableinfo.relkind != RELKIND_PARTITIONED_TABLE && tableinfo.relkind != RELKIND_PARTITIONED_INDEX)
 	printfPQExpBuffer(&buf, _("Number of child tables: %d (Use \\d+ to list them.)"), tuples);
 else
 	printfPQExpBuffer(&buf, _("Number of partitions: %d (Use \\d+ to list them.)"), tuples);
@@ -3198,39 +3201,33 @@ describeOneTableDetails(const char *schemaname,
 		else
 		{
 			/* display the list of child tables */
-			const char *ct = (tableinfo.relkind != RELKIND_PARTITIONED_TABLE) ?
+			const char *ct = (tableinfo.relkind != RELKIND_PARTITIONED_TABLE && tableinfo.relkind != RELKIND_PARTITIONED_INDEX) ?
 			_("Child tables") : _("Partitions");
 			int			ctw = pg_wcswidth(ct, strlen(ct), pset.encoding);
 
 			for (i = 0; i < tuples; i++)
 			{
-if (tableinfo.relkind != RELKIND_PARTITIONED_TABLE)
-{
-	if (i == 0)
-		printfPQExpBuffer(&

Re: An out-of-date comment in nodeIndexonlyscan.c

On Fri, Feb 8, 2019 at 4:58 AM Thomas Munro
 wrote:
> On Thu, May 17, 2018 at 3:49 AM Heikki Linnakangas  wrote:
> > On 14/05/18 02:15, Thomas Munro wrote:
> > > The first sentence of that comment is no longer true as of commit
> > > c01262a8 (2013).  As for whether it's necessary to predicate-lock the
> > > whole eheap page (rather than the heap tuple) anyway because of HOT
> > > update chains, I don't know, so I'm not sure what wording to propose
> > > instead.
> >
> > Hmm. If there are any updated tuples, HOT or not, the visibility map bit
> > will not be set, and we won't reach this code. So I think we could
> > acquire the tuple lock here.
>
> Right.  CC Kevin.  I think we should at least fix the comment.  As for
> changing the lock level, PredicateLockTuple() wants a heap tuple that
> we don't have, so we'd probably need to add a PredicateLockTid()
> function.

Here's a patch I'd like to commit to fix the comment.  We could look
at improving the actual code after
https://commitfest.postgresql.org/23/2169/ is done.

I wonder if it might be possible to avoid page locks on both the heap
*and* index in some limited cases, as I mentioned over here (just an
idea, could be way off base):

https://www.postgresql.org/message-id/CA%2BhUKGJGDVfhHmoUGzi-_J%2BN8FmRjmWTY0MCd1ZV5Fj9qdyb1w%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com


0001-Fix-misleading-comment-in-nodeIndexonlyscan.c.patch
Description: Binary data

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-27 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 > Robert Haas  writes:
 >> I'm kind of unsure what to think about this whole debate
 >> substantively. If Andrew is correct that zone.tab or zone1970.tab is
 >> a list of time zone names to be preferred over alternatives, then it
 >> seems like we ought to prefer them.

 Tom> It's not really clear to me that the IANA folk intend those files
 Tom> to be read as a list of preferred zone names.

The files exist to support user selection of zone names. That is, it is
intended that you can use them to allow the user to choose their country
and then timezone within that country, rather than offering them a flat
regional list (which can be large and the choices non-obvious).

The zone*.tab files therefore include only geographic names, and not
either Posix-style abbreviations or special cases like Etc/UTC. Programs
that use zone*.tab to allow user selection handle cases like that
separately (for example, FreeBSD's tzsetup offers "UTC" at the
"regional" menu).

It's quite possible that people have implemented time zone selection
interfaces that use some other presentation of the list, but that
doesn't particularly diminish the value of zone*.tab. In particular, the
current zone1970.tab has:

  - at least one entry for every iso3166 country code that's not an
uninhabited remote island;

  - an entry for every distinct "Zone" in the primary data files, with
the exception of entries that are specifically commented as being
for backward compatibility (e.g. CET, CST6CDT, etc. - see the
comments in the europe and northamerica data files for why these
exist)

The zonefiles that get installed in addition to the ones in zone1970.tab
fall into these categories:

  - they are "Link" entries in the primary data files

  - they are from the "backward" data file, which is omitted in some
system tzdb installations because it exists only for backward
compatibility (but we install it because it's still listed in
tzdata.zi by default)

  - they are from the "etcetera" file, which lists special cases such as
UTC and fixed UTC offsets

 Tom> If they do, what are we to make of the fact that no variant of
 Tom> "UTC" appears in them?

That "UTC" is not a geographic timezone name?

 >> He remarks that we are preferring "deprecated backward-compatibility
 >> aliases" and to the extent that this is true, it seems like a bad
 >> thing. We can't claim to be altogether here apolitical, because when
 >> those deprecated backward-compatibility names are altogether
 >> removed, we are going to remove them and they're going to stop
 >> working. If we know which ones are likely to suffer that fate
 >> eventually, we ought to stop spitting them out. It's no more
 >> political to de-prefer them when upstream does than it is to remove
 >> them with the upstream does.

 Tom> I think that predicting what IANA will do in the future is a
 Tom> fool's errand.

Maybe so, but when something is explicitly in a file called "backward",
and the upstream-provided Makefile has specific options for omitting it
(even though it is included by default), and all the comments about it
are explicit about it being for backward compatibility, I think it's
reasonable to avoid _preferring_ the names in it.

The list of backward-compatibility zones is in any case extremely
arbitrary and nonsensical: for example "GB", "Eire", "Iceland",
"Poland", "Portugal" are aliases for their respective countries, but
there are no comparable aliases for any other European country. The
"Navajo" entry (an alias for America/Denver) has already been mentioned
in this thread; our arbitrary rule prefers it (due to shortness) for all
US zones that use Mountain time with DST. And so on.

 Tom> Our contract is to select some one of the aliases that the tzdb
 Tom> database presents, not to guess about whether it might present a
 Tom> different set in the future. (Also note that a lot of the observed
 Tom> variation here has to do with whether individual platforms choose
 Tom> to install backward-compatibility zone names. I think the odds
 Tom> that IANA proper will remove those links are near zero; TTBOMK
 Tom> they never have removed one yet.)

Well, we should also consider the possibility that we might be using the
system tzdata and that the upstream OS or distro packager may choose to
remove the "backward" data or split it to a separate package.

 Tom> More generally, my unhappiness about Andrew's proposal is:

 [...]
 Tom> 3. The proposal has technical issues, in particular I'm not nearly
 Tom> as sanguine as Andrew is about whether we can rely on
 Tom> zone[1970].tab to be available.

My proposal works even if it's not, though I don't expect that to be an
issue in practice.

-- 
Andrew (irc:RhodiumToad)

Re: An out-of-date comment in nodeIndexonlyscan.c

2019-06-27 Thread Ashwin Agrawal

On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro  wrote:

> Here's a patch I'd like to commit to fix the comment.  We could look
> at improving the actual code after
> https://commitfest.postgresql.org/23/2169/ is done.
>

Change looks good to me.

> I wonder if it might be possible to avoid page locks on both the heap
> *and* index in some limited cases, as I mentioned over here (just an
> idea, could be way off base):
>
>
> https://www.postgresql.org/message-id/CA%2BhUKGJGDVfhHmoUGzi-_J%2BN8FmRjmWTY0MCd1ZV5Fj9qdyb1w%40mail.gmail.com

I am in same boat as you. One can get serializable conflict error today
based on tuple gets HOT updated or not. HOT is logically internal code
optimization and not so much user functionality, so ideally feels shouldn't
affect serializable behavior. But it does currently, again, due to index
locking. Disable HOT update and 4 isolation tests fail due to "could not
serialize access due to read/write dependencies among transactions"
otherwise not. If the tuple happens to fit on same page user will not get
the error, if the tuple gets inserted to different page the error can
happen, due to index page locking. I had discussed this with Heikki and
thinking is we shouldn't need to take the lock on the index page, if the
index key was not changed, even if it was a non-HOT update. Not sure of
complications and implications, but just a thought.

Re: Usage of epoch in txid_current

Hi!

On Mon, Jun 24, 2019 at 6:27 PM Andres Freund  wrote:
> On June 24, 2019 8:19:13 AM PDT, Robert Haas  wrote:
> >On Fri, Jun 21, 2019 at 7:01 PM Alexander Korotkov
> > wrote:
> >> On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro 
> >wrote:
> >> > Thanks for the reviews!  Pushed.
> >>
> >> Any ideas we should move towards 64-bit xids in more places?  That
> >has
> >> been discussed several times already.  I think last time it was
> >> discussed in person during FOSDEM PGDay 2018 Developer Meeting [1].
> >> There we've discussed that it probably doesn't worth it to change
> >> 32-bit on-disk xids in heap.  It's better to leave existing heap "as
> >> is", but allow other pluggable table access methods to support 64-bit
> >> xids.  Given now we have pluggable table access methods, we may build
> >> a plan on full support of 64-bit xids in core.
> >>
> >> In my vision sketchy plan may look like this.
> >>
> >> 1. Change all non-heap types of xids from TransactionId to
> >> FullTransactionId.
> >
> >I think it's fine to replace TransactionId with FullTransactionId
> >without stressing about it too much in code that's not that heavily
> >trafficked. However, I'm not sure we can do that across the board. For
> >example, converting snapshots to use 64-bit XIDs would mean that in
> >the worst case a snapshot will use twice as many cache lines, and that
> >might have performance implications on some workloads.
>
> We could probably expand the transaction IDs on access or when computing them 
> for most of these, as usually they'll largely  be about currently running 
> transactions. It e.g. seems sensible to keep the procarray at 32 bit xids, 
> expand xmin/xmax to 64 when computing snapshots, and leave the snapshot's 
> transaction ID array at 32xids. That ought to be an negligible overhead.

I see, replace TransactionId with FullTransactionId just everywhere
doesn't look like good idea.

Given now we have pluggable table AMs, new table AMs may not have data
wraparound problem.  For instance, table AM could store xids 64-bit
internally, and convert them to 32-bit on-the-fly.  If xid is too old
for conversion, just replace it with FrozenTransactionId.  So, the
restriction we really have now is that running xacts and active
snapshots should fit 2^31 range. Turning Snapshot xmin/xmas into
64-bit will soften this restriction, then just running xacts should
fit 2^31 range while snapshots could be older.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Commitfest 2019-07, the first of five* for PostgreSQL 13

Hello hackers,

The first Commitfest[1] for the next major release of PostgreSQL
begins in a few days, and runs for the month of July. There are 218
patches registered[2] right now, and I'm sure we'll see some more at
the last minute. PostgreSQL 13 needs you!

I volunteered to be the CF manager for this one, and Jonathan Katz
kindly offered to help me[3]. Assuming there are no objections and we
land this coveted role (I didn't see any other volunteers for CF1?), I
plan to start doing the sort of stuff listed on
https://wiki.postgresql.org/wiki/Running_a_CommitFest shortly, and
will then provide updates on this thread. (Clearly some of that is
out of date WRT the "new" Commitfest app and process, so if there's a
better list somewhere please let me know; if not, perhaps one of our
tasks should be to update that).

Either way, please make sure your patches are in, and start signing up
to review things that you're interested in or can help with or want to
learn about. If you've submitted patches, it'd be ideal if you could
try to review patches of similar size/complexity. Every review helps:
whether proof-reading or copy-editing the documentation and comments
(or noting that they are missing), finding low level C programming
errors, providing high level architectural review, comparing against
the SQL standard or other relevant standards or products, seeing if
appropriate regression tests are included, manual testing or ...
anything in between. Testing might include functionality testing
(does it work as described, do all the supplied tests pass?),
performance/scalability testing, portability testing (eg does it work
on your OS?), checking with tools like valgrind, feature combination
checks (are there hidden problems when combined with partitions,
serializable, replication, triggers, ...?) and generally hunting for
weird edge cases the author didn't think of[4].

A couple of notes for new players: We don't bite, and your
contributions are very welcome. It's OK to review things that others
are already reviewing. If you are interested in a patch and don't
know how to get started reviewing it or how to get it up and running
on your system, just ask and someone will be happy to point to or
provide more instructions. You'll need to subscribe to this mailing
list if you haven't already. If the thread for a CF entry began
before you were subscribed, you might be able to download the whole
thread as a mailbox file and import it into your email client so that
you can reply to the thread; if you can't do that (it can be
tricky/impossible on some email clients), ping me and I'll CC you so
you can reply.

*probably

[1] https://wiki.postgresql.org/wiki/CommitFest
[2] https://commitfest.postgresql.org/23/
[3]
https://wiki.postgresql.org/wiki/PgCon_2019_Developer_Meeting#11:10_-_11:25.09Commitfest_Management
[4] "A QA engineer walks into a bar. Orders a beer. Orders 0 beers.
Orders 999 beers. Orders a lizard. Orders -1 beers. Orders a
ueicbksjdhd. First real customer walks in and asks where the bathroom
is. The bar bursts into flames, killing everyone." -Brenan Keller

--
Thomas Munro
https://enterprisedb.com

Re: Usage of epoch in txid_current

On Mon, Jun 24, 2019 at 8:43 PM Alvaro Herrera  wrote:
>
> On 2019-Jun-22, Alexander Korotkov wrote:
>
> > 2. Also introduce FullMultixactId, and apply to MultixactId the
> > similar change as #1.
> > 3. Change SLRU on-disk format to handle 64-bit xids and multixacts.
> > In particular make SLRU page numbers 64-bit too.  Add SLRU upgrade
> > procedure to pg_upgrade.
>
> I think enlarging multixacts to 64 bits is a terrible idea.  I would
> rather get rid of multixacts completely; zheap is proposing not to use
> multixacts at all, for example.  The amount of bloat caused by
> pg_multixact data is already pretty bad ... because of which requiring
> pg_multixact to be rewritten by pg_upgrade would cause a severe slowdown
> for upgrades.  (It worked for FSM because the volume is tiny, but that's
> not the case for multixact.)
>
> I think the pg_upgrade problem can be worked around by creating a new
> dir pg_multixact64 (an example) which is populated from the upgrade
> point onwards; so you keep the old data unchanged, and new multixacts
> use the new location, but the system knows to read the old one when
> reading old tuples.  But, as I said above, I would rather not have
> multixacts at all.
>
> Another idea: create a new table AM that mimics heapam (I think ß-heap
> "eszett-heap" is a good name), except that it reuses zheap's idea of
> keeping "transaction locks" separately for tuple locks rather than
> multixacts; heapam continues to use 32bits multixact.  Tables can be
> migrated from heapam to ß-heap (alter table ... set access method) to
> incrementally reduce reliance on multixacts going forward.  No need for
> pg_upgrade-time disruption.

We need multixacts to store row-level locks information.  I remember
they weren't crash-safe some time ago; because we basically don't need
lock information about previous server run: all that locks are for
sure released.  Due to some difficulties we finally made them
crash-safe (I didn't follow that in much details).  But what about
discarding mulixacts on pg_upgrade?  Is it feasible goal?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: SQL/JSON path issues/questions

On Thu, Jun 27, 2019 at 4:57 PM Thom Brown  wrote:
> On Wed, 19 Jun 2019 at 20:04, Alexander Korotkov
>  wrote:
> > On Wed, Jun 19, 2019 at 7:07 PM Thom Brown  wrote:
> > > On Thu, 13 Jun 2019 at 14:59, Thom Brown  wrote:
> > > Also, this example doesn't work:
> > >
> > > '$.track ? (@.segments[*] ? (@.HR > 130)).segments.size()'
> > >
> > > This gives me:
> > >
> > > psql: ERROR:  syntax error, unexpected $end at end of jsonpath input
> > > LINE 13: }','$.track ? (@.segments[*]');
> > > ^
> >
> > Perhaps it should be following:
> >
> > '$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()'
>
> I'm not clear on why the original example doesn't work here.

It doesn't work because filter expression should be predicate, i.e.
always return bool.  In the original example filter expression selects
some json elements.  My original idea was that it was accidentally
come from some of our extensions where we've allowed that.  But it
appears to be just plain wrong example, which never worked.  Sorry for
that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: SQL/JSON path issues/questions

On Tue, Jun 25, 2019 at 6:38 PM Liudmila Mantrova
 wrote:
> Thank you for the catch! Please see the modified version of patch 0004
> attached.

I tried to review and revise the part related to filters, but I failed
because I don't understand the notions used in the documentation.

What is the difference between filter expression and filter condition?
 I can guess that filter expression contains question mark,
parentheses and filter condition inside.  But this sentence is in
contradiction with my guess: "A filter expression must be enclosed in
parentheses and preceded by a question mark".  So, filter expression
is inside the parentheses.  Then what is filter condition?  The same?

>Each filter expression can provide one or more filters
>that are applied to the result of the path evaluation.

So additionally to filter condition and filter expression we introduce
the notion of just filter.  What is it?  Could we make it without
introduction of new notion?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Obsolete comment in commands/analyze.c

2019-06-27 Thread Etsuro Fujita

On Fri, Jun 28, 2019 at 1:02 AM Tomas Vondra
 wrote:
> On Thu, Jun 27, 2019 at 01:53:52PM +0200, Daniel Gustafsson wrote:
> >> On 27 Jun 2019, at 13:05, Etsuro Fujita  wrote:
> >
> >> since the citation was also moved to
> >> utils/misc/sampling.c, I think the "see full citation below" part
> >> should be updated accordingly.  Attached is a patch for that.
> >
> >Agreed, nice catch!
>
> Thanks, committed.

Thanks for committing, Tomas!  Thanks for reviewing, Daniel!

Best regards,
Etsuro Fujita

Re: SQL/JSON path issues/questions

On 2019-Jun-28, Alexander Korotkov wrote:

> On Tue, Jun 25, 2019 at 6:38 PM Liudmila Mantrova
>  wrote:
> > Thank you for the catch! Please see the modified version of patch 0004
> > attached.
> 
> I tried to review and revise the part related to filters, but I failed
> because I don't understand the notions used in the documentation.
> 
> What is the difference between filter expression and filter condition?
>  I can guess that filter expression contains question mark,
> parentheses and filter condition inside.  But this sentence is in
> contradiction with my guess: "A filter expression must be enclosed in
> parentheses and preceded by a question mark".  So, filter expression
> is inside the parentheses.  Then what is filter condition?  The same?

The SQL standard defines "JSON filter expressions" (in 9.39 of the 2016
edition).  It does not use either term "filter condition" nor bare
"filter"; it uses "JSON path predicate" which is the part of the JSON
filter expression that is preceded by the question mark and enclosed by
parens.

Maybe we should stick with the standard terminology ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: An out-of-date comment in nodeIndexonlyscan.c