Re: C testing for Postgres

2019-07-02 Thread Ashwin Agrawal
On Mon, Jul 1, 2019 at 11:26 PM Michael Paquier  wrote:

> On Fri, Jun 28, 2019 at 09:42:54AM -0400, Adam Berlin wrote:
> > If we were to use this tool, would the community want to vendor the
> > framework in the Postgres repository, or keep it in a separate repository
> > that produces a versioned shared library?
>
> Well, my take is that having a base infrastructure for a fault
> injection framework is something that would prove to be helpful, and
> that I am not against having something in core.  While working on
> various issues, I have found myself doing many times crazy stat()
> calls on an on-disk file to enforce an elog(ERROR) or elog(FATAL), and
> by experience fault points are things very *hard* to place correctly
> because they should not be single-purpose things.
>
> Now, we don't want to finish with an infinity of fault points in the
> tree, but being able to enforce a failure in a point added for a patch
> using a SQL command can make the integration of tests in a patch
> easier for reviewers, for example isolation tests with elog(ERROR)
> (like what has been discussed for b4721f3).
>

Just to clarify what Adam is proposing in this thread is *not* a fault
injection framework.


Re: Replacing the EDH SKIP primes

2019-07-02 Thread Peter Eisentraut
On 2019-06-18 13:05, Daniel Gustafsson wrote:
> This was touched upon, but never really discussed AFAICT, back when then EDH
> parameters were reworked a few years ago.  Instead of replacing with custom
> ones, as suggested in [1] it we might as well replace with standardized ones 
> as
> this is a fallback.  Custom ones wont make it more secure, just add more work
> for the project.  The attached patch replace the SKIP prime with the 2048 bit
> MODP group from RFC 3526, which is the same change that OpenSSL did a few 
> years
> back [2].

It appears that we have consensus to go ahead with this.


I was wondering whether the provided binary blob contained any checksums
or other internal checks.  How would we know whether it contains
transposed characters or replaces a 1 by a I or a l?  If I just randomly
edit the blob, the ssl tests still pass.  (The relevant load_dh_buffer()
call does get called by the tests.)  How can we make sure we actually
got a good copy?


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Peter Eisentraut
On 2019-07-01 22:46, Alvaro Herrera wrote:
> On 2019-Jul-02, Thomas Munro wrote:
>> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud  wrote:
>>> Even if that's just me being delusional, I'd still prefer Alvaro's
>>> approach to have distinct switches for each collation system.
>>
>> Makes sense.  But why use the name "glibc" in the code and user
>> interface?  The name of the collation provider in PostgreSQL is "libc"
>> (for example in the CREATE COLLATION command), and the problem applies
>> no matter who makes your libc.
> 
> Makes sense.  "If your libc is glibc and you go across an upgrade over
> version X, please use --include-rule=libc-collation"

I think it might be better to put the logic of what indexes are
collation affected etc. into the backend REINDEX command.  We are likely
to enhance the collation version and dependency tracking over time,
possibly soon, possibly multiple times, and it would be very cumbersome
to have to keep updating reindexdb with this.  Moreover, since for
performance you likely want to reindex by table, implementing a logic of
"reindex all collation-affected indexes on this table" would be much
easier to do in the backend.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Implement uuid_version()

2019-07-02 Thread Peter Eisentraut
On 2019-06-30 14:50, Fabien COELHO wrote:
> I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if 
> it is available in core?

That would probably require an extension version update dance in
pgcrypto.  I'm not sure if it's worth that.  Thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Cache lookup errors with functions manipulation object addresses

2019-07-02 Thread Michael Paquier
On Thu, Feb 21, 2019 at 04:40:13PM +0900, Michael Paquier wrote:
> Rebased version fixing some conflicts with HEAD.

And rebased version for this stuff on HEAD (66c5bd3), giving visibly
v16.
--
Michael
From 76ef961e02e0276af5516bdce6e0e543526db5b2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 2 Jul 2019 16:07:06 +0900
Subject: [PATCH v16 1/3] Add flag to format_type_extended to enforce NULL-ness

If a type scanned is undefined, type format routines have two behaviors
depending on if FORMAT_TYPE_ALLOW_INVALID is defined by the caller:
- Generate an error
- Return undefined type name "???" or "-".

The current interface is unhelpful for callers willing to format
properly a type name, but still make sure that the type is defined as
there could be types matching the undefined strings.  In order to
counter that, add a new flag called FORMAT_TYPE_FORCE_NULL which returns
a NULL result instead of "??? or "-" which does not generate an error.
They will be used for future patches to improve the SQL interface for
object addresses.
---
 src/backend/utils/adt/format_type.c | 20 
 src/include/utils/builtins.h|  1 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 6ae2a31345..8ee0e10c29 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -96,6 +96,9 @@ format_type(PG_FUNCTION_ARGS)
  * - FORMAT_TYPE_ALLOW_INVALID
  *			if the type OID is invalid or unknown, return ??? or such instead
  *			of failing
+ * - FORMAT_TYPE_FORCE_NULL
+ *			if the type OID is invalid or unknown, return NULL instead of ???
+ *			or such
  * - FORMAT_TYPE_FORCE_QUALIFY
  *			always schema-qualify type names, regardless of search_path
  *
@@ -114,13 +117,20 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags)
 	char	   *buf;
 	bool		with_typemod;
 
-	if (type_oid == InvalidOid && (flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
-		return pstrdup("-");
+	if (type_oid == InvalidOid)
+	{
+		if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+			return NULL;
+		else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+			return pstrdup("-");
+	}
 
 	tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type_oid));
 	if (!HeapTupleIsValid(tuple))
 	{
-		if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+		if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+			return NULL;
+		else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
 			return pstrdup("???");
 		else
 			elog(ERROR, "cache lookup failed for type %u", type_oid);
@@ -143,7 +153,9 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags)
 		tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_base_type));
 		if (!HeapTupleIsValid(tuple))
 		{
-			if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+			if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+return NULL;
+			else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
 return pstrdup("???[]");
 			else
 elog(ERROR, "cache lookup failed for type %u", type_oid);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index a3cd7d26fa..a1e418cee7 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -109,6 +109,7 @@ extern Datum numeric_float8_no_overflow(PG_FUNCTION_ARGS);
 #define FORMAT_TYPE_TYPEMOD_GIVEN	0x01	/* typemod defined by caller */
 #define FORMAT_TYPE_ALLOW_INVALID	0x02	/* allow invalid types */
 #define FORMAT_TYPE_FORCE_QUALIFY	0x04	/* force qualification of type */
+#define FORMAT_TYPE_FORCE_NULL		0x08	/* force NULL if undefined */
 extern char *format_type_extended(Oid type_oid, int32 typemod, bits16 flags);
 
 extern char *format_type_be(Oid type_oid);
-- 
2.20.1

From ad8f7f57f91b243926740d5a224357f54295ee34 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 2 Jul 2019 16:07:29 +0900
Subject: [PATCH v16 2/3] Refactor format procedure and operator APIs to be
 more modular

This introduce a new set of extended routines for procedure and operator
lookups, with a flags bitmask argument that can modify the default
behavior:
- Force schema qualification
- Force NULL as result instead of a numeric OID for an undefined
object.
---
 src/backend/utils/adt/regproc.c | 61 +++--
 src/include/utils/regproc.h | 10 ++
 2 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 17a7f6c9d8..8d6d73039b 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -40,8 +40,6 @@
 #include "utils/regproc.h"
 #include "utils/varlena.h"
 
-static char *format_operator_internal(Oid operator_oid, bool force_qualify);
-static char *format_procedure_internal(Oid procedure_oid, bool force_qualify);
 static void parseNameAndArgTypes(const char *string, bool allowNone,
  List **names, int *nargs, Oid *argtypes);
 
@@ -322,24 +320,27 @@ to_regprocedure(PG_FUNCTION_ARGS)
 char *
 format_procedure(Oid p

Re: POC: converting Lists into arrays

2019-07-02 Thread Oleksandr Shulgin
On Tue, Jul 2, 2019 at 1:27 AM Tom Lane  wrote:

>
> So I think this is a win, and attached is v7.
>

Not related to the diff v6..v7, but shouldn't we throw additionally a
memset() with '\0' before calling pfree():

+ListCell   *newelements;
+
+newelements = (ListCell *)
+MemoryContextAlloc(GetMemoryChunkContext(list),
+   new_max_len * sizeof(ListCell));
+memcpy(newelements, list->elements,
+   list->length * sizeof(ListCell));
+pfree(list->elements);
+list->elements = newelements;

Or is this somehow ensured by debug pfree() implementation or does it work
differently together with Valgrind?

Otherwise it seems that the calling code can still be hanging onto a list
element from a freed chunk and (rather) happily accessing it, as opposed to
almost ensured crash if that is zeroed before returning from enlarge_list().

Cheers,
--
Alex


Re: Replacing the EDH SKIP primes

2019-07-02 Thread Michael Paquier
On Tue, Jul 02, 2019 at 08:14:25AM +0100, Peter Eisentraut wrote:
> It appears that we have consensus to go ahead with this.

Yeah, I was planning to look at that one next.  Or perhaps you would
like to take care of it, Peter?

> 
> I was wondering whether the provided binary blob contained any checksums
> or other internal checks.  How would we know whether it contains
> transposed characters or replaces a 1 by a I or a l?  If I just randomly
> edit the blob, the ssl tests still pass.  (The relevant load_dh_buffer()
> call does get called by the tests.)  How can we make sure we actually
> got a good copy?
> 

PEM_read_bio_DHparams() has some checks on the Diffie-Hellman key, but
it is up to the caller to make sure that it is normally providing a
prime number in this case to make the cracking harder, no?  RFC 3526
has a small formula in this case, which we can use to double-check the
patch.
--
Michael


signature.asc
Description: PGP signature


Re: Refactoring base64 encoding and decoding into a safer interface

2019-07-02 Thread Daniel Gustafsson
> On 2 Jul 2019, at 07:41, Michael Paquier  wrote:

>> In the below passage, we leave the input buffer with a non-complete
>> encoded string.  Should we memset the buffer to zero to avoid the
>> risk that code which fails to check the return value believes it has
>> an encoded string?
> 
> Hmm.  Good point.  I have not thought of that, and your suggestion
> makes sense.
> 
> Another question is if we'd want to actually use explicit_bzero()
> here, but that could be a discussion on this other thread, except if
> the patch discussed there is merged first:
> https://www.postgresql.org/message-id/42d26bde-5d5b-c90d-87ae-6cab875f7...@2ndquadrant.com

I’m not sure we need to go to that length, but I don’t have overly strong
opinions.  I think of this more like a case of “we’ve changed the API with new
errorcases that we didn’t handle before, so we’re being a little defensive to
help you avoid subtle bugs”.

> Attached is an updated patch.

Looks good, passes tests, provides value to the code.  Bumping this to ready
for committer as I no more comments to add.

cheers ./daniel



Re: [PATCH] Speedup truncates of relation forks

2019-07-02 Thread Masahiko Sawada
On Mon, Jun 17, 2019 at 5:01 PM Jamison, Kirk  wrote:
>
> Hi all,
>
> Attached is the v2 of the patch. I added the optimization that Sawada-san
> suggested for DropRelFileNodeBuffers, although I did not acquire the lock
> when comparing the minBlock and target block.
>
> There's actually a comment written in the source code that we could
> pre-check buffer tag for forkNum and blockNum, but given that FSM and VM
> blocks are small compared to main fork's, the additional benefit of doing so
> would be small.
>
> >* We could check forkNum and blockNum as well as the rnode, but the
> >* incremental win from doing so seems small.
>
> I personally think it's alright not to include the suggested pre-checking.
> If that's the case, we can just follow the patch v1 version.
>
> Thoughts?
>
> Comments and reviews from other parts of the patch are also very much welcome.
>

Thank you for updating the patch. Here is the review comments for v2 patch.

---
- * visibilitymap_truncate - truncate the visibility map
+ * visibilitymap_mark_truncate - mark the about-to-be-truncated VM
+ *
+ * Formerly, this function truncates VM relation forks. Instead, this just
+ * marks the dirty buffers.
  *
  * The caller must hold AccessExclusiveLock on the relation, to ensure that
  * other backends receive the smgr invalidation event that this function sends
  * before they access the VM again.
  *

I don't think we should describe about the previous behavior here.
Rather we need to describe what visibilitymap_mark_truncate does and
what it returns to the caller.

I'm not sure that visibilitymap_mark_truncate function name is
appropriate here since it actually truncate map bits, not only
marking. Perhaps we can still use visibilitymap_truncate or change to
visibilitymap_truncate_prepare, or something? Anyway, this function
truncates only tail bits in the last remaining map page and we can
have a rule that the caller must call smgrtruncate() later to actually
truncate pages.

The comment of second paragraph is now out of date since this function
no longer sends smgr invalidation message.

Is it worth to leave the current visibilitymap_truncate function as a
shortcut function, instead of replacing? That way we don't need to
change pg_truncate_visibility_map function.

The same comments are true for MarkFreeSpaceMapTruncateRel.

---
+   ForkNumber  forks[MAX_FORKNUM];
+   BlockNumber blocks[MAX_FORKNUM];
+   BlockNumber new_nfsmblocks = InvalidBlockNumber;/* FSM blocks */
+   BlockNumber newnblocks = InvalidBlockNumber;/* VM blocks */
+   int nforks = 0;

I think that we can have new_nfsmblocks and new_nvmblocks and wipe out
the comments.

---
-   /* Truncate the FSM first if it exists */
+   /*
+* We used to truncate FSM and VM forks here. Now we only mark the
+* dirty buffers of all forks about-to-be-truncated if they exist.
+*/
+

Again, I think we need the description of current behavior rather than
the history, except the case where the history is important.

---
-   /*
-* Make an XLOG entry reporting the file truncation.
-*/
+   /* Make an XLOG entry reporting the file truncation */

Unnecessary change.

---
+   /*
+* We might as well update the local smgr_fsm_nblocks and
smgr_vm_nblocks
+* setting. smgrtruncate sent an smgr cache inval message,
which will cause
+* other backends to invalidate their copy of smgr_fsm_nblocks and
+* smgr_vm_nblocks, and this one too at the next command
boundary. But this
+* ensures it isn't outright wrong until then.
+*/
+   if (rel->rd_smgr)
+   {
+   rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+   rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+   }

new_nfsmblocks and newnblocks could be InvalidBlockNumber when the
forks are already enough small. So the above code sets incorrect
values to smgr_{fsm,vm}_nblocks.

Also, I wonder if we can do the above code in smgrtruncate. Otherwise
we always need to set smgr_{fsm,vm}_nblocks after smgrtruncate, which
is inconvenient.

---
+   /* Update the local smgr_fsm_nblocks and
smgr_vm_nblocks setting */
+   if (rel->rd_smgr)
+   {
+   rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
+   rel->rd_smgr->smgr_vm_nblocks = newnblocks;
+   }

The save as above. And we need to set smgr_{fsm,vm}_nblocks in spite
of freeing the fake relcache soon?

---
+   /* Get the lower bound of target block number we're interested in */
+   for (i = 0; i < nforks; i++)
+   {
+   if (!BlockNumberIsValid(minBlock) ||
+   minBlock > firstDelBlock[i])
+   minBlock = firstDelBlock[i];
+   }

Maybe we can declare 'i' in the for statement (i.e. for (int i = 0;
...)) at every outer loops in this

Re: Increasing default value for effective_io_concurrency?

2019-07-02 Thread Tomas Vondra

On Mon, Jul 01, 2019 at 04:32:15PM -0700, Andres Freund wrote:

Hi,

On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote:

I think we should consider changing the effective_io_concurrency default
value, i.e. the guc that determines how many pages we try to prefetch in
a couple of places (the most important being Bitmap Heap Scan).


Maybe we need improve the way it's used / implemented instead - it seems
just too hard to determine the correct setting as currently implemented.



Sure, if we can improve those bits, that'd be nice. It's definitely hard
to decide what value is appropriate for a given storage system. But I'm
not sure it's something we can do easily, considering how opaque the
hardware is for us ...

I wonder 




In some cases it helps a bit, but a bit higher value (4 or 8) performs
significantly better. Consider for example this "sequential" data set
from the 6xSSD RAID system (x-axis shows e_i_c values, pct means what
fraction of pages matches the query):


I assume that the y axis is the time of the query?



The y-axis is the fraction of table matched by the query. The values in
the contingency table are query durations (average of 3 runs, but the
numbers vere very close).


How much data is this compared to memory available for the kernel to do
caching?



Multiple of RAM, in all cases. The queries were hitting random subsets of
the data, and the page cache was dropped after each test, to eliminate
cross-query caching.




   pct 0 14 1664   128
   ---
 1 25990 18624  3269  2219  2189  2171
 5 88116 60242 14002  8663  8560  8726
10120556 99364 29856 17117 16590 17383
25101080184327 79212 47884 46846 46855
50130709309857163614103001 94267 94809
75126516435653248281156586139500140087

compared to the e_i_c=0 case, it looks like this:

   pct   14 1664   128
   
 1 72%  13% 9%8%8%
 5 68%  16%10%   10%   10%
10 82%  25%14%   14%   14%
25182%  78%47%   46%   46%
50237% 125%79%   72%   73%
75344% 196%   124%  110%  111%

So for 1% of the table the e_i_c=1 is faster by about ~30%, but with
e_i_c=4 (or more) it's ~10x faster. This is a fairly common pattern, not
just on this storage system.

The e_i_c=1 can perform pretty poorly, especially when the query matches
large fraction of the table - for example in this example it's 2-3x
slower compared to no prefetching, and higher e_i_c values limit the
damage quite a bit.


I'm surprised the slowdown for small e_i_c values is that big - it's not
obvious to me why that is.  Which os / os version / filesystem / io
scheduler / io scheduler settings were used?



This is the system with NVMe storage, and SATA RAID:

Linux bench2 4.19.26 #1 SMP Sat Mar 2 19:50:14 CET 2019 x86_64 Intel(R)
Xeon(R) CPU E5-2620 v4 @ 2.10GHz GenuineIntel GNU/Linux

/dev/nvme0n1p1 on /mnt/data type ext4 (rw,relatime)
/dev/md0 on /mnt/raid type ext4 (rw,relatime,stripe=48)

The other system looks pretty much the same (same kernel, ext4).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Attached partition not considering altered column properties of root partition.

2019-07-02 Thread Prabhat Sahu
Hi,

In below testcase when I changed the staorage option for root partition,
newly attached partition not including the changed staorage option.
Is this an expected behavior?

postgres=# CREATE TABLE tab1 (c1 INT, c2 text) PARTITION BY RANGE(c1);
CREATE TABLE
postgres=# create table tt_p1 as select * from tab1  where 1=2;
SELECT 0
postgres=# alter  table tab1 alter COLUMN c2 set storage main;
ALTER TABLE
postgres=#
postgres=# alter table tab1 attach partition tt_p1 for values from (20) to
(30);
ALTER TABLE
postgres=# \d+ tab1
 Partitioned table "public.tab1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 c1 | integer |   |  | | plain   |
 |
 c2 | text|   |  | | main|
 |
Partition key: RANGE (c1)
Partitions: tt_p1 FOR VALUES FROM (20) TO (30)

postgres=# \d+ tt_p1
   Table "public.tt_p1"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats
target | Description
+-+---+--+-+--+--+-
 c1 | integer |   |  | | plain|
 |
 c2 | text|   |  | | extended |
 |
Partition of: tab1 FOR VALUES FROM (20) TO (30)
Partition constraint: ((c1 IS NOT NULL) AND (c1 >= 20) AND (c1 < 30))
Access method: heap

-- 

With Regards,

Prabhat Kumar Sahu


Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Julien Rouhaud
On Mon, Jul 1, 2019 at 11:21 PM Peter Geoghegan  wrote:
>
> On Mon, Jul 1, 2019 at 1:34 PM Julien Rouhaud  wrote:
> > I have a vague recollection that ICU was providing some backward
> > compatibility so that even if you upgrade your lib you can still get
> > the sort order that was active when you built your indexes, though
> > maybe for a limited number of versions.
>
> That isn't built in. Another database system that uses ICU handles
> this by linking to multiple versions of ICU, each with its own UCA
> version and associated collations. I don't think that we want to go
> there, so it makes sense to make an upgrade that crosses ICU or glibc
> versions as painless as possible.
>
> Note that ICU does at least provide a standard way to use multiple
> versions at once; the symbol names have the ICU version baked in.
> You're actually calling the functions using the versioned symbol names
> without realizing it, because there is macro trickery involved.

Ah, thanks for the clarification!




Re: Refactoring base64 encoding and decoding into a safer interface

2019-07-02 Thread Michael Paquier
On Tue, Jul 02, 2019 at 09:56:03AM +0200, Daniel Gustafsson wrote:
> I’m not sure we need to go to that length, but I don’t have overly strong
> opinions.  I think of this more like a case of “we’ve changed the API with new
> errorcases that we didn’t handle before, so we’re being a little defensive to
> help you avoid subtle bugs”.

I quite like this suggestion.

> Looks good, passes tests, provides value to the code.  Bumping this to ready
> for committer as I no more comments to add.

Thanks.  I'll look at that again in a couple of days, let's see if
others have any input to offer.
--
Michael


signature.asc
Description: PGP signature


Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Tomas Vondra

On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote:

Hi,

With the glibc 2.28 coming, all users will have to reindex almost
every indexes after a glibc upgrade to guarantee the lack of
corruption.  Unfortunately, reindexdb is not ideal for that as it's
processing everything using a single connexion and isn't able to
discard indexes that doesn't depend on a glibc collation.

PFA a patchset to add parallelism to reindexdb (reusing the
infrastructure in vacuumdb with some additions) and an option to
discard indexes that doesn't depend on glibc (without any specific
collation filtering or glibc version detection), with updated
regression tests.  Note that this should be applied on top of the
existing reindexdb cleanup & refactoring patch
(https://commitfest.postgresql.org/23/2115/).

This was sponsored by VMware, and has been discussed internally with
Kevin and Michael, in Cc.


I wonder why this is necessary:

pg_log_error("cannot reindex glibc dependent objects and a subset of objects");

What's the reasoning behind that? It seems like a valid use case to me -
imagine you have a bug database, but only a couple of tables are used by
the application regularly (the rest may be archive tables, for example).
Why not to allow rebuilding glibc-dependent indexes on the used tables, so
that the database can be opened for users sooner.

BTW now that we allow rebuilding only some of the indexes, it'd be great
to have a dry-run mode, were we just print which indexes will be rebuilt
without actually rebuilding them.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Julien Rouhaud
On Tue, Jul 2, 2019 at 9:19 AM Peter Eisentraut
 wrote:
>
> On 2019-07-01 22:46, Alvaro Herrera wrote:
> > On 2019-Jul-02, Thomas Munro wrote:
> >> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud  wrote:
> >>> Even if that's just me being delusional, I'd still prefer Alvaro's
> >>> approach to have distinct switches for each collation system.
> >>
> >> Makes sense.  But why use the name "glibc" in the code and user
> >> interface?  The name of the collation provider in PostgreSQL is "libc"
> >> (for example in the CREATE COLLATION command), and the problem applies
> >> no matter who makes your libc.
> >
> > Makes sense.  "If your libc is glibc and you go across an upgrade over
> > version X, please use --include-rule=libc-collation"
>
> I think it might be better to put the logic of what indexes are
> collation affected etc. into the backend REINDEX command.  We are likely
> to enhance the collation version and dependency tracking over time,
> possibly soon, possibly multiple times, and it would be very cumbersome
> to have to keep updating reindexdb with this.  Moreover, since for
> performance you likely want to reindex by table, implementing a logic of
> "reindex all collation-affected indexes on this table" would be much
> easier to do in the backend.

That's a great idea, and would make the parallelism in reindexdb much
simpler.  There's however a downside, as users won't have a way to
benefit from index filtering until they upgrade to this version.  OTOH
glibc 2.28 is already there, and a hypothetical fancy reindexdb is far
from being released.




Re: [PATCH] Implement uuid_version()

2019-07-02 Thread Jose Luis Tallon

On 2/7/19 9:26, Peter Eisentraut wrote:

On 2019-06-30 14:50, Fabien COELHO wrote:

I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if
it is available in core?

That would probably require an extension version update dance in
pgcrypto.  I'm not sure if it's worth that.  Thoughts?


What I have devised for my upcoming patch series is to use a 
compatibility "shim" that calls the corresponding core code when the 
expected usage does not match the new names/signatures...


This way we wouldn't even need to version bump pgcrypto (full backwards 
compatibility -> no bump needed). Another matter is whether this should 
raise some "deprecation warning" or the like; I don't think we have any 
such mechanisms available yet.



FWIW, I'm implementing an "alias" functionality for extensions, too, in 
order to achieve transparent (for the user) extension renames.


HTH


Thanks,

    / J.L.






Re: mcvstats serialization code is still shy of a load

2019-07-02 Thread Tomas Vondra

On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a slightly improved version of the serialization patch.


I reviewed this patch, and tested it on hppa and ppc.  I found one
serious bug: in the deserialization varlena case, you need

-   dataptr += MAXALIGN(len);
+   dataptr += MAXALIGN(len + VARHDRSZ);

(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
plus the Assert a few lines further down about dataptr having been
advanced by the correct amount fires.  (On one machine I tested on,
that happened during the core regression tests.  The other machine
got through regression, but trying to do "select * from pg_stats_ext;"
afterwards exhibited the crash.  I didn't investigate closely, but
I suspect the difference has to do with different MAXALIGN values,
4 and 8 respectively.)

The attached patch (a delta atop your v2) corrects that plus some
cosmetic issues.



Thanks.


If we're going to push this, it would be considerably less complicated
to do so before v12 gets branched --- not long after that, there will be
catversion differences to cope with.  I'm planning to make the branch
tomorrow (Monday), probably ~1500 UTC.  Just sayin'.



Unfortunately, I was travelling on Sunday and was quite busy on Monday, so
I've been unable to push this before the branching :-(

I'll push by the end of this week, once I get home.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Julien Rouhaud
On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra
 wrote:
>
> I wonder why this is necessary:
>
> pg_log_error("cannot reindex glibc dependent objects and a subset of 
> objects");
>
> What's the reasoning behind that? It seems like a valid use case to me -
> imagine you have a bug database, but only a couple of tables are used by
> the application regularly (the rest may be archive tables, for example).
> Why not to allow rebuilding glibc-dependent indexes on the used tables, so
> that the database can be opened for users sooner.

It just seemed wrong to me to allow a partial processing for something
that's aimed to prevent corruption.  I'd think that if users are
knowledgeable enough to only reindex a subset of indexes/tables in
such cases, they can also discard indexes that don't get affected by a
collation lib upgrade.  I'm not strongly opposed to supporting if
though, as there indeed can be valid use cases.

> BTW now that we allow rebuilding only some of the indexes, it'd be great
> to have a dry-run mode, were we just print which indexes will be rebuilt
> without actually rebuilding them.

+1.  If we end up doing the filter in the backend, we'd have to add
such option in the REINDEX command, and actually issue all the orders
to retrieve the list.




Re: Index Skip Scan

2019-07-02 Thread Thomas Munro
On Fri, Jun 21, 2019 at 1:20 AM Jesper Pedersen
 wrote:
> Attached is v20, since the last patch should have been v19.

I took this for a quick spin today.  The DISTINCT ON support is nice
and I think it will be very useful.  I've signed up to review it and
will have more to say later.  But today I had a couple of thoughts
after looking into how src/backend/optimizer/plan/planagg.c works and
wondering how to do some more skipping tricks with the existing
machinery.

1.  SELECT COUNT(DISTINCT i) FROM t could benefit from this.  (Or
AVG(DISTINCT ...) or any other aggregate).  Right now you get a seq
scan, with the sort/unique logic inside the Aggregate node.  If you
write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get
a skip scan that is much faster in good cases.  I suppose you could
have a process_distinct_aggregates() in planagg.c that recognises
queries of the right form and generates extra paths a bit like
build_minmax_path() does.  I think it's probably better to consider
that in the grouping planner proper instead.  I'm not sure.

2.  SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if
you're allowed to go forwards.  Same for SELECT i, MAX(j) FROM t GROUP
BY i if you're allowed to go backwards.  Those queries are equivalent
to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC]
(though as Floris noted, the backwards version gives the wrong answers
with v20).  That does seem like a much more specific thing applicable
only to MIN and MAX, and I think preprocess_minmax_aggregates() could
be taught to handle that sort of query, building an index only scan
path with skip scan in build_minmax_path().

-- 
Thomas Munro
https://enterprisedb.com




d25ea01275 and partitionwise join

2019-07-02 Thread Amit Langote
Hi Tom,

I think an assumption of d25ea01275 breaks partitionwise join.  Sorry
it took me a while to report it.

In https://postgr.es/m/8168.1560446...@sss.pgh.pa.us, Tom wrote:
> I poked into this and found the cause.  For the sample query, we have
> an EquivalenceClass containing the expression
> COALESCE(COALESCE(Var_1_1, Var_2_1), Var_3_1)
> where each of the Vars belongs to an appendrel parent.
> add_child_rel_equivalences() needs to add expressions representing the
> transform of that to each child relation.  That is, if the children
> of table 1 are A1 and A2, of table 2 are B1 and B2, and of table 3
> are C1 and C2, what we'd like to add are the expressions
> COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_3_1)
> COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_3_1)
> COALESCE(COALESCE(Var_1_1, Var_B1_1), Var_3_1)
> COALESCE(COALESCE(Var_1_1, Var_B2_1), Var_3_1)
> COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C1_1)
> COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C2_1)
> However, what it's actually producing is additional combinations for
> each appendrel after the first, because each call also mutates the
> previously-added child expressions.  So in this example we also get
> COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_3_1)
> COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_3_1)
> COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_C1_1)
> COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_C2_1)
> COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_C1_1)
> COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_C2_1)
> With two appendrels involved, that's O(N^2) expressions; with
> three appendrels, more like O(N^3).
>
> This is by no means specific to FULL JOINs; you could get the same
> behavior with join clauses like "WHERE t1.a + t2.b + t3.c = t4.d".
>
> These extra expressions don't have any use, since we're not
> going to join the children directly to each other.

...unless partition wise join thinks they can be joined.  Partition
wise join can't handle 3-way full joins today, but only because it's
broken itself when trying to match a full join clause to the partition
key due to one side being a COALESCE expression.  Consider this
example query:

-- p is defined as:
-- create table p (a int) partition by list (a);
-- create table p1 partition of p for values in (1);
-- create table p2 partition of p for values in (2);
explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
   QUERY PLAN
─
 Sort  (cost=16416733.32..16628145.85 rows=84565012 width=4)
   Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a))
   ->  Merge Full Join  (cost=536957.40..1813748.77 rows=84565012 width=4)
 Merge Cond: (t4.a = (COALESCE(COALESCE(t1.a, t2.a), t3.a)))
 ->  Sort  (cost=410.57..423.32 rows=5100 width=4)
   Sort Key: t4.a
   ->  Append  (cost=0.00..96.50 rows=5100 width=4)
 ->  Seq Scan on p1 t4  (cost=0.00..35.50 rows=2550 width=4)
 ->  Seq Scan on p2 t4_1  (cost=0.00..35.50
rows=2550 width=4)
 ->  Materialize  (cost=536546.83..553128.21 rows=3316275 width=12)
   ->  Sort  (cost=536546.83..544837.52 rows=3316275 width=12)
 Sort Key: (COALESCE(COALESCE(t1.a, t2.a), t3.a))
 ->  Merge Full Join  (cost=14254.85..64024.48
rows=3316275 width=12)
   Merge Cond: (t3.a = (COALESCE(t1.a, t2.a)))
   ->  Sort  (cost=410.57..423.32 rows=5100 width=4)
 Sort Key: t3.a
 ->  Append  (cost=0.00..96.50
rows=5100 width=4)
   ->  Seq Scan on p1 t3
(cost=0.00..35.50 rows=2550 width=4)
   ->  Seq Scan on p2 t3_1
(cost=0.00..35.50 rows=2550 width=4)
   ->  Sort  (cost=13844.29..14169.41
rows=130050 width=8)
 Sort Key: (COALESCE(t1.a, t2.a))
 ->  Merge Full Join
(cost=821.13..2797.38 rows=130050 width=8)
   Merge Cond: (t1.a = t2.a)
   ->  Sort  (cost=410.57..423.32
rows=5100 width=4)
 Sort Key: t1.a
 ->  Append
(cost=0.00..96.50 rows=5100 width=4)
   ->  Seq Scan on p1
t1  (cost=0.00..35.50 rows=2550 width=4)
   ->  Seq Scan on p2
t1_1  (cost=0.00..35.50 rows=2550 width=4)
   ->  Sort  (cost=410.57..423.32
rows=5100 width=4)
  

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-02 Thread Tomas Vondra

On Tue, Jul 02, 2019 at 10:45:44AM +0200, Julien Rouhaud wrote:

On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra
 wrote:


I wonder why this is necessary:

pg_log_error("cannot reindex glibc dependent objects and a subset of objects");

What's the reasoning behind that? It seems like a valid use case to me -
imagine you have a bug database, but only a couple of tables are used by
the application regularly (the rest may be archive tables, for example).
Why not to allow rebuilding glibc-dependent indexes on the used tables, so
that the database can be opened for users sooner.


It just seemed wrong to me to allow a partial processing for something
that's aimed to prevent corruption.  I'd think that if users are
knowledgeable enough to only reindex a subset of indexes/tables in
such cases, they can also discard indexes that don't get affected by a
collation lib upgrade.  I'm not strongly opposed to supporting if
though, as there indeed can be valid use cases.



I don't know, it just seems like an unnecessary limitation.


BTW now that we allow rebuilding only some of the indexes, it'd be great
to have a dry-run mode, were we just print which indexes will be rebuilt
without actually rebuilding them.


+1.  If we end up doing the filter in the backend, we'd have to add
such option in the REINDEX command, and actually issue all the orders
to retrieve the list.


Hmmm, yeah. FWIW I'm not requesting v0 to have that feature, but it'd be
good to design the feature in a way that allows adding it later.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: pg_waldump and PREPARE

2019-07-02 Thread Julien Rouhaud
On Fri, Apr 26, 2019 at 5:38 AM Michael Paquier  wrote:
>
> On Thu, Apr 25, 2019 at 03:08:36PM -0400, Alvaro Herrera wrote:
> > On 2019-Apr-26, Fujii Masao wrote:
> >> I did that arrangement because the format of PREPARE TRANSACTION record,
> >> i.e., that struct, also needs to be accessed in backend and frontend.
> >> But, of course, if there is smarter way, I'm happy to adopt that!
> >
> > I don't know.  I spent some time staring at the code involved, and it
> > seems it'd be possible to improve just a little bit on cleanliness
> > grounds, with a lot of effort, but not enough practical value.
>
> Describing those records is something we should do.  There are other
> parsing routines in xactdesc.c for commit and abort records, so having
> that extra routine for prepare at the same place does not sound
> strange to me.
>
> +typedef xl_xact_prepare TwoPhaseFileHeader;
> I find this mapping implementation a bit lazy, and your
> newly-introduced xl_xact_prepare does not count for all the contents
> of the actual WAL record for PREPARE TRANSACTION.  Wouldn't it be
> better to put all the contents of the record in the same structure,
> and not only the 2PC header information?

This patch doesn't apply anymore, could you send a rebase?




Re: Introduce MIN/MAX aggregate functions to pg_lsn

2019-07-02 Thread Surafel Temesgen
Hi,
Here are same review comment
-  any numeric, string, date/time, network, or enum type,
+  any numeric, string, date/time, network, lsn, or enum type,
  or arrays of these types
   same as argument type
In the documentation it refereed as pg_lsn type rather than lsn alone
+Datum
+pg_lsn_larger(PG_FUNCTION_ARGS)
+{
+ XLogRecPtr lsn1 = PG_GETARG_LSN(0);
+ XLogRecPtr lsn2 = PG_GETARG_LSN(1);
+ XLogRecPtr result;
+
+ result = ((lsn1 > lsn2) ? lsn1 : lsn2);
+
+ PG_RETURN_LSN(result);
+}

rather than using additional variable its more readable and effective to
return the argument
itself like we do in date data type and other place
regards
Surafel


Re: Choosing values for multivariate MCV lists

2019-07-02 Thread Tomas Vondra

On Mon, Jul 01, 2019 at 12:02:28PM +0100, Dean Rasheed wrote:

On Sat, 29 Jun 2019 at 14:01, Tomas Vondra  wrote:


>However, it looks like the problem is with mcv_list_items()'s use
>of %f to convert to text, which is pretty ugly.

>>>There's one issue with the signature, though - currently the function
>>>returns null flags as bool array, but values are returned as simple
>>>text value (formatted in array-like way, but still just a text).
>>>
>>IMO fixing this to return a text array is worth doing, even though it
>>means a catversion bump.
>>
Attached is a cleaned-up version of that patch. The main difference is
that instead of using construct_md_array() this uses ArrayBuildState to
construct the arrays, which is much easier. The docs don't need any
update because those were already using text[] for the parameter, the
code was inconsistent with it.



Cool, this looks a lot neater and fixes the issues discussed with both
floating point values no longer being converted to text, and proper
text arrays for values.

One minor nitpick -- it doesn't seem to be necessary to build the
arrays *outfuncs and *fmgrinfo. You may as well just fetch the
individual column output function information at the point where it's
used (in the "if (!item->isnull[i])" block) rather than building those
arrays.



OK, thanks for the feedback. I'll clean-up the function lookup.




This does require catversion bump, but as annoying as it is, I think
it's worth it (and there's also the thread discussing the serialization
issues). Barring objections, I'll get it committed later next week, once
I get back from PostgresLondon.

As I mentioned before, if we don't want any additional catversion bumps,
it's possible to pass the arrays through output functions - that would
allow us keeping the text output (but correct, unlike what we have now).



I think this is a clear bug fix, so I'd vote for fixing it properly
now, with a catversion bump.



I agree.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Index Skip Scan

2019-07-02 Thread David Rowley
On Tue, 2 Jul 2019 at 21:00, Thomas Munro  wrote:
> I took this for a quick spin today.  The DISTINCT ON support is nice
> and I think it will be very useful.  I've signed up to review it and
> will have more to say later.  But today I had a couple of thoughts
> after looking into how src/backend/optimizer/plan/planagg.c works and
> wondering how to do some more skipping tricks with the existing
> machinery.
>
> 1.  SELECT COUNT(DISTINCT i) FROM t could benefit from this.  (Or
> AVG(DISTINCT ...) or any other aggregate).  Right now you get a seq
> scan, with the sort/unique logic inside the Aggregate node.  If you
> write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get
> a skip scan that is much faster in good cases.  I suppose you could
> have a process_distinct_aggregates() in planagg.c that recognises
> queries of the right form and generates extra paths a bit like
> build_minmax_path() does.  I think it's probably better to consider
> that in the grouping planner proper instead.  I'm not sure.

I think to make that happen we'd need to do a bit of an overhaul in
nodeAgg.c to allow it to make use of presorted results instead of
having the code blindly sort rows for each aggregate that has a
DISTINCT or ORDER BY.  The planner would also then need to start
requesting paths with pathkeys that suit the aggregate and also
probably dictate the order the AggRefs should be evaluated to allow
all AggRefs to be evaluated that can be for each sort order.  Once
that part is done then the aggregates could then also request paths
with certain "UniqueKeys" (a feature I mentioned in [1]), however we'd
need to be pretty careful with that one since there may be other parts
of the query that require that all rows are fed in, not just 1 row per
value of "i", e.g SELECT COUNT(DISTINCT i) FROM t WHERE z > 0; can't
just feed through 1 row for each "i" value, since we need only the
ones that have "z > 0".  Getting the first part of this solved is much
more important than making skip scans work here, I'd say. I think we
need to be able to walk before we can run with DISTINCT  / ORDER BY
aggs.

> 2.  SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if
> you're allowed to go forwards.  Same for SELECT i, MAX(j) FROM t GROUP
> BY i if you're allowed to go backwards.  Those queries are equivalent
> to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC]
> (though as Floris noted, the backwards version gives the wrong answers
> with v20).  That does seem like a much more specific thing applicable
> only to MIN and MAX, and I think preprocess_minmax_aggregates() could
> be taught to handle that sort of query, building an index only scan
> path with skip scan in build_minmax_path().

For the MIN query you just need a path with Pathkeys: { i ASC, j ASC
}, UniqueKeys: { i, j }, doing the MAX query you just need j DESC.

The more I think about these UniqueKeys, the more I think they need to
be a separate concept to PathKeys. For example, UniqueKeys: { x, y }
should be equivalent to { y, x }, but with PathKeys, that's not the
case, since the order of each key matters. UniqueKeys equivalent
version of pathkeys_contained_in() would not care about the order of
individual keys, it would say things like, { a, b, c } is contained in
{ b, a }, since if the path is unique on columns { b, a } then it must
also be unique on { a, b, c }.

[1] 
https://www.postgresql.org/message-id/cakjs1f86fgoduunhiq25rkeues4qtqenxm1qbqjwrbozxvg...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: [HACKERS] Custom compression methods

2019-07-02 Thread Ildus K
On Mon, 1 Jul 2019 at 17:28, Alexander Korotkov 
wrote:

> On Mon, Jul 1, 2019 at 5:51 PM Alvaro Herrera 
> wrote:
> > On 2019-Jul-01, Alexander Korotkov wrote:
> >
> > > As I get we're currently need to make high-level decision of whether
> > > we need this [1].  I was going to bring this topic up at last PGCon,
> > > but I didn't manage to attend.  Does it worth bothering Ildus with
> > > continuous rebasing assuming we don't have this high-level decision
> > > yet?
> >
> > I agree that having to constantly rebase a patch that doesn't get acted
> > upon is a bit pointless.  I see a bit of a process problem here: if the
> > patch doesn't apply, it gets punted out of commitfest and reviewers
> > don't look at it.  This means the discussion goes unseen and no
> > decisions are made.  My immediate suggestion is to rebase even if other
> > changes are needed.
>
> OK, let's do this assuming Ildus didn't give up yet :)
>

No, I still didn't give up :)
I'm going to post rebased version in few days. I found that are new
conflicts with
a slice decompression, not sure how to figure out them for now.

Also I was thinking maybe there is a point to add possibility to compress
any data
that goes to some column despite toast threshold size. In our company we
have
types that could benefit from compression even on smallest blocks.

Since pluggable storages were committed I think I should notice that
compression
methods also can be used by them and are not supposed to work only with
toast tables.
Basically it's just an interface to call compression functions which are
related with some column.

Best regards,
Ildus Kurbangaliev


Re: progress report for ANALYZE

2019-07-02 Thread Julien Rouhaud
On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera  wrote:
>
> Here's a patch that implements progress reporting for ANALYZE.

Patch applies, code and doc and compiles cleanly.  I have few comments:

@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
if (numrows > 0)
{
MemoryContext col_context,
-   old_context;
+ old_context;
+   const int   index[] = {
+   PROGRESS_ANALYZE_PHASE,
+   PROGRESS_ANALYZE_TOTAL_BLOCKS,
+   PROGRESS_ANALYZE_BLOCKS_DONE
+   };
+   const int64 val[] = {
+   PROGRESS_ANALYZE_PHASE_ANALYSIS,
+   0, 0
+   };
+
+   pgstat_progress_update_multi_param(3, index, val);
[...]
}
+   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_COMPLETE);
+
If there wasn't any row but multiple blocks were scanned, the
PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
the blocks that were scanned.  I'm not sure if we should stay
consistent here.

diff --git a/src/backend/utils/adt/pgstatfuncs.c
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
/* Translate command name into command type code. */
if (pg_strcasecmp(cmd, "VACUUM") == 0)
cmdtype = PROGRESS_COMMAND_VACUUM;
+   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+   cmdtype = PROGRESS_COMMAND_ANALYZE;
else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
cmdtype = PROGRESS_COMMAND_CLUSTER;
else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)

it should be an "else if" here.

Everything else LGTM.




Re: New EXPLAIN option: ALL

2019-07-02 Thread Peter Eisentraut
On 2019-05-18 19:39, David Fetter wrote:
> On Wed, May 15, 2019 at 09:32:31AM -0400, Tom Lane wrote:
>> David Fetter  writes:
>>> I hope the patch is a little easier to digest as now attached.
>>
>> To be blunt, I find 500K worth of changes in the regression test
>> outputs to be absolutely unacceptable, especially when said changes
>> are basically worthless from a diagnostic standpoint.  There are at
>> least two reasons why this won't fly:
> 
> Here's a patch set with a much smaller change.  Will that fly?

This appears to be the patch of record for this commit fest.

I don't sense much enthusiasm for this change.  What is the exact
rationale for this proposal?

I think using a new keyword EXEC that is similar to an existing one
EXECUTE will likely just introduce a new class of confusion.  (ANALYZE
EXEC EXECUTE ...?)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: C testing for Postgres

2019-07-02 Thread Adam Berlin
> Just to clarify what Adam is proposing in this thread is *not* a fault
> injection framework.
>

Yes, thanks for clarifying Ashwin.

Sorry Michael, this testing framework is more like these other frameworks:

Java with Junit + Hamcrest: http://hamcrest.org/JavaHamcrest/tutorial
Ruby with Rspec:
https://rspec.info/documentation/3.8/rspec-expectations/#Built-in_matchers
Javascript with Jasmine: https://jasmine.github.io/


>


Re: Introduce MIN/MAX aggregate functions to pg_lsn

2019-07-02 Thread Fabrízio de Royes Mello
On Tue, Jul 2, 2019 at 7:22 AM Surafel Temesgen 
wrote:
>
> Hi,
> Here are same review comment

Thanks for your review.

> -  any numeric, string, date/time, network, or enum type,
> +  any numeric, string, date/time, network, lsn, or enum type,
>   or arrays of these types
>same as argument type
> In the documentation it refereed as pg_lsn type rather than lsn alone

Fixed.

> +Datum
> +pg_lsn_larger(PG_FUNCTION_ARGS)
> +{
> + XLogRecPtr lsn1 = PG_GETARG_LSN(0);
> + XLogRecPtr lsn2 = PG_GETARG_LSN(1);
> + XLogRecPtr result;
> +
> + result = ((lsn1 > lsn2) ? lsn1 : lsn2);
> +
> + PG_RETURN_LSN(result);
> +}
>
> rather than using additional variable its more readable and effective to
return the argument
> itself like we do in date data type and other place
>

Fixed.

New version attached.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3a8581d..b7f746b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14804,7 +14804,7 @@ NULL baz(3 rows)

max(expression)
   
-  any numeric, string, date/time, network, or enum type,
+  any numeric, string, date/time, network, pg_lsn, or enum type,
  or arrays of these types
   same as argument type
   Yes
@@ -14822,7 +14822,7 @@ NULL baz(3 rows)

min(expression)
   
-  any numeric, string, date/time, network, or enum type,
+  any numeric, string, date/time, network, pg_lsn, or enum type,
  or arrays of these types
   same as argument type
   Yes
diff --git a/src/backend/utils/adt/pg_lsn.c b/src/backend/utils/adt/pg_lsn.c
index 4c329a8..b4c6c23 100644
--- a/src/backend/utils/adt/pg_lsn.c
+++ b/src/backend/utils/adt/pg_lsn.c
@@ -171,6 +171,24 @@ pg_lsn_ge(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(lsn1 >= lsn2);
 }
 
+Datum
+pg_lsn_larger(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	lsn1 = PG_GETARG_LSN(0);
+	XLogRecPtr	lsn2 = PG_GETARG_LSN(1);
+
+	PG_RETURN_LSN((lsn1 > lsn2) ? lsn1 : lsn2);
+}
+
+Datum
+pg_lsn_smaller(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	lsn1 = PG_GETARG_LSN(0);
+	XLogRecPtr	lsn2 = PG_GETARG_LSN(1);
+
+	PG_RETURN_LSN((lsn1 < lsn2) ? lsn1 : lsn2);
+}
+
 /* btree index opclass support */
 Datum
 pg_lsn_cmp(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 044695a..242d843 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -146,6 +146,9 @@
 { aggfnoid => 'max(inet)', aggtransfn => 'network_larger',
   aggcombinefn => 'network_larger', aggsortop => '>(inet,inet)',
   aggtranstype => 'inet' },
+{ aggfnoid => 'max(pg_lsn)', aggtransfn => 'pg_lsn_larger',
+  aggcombinefn => 'pg_lsn_larger', aggsortop => '>(pg_lsn,pg_lsn)',
+  aggtranstype => 'pg_lsn' },
 
 # min
 { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller',
@@ -208,6 +211,9 @@
 { aggfnoid => 'min(inet)', aggtransfn => 'network_smaller',
   aggcombinefn => 'network_smaller', aggsortop => '<(inet,inet)',
   aggtranstype => 'inet' },
+{ aggfnoid => 'min(pg_lsn)', aggtransfn => 'pg_lsn_smaller',
+  aggcombinefn => 'pg_lsn_smaller', aggsortop => '<(pg_lsn,pg_lsn)',
+  aggtranstype => 'pg_lsn' },
 
 # count
 { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8733524..aa8674c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6219,6 +6219,9 @@
 { oid => '3564', descr => 'maximum value of all inet input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
+{ oid => '8125', descr => 'maximum value of all pg_lsn input values',
+  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 
 { oid => '2131', descr => 'minimum value of all bigint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
@@ -6283,6 +6286,9 @@
 { oid => '3565', descr => 'minimum value of all inet input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
+{ oid => '8126', descr => 'minimum value of all pg_lsn input values',
+  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 
 # count has two forms: count(any) and count(*)
 { oid => '2147',
@@ -8385,6 +8391,12 @@
 { oid => '3413', descr => 'hash',
   proname => 'pg_lsn_hash_extended', prorettype => 'int8',
   proargtypes => 'pg_lsn int8', prosrc => 'pg_lsn_hash_extended' },
+{ oid => '8123', descr => 'larger of two',
+  proname => 'pg_lsn_larger', prorettype => 'pg_lsn',
+  proargtyp

TopoSort() fix

2019-07-02 Thread Rui Hai Jiang
Hi Hackers,

I think I found an issue in the TopoSort() function.

As the comments say,

/* .
 * .. If there are any other processes
 * in the same lock group on the queue, set their number of
 * beforeConstraints to -1 to indicate that they should be
emitted
 * with their groupmates rather than considered separately.
 */

If the line "break;" exists, there is no chance to set beforeConstraints to
-1 for other processes in the same lock group.

So, I think we need delete the line "break;" . See the patch.

I just took a look, and I found all the following versions have this line .


postgresql-12beta2, postgresql-12beta1, postgresql-11.4,
postgresql-11.3,postgresql-11.0,
postgresql-10.9,postgresql-10.5, postgresql-10.0


Thanks,
Ruihai


TopoSort.patch
Description: Binary data


Re: Optimize partial TOAST decompression

2019-07-02 Thread Paul Ramsey
On Mon, Jul 1, 2019 at 6:46 AM Binguo Bao  wrote:
> > Andrey Borodin  于2019年6月29日周六 下午9:48写道:
>> I've took a look into the code.
>> I think we should extract function for computation of max_compressed_size 
>> and put it somewhere along with pglz code. Just in case something will 
>> change something about pglz so that they would not forget about compression 
>> algorithm assumption.
>>
>> Also I suggest just using 64 bit computation to avoid overflows. And I think 
>> it worth to check if max_compressed_size is whole data and use min of 
>> (max_compressed_size, uncompressed_data_size).
>>
>> Also you declared needsize and max_compressed_size too far from use. But 
>> this will be solved by function extraction anyway.
>>
> Thanks for the suggestion.
> I've extracted function for computation for max_compressed_size and put the 
> function into pg_lzcompress.c.

This looks good to me. A little commentary around why
pglz_maximum_compressed_size() returns a universally correct answer
(there's no way the compressed size can ever be larger than this
because...) would be nice for peasants like myself.

If you're looking to continue down this code line in your next patch,
the next TODO item is a little more involved: a user-land (ala
PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow
the optimization of searching of large objects like JSONB types, and
so on, where the thing you are looking for is not at a known location
in the object. So, things like looking for a particular substring in a
string, or looking for a particular key in a JSONB. "Iterate until you
find the thing." would allow optimization of some code lines that
currently require full decompression of the objects.

P.




Re: progress report for ANALYZE

2019-07-02 Thread Anthony Nowocien
Hi,
In monitoring.sgml, "a" is missing in "row for ech backend that is
currently running that command[...]".
Anthony


On Tuesday, July 2, 2019, Julien Rouhaud  wrote:
> On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera 
wrote:
>>
>> Here's a patch that implements progress reporting for ANALYZE.
>
> Patch applies, code and doc and compiles cleanly.  I have few comments:
>
> @@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
> if (numrows > 0)
> {
> MemoryContext col_context,
> -   old_context;
> + old_context;
> +   const int   index[] = {
> +   PROGRESS_ANALYZE_PHASE,
> +   PROGRESS_ANALYZE_TOTAL_BLOCKS,
> +   PROGRESS_ANALYZE_BLOCKS_DONE
> +   };
> +   const int64 val[] = {
> +   PROGRESS_ANALYZE_PHASE_ANALYSIS,
> +   0, 0
> +   };
> +
> +   pgstat_progress_update_multi_param(3, index, val);
> [...]
> }
> +   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> +PROGRESS_ANALYZE_PHASE_COMPLETE);
> +
> If there wasn't any row but multiple blocks were scanned, the
> PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
> the blocks that were scanned.  I'm not sure if we should stay
> consistent here.
>
> diff --git a/src/backend/utils/adt/pgstatfuncs.c
> b/src/backend/utils/adt/pgstatfuncs.c
> index 05240bfd14..98b01e54fa 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
> /* Translate command name into command type code. */
> if (pg_strcasecmp(cmd, "VACUUM") == 0)
> cmdtype = PROGRESS_COMMAND_VACUUM;
> +   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
> +   cmdtype = PROGRESS_COMMAND_ANALYZE;
> else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
> cmdtype = PROGRESS_COMMAND_CLUSTER;
> else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
>
> it should be an "else if" here.
>
> Everything else LGTM.
>
>
>


Re: Hash join explain is broken

2019-07-02 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> Tom, any comments? Otherwise I'll go ahead, and commit after a round or
>> two of polishing.

> Sorry for not getting to this sooner --- I'll try to look tomorrow.

I took a look, and I think this is going in the right direction.
We definitely need a test case corresponding to the live bug,
and I think the comments could use more work, and there are some
cosmetic things (I wouldn't add the new struct Hash field at the
end, for instance).

regards, tom lane




Re: [PATCH] Implement uuid_version()

2019-07-02 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-06-30 14:50, Fabien COELHO wrote:
>> I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if 
>> it is available in core?

> That would probably require an extension version update dance in
> pgcrypto.  I'm not sure if it's worth that.  Thoughts?

We have some previous experience with this type of thing when we migrated
contrib/tsearch2 stuff into core.  I'm too caffeine-deprived to remember
exactly what we did or how well it worked.  But it seems advisable to go
study that history, because we could easily make things a mess for users
if we fail to consider their upgrade experience.

regards, tom lane




Re: POC: converting Lists into arrays

2019-07-02 Thread Tom Lane
Oleksandr Shulgin  writes:
> Not related to the diff v6..v7, but shouldn't we throw additionally a
> memset() with '\0' before calling pfree():

I don't see the point of that.  In debug builds CLOBBER_FREED_MEMORY will
take care of it, and in non-debug builds I don't see why we'd expend
the cycles.

regards, tom lane




Re: TopoSort() fix

2019-07-02 Thread Tom Lane
Rui Hai Jiang  writes:
> I think I found an issue in the TopoSort() function.

This indeed seems like a live bug introduced by a1c1af2a.
Robert?

regards, tom lane




Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-07-02 Thread Tom Lane
Robert Haas  writes:
> Long story short, I agree with you that most people probably don't
> care about this very much, but I also agree with Andrew that some of
> the current choices we're making are pretty strange, and I'm not
> convinced as you are that it's impossible to make a principled choice
> between alternatives in all cases. The upstream data appears to
> contain some information about intent; it's not just a jumble of
> exactly-equally-preferred alternatives.

I agree that if there were an easy way to discount the IANA "backward
compatibility" zone names, that'd likely be a reasonable thing to do.
The problem is that those names aren't distinguished from others in
the representation we have available to us (ie, the actual
/usr/share/zoneinfo file tree).  I'm dubious that relying on
zone[1970].tab would improve matters substantially; it would fix
some cases, but I don't think it would fix all of them.  Resolving
all ambiguous zone-name choices is not the charter of those files.

regards, tom lane




Re: Conflict handling for COPY FROM

2019-07-02 Thread Alexey Kondratov

On 28.06.2019 16:12, Alvaro Herrera wrote:

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund  wrote:

Or even just return it as a row. CopyBoth is relatively widely supported
these days.

i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

Warnings seem useless for this purpose.  I'm with Andres: returning rows
would make this a fine feature.  If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.


I agree with previous commentators that returning rows will make this 
feature more versatile. Though, having a possibility to simply skip 
conflicting/malformed rows is worth of doing from my perspective. 
However, pushing every single skipped row to the client as a separated 
WARNING will be too much for a bulk import. So maybe just overall stats 
about skipped rows number will be enough?


Also, I would prefer having an option to ignore all errors, e.g. with 
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate 
a number of future errors if you are playing with some badly structured 
data, while always setting it to 100500k looks ugly.


Anyway, below are some issues with existing code after a brief review of 
the patch:


1) Calculation of processed rows isn't correct (I've checked). You do it 
in two places, and


-            processed++;
+            if (!cstate->error_limit)
+                processed++;

is never incremented if ERROR_LIMIT is specified and no errors 
occurred/no constraints exist, so the result will always be 0. However, 
if primary column with constraints exists, then processed is calculated 
correctly, since another code path is used:


+                        if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;

I would prefer this calculation in a single place (as it was before 
patch) for simplicity and in order to avoid such problems.



2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT 
is specified and was exceeded, which doesn't seem to be correct, does it?


-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 && 
cstate->error_limit == 0)

                         recheckIndexes = ExecInsertIndexTuples(myslot,


3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data 
for column \"%s\" ",


+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2001    231    \N    \N"


Otherwise, the patch applies/compiles cleanly and regression tests are 
passed.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: New EXPLAIN option: ALL

2019-07-02 Thread David Fetter
On Tue, Jul 02, 2019 at 03:06:52PM +0100, Peter Eisentraut wrote:
> On 2019-05-18 19:39, David Fetter wrote:
> > On Wed, May 15, 2019 at 09:32:31AM -0400, Tom Lane wrote:
> >> David Fetter  writes:
> >>> I hope the patch is a little easier to digest as now attached.
> >>
> >> To be blunt, I find 500K worth of changes in the regression test
> >> outputs to be absolutely unacceptable, especially when said changes
> >> are basically worthless from a diagnostic standpoint.  There are at
> >> least two reasons why this won't fly:
> > 
> > Here's a patch set with a much smaller change.  Will that fly?
> 
> This appears to be the patch of record for this commit fest.
> 
> I don't sense much enthusiasm for this change.

Neither do I, so withdrawn.

I do hope we can go with EXPLAIN and PROFILE, as opposed to
EXPLAIN/EXPLAIN ANALYZE.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Add missing operator <->(box, point)

2019-07-02 Thread Nikita Glukhov

Attached 2nd version of the patches.

On 20.04.2019 16:41, Fabien COELHO wrote:


About the test, I'd suggest to name the result columns, eg "pt to box
dist" and "box to pt dist", otherwise why all is repeated is unclear.


Fixed.

On 02.07.2019 7:01, Tom Lane wrote:


[ warning, drive-by comment ahead ]

Fabien COELHO  writes:

I notice that other distance tests do not test for commutativity. Are they
also not implemented, or just not tested? If not implemented, I'd suggest
to add them in the same batch.

Yeah ... just looking at operators named <->, I see

regression=# select oid, oid::regoperator, oprcom, oprcode from pg_operator where 
oprname = '<->';
  oid  | oid  | oprcom |  oprcode
--+--++---
   517 | <->(point,point) |517 | point_distance
   613 | <->(point,line)  |  0 | dist_pl
   614 | <->(point,lseg)  |  0 | dist_ps
   615 | <->(point,box)   |  0 | dist_pb
   616 | <->(lseg,line)   |  0 | dist_sl
   617 | <->(lseg,box)|  0 | dist_sb
   618 | <->(point,path)  |  0 | dist_ppath
   706 | <->(box,box) |706 | box_distance
   707 | <->(path,path)   |707 | path_distance
   708 | <->(line,line)   |708 | line_distance
   709 | <->(lseg,lseg)   |709 | lseg_distance
   712 | <->(polygon,polygon) |712 | poly_distance
  1520 | <->(circle,circle)   |   1520 | circle_distance
  1522 | <->(point,circle)|   3291 | dist_pc
  3291 | <->(circle,point)|   1522 | dist_cpoint
  3276 | <->(point,polygon)   |   3289 | dist_ppoly
  3289 | <->(polygon,point)   |   3276 | dist_polyp
  1523 | <->(circle,polygon)  |  0 | dist_cpoly
  1524 | <->(line,box)|  0 | dist_lb
  5005 | <->(tsquery,tsquery) |  0 | pg_catalog.tsquery_phrase
(20 rows)

It's not clear to me why to be particularly more excited about
<->(box, point) than about the other missing cases here.

regards, tom lane


The original goal was to add support of ordering by distance to point to
all geometric opclasses.  As you can see, GiST and SP-GiST box_ops has no
distance operator while more complex circle_ops and poly_ops have it:

SELECT
  amname,
  opcname,
  amopopr::regoperator AS dist_opr
FROM
  pg_opclass LEFT JOIN
  pg_amop ON amopfamily = opcfamily AND amoppurpose = 'o',
  pg_am,
  pg_type
WHERE
  opcmethod = pg_am.oid AND
  opcintype = pg_type.oid AND
  typcategory = 'G'
ORDER BY 1, 2;


 amname |  opcname  |  dist_opr
+---+
 brin   | box_inclusion_ops |
 gist   | box_ops   |
 gist   | circle_ops| <->(circle,point)
 gist   | point_ops | <->(point,point)
 gist   | poly_ops  | <->(polygon,point)
 spgist | box_ops   |
 spgist | kd_point_ops  | <->(point,point)
 spgist | poly_ops  | <->(polygon,point)
 spgist | quad_point_ops| <->(point,point)
(9 rows)

We could use commuted "const <-> var" operators for kNN searches, but the
current implementation requires the existence of "var <-> const" operators, and
order-by-op clauses are rebuilt using them (see match_clause_to_ordering_op()
at /src/backend/optimizer/path/indxpath.c).



--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From cff9fbdcd9d3633a28665c0636e6d3a5a0e8512b Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Thu, 7 Mar 2019 20:22:49 +0300
Subject: [PATCH 1/3] Add operator <->(box, point)

---
 src/backend/utils/adt/geo_ops.c| 12 +
 src/include/catalog/pg_operator.dat|  5 +-
 src/include/catalog/pg_proc.dat|  3 ++
 src/test/regress/expected/geometry.out | 96 +-
 src/test/regress/sql/geometry.sql  |  2 +-
 5 files changed, 68 insertions(+), 50 deletions(-)

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 373784f..d8ecfe3 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -2419,6 +2419,18 @@ dist_pb(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Distance from a box to a point
+ */
+Datum
+dist_bp(PG_FUNCTION_ARGS)
+{
+	BOX		   *box = PG_GETARG_BOX_P(0);
+	Point	   *pt = PG_GETARG_POINT_P(1);
+
+	PG_RETURN_FLOAT8(box_closept_point(NULL, box, pt));
+}
+
+/*
  * Distance from a lseg to a line
  */
 Datum
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index bacafa5..ae5ed1e 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -666,7 +666,10 @@
   oprresult => 'float8', oprcode => 'dist_ps' },
 { oid => '615', descr => 'distance between',
   oprname => '<->', oprleft => 'point', oprright => 'box',
-  oprresult => 'float8', oprcode => 'dist_pb' },
+  oprresult => 'float8', oprcom => '<->(box,point)', oprcode => 'dist_pb' },
+{ oid => '606', descr => 'distance between',
+  oprname => '<->', oprleft => 'box', oprright => 'po

Re: Add missing operator <->(box, point)

2019-07-02 Thread Alexander Korotkov
On Tue, Jul 2, 2019 at 9:19 PM Nikita Glukhov  wrote:
> We could use commuted "const <-> var" operators for kNN searches, but the
> current implementation requires the existence of "var <-> const" operators, 
> and
> order-by-op clauses are rebuilt using them (see match_clause_to_ordering_op()
> at /src/backend/optimizer/path/indxpath.c).

But probably it's still worth to just add commutator for every <->
operator and close this question.  Otherwise, it may arise again once
we want to add some more kNN support to opclasses or something.  On
the other hand, are we already going to limit oid consumption?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH v5] Show detailed table persistence in \dt+

2019-07-02 Thread Tom Lane
David Fetter  writes:
> [ v7-0001-Show-detailed-relation-persistence-in-dt.patch ]

I looked this over and had a few suggestions, as per attached v8:

* The persistence description values ought to be translatable, as
is the usual practice in describe.c.  This is slightly painful
because it requires tweaking the translate_columns[] values in a
non-constant way, but it's not that bad.

* I dropped the "ELSE 'unknown'" bit in favor of just emitting NULL
if the persistence isn't recognized.  This is the same way that the
table-type CASE just above does it, and I see no reason to be different.
Moreover, there are message-style-guidelines issues with what to print
if you do want to print something; "unknown" doesn't cut it.

* I also dropped the logic for pre-9.1 servers, because the existing
precedent in describeOneTableDetails() is that we only consider
relpersistence for >= 9.1, and I don't see a real good reason to
deviate from that.  9.0 and before are long out of support anyway.

If there aren't objections, I think v8 is committable.

regards, tom lane

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 1c770b4..0e0af5f 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3632,7 +3632,8 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, true, false, false, false, false};
+	int			cols_so_far;
+	bool		translate_columns[] = {false, false, true, false, false, false, false, false};
 
 	/* If tabtypes is empty, we default to \dtvmsE (but see also command.c) */
 	if (!(showTables || showIndexes || showViews || showMatViews || showSeq || showForeign))
@@ -3672,15 +3673,40 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 	  gettext_noop("partitioned index"),
 	  gettext_noop("Type"),
 	  gettext_noop("Owner"));
+	cols_so_far = 4;
 
 	if (showIndexes)
+	{
 		appendPQExpBuffer(&buf,
-		  ",\n c2.relname as \"%s\"",
+		  ",\n  c2.relname as \"%s\"",
 		  gettext_noop("Table"));
+		cols_so_far++;
+	}
 
 	if (verbose)
 	{
 		/*
+		 * Show whether a relation is permanent, temporary, or unlogged.  Like
+		 * describeOneTableDetails(), we consider that persistence emerged in
+		 * v9.1, even though related concepts existed before.
+		 */
+		if (pset.sversion >= 90100)
+		{
+			appendPQExpBuffer(&buf,
+			  ",\n  CASE c.relpersistence WHEN 'p' THEN '%s' WHEN 't' THEN '%s' WHEN 'u' THEN '%s' END as \"%s\"",
+			  gettext_noop("permanent"),
+			  gettext_noop("temporary"),
+			  gettext_noop("unlogged"),
+			  gettext_noop("Persistence"));
+			translate_columns[cols_so_far] = true;
+		}
+
+		/*
+		 * We don't bother to count cols_so_far below here, as there's no need
+		 * to; this might change with future additions to the output columns.
+		 */
+
+		/*
 		 * As of PostgreSQL 9.0, use pg_table_size() to show a more accurate
 		 * size of a table, including FSM, VM and TOAST tables.
 		 */


Re: Add missing operator <->(box, point)

2019-07-02 Thread Tom Lane
Alexander Korotkov  writes:
> On Tue, Jul 2, 2019 at 9:19 PM Nikita Glukhov  wrote:
>> We could use commuted "const <-> var" operators for kNN searches, but the
>> current implementation requires the existence of "var <-> const" operators, 
>> and
>> order-by-op clauses are rebuilt using them (see match_clause_to_ordering_op()
>> at /src/backend/optimizer/path/indxpath.c).

> But probably it's still worth to just add commutator for every <->
> operator and close this question.

Yeah, I was just thinking that it was weird not to have the commutator
operators, independently of indexing considerations.  Seems like a
usability thing.

regards, tom lane




Re: [PATCH v5] Show detailed table persistence in \dt+

2019-07-02 Thread Alvaro Herrera
On 2019-Jul-02, Tom Lane wrote:

> * The persistence description values ought to be translatable, as
> is the usual practice in describe.c.  This is slightly painful
> because it requires tweaking the translate_columns[] values in a
> non-constant way, but it's not that bad.

LGTM.  I only fear that the cols_so_far thing is easy to break, and the
breakage will be easy to miss.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH v4] Add \warn to psql

2019-07-02 Thread Tom Lane
David Fetter  writes:
> [ v7-0001-Add-warn-to-psql.patch ]

I took a look at this.  I have no quibble with the proposed feature,
and the implementation is certainly simple enough.  But I'm unconvinced
about the proposed test scaffolding.  Spinning up a new PG instance is a
*hell* of a lot of overhead to pay for testing something that could be
tested as per attached.  Admittedly, the attached doesn't positively
prove which pipe each output string went down, but that does not strike
me as a concern large enough to justify adding a TAP test for.

I'd be happier about adding TAP infrastructure if it looked like it'd
be usable to test some of the psql areas that are unreachable by the
existing test methodology, particularly tab-complete.c and prompt.c.
But I don't see anything here that looks like it'll work for that.

I don't like what you did to command_checks_all, either --- it could
hardly say "bolted on after the fact" more clearly if you'd written
that in  text.  If we need an input-stream argument,
let's just add it in a rational place and adjust the callers.
There aren't that many of 'em, nor has the subroutine been around
all that long.

regards, tom lane

diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index 4bcf0cc..9b4060f 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -4180,6 +4180,29 @@ drop table psql_serial_tab;
 \pset format aligned
 \pset expanded off
 \pset border 1
+-- \echo and allied features
+\echo this is a test
+this is a test
+\echo -n this is a test without newline
+this is a test without newline\echo more test
+more test
+\set foo bar
+\echo foo = :foo
+foo = bar
+\qecho this is a test
+this is a test
+\qecho -n this is a test without newline
+this is a test without newline\qecho more test
+more test
+\qecho foo = :foo
+foo = bar
+\warn this is a test
+this is a test
+\warn -n this is a test without newline
+this is a test without newline\warn more test
+more test
+\warn foo = :foo
+foo = bar
 -- tests for \if ... \endif
 \if true
   select 'okay';
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 26f436a..2bae6c5 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -771,6 +771,24 @@ drop table psql_serial_tab;
 \pset expanded off
 \pset border 1
 
+-- \echo and allied features
+
+\echo this is a test
+\echo -n this is a test without newline
+\echo more test
+\set foo bar
+\echo foo = :foo
+
+\qecho this is a test
+\qecho -n this is a test without newline
+\qecho more test
+\qecho foo = :foo
+
+\warn this is a test
+\warn -n this is a test without newline
+\warn more test
+\warn foo = :foo
+
 -- tests for \if ... \endif
 
 \if true


Re: [PATCH v5] Show detailed table persistence in \dt+

2019-07-02 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jul-02, Tom Lane wrote:
>> * The persistence description values ought to be translatable, as
>> is the usual practice in describe.c.  This is slightly painful
>> because it requires tweaking the translate_columns[] values in a
>> non-constant way, but it's not that bad.

> LGTM.  I only fear that the cols_so_far thing is easy to break, and the
> breakage will be easy to miss.

Yeah, but that's pretty true of all the translatability stuff in
describe.c.  I wonder if there's any way to set up tests for that.
The fact that the .po files lag so far behind the source code seems
like an impediment --- even if we made a test case that presumed
--enable-nls and tried to exercise this, the lack of translations
for the new words would get in the way for a long while.

regards, tom lane




Re: [PATCH v5] Show detailed table persistence in \dt+

2019-07-02 Thread Daniel Gustafsson
> On 2 Jul 2019, at 22:16, Tom Lane  wrote:

> even if we made a test case that presumed
> --enable-nls and tried to exercise this, the lack of translations
> for the new words would get in the way for a long while.

For testing though, couldn’t we have an autogenerated .po which has a unique
and predictable dummy value translation for every string (the string backwards
or something), which can be used for testing?  This is all hand-wavy since I
haven’t tried actually doing it, but it seems a better option than waiting for
.po files to be available.  Or am I missing the point of the value of the
discussed test?

cheers ./daniel



Re: [PATCH v5] Show detailed table persistence in \dt+

2019-07-02 Thread Alvaro Herrera
On 2019-Jul-02, Daniel Gustafsson wrote:

> > On 2 Jul 2019, at 22:16, Tom Lane  wrote:
> 
> > even if we made a test case that presumed
> > --enable-nls and tried to exercise this, the lack of translations
> > for the new words would get in the way for a long while.
> 
> For testing though, couldn’t we have an autogenerated .po which has a unique
> and predictable dummy value translation for every string (the string backwards
> or something), which can be used for testing?  This is all hand-wavy since I
> haven’t tried actually doing it, but it seems a better option than waiting for
> .po files to be available.  Or am I missing the point of the value of the
> discussed test?

Hmm, no, I think that's precisely it, and that sounds like a pretty good
starter idea ... but I wouldn't want to be the one to have to set this
up -- it seems pretty laborious.

Anyway I'm not objecting to the patch -- I agree that we're already not
testing translatability and that this patch shouldn't be forced to start
doing it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH v5] Show detailed table persistence in \dt+

2019-07-02 Thread Daniel Gustafsson
> On 2 Jul 2019, at 22:35, Alvaro Herrera  wrote:

> Anyway I'm not objecting to the patch -- I agree that we're already not
> testing translatability and that this patch shouldn't be forced to start
> doing it.

I forgot to add that to my previous email, the patch as it stands in v8 looks
good to me. I’ve missed having this on many occasions.

cheers ./daniel



Re: "long" type is not appropriate for counting tuples

2019-07-02 Thread Tom Lane
Peter Eisentraut  writes:
> Attached is a patch to implement this in a handful of cases that are
> particularly verbose right now.  I think those are easy wins.
> (Also a second patch that makes use of %zu for size_t where this was not
> yet done.)

I took a look through these and see nothing objectionable.  There are
probably more places that can be improved, but we need not insist on
getting every such place in one go.

Per Robert's position that variables ought to have well-defined widths,
there might be something to be said for not touching the variable
declarations that you changed from int64 to long long, and instead
casting them to long long in the sprintf calls.  But I'm not really
convinced that that's better than what you've done.

Marked CF entry as ready-for-committer.

regards, tom lane




Re: benchmarking Flex practices

2019-07-02 Thread Tom Lane
John Naylor  writes:
> 0001 is a small patch to remove some unneeded generality from the
> current rules. This lowers the number of elements in the yy_transition
> array from 37045 to 36201.

I don't particularly like 0001.  The two bits like this

-whitespace ({space}+|{comment})
+whitespace ({space}|{comment})

seem likely to create performance problems for runs of whitespace, in that
the lexer will now have to execute the associated action once per space
character not just once for the whole run.  Those actions are empty, but
I don't think flex optimizes for that, and it's really flex's per-action
overhead that I'm worried about.  Note the comment in the "Performance"
section of the flex manual:

Another area where the user can increase a scanner's performance (and
one that's easier to implement) arises from the fact that the longer
the tokens matched, the faster the scanner will run.  This is because
with long tokens the processing of most input characters takes place
in the (short) inner scanning loop, and does not often have to go
through the additional work of setting up the scanning environment
(e.g., `yytext') for the action.

There are a bunch of higher-order productions that use "{whitespace}*",
which is surely a bit redundant given the contents of {whitespace}.
But maybe we could address that by replacing "{whitespace}*" with
"{opt_whitespace}" defined as

opt_whitespace  ({space}*|{comment})

Not sure what impact if any that'd have on table size, but I'm quite sure
that {whitespace} was defined with an eye to avoiding unnecessary
lexer action cycles.

As for the other two bits that are like

-.  {
-   /* This is only needed for \ just 
before EOF */
+\\ {

my recollection is that those productions are defined that way to avoid a
flex warning about not all possible input characters being accounted for
in the  (resp. ) state.  Maybe that warning is
flex-version-dependent, or maybe this was just a worry and not something
that actually produced a warning ... but I'm hesitant to change it.
If we ever did get to flex's default action, that action is to echo the
current input character to stdout, which would be Very Bad.

As far as I can see, the point of 0002 is to have just one set of
flex rules for the various variants of quotecontinue processing.
That sounds OK, though I'm a bit surprised it makes this much difference
in the table size. I would suggest that "state_before" needs a less
generic name (maybe "state_before_xqs"?) and more than no comment.
Possibly more to the point, it's not okay to have static state variables
in the core scanner, so that variable needs to be kept in yyextra.
(Don't remember offhand whether it's any more acceptable in the other
scanners.)

regards, tom lane




Re: Fix two issues after moving to unified logging system for command-line utils

2019-07-02 Thread Peter Eisentraut
On 2019-07-01 16:18, Alexey Kondratov wrote:
> I have found two minor issues with unified logging system for 
> command-line programs (commited by Peter cc8d415117), while was rebasing 
> my pg_rewind patch:

Fixed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Adversarial case for "many duplicates" nbtree split strategy in v12

2019-07-02 Thread Peter Geoghegan
The single largest benefit of the v12 nbtree enhancement was its
adaptability with indexes where a portion of the key space contains
many duplicates. Successive page splits choose split points in a way
that leaves duplicates together on their own page, and eventually pack
pages full of duplicates.

I have thought up a specific case where the logic can be fooled into
consistently doing the wrong thing, leading to very poor space
utilization:

drop table if exists descnums;
create table descnums(nums int4);
create index bloat_desc on descnums (nums desc nulls first);
-- Fill first leaf page (leaf root page) with NULL values, to the
point where it almost splits:
insert into descnums select null from generate_series(0,400);
-- Insert integers, which will be treated as descending insertions within index:
insert into descnums select i from generate_series(0,1) i;
-- Observe if we had 50:50 page splits here:
create extension if not exists pgstattuple;
select avg_leaf_density, leaf_fragmentation from pgstatindex('bloat_desc');

The final output looks like this:

 avg_leaf_density | leaf_fragmentation
--+
 1.83 |  99.88
(1 row)

Although the case is contrived, it is clearly not okay that this is
possible -- avg_leaf_density should be about 50 here, which is what
you'll see on v11. You'll also see an avg_leaf_density that's at least
50 if you vary any of the details. For example, if you remove "nulls
first" then you'll get an avg_leaf_density of ~50. Or, if you make the
index ASC then the avg_leaf_density is almost exactly 90 for the usual
reason (the NULLs won't "get in the way" of consistently getting
rightmost splits that way). Note that I've deliberately arranged for
the page splits to be as ineffective as possible by almost filling a
leaf page with NULLs, leaving a tiny gap for all future non-NULL
integer insertions.

This is another case where a bimodal distribution causes trouble when
combined with auto-incrementing insertions -- it is slightly similar
to the TPC-C issue that the v12 work fixed IMV. You could say that the
real root of the problem here is one of two things, depending on your
perspective:

1. Arguably, nbtsplitloc.c is already doing the right thing here, and
the root of the issue is that _bt_truncate() lacks any way of
generating a new high key that is "mid way between" the value NULL in
the lastleft tuple and the integer in the firstright tuple during the
first split. If _bt_truncate() created a "mid point value" of around
INT_MAX/2 for the new high key during the first split, then everything
would work out -- we wouldn't keep splitting the same leftmost page
again and again. The behavior would stabilize in the same way as it
does in the ASC + NULLS LAST case, without hurting any other case that
already works well. This isn't an academic point; we might actually
need to do that in order to be able to pack the leaf page 90% full
with DESC insertions, which ought to be a future goal for
nbtsplitloc.c. But that's clearly not in scope for v12.

2. The other way you could look at it (which is likely to be the basis
of my fix for v12) is that nbtsplitloc.c has been fooled into treating
page splits as "many duplicate" splits, when in fact there are not
that many duplicates involved -- there just appears to be many
duplicates because they're so unevenly distributed on the page. It
would be fine for it to be wrong if there was some way that successive
page splits could course correct (see point 1), but that isn't
possible here because of the skew -- we get stuck with the same lousy
choice of split point again and again. (There also wouldn't be a
problem if the integers values were random, since we'd have just one
or two uneven splits at the start.)

I've already written a rough patch that fixes the issue by taking this
second view of the problem. The patch makes nbtsplitloc.c more
skeptical about finishing with the "many duplicates" strategy,
avoiding the problem -- it can just fall back on a 50:50 page split
when it looks like this is happening (the related "single value"
strategy must already so something similar in _bt_strategy()).
Currently, it simply considers if the new item on the page has an
offset number immediately to the right of the split point indicated by
the "many duplicates" strategy. We look for it within ~10 offset
positions to the right, since that strongly suggests that there aren't
that many duplicates after all. I may make the check more careful
still, for example by performing additional comparisons on the page to
make sure that there are in fact very few distinct values on the whole
page.

My draft fix doesn't cause any regressions in any of my test cases --
the fix barely affects the splits chosen for my real-world test data,
and TPC test data. As far as I know, I already have a comprehensive
fix. I will need to think about it much more carefully before
proceeding, though.

Thoughts?
--
Peter Geoghegan




Re: MSVC Build support with visual studio 2019

2019-07-02 Thread Michael Paquier
On Tue, Jul 02, 2019 at 02:10:11PM +0900, Michael Paquier wrote:
> OK, committed to HEAD for now after perltidy'ing the patch.  Let's see
> what the buildfarm has to say about it first.  Once we are sure that
> the thing is stable, I'll try to backpatch it.  This works on my own
> dev machines with VS 2015 and 2019, but who knows what hides in the
> shadows... 

The buildfarm did not have much to say, so backpatched down to 9.4,
adjusting things on the way.
--
Michael


signature.asc
Description: PGP signature


Re: Memory-Bounded Hash Aggregation

2019-07-02 Thread Tomas Vondra

Hi Jeff,

On Mon, Jul 01, 2019 at 12:13:53PM -0700, Jeff Davis wrote:

This is for design review. I have a patch (WIP) for Approach 1, and if
this discussion starts to converge on that approach I will polish and
post it.



Thanks for working on this.


Let's start at the beginning: why do we have two strategies -- hash
and sort -- for aggregating data? The two are more similar than they
first appear. A partitioned hash strategy writes randomly among the
partitions, and later reads the partitions sequentially; a sort will
write sorted runs sequentially, but then read the among the runs
randomly during the merge phase. A hash is a convenient small
representation of the data that is cheaper to operate on; sort uses
abbreviated keys for the same reason.



What does "partitioned hash strategy" do? It's probably explained in one
of the historical discussions, but I'm not sure which one. I assume it
simply hashes the group keys and uses that to partition the data, and then
passing it to hash aggregate.


Hash offers:

* Data is aggregated on-the-fly, effectively "compressing" the amount
 of data that needs to go to disk. This is particularly important
 when the data contains skewed groups (see below).

* Can output some groups after the first pass of the input data even
 if other groups spilled.

* Some data types only support hashing; not sorting.

Sort+Group offers:

* Only one group is accumulating at once, so if the transition state
 grows (like with ARRAY_AGG), it minimizes the memory needed.

* The input may already happen to be sorted.

* Some data types only support sorting; not hashing.

Currently, Hash Aggregation is only chosen if the optimizer believes
that all the groups (and their transition states) fit in
memory. Unfortunately, if the optimizer is wrong (often the case if the
input is not a base table), the hash table will
keep growing beyond work_mem, potentially bringing the entire system
to OOM. This patch fixes that problem by extending the Hash
Aggregation strategy to spill to disk when needed.



OK, makes sense.


Previous discussions:


https://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop

https://www.postgresql.org/message-id/1419326161.24895.13.camel%40jeff-desktop

https://www.postgresql.org/message-id/87be3bd5-6b13-d76e-5618-6db0a4db584d%40iki.fi

A lot was discussed, which I will try to summarize and address here.

Digression: Skewed Groups:

Imagine the input tuples have the following grouping keys:

 0, 1, 0, 2, 0, 3, 0, 4, ..., 0, N-1, 0, N

Group 0 is a skew group because it consists of 50% of all tuples in
the table, whereas every other group has a single tuple. If the
algorithm is able to keep group 0 in memory the whole time until
finalized, that means that it doesn't have to spill any group-0
tuples. In this example, that would amount to a 50% savings, and is a
major advantage of Hash Aggregation versus Sort+Group.



Right. I agree efficiently handling skew is important and may be crucial
for achieving good performance.


High-level approaches:

1. When the in-memory hash table fills, keep existing entries in the
hash table, and spill the raw tuples for all new groups in a
partitioned fashion. When all input tuples are read, finalize groups
in memory and emit. Now that the in-memory hash table is cleared (and
memory context reset), process a spill file the same as the original
input, but this time with a fraction of the group cardinality.

2. When the in-memory hash table fills, partition the hash space, and
evict the groups from all partitions except one by writing out their
partial aggregate states to disk. Any input tuples belonging to an
evicted partition get spilled to disk. When the input is read
entirely, finalize the groups remaining in memory and emit. Now that
the in-memory hash table is cleared, process the next partition by
loading its partial states into the hash table, and then processing
its spilled tuples.

3. Use some kind of hybrid[1][2] of hashing and sorting.



Unfortunately the second link does not work :-(


Evaluation of approaches:

Approach 1 is a nice incremental improvement on today's code. The
final patch may be around 1KLOC. It's a single kind of on-disk data
(spilled tuples), and a single algorithm (hashing). It also handles
skewed groups well because the skewed groups are likely to be
encountered before the hash table fills up the first time, and
therefore will stay in memory.



I'm not going to block Approach 1, althought I'd really like to see
something that helps with array_agg.


Approach 2 is nice because it resembles the approach of Hash Join, and
it can determine whether a tuple should be spilled without a hash
lookup. Unfortunately, those upsides are fairly mild, and it has
significant downsides:

* It doesn't handle skew values well because it's likely to evict
 them.

* If we leave part of the hash table in memory, it's difficult to
 ensure that we will be able to actually use the space freed by
 eviction, be

contrib make check-world fail if data have been modified and there's vpath

2019-07-02 Thread didier
Hi,
because there's destination data
src/makefiles/pgxs.mk line
ln -s $< $@
fails and make clean doesn't remove these links.
ln -sf
is an option but it's not tested in configure
or rm -f

Regards
Didier




Re: Bug: ECPG: Cannot use CREATE AS EXECUTE statemnt

2019-07-02 Thread Michael Meskes
Matsumura-san,

> I noticed that CREATE AS EXECUTE with using clause needs a new
> implementation that all parameters in using clause must be embedded
> into
> expr-list of EXECUTE in text-format as the following because there is
> no interface of protocol for our purpose. 
> It spends more time for implementing. Do you have any advice?
> ...

Unfortunately no, I have no advice. Originally all statements needed
this treatment. :)

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL





Re: [PATCH] memory leak in ecpglib

2019-07-02 Thread Michael Meskes
Hi,

> Memory leaks occur when the ecpg_update_declare_statement() is called
> the second time.
> ...

I'm going to commit this patch HEAD, this way we can see if it works as
advertised. It does not hurt if it gets removed by a rewrite.

Thanks for finding the issue,

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL





Re: Attached partition not considering altered column properties of root partition.

2019-07-02 Thread Amit Langote
Hi Prabhat,

On Tue, Jul 2, 2019 at 5:12 PM Prabhat Sahu
 wrote:
>
> Hi,
>
> In below testcase when I changed the staorage option for root partition, 
> newly attached partition not including the changed staorage option.
> Is this an expected behavior?

Thanks for the report.  This seems like a bug.  Documentation claims
that the child tables inherit column storage options from the parent
table.  That's actually enforced in only some cases.

1. If you create the child table as a child to begin with (that is,
not attach it as child after the fact):

create table parent (a text);
create table child () inherits (parent);
select attrelid::regclass, attname, attstorage from pg_attribute where
attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a';
 attrelid │ attname │ attstorage
──┼─┼
 parent   │ a   │ x
 child│ a   │ x
(2 rows)


2. If you change the parent's column's storage option, child's column
is recursively changed.

alter table parent alter a set storage main;
select attrelid::regclass, attname, attstorage from pg_attribute where
attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a';
 attrelid │ attname │ attstorage
──┼─┼
 parent   │ a   │ m
 child│ a   │ m
(2 rows)

However, we fail to enforce the rule when the child is attached after the fact:

create table child2 (a text);
alter table child2 inherit parent;
select attrelid::regclass, attname, attstorage from pg_attribute where
attrelid in ('parent'::regclass, 'child'::regclass,
'child2'::regclass) and attname = 'a';
 attrelid │ attname │ attstorage
──┼─┼
 parent   │ a   │ m
 child│ a   │ m
 child2   │ a   │ x
(3 rows)

To fix this, MergeAttributesIntoExisting() should check that the
attribute options of a child don't conflict with the parent, which the
attached patch implements.  Note that partitioning uses the same code
as inheritance, so the fix applies to it too.  After the patch:

create table p (a int, b text) partition by list (a);
create table p1 partition of p for values in (1);
select attrelid::regclass, attname, attstorage from pg_attribute where
attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b';
 attrelid │ attname │ attstorage
──┼─┼
 p│ b   │ x
 p1   │ b   │ x
(2 rows)

alter table p alter b set storage main;
select attrelid::regclass, attname, attstorage from pg_attribute where
attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b';
 attrelid │ attname │ attstorage
──┼─┼
 p│ b   │ m
 p1   │ b   │ m
(2 rows)

create table p2 (like p);
select attrelid::regclass, attname, attstorage from pg_attribute where
attrelid in ('p'::regclass, 'p1'::regclass, 'p2'::regclass) and
attname = 'b';
 attrelid │ attname │ attstorage
──┼─┼
 p│ b   │ m
 p1   │ b   │ m
 p2   │ b   │ x
(3 rows)

alter table p attach partition p2 for values in (2);
ERROR:  child table "p2" has different storage option for column "b" than parent
DETAIL:  EXTENDED versus MAIN

-- ok after changing p2 to match
alter table p2 alter b set storage main;
alter table p attach partition p2 for values in (2);

Thanks,
Amit


attstorage-inherit-bug.patch
Description: Binary data


Re: TopoSort() fix

2019-07-02 Thread Rui Hai Jiang
Could the attached patch fix this issue? Or does any one else plan to fix
it?

If people are busy and have not  time, I can go ahead to fix it.  To fix
this issue, do we need a patch for each official branch?


Regards,
Ruihai

On Tue, Jul 2, 2019 at 11:23 PM Tom Lane  wrote:

> Rui Hai Jiang  writes:
> > I think I found an issue in the TopoSort() function.
>
> This indeed seems like a live bug introduced by a1c1af2a.
> Robert?
>
> regards, tom lane
>


Re: progress report for ANALYZE

2019-07-02 Thread Tatsuro Yamada

Hi Alvaro!

On 2019/06/22 3:52, Alvaro Herrera wrote:

Hello

Here's a patch that implements progress reporting for ANALYZE.


Sorry for the late reply.
My email address was changed to tatsuro.yamada...@nttcom.co.jp.

I have a question about your patch.
My ex-colleague Vinayak created same patch in 2017 [1], and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

However, actually, I think it's okay because the feature is useful
for DBAs, even if your patch can't handle Foreign table.

I'll review your patch in this week. :)
 
[1] ANALYZE command progress checker

https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006

Regards,
Tatsuro Yamada




Re: Run-time pruning for ModifyTable

2019-07-02 Thread Amit Langote
Hi David,

On Thu, Jun 27, 2019 at 2:28 PM David Rowley
 wrote:
> Deja vu from this time last year when despite everyone's best efforts
> (mostly Alvaro) we missed getting run-time pruning in for MergeAppend
> into the PG11 release.   This year it was ModifyTable, which is now
> possible thanks to Amit and Tom's modifications to the inheritance
> planner.
>
> I've attached what I have so far for this.

Thanks for working on this.  IIUC, the feature is to skip modifying a
given result relation if run-time pruning dictates that none of its
existing rows will match some dynamically computable quals.

>  I think it's mostly okay,
> but my brain was overheating a bit at the inheritance_planner changes.

I think we need to consider the fact that there is a proposal [1] to
get rid of inheritance_planner() as the way of planning UPDATE/DELETEs
on inheritance trees.  If we go that route, then a given partitioned
target table will be expanded at the bottom and so, there's no need
for ModifyTable to have its own run-time pruning info, because
Append/MergeAppend will have it.  Maybe, we will need some code in
ExecInitModifyTable() and ExecModifyTable() to handle the case where
run-time pruning, during plan tree initialization and plan tree
execution respectively, may have rendered modifying a given result
relation unnecessary.

A cursory look at the patch suggests that most of its changes will be
for nothing if [1] materializes.  What do you think about that?

Thanks,
Amit

[1] https://www.postgresql.org/message-id/357.1550612935%40sss.pgh.pa.us




Re: Another way to fix inherited UPDATE/DELETE

2019-07-02 Thread Amit Langote
Hi Tom,

On Wed, Feb 20, 2019 at 6:49 AM Tom Lane  wrote:
> Obviously this'd be a major rewrite with no chance of making it into v12,
> but it doesn't sound too big to get done during v13.

Are you planning to work on this?

Thanks,
Amit




Re: Where is SSPI auth username determined for TAP tests?

2019-07-02 Thread Michael Paquier
On Sun, Jun 30, 2019 at 12:09:18PM -0400, Tom Lane wrote:
> We could apply the same hack on the source node, but I wonder if it
> wouldn't be better to make this less of a kluge.  I'm tempted to
> propose that "pg_regress --config-auth --user XXX" should understand
> XXX as the bootstrap superuser name, and then we could clean up
> 010_dump_connstr.pl by using that.

I have been reviewing that part, and the part to split the bootstrap
user from the set of extra roles created looks fine to me.  Now, it
seems to me that you can simplify 010_dump_connstr.pl as per the
attached because PostgresNode::Init can take care of --auth-config
part with the correct options using auth_extra.  What do you think
about the cleanup attached?
--
Michael
diff --git a/src/bin/pg_dump/t/010_dump_connstr.pl b/src/bin/pg_dump/t/010_dump_connstr.pl
index 048ecf24eb..e9f0ff8211 100644
--- a/src/bin/pg_dump/t/010_dump_connstr.pl
+++ b/src/bin/pg_dump/t/010_dump_connstr.pl
@@ -177,15 +177,11 @@ my $restore_super = qq{regress_a'b\\c=d\\ne"f};
 # dbname/user connection parameters
 
 my $envar_node = get_new_node('destination_envar');
-$envar_node->init(extra =>
-	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]);
-$envar_node->run_log(
-	[
-		$ENV{PG_REGRESS},  '--config-auth',
-		$envar_node->data_dir, '--user',
-		$dst_bootstrap_super,  '--create-role',
-		$restore_super
-	]);
+$envar_node->init(
+	extra =>
+	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ],
+	auth_extra =>
+	  [ '--user', $dst_bootstrap_super, '--create-role', $restore_super ]);
 $envar_node->start;
 
 # make superuser for restore
@@ -210,15 +206,11 @@ is($stderr, '', 'no dump errors');
 $restore_super =~ s/"//g
   if $TestLib::windows_os;# IPC::Run mishandles '"' on Windows
 my $cmdline_node = get_new_node('destination_cmdline');
-$cmdline_node->init(extra =>
-	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]);
-$cmdline_node->run_log(
-	[
-		$ENV{PG_REGRESS},'--config-auth',
-		$cmdline_node->data_dir, '--user',
-		$dst_bootstrap_super,'--create-role',
-		$restore_super
-	]);
+$cmdline_node->init(
+	extra =>
+	  [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ],
+	auth_extra =>
+	  [ '--user', $src_bootstrap_super, '--create-role', $restore_super ]);
 $cmdline_node->start;
 $cmdline_node->run_log(
 	[ 'createuser', '-U', $dst_bootstrap_super, '-s', $restore_super ]);


signature.asc
Description: PGP signature


Re: [HACKERS] WIP: Aggregation push-down

2019-07-02 Thread Richard Guo
On Fri, Mar 1, 2019 at 12:01 AM Antonin Houska  wrote:

> Tom Lane  wrote:
>
> > Antonin Houska  writes:
> > > Michael Paquier  wrote:
> > >> Latest patch set fails to apply, so moved to next CF, waiting on
> > >> author.
> >
> > > Rebased.
> >
> > This is in need of rebasing again :-(.  I went ahead and pushed the 001
> > part, since that seemed fairly uncontroversial.
>
> ok, thanks.
>


Another rebase is needed for the patches.

Thanks
Richard


Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-07-02 Thread Etsuro Fujita
On Tue, Jul 2, 2019 at 1:47 PM amul sul  wrote:
> Attached version is rebase atop of the latest master head(c74d49d41c), thanks.

Thanks!  Will review.

Best regards,
Etsuro Fujita




Re: POC: converting Lists into arrays

2019-07-02 Thread David Rowley
On Tue, 2 Jul 2019 at 11:27, Tom Lane  wrote:
> My previous patch would have had you replace this with a loop using
> an integer list-position index.  You can still do that if you like,
> but it's less change to convert the loop to a foreach(), drop the
> prev/next variables, and replace the list_delete_cell call with
> foreach_delete_current:
>
> foreach(cell, state->enterKeys)
> {
> TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell);
>
> if (need to delete)
> state->enterKeys = 
> foreach_delete_current(state->enterKeys,
> cell);
> }
>
> So I think this is a win, and attached is v7.

It's pretty nice to get rid of those. I also like you've handled the
changes in SRFs.

I've now read over the entire patch and have noted down the following:

1. MergeAttributes() could do with a comment mention why you're not
using foreach() on the outer loop. I almost missed the
list_delete_nth_cell() call that's a few branches deep in the outer
loop.

2. In expandTupleDesc(), couldn't you just change the following:

int i;
for (i = 0; i < offset; i++)
{
if (aliascell)
aliascell = lnext(eref->colnames, aliascell);
}

to:

aliascell = offset < list_length(eref->colnames) ?
list_nth_cell(eref->colnames, offset) : NULL;

3. Worth Assert(list != NIL); in new_head_cell() and new_tail_cell() ?

4. Do you think it would be a good idea to document that the 'pos' arg
in list_insert_nth and co must be <= list_length(). I know you've
mentioned that in insert_new_cell, but that's static and
list_insert_nth is not. I think it would be better not to have to
chase down comments of static functions to find out how to use an
external function.

5. Why does enlarge_list() return List *? Can't it just return void?
I noticed this after looking at add_new_cell_after() and reading your
cautionary comment and then looking at lappend_cell(). At first, it
seemed that lappend_cell() could end up reallocating List to make way
for the new cell, but from looking at enlarge_list() it seems we
always maintain the original allocation of the header. So why bother
returning List * in that function?

6. Is there a reason to use memmove() in list_concat() rather than
just memcpy()? I don't quite believe the comment you've written. As
far as I can see you're not overwriting any useful memory so the order
of the copy should not matter.

7. The last part of the following comment might not age well.

/*
* Note: unlike the individual-list-cell deletion functions, we never make
* any effort to move the surviving list cells to new storage.  This is
* because none of them can move in this operation, so it's the same as
* the old implementation in terms of what callers may assume.
*/

The old comment about extending the list seems more fitting.

9. I see your XXX comment in list_qsort(), but wouldn't it be better
to just invent list_qsort_internal() as a static function and just
have it qsort the list in-place, then make list_qsort just return
list_qsort_internal(list_copy(list)); and keep the XXX comment so that
the fixup would just remove the list_copy()? That way, when it comes
to fixing that inefficiency we can just cut and paste the internal
implementation into list_qsort(). It'll be much less churn, especially
if you put the internal version directly below the external one.

10. I wonder if we can reduce a bit of pain for extension authors by
back patching a macro that wraps up a lnext() call adding a dummy
argument for the List.  That way they don't have to deal with their
own pre-processor version dependent code. Downsides are we'd need to
keep the macro into the future, however, it's just 1 line of code...


I also did some more benchmarking of the patch. Basically, I patched
with the attached file (converted to .txt not to upset the CFbot) then
ran make installcheck. This was done on an AWS m5d.large instance.
The patch runs the planner 10 times then LOGs the average time of
those 10 runs. Taking the sum of those averages I see:

Master: 0.482479 seconds
Patched: 0.471949 seconds

Which makes the patched version 2.2% faster than master on that run.
I've resisted attaching the spreadsheet since there are almost 22k
data points per run.

Apart from the 10 points above, I think the patch is good to go.

I also agree with keeping the further improvements like getting rid of
the needless list_copy() in list_concat() calls as a followup patch. I
also agree with Tom about moving quickly with this one.  Reviewing it
in detail took me a long time, I'd really rather not do it again after
leaving it to rot for a while.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1bcfdd67e0..750320cb3b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -22,