Re: Postgres 11 release notes

2018-09-29 Thread Amit Langote
On Fri, Sep 28, 2018 at 7:24 PM Bruce Momjian  wrote:
> On Fri, Sep 28, 2018 at 10:21:16AM +0900, Amit Langote wrote:
> > On 2018/09/27 23:24, Alvaro Herrera wrote:
> > > On 2018-Sep-27, Amit Langote wrote:
> > >
> > >> Sorry I couldn't reply sooner, but the following of your proposed text
> > >> needs to be updated a bit:
> > >>
> > >> +   
> > >> +
> > >> + Having a "default" partition for storing data that does not 
> > >> match a
> > >> + partition key
> > >> +
> > >> +   
> > >>
> > >> I think "does not match a partition key" is not accurate.  Description of
> > >> default partitions further below in the release notes says this:
> > >>
> > >> "The default partition can store rows that don't match any of the other
> > >> defined partitions, and is searched accordingly."
> > >>
> > >> So, we could perhaps write it as:
> > >>
> > >> Having a "default" partition for storing data that does not match any of
> > >> the remaining partitions
> > >
> > > Yeah, I agree that "a partition key" is not the right term to use there
> > > (and that term is used in the press release text also).  However I don't
> > > think "remaining" is the right word there either, because it sounds as
> > > if you're removing something.
> > >
> > > For the Spanish translation of the press release, we ended up using the
> > > equivalent of "for the data that does not match any other partition".
> >
> > Yeah, "any other partition" is what the existing description uses too, so:
> >
> > Having a "default" partition for storing data that does not match any
> > other partition
>
> Uh, what text are you talkinga about?  I see this text in the release
> notes since May:
>
> The default partition can store rows that don't match any of the
> other defined partitions, and is searched accordingly.

I was commenting on the Jonathan's patch upthread [1] whereby he
proposed to add a line about the default partition feature in the
major partitioning enhancements section at the top.

The existing text that you'd added when creating the release notes is fine.

> The press release?

Yeah, Alvaro pointed out that the press release has inaccurate text
about the default partition, but that's a separate issue, which I was
unaware of.

Thanks,
Amit

[1] 
https://www.postgresql.org/message-id/dcdbb9e9-cde2-fb78-fdb6-76d672d08dbc%40postgresql.org



Re: Optional message to user when terminating/cancelling backend

2018-09-29 Thread Pavel Stehule
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I tested this patch, and all looks well and functional. I reread a discussion 
and I don't see any unresolved objection against this patch.

There are not warning, crashes, all tests are passed. New behave is documented 
well.

I'll mark this patch as ready for commiters

The new status of this patch is: Ready for Committer


Re: Progress reporting for pg_verify_checksums

2018-09-29 Thread Fabien COELHO




The time() granularity to the second makes the result awkward on small
tests:

8/1554552 kB (0%, 8 kB/s)
1040856/1554552 kB (66%, 1040856 kB/s)
1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.


I still think it would make sense to use that instead of low-precision
time().


As the use-case of this is not for small tests,


Even for a long run, the induce error by the 1 second imprecision on both 
points is significant at the beginning of the scan.



I don't think it is useful to make the code more complicated for this.


I do not think that it would be much more complicated to use the 
portability macros to get a precise time.



The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here.


Hmmm... units are units, and the display is wrong. The fact that other
commands are wrong as well does not look like a good argument to persist in
an error.


I've had a look around, and "kB" seems to be a common unit for 1024
bytes, e.g. in pg_test_fsync etc., unless I am mistaken?


I can only repeat my above comment: the fact that postgres is wrong 
elsewhere is not a good reason to persist in reproducing an error.



See the mother of all knowledge https://en.wikipedia.org/wiki/Kilobyte

 - SI (decimal, 1000): kB, MB, GB, ...
 - IEC (binary 1024): KiB, MiB, GiB, ...
 - JEDEC (binary 1024, used for memory): KB, MB, GB.

Summary:
 - 1 kB = 1000 bytes
 - 1 KB = 1 KiB = 1024 bytes

Decimal is used for storage (HDD, SSD), binary for memory. That is life, 
and IMHO Postgres code is not the place to invent new units.


Moreover, I still think that the progress should use MB defined as 100 
bytes for showing the progress.



I would be okay with a progress printing function shared between some
commands. It just needs some place. pg_rewind also has a --rewind option.


I guess you mean pg_rewind also has a --progress option.


Indeed.


I do agree it makes sense to refactor that, but I don't think this
should be part of this patch.


That's a point. I'd suggest to put the new progress function directly in 
the common part, and other patches could take advantage of it for other 
commands when someone feels like it.


Other comments:

function toggle_progress_report has empty lines after { and before },
but this style is not used elsewhere, I think that they should be removed.

The report_progress API should be ready to be used by another client 
application, which suggest that global variables should be avoided.


Maybe:

  void report_progress(current, total, force)

and the scan start and last times could be a static variable inside the 
function initialized on the first call, which would suggest to call the 
function at the beginning of the scan, probably with current == 0.


Or maybe:

  time_type report_progress(current, total, start, last, force)

Which would return the last time, and the caller has responsability for 
initializing a start time.


--
Fabien.



Re: [HACKERS] kqueue

2018-09-29 Thread Matteo Beccati
On 28/09/2018 14:19, Thomas Munro wrote:
> On Fri, Sep 28, 2018 at 11:09 AM Andres Freund  wrote:
>> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
>>> Matteo Beccati reported a 5-10% performance drop on a
>>> low-end Celeron NetBSD box which we have no explanation for, and we
>>> have no reports from server-class machines on that OS -- so perhaps we
>>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
>>> NetBSD until someone can figure out what needs to be fixed there
>>> (possibly on the NetBSD side)?
>>
>> Yea, I'm not too worried about that. It'd be great to test that, but
>> otherwise I'm also ok to just plonk that into the template.
> 
> Thanks for the review!  Ok, if we don't get a better idea I'll put
> this in src/template/netbsd:
> 
> CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"

A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh
install of NetBSD 8.0 again shows that kqueue is consistently slower
running pgbench vs unpatched master on tcp-b like pgbench workloads:

~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10

while on select only benchmarks the difference is below the noise floor,
with both doing roughly the same ~30k tps.

Out of curiosity, I've installed FreBSD on an identically specced VM,
and the select benchmark was ~75k tps for kqueue vs ~90k tps on
unpatched master, so maybe there's something wrong I'm doing when
benchmarking. Could you please provide proper instructions?


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: doc - improve description of default privileges

2018-09-29 Thread Fabien COELHO




The Owner column is redundant, because it's always all applicable
privileges.  (Having this column would give the impression that it's not
always all privileges, so it would be confusing.)


The reason I put the owner column is to show (1) the privileges that apply 
to the objects (i.e. what is under "ALL") and (2) whether public's 
privileges are the same or not, because there are subtles differences, so 
I think it is interesting to have them side by side somewhere.



Privileges should be listed using their full name (e.g., "SELECT"), not
their internal abbreviation letter.


Hmmm... the psql commands listed in the table output the abbreviated 
letters. Using the words would result in a large table, but maybe it 
could use multiline cells.



The psql commands seem out of place here.  If you want to learn about
how to use psql, you can go to the psql documentation.


The information about how to display the permissions is currently not 
easily available, I had to test/look for it, noticed that it is not 
accessible on some objects, so ISTM that it is useful to have it 
somewhere.


Basically your points suggest that the table is maybe in the wrong place
and could be improved.

About the place, there is no simple choice:

 - backslash commands are "psql" specific
 - abbreviated letters are aclitem function specific, which
   happend to be used by psql.
 - full names are SQL specific (GRANT)
 - default permissions are object specific and can be modified...

Which means that the information tends to be scattered everywhere and 
overall pretty unclear unless you have read all the pages, hence my 
proposal to put some unified summary somewhere with all the relevant 
information. Any choice will have its downside, and removing information 
to better suit one place means that my point of having some kind of 
summary in one place is lost, which is the initial motivation for this 
patch.


--
Fabien.



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-09-29 Thread Michael Paquier
On Fri, Sep 28, 2018 at 07:16:25PM -0400, Stephen Frost wrote:
> An alternative would be to go through the .ready files on crash-recovery
> and remove any .ready files that don't have corresponding WAL files, or
> if we felt that it was necessary, we could do that on every restart but
> do we really think we'd need to do that..?

Actually, what you are proposing here sounds much better to me.  That's
in the area of what has been done recently with RemoveTempXlogFiles() in
5fc1008e.  Any objections to doing something like that? 
--
Michael


signature.asc
Description: PGP signature


Re: Online verification of checksums

2018-09-29 Thread Tomas Vondra
Hi,

On 09/26/2018 05:15 PM, Michael Banck wrote:
> ...
> 
> New version 5 attached.
> 

I've looked at v5, and the retry/recheck logic seems OK to me - I'd
still vote to keep it consistent with what pg_basebackup does (i.e.
doing the LSN check first, before looking at the checksum), but I don't
think it's a bug.

I'm not sure about the other issues brought up (ENOENT, short reads). I
haven't given it much thought.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] proposal: schema variables

2018-09-29 Thread Pavel Stehule
so 22. 9. 2018 v 8:00 odesílatel Pavel Stehule 
napsal:

> Hi
>
> rebased against yesterday changes in tab-complete.c
>

rebased against last changes in master

Regards

Pavel



> Regards
>
> Pavel
>


schema-variables-20180929-01.patch.gz
Description: application/gzip


Re: Online enabling of checksums

2018-09-29 Thread Tomas Vondra
Hi,

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2018-09-29 Thread Tomas Vondra
Hi,

One more thought - when running similar tools on a live system, it's
usually a good idea to limit the impact by throttling the throughput. As
the verification runs in an independent process it can't reuse the
vacuum-like cost limit directly, but perhaps it could do something
similar? Like, limit the number of blocks read/second, or so?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2018-09-29 Thread Michael Paquier
On Sat, Sep 29, 2018 at 10:51:23AM +0200, Tomas Vondra wrote:
> One more thought - when running similar tools on a live system, it's
> usually a good idea to limit the impact by throttling the throughput. As
> the verification runs in an independent process it can't reuse the
> vacuum-like cost limit directly, but perhaps it could do something
> similar? Like, limit the number of blocks read/second, or so?

When it comes to such parameters, not using a number of blocks but
throttling with a value in bytes (kB or MB of course) speaks more to the
user.  The past experience with checkpoint_segments is one example of
that.  Converting that to a number of blocks internally would definitely
make sense the most sense.  +1 for this idea.
--
Michael


signature.asc
Description: PGP signature


Re: Online verification of checksums

2018-09-29 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sat, Sep 29, 2018 at 10:51:23AM +0200, Tomas Vondra wrote:
> > One more thought - when running similar tools on a live system, it's
> > usually a good idea to limit the impact by throttling the throughput. As
> > the verification runs in an independent process it can't reuse the
> > vacuum-like cost limit directly, but perhaps it could do something
> > similar? Like, limit the number of blocks read/second, or so?
> 
> When it comes to such parameters, not using a number of blocks but
> throttling with a value in bytes (kB or MB of course) speaks more to the
> user.  The past experience with checkpoint_segments is one example of
> that.  Converting that to a number of blocks internally would definitely
> make sense the most sense.  +1 for this idea.

While I agree this would be a nice additional feature to have, it seems
like something which could certainly be added later and doesn't
necessairly have to be included in the initial patch.  If Michael has
time to add that, great, if not, I'd rather have this as-is than not.

I do tend to agree with Michael that having the parameter be specified
as (or at least able to accept) a byte-based value is a good idea.  As
another feature idea, having this able to work in parallel across
tablespaces would be nice too.  I can certainly imagine some point where
this is a default process which scans the database at a slow pace across
all the tablespaces more-or-less all the time checking for corruption.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Online enabling of checksums

2018-09-29 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> While looking at the online checksum verification patch (which I guess
> will get committed before this one), it occurred to me that disabling
> checksums may need to be more elaborate, to protect against someone
> using the stale flag value (instead of simply switching to "off"
> assuming that's fine).
> 
> The signals etc. seem good enough for our internal stuff, but what if
> someone uses the flag in a different way? E.g. the online checksum
> verification runs as an independent process (i.e. not a backend) and
> reads the control file to find out if the checksums are enabled or not.
> So if we just switch from "on" to "off" that will break.
> 
> Of course, we may also say "Don't disable checksums while online
> verification is running!" but that's not ideal.

I'm not really sure what else we could say here..?  I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

Thanks!

Stephen


signature.asc
Description: PGP signature


Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread David Hedberg
Hello,

I recently wanted a way to encrypt/decrypt backups while still
utilizing the parallel dump/restore functionality. I couldn't see a
way to do this so I experimented a bit with the directory backup
format. If there's in fact already a way to do this, please tell me
now :-)

The idea is to add a --pipe option to pg_dump / pg_restore where you
can specify a custom shell command that is used to write / read each
.dat-file. Usage examples include encryption with pgp and/or custom
compression pipelines. %p in the command is expanded to the path to
write to / read from. The pipe command is not applied to the toc.

The current version is attached. Could something like this be
acceptable for inclusion?
From 27f6c541be6546edfef62646f514fe1a92042705 Mon Sep 17 00:00:00 2001
From: David Hedberg 
Date: Sat, 29 Sep 2018 12:55:52 +0200
Subject: [PATCH] Add support for --pipe to pg_dump and pg_restore

---
 src/bin/pg_dump/compress_io.c | 97 ---
 src/bin/pg_dump/compress_io.h |  6 +-
 src/bin/pg_dump/pg_backup.h   |  2 +
 src/bin/pg_dump/pg_backup_directory.c | 14 ++--
 src/bin/pg_dump/pg_dump.c | 17 -
 src/bin/pg_dump/pg_restore.c  |  7 ++
 6 files changed, 121 insertions(+), 22 deletions(-)

diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index a96da15dc1..64c06d7eae 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -443,6 +443,9 @@ struct cfp
 static int	hasSuffix(const char *filename, const char *suffix);
 #endif
 
+static void
+expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath);
+
 /* free() without changing errno; useful in several places below */
 static void
 free_keep_errno(void *p)
@@ -464,24 +467,26 @@ free_keep_errno(void *p)
  * On failure, return NULL with an error code in errno.
  */
 cfp *
-cfopen_read(const char *path, const char *mode)
+cfopen_read(const char *path, const char *mode, const char *pipecmd)
 {
 	cfp		   *fp;
 
+	if (pipecmd)
+		fp = cfopen(path, mode, 0, pipecmd);
 #ifdef HAVE_LIBZ
-	if (hasSuffix(path, ".gz"))
-		fp = cfopen(path, mode, 1);
+	else if (hasSuffix(path, ".gz"))
+		fp = cfopen(path, mode, 1, NULL);
 	else
 #endif
 	{
-		fp = cfopen(path, mode, 0);
+		fp = cfopen(path, mode, 0, NULL);
 #ifdef HAVE_LIBZ
 		if (fp == NULL)
 		{
 			char	   *fname;
 
 			fname = psprintf("%s.gz", path);
-			fp = cfopen(fname, mode, 1);
+			fp = cfopen(fname, mode, 1, NULL);
 			free_keep_errno(fname);
 		}
 #endif
@@ -501,19 +506,19 @@ cfopen_read(const char *path, const char *mode)
  * On failure, return NULL with an error code in errno.
  */
 cfp *
-cfopen_write(const char *path, const char *mode, int compression)
+cfopen_write(const char *path, const char *mode, int compression, const char *pipecmd)
 {
 	cfp		   *fp;
 
 	if (compression == 0)
-		fp = cfopen(path, mode, 0);
+		fp = cfopen(path, mode, 0, pipecmd);
 	else
 	{
 #ifdef HAVE_LIBZ
 		char	   *fname;
 
 		fname = psprintf("%s.gz", path);
-		fp = cfopen(fname, mode, compression);
+		fp = cfopen(fname, mode, compression, pipecmd);
 		free_keep_errno(fname);
 #else
 		exit_horribly(modulename, "not built with zlib support\n");
@@ -530,11 +535,32 @@ cfopen_write(const char *path, const char *mode, int compression)
  * On failure, return NULL with an error code in errno.
  */
 cfp *
-cfopen(const char *path, const char *mode, int compression)
+cfopen(const char *path, const char *mode, int compression, const char *pipecmd)
 {
 	cfp		   *fp = pg_malloc(sizeof(cfp));
 
-	if (compression != 0)
+	if (pipecmd)
+	{
+		char cmd[MAXPGPATH];
+		char pmode[2];
+
+		if ( !(mode[0] == 'r' || mode[0] == 'w') ) {
+			exit_horribly(modulename, "Pipe does not support mode %s", mode);
+		}
+		pmode[0] = mode[0];
+		pmode[1] = '\0';
+
+		expand_shell_command(cmd, MAXPGPATH, pipecmd, path);
+
+		fp->compressedfp = NULL;
+		fp->uncompressedfp = popen(cmd, pmode);
+		if (fp->uncompressedfp == NULL)
+		{
+			free_keep_errno(fp);
+			fp->uncompressedfp = NULL;
+		}
+	}
+	else if (compression != 0)
 	{
 #ifdef HAVE_LIBZ
 		if (compression != Z_DEFAULT_COMPRESSION)
@@ -731,5 +757,54 @@ hasSuffix(const char *filename, const char *suffix)
   suffix,
   suffixlen) == 0;
 }
-
 #endif
+
+/*
+ * Expand a shell command
+ *
+ * Replaces %p in cmd with the path in filepath and writes the result to buf.
+ */
+static void
+expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath)
+{
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+
+	dp = buf;
+	endp = buf + bufsize - 1;
+	*endp = '\0';
+
+	for (sp = cmd; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+case 'p':
+	/* %p: absolute path of file */
+	sp++;
+	strlcpy(dp, filepath, endp - dp);
+	dp += strlen(dp);
+	break;
+case '%':
+	/* convert %% to a single % */
+	sp++;
+	if (dp < endp)
+		*dp++ = *sp;
+	break;
+default:
+	/* otherwise treat the %

Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-29 Thread Andrew Dunstan




On 09/29/2018 01:36 AM, Andrew Gierth wrote:

  Mark> What should I try next?

What is the size of a C "int" on this platform?



4.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: speeding up planning with partitions

2018-09-29 Thread Dilip Kumar
On Fri, Sep 14, 2018 at 3:58 PM, Amit Langote
 wrote:
> Thanks a lot for your detailed review.
>
I was going through your patch (v3-0002) and I have some suggestions

1.
- if (nparts > 0)
+ /*
+ * For partitioned tables, we just allocate space for RelOptInfo's.
+ * pointers for all partitions and copy the partition OIDs from the
+ * relcache.  Actual RelOptInfo is built for a partition only if it is
+ * not pruned.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
  rel->part_rels = (RelOptInfo **)
- palloc(sizeof(RelOptInfo *) * nparts);
+ palloc0(sizeof(RelOptInfo *) * rel->nparts);
+ return rel;
+ }

I think we can delay allocating memory for rel->part_rels?  And we can
allocate in add_rel_partitions_to_query only
for those partitions which survive pruning.

2.
add_rel_partitions_to_query
...
+ /* Expand the PlannerInfo arrays to hold new partition objects. */
+ num_added_parts = scan_all_parts ? rel->nparts :
+ bms_num_members(partindexes);
+ new_size = root->simple_rel_array_size + num_added_parts;
+ root->simple_rte_array = (RangeTblEntry **)
+ repalloc(root->simple_rte_array,
+ sizeof(RangeTblEntry *) * new_size);
+ root->simple_rel_array = (RelOptInfo **)
+ repalloc(root->simple_rel_array,
+ sizeof(RelOptInfo *) * new_size);
+ if (root->append_rel_array)
+ root->append_rel_array = (AppendRelInfo **)
+ repalloc(root->append_rel_array,
+ sizeof(AppendRelInfo *) * new_size);
+ else
+ root->append_rel_array = (AppendRelInfo **)
+ palloc0(sizeof(AppendRelInfo *) *
+ new_size);

Instead of re-pallocing for every partitioned relation can't we first
count the total number of surviving partitions and
repalloc at once.

3.
+ /*
+ * And do prunin.  Note that this adds AppendRelInfo's of only the
+ * partitions that are not pruned.
+ */

prunin/pruning

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Postgres 11 release notes

2018-09-29 Thread Bruce Momjian
On Sat, Sep 29, 2018 at 04:28:13PM +0900, Amit Langote wrote:
> > The default partition can store rows that don't match any of the
> > other defined partitions, and is searched accordingly.
> 
> I was commenting on the Jonathan's patch upthread [1] whereby he
> proposed to add a line about the default partition feature in the
> major partitioning enhancements section at the top.
> 
> The existing text that you'd added when creating the release notes is fine.
> 
> > The press release?
> 
> Yeah, Alvaro pointed out that the press release has inaccurate text
> about the default partition, but that's a separate issue, which I was
> unaware of.

Oh, OK, thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread Stephen Frost
Greetings,

* David Hedberg (david.hedb...@gmail.com) wrote:
> I recently wanted a way to encrypt/decrypt backups while still
> utilizing the parallel dump/restore functionality. I couldn't see a
> way to do this so I experimented a bit with the directory backup
> format. If there's in fact already a way to do this, please tell me
> now :-)

Supporting encryption/decryption is certainly a good idea but I'm not
sure that we want to punt like this and expect the user to provide a
shell script or similar to do it.  I would have thought we'd build in
encryption leveraging openssl (and, ideally, other providers, similar to
what we're working to do with SSL) directly.

> The idea is to add a --pipe option to pg_dump / pg_restore where you
> can specify a custom shell command that is used to write / read each
> .dat-file. Usage examples include encryption with pgp and/or custom
> compression pipelines. %p in the command is expanded to the path to
> write to / read from. The pipe command is not applied to the toc.

I would certainly think that we'd want to have support for custom format
dumps too..

> The current version is attached. Could something like this be
> acceptable for inclusion?

At least for my 2c, I'm not completely against it, but I'd much rather
see us providing encryption directly and for all of the formats we
support, doing intelligent things like encrypting the TOC for a custom
format dump independently so we can still support fast restore of
individual objects and such.  I'm also not entirely sure about how well
this proposed approach would work on Windows..

Thanks!

Stephen


signature.asc
Description: PGP signature


Cygwin linking rules

2018-09-29 Thread Tom Lane
Most of the buildfarm is now happy with the changes I made to have
libpq + ecpg get src/port and src/common files via libraries ...
but lorikeet isn't.  It gets through the core regression tests fine
(so libpq, per se, works), but contrib/dblink fails:

! ERROR:  could not establish connection
! DETAIL:  libpq is incorrectly linked to backend functions

What this means is that libpq is calling the backend version of
src/common/link-canary.c rather than the frontend version.
Why would it do the right thing normally and the wrong thing in dblink?

I can think of a few theories but I lack the ability to investigate:

1. Maybe the dblink.dll build is pulling in libpq.a rather than
establishing a reference to libpq.dll.  If so, the wrong things would
happen because libpq.a won't contain the src/common/ files that
libpq needs.  (It seems like libpq.a is an active hazard given
this.  Why are we building it at all?)

2. Maybe we need a --version-script option or something equivalent
to get libpq.dll's references to be preferentially resolved internally
rather than to the backend.  But this doesn't really explain why it
worked properly before.

regards, tom lane



Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread Tom Lane
Stephen Frost  writes:
> * David Hedberg (david.hedb...@gmail.com) wrote:
>> The idea is to add a --pipe option to pg_dump / pg_restore where you
>> can specify a custom shell command that is used to write / read each
>> .dat-file. Usage examples include encryption with pgp and/or custom
>> compression pipelines. %p in the command is expanded to the path to
>> write to / read from. The pipe command is not applied to the toc.

> I would certainly think that we'd want to have support for custom format
> dumps too..

This seems like rather a kluge :-(.  In the context of encrypted dumps
in particular, I see no really safe way to pass an encryption key down
to the custom command --- either you put it in the command line to be
exec'd, or you put it in the process environment, and neither of those
are secure on all platforms.

The assumption that the TOC doesn't need encryption seems pretty
shaky as well.

So I think we'd be better off proceeding as Stephen envisions.
Maybe there are use-cases for the sort of thing David is proposing,
but I don't think encrypted dumps present a good argument for it.

regards, tom lane



Re: Online verification of checksums

2018-09-29 Thread Tomas Vondra



On 09/29/2018 02:14 PM, Stephen Frost wrote:
> Greetings,
> 
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> On Sat, Sep 29, 2018 at 10:51:23AM +0200, Tomas Vondra wrote:
>>> One more thought - when running similar tools on a live system, it's
>>> usually a good idea to limit the impact by throttling the throughput. As
>>> the verification runs in an independent process it can't reuse the
>>> vacuum-like cost limit directly, but perhaps it could do something
>>> similar? Like, limit the number of blocks read/second, or so?
>>
>> When it comes to such parameters, not using a number of blocks but
>> throttling with a value in bytes (kB or MB of course) speaks more to the
>> user.  The past experience with checkpoint_segments is one example of
>> that.  Converting that to a number of blocks internally would definitely
>> make sense the most sense.  +1 for this idea.
> 
> While I agree this would be a nice additional feature to have, it seems
> like something which could certainly be added later and doesn't
> necessairly have to be included in the initial patch.  If Michael has
> time to add that, great, if not, I'd rather have this as-is than not.
> 

True, although I don't think it'd be particularly difficult.

> I do tend to agree with Michael that having the parameter be specified
> as (or at least able to accept) a byte-based value is a good idea.

Sure, I was not really expecting it to be exposed as raw block count. I
agree it should be in byte-based values (i.e. just like --max-rate in
pg_basebackup).

> As another feature idea, having this able to work in parallel across
> tablespaces would be nice too.  I can certainly imagine some point where
> this is a default process which scans the database at a slow pace across
> all the tablespaces more-or-less all the time checking for corruption.
> 

Maybe, but that's certainly a non-trivial feature.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread David Fetter
On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
> Stephen Frost  writes:
> > * David Hedberg (david.hedb...@gmail.com) wrote:
> >> The idea is to add a --pipe option to pg_dump / pg_restore where
> >> you can specify a custom shell command that is used to write /
> >> read each .dat-file. Usage examples include encryption with pgp
> >> and/or custom compression pipelines. %p in the command is
> >> expanded to the path to write to / read from. The pipe command is
> >> not applied to the toc.
> 
> > I would certainly think that we'd want to have support for custom
> > format dumps too..
> 
> This seems like rather a kluge :-(.  In the context of encrypted
> dumps in particular, I see no really safe way to pass an encryption
> key down to the custom command --- either you put it in the command
> line to be exec'd, or you put it in the process environment, and
> neither of those are secure on all platforms.

As I understand it, those are the options for providing secrets in
general. At least in the case of encryption, one good solution would
be to use an asymmetric encryption scheme, i.e. one where encrypting
doesn't expose a secret in any way.

As to decryption, that's generally done with more caution in
environments where things are being routinely encrypted in the first
place.

> The assumption that the TOC doesn't need encryption seems pretty
> shaky as well.

That it does.

> So I think we'd be better off proceeding as Stephen envisions.
> Maybe there are use-cases for the sort of thing David is proposing,
> but I don't think encrypted dumps present a good argument for it.

Dumping over a network seems like a reasonable use case for this. I
know that we have remote ways to do this, but in some
environments--think FedRAMP, or similar compliance regime--setting up
a remote access to do the dump can cause extra headaches. Being able
to encrypt them in the process would be helpful in situations I've
seen in the past week.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Online enabling of checksums

2018-09-29 Thread Tomas Vondra
On 09/29/2018 02:19 PM, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
>> While looking at the online checksum verification patch (which I guess
>> will get committed before this one), it occurred to me that disabling
>> checksums may need to be more elaborate, to protect against someone
>> using the stale flag value (instead of simply switching to "off"
>> assuming that's fine).
>>
>> The signals etc. seem good enough for our internal stuff, but what if
>> someone uses the flag in a different way? E.g. the online checksum
>> verification runs as an independent process (i.e. not a backend) and
>> reads the control file to find out if the checksums are enabled or not.
>> So if we just switch from "on" to "off" that will break.
>>
>> Of course, we may also say "Don't disable checksums while online
>> verification is running!" but that's not ideal.
> 
> I'm not really sure what else we could say here..?  I don't particularly
> see an issue with telling people that if they disable checksums while
> they're running a tool that's checking the checksums that they're going
> to get odd results.
> 

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint. Soi
if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Cygwin linking rules

2018-09-29 Thread Andrew Dunstan




On 09/29/2018 11:35 AM, Tom Lane wrote:

Most of the buildfarm is now happy with the changes I made to have
libpq + ecpg get src/port and src/common files via libraries ...
but lorikeet isn't.  It gets through the core regression tests fine
(so libpq, per se, works), but contrib/dblink fails:

! ERROR:  could not establish connection
! DETAIL:  libpq is incorrectly linked to backend functions

What this means is that libpq is calling the backend version of
src/common/link-canary.c rather than the frontend version.
Why would it do the right thing normally and the wrong thing in dblink?

I can think of a few theories but I lack the ability to investigate:

1. Maybe the dblink.dll build is pulling in libpq.a rather than
establishing a reference to libpq.dll.  If so, the wrong things would
happen because libpq.a won't contain the src/common/ files that
libpq needs.  (It seems like libpq.a is an active hazard given
this.  Why are we building it at all?)

2. Maybe we need a --version-script option or something equivalent
to get libpq.dll's references to be preferentially resolved internally
rather than to the backend.  But this doesn't really explain why it
worked properly before.






I will see if I can determine if 1) is the cause. I don't know enough, 
or in fact anything, about 2), so don;t know that I can help there 
without advice.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread David Hedberg
On Sat, Sep 29, 2018 at 5:56 PM, David Fetter  wrote:
> On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
>> Stephen Frost  writes:
>> > * David Hedberg (david.hedb...@gmail.com) wrote:
>> >> The idea is to add a --pipe option to pg_dump / pg_restore where
>> >> you can specify a custom shell command that is used to write /
>> >> read each .dat-file. Usage examples include encryption with pgp
>> >> and/or custom compression pipelines. %p in the command is
>> >> expanded to the path to write to / read from. The pipe command is
>> >> not applied to the toc.
>>
>> > I would certainly think that we'd want to have support for custom
>> > format dumps too..
>>
>> This seems like rather a kluge :-(.  In the context of encrypted
>> dumps in particular, I see no really safe way to pass an encryption
>> key down to the custom command --- either you put it in the command
>> line to be exec'd, or you put it in the process environment, and
>> neither of those are secure on all platforms.
>
> As I understand it, those are the options for providing secrets in
> general. At least in the case of encryption, one good solution would
> be to use an asymmetric encryption scheme, i.e. one where encrypting
> doesn't expose a secret in any way.
>
> As to decryption, that's generally done with more caution in
> environments where things are being routinely encrypted in the first
> place.
>

Yes; in my specific case the idea is to use public key encryption with
gpg. In that scenario the secret does not need to be on the server at
all.

>> The assumption that the TOC doesn't need encryption seems pretty
>> shaky as well.
>
> That it does.
>

I don't think there's any inherent reason it can't be applied to the
TOC as well. It's mostly an accident of me following the the existing
compression code.

On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost  wrote:
> At least for my 2c, I'm not completely against it, but I'd much rather
> see us providing encryption directly and for all of the formats we
> support, doing intelligent things like encrypting the TOC for a custom
> format dump independently so we can still support fast restore of
> individual objects and such.  I'm also not entirely sure about how well
> this proposed approach would work on Windows..

I haven't tested it in windows, but I did see that there's already a
popen function in src/port/system.c so my guess was going to be that
it can work..

Generally, my thinking is that this can be pretty useful in general
besides encryption. For other formats the dumps can already be written
to standard output and piped through for example gpg or a custom
compression application of the administrators choice, so in a sense
this functionality would merely add the same feature to the directory
format.

My main wish here is to be able combine a parallel dump/restore with
encryption without having to first write the dump encrypted and then
loop over and rewrite the files encrypted in an extra step. This can
surely be quite a large win as the size of the dumps grow larger..

/ David



Re: Implementing SQL ASSERTION

2018-09-29 Thread Joe Wildish
On 26 Sep 2018, at 12:36, Peter Eisentraut  
wrote:
> 
> On 25/09/2018 01:04, Joe Wildish wrote:
>> Having said all that: there are obviously going to be some expressions
>> that cannot be proven to have no potential for invalidating the assertion
>> truth. I guess this is the prime concern from a concurrency PoV?
> 
> Before we spend more time on this, I think we need to have at least a
> plan for that.  

Having thought about this some more: the answer could lie in using predicate
locks, and enforcing that the transaction be SERIALIZABLE whenever an ASSERTION
is triggered.

To make use of the predicate locks we'd do a transformation on the ASSERTION
expression. I believe that there is derivation, similar to the one mentioned
up-thread re: "managers and administrators", that would essentially push
predicates into the expression on the basis of the changed data. The semantics
of the expression would remain unchanged, but it would mean that when the
expression is rechecked, the minimal set of data is read and would therefore not
conflict with other DML statements that had triggered the same ASSERTION but had
modified unrelated data. Example:

CREATE TABLE t
 (n INTEGER NOT NULL,
  m INTEGER NOT NULL,
  k INTEGER NOT NULL,
 PRIMARY KEY (n, m));

CREATE ASSERTION sum_k_at_most_10 CHECK
  (NOT EXISTS
(SELECT * FROM
  (SELECT n, sum(k)
 FROM t
GROUP BY n)
 AS r(n, ks)
  WHERE ks > 10));

On an INSERT/DELETE/UPDATE of "t", we would transform the inner-most expression
of the ASSERTION to have a predicate of "WHERE n = NEW.n". In my experiments I
can see that doing so allows concurrent transactions to COMMIT that have
modified unrelated segments of "t" (assuming the planner uses Index Scan). The
efficacy of this would be dictated by the granularity of the SIREAD locks; my
understanding is that this can be as low as tuple-level in the case where Index
Scans are used (and this is borne out in my experiments - ie. you don't want a
SeqScan).

> Perhaps we could should disallow cases that we can't
> handle otherwise.  But even that would need some analysis of which
> practical cases we can and cannot handle, how we could extend support in
> the future, etc.


The optimisation I mentioned up-thread, plus the one hypothesised here, both
rely on being able to derive the key of an expression from the underlying base
tables/other expressions. We could perhaps disallow ASSERTIONS that don't have
such properties?

Beyond that I think it starts to get difficult (impossible?) to know which
expressions are likely to be costly on the basis of static analysis. It could be
legitimate to have an ASSERTION defined over what turns out to be a small subset
of a very large table, for example.

-Joe






Re: Implementing SQL ASSERTION

2018-09-29 Thread Joe Wildish
Hi Andrew,

On 25 Sep 2018, at 01:51, Andrew Gierth  wrote:
> I haven't looked at the background of this, but if what you want to know
> is whether the aggregate function has the semantics of min() or max()
> (and if so, which) then the place to look is pg_aggregate.aggsortop.

Thanks for the pointer. I've had a quick look at pg_aggregate, and back
at my code, but I think there is more to it than just the sorting property.
Specifically we need to know about the aggregate function when combined with
connectors <, <=, < ANY, <= ANY, < ALL and <= ALL (and their equivalents
with ">" and ">="). Also, it looks like COUNT and SUM don't have a sortop
(the other aggregates I've catered for do though).

When I come to do the rework of the patch I'll take a more in-depth look
though, and see if this can be utilised.

> As for operators, you can only make assumptions about their meaning if
> the operator is a member of some opfamily that assigns it some
> semantics. 

I had clocked the BT semantics stuff when doing the PoC patch. I have used
the "get_op_btree_interpretation" function for determining operator meaning.

-Joe




Re: Online enabling of checksums

2018-09-29 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> >> While looking at the online checksum verification patch (which I guess
> >> will get committed before this one), it occurred to me that disabling
> >> checksums may need to be more elaborate, to protect against someone
> >> using the stale flag value (instead of simply switching to "off"
> >> assuming that's fine).
> >>
> >> The signals etc. seem good enough for our internal stuff, but what if
> >> someone uses the flag in a different way? E.g. the online checksum
> >> verification runs as an independent process (i.e. not a backend) and
> >> reads the control file to find out if the checksums are enabled or not.
> >> So if we just switch from "on" to "off" that will break.
> >>
> >> Of course, we may also say "Don't disable checksums while online
> >> verification is running!" but that's not ideal.
> > 
> > I'm not really sure what else we could say here..?  I don't particularly
> > see an issue with telling people that if they disable checksums while
> > they're running a tool that's checking the checksums that they're going
> > to get odd results.
> 
> I don't know, to be honest. I was merely looking at the online
> verification patch and realized that if someone disables checksums it
> won't notice it (because it only reads the flag once, at the very
> beginning) and will likely produce bogus errors.
> 
> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> fix it. The checkpoint LSN is read from the same controlfile as the
> flag, so we know the checksums were enabled during that checkpoint. Soi
> if we ignore failures with a newer LSN, that should do the trick, no?
> 
> So perhaps that's the right "protocol" to handle this?

I certainly don't think we need to do anything more.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread Stephen Frost
Greetings,

* David Hedberg (david.hedb...@gmail.com) wrote:
> On Sat, Sep 29, 2018 at 5:56 PM, David Fetter  wrote:
> > On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
> > As I understand it, those are the options for providing secrets in
> > general. At least in the case of encryption, one good solution would
> > be to use an asymmetric encryption scheme, i.e. one where encrypting
> > doesn't expose a secret in any way.
> >
> > As to decryption, that's generally done with more caution in
> > environments where things are being routinely encrypted in the first
> > place.
> 
> Yes; in my specific case the idea is to use public key encryption with
> gpg. In that scenario the secret does not need to be on the server at
> all.

Using public key encryption doesn't mean you get to entirely avoid the
question around how to handle secrets- you'll presumably want to
actually restore the dump at some point.

> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost  wrote:
> > At least for my 2c, I'm not completely against it, but I'd much rather
> > see us providing encryption directly and for all of the formats we
> > support, doing intelligent things like encrypting the TOC for a custom
> > format dump independently so we can still support fast restore of
> > individual objects and such.  I'm also not entirely sure about how well
> > this proposed approach would work on Windows..
> 
> I haven't tested it in windows, but I did see that there's already a
> popen function in src/port/system.c so my guess was going to be that
> it can work..

Perhaps, though these things tend to be trickier on Windows, at least
from what I've seen (I'm no Windows dev myself tho, to be clear).

> Generally, my thinking is that this can be pretty useful in general
> besides encryption. For other formats the dumps can already be written
> to standard output and piped through for example gpg or a custom
> compression application of the administrators choice, so in a sense
> this functionality would merely add the same feature to the directory
> format.

That's certainly not the same though.  One of the great advantages of
custom and directory format dumps is the TOC and the ability to
selectively extract data from them without having to read the entire
dump file.  You end up losing that if you have to pass the entire dump
through something else because you're using the pipe.

> My main wish here is to be able combine a parallel dump/restore with
> encryption without having to first write the dump encrypted and then
> loop over and rewrite the files encrypted in an extra step. This can
> surely be quite a large win as the size of the dumps grow larger..

That's great, and I think we agree that it'd be a very nice feature for
pg_dump/restore to support encryption, but done intelligently, across
the formats that pg_dump supports, with a secure way to pass the
secrets.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Cygwin linking rules

2018-09-29 Thread Andrew Dunstan




On 09/29/2018 12:09 PM, Andrew Dunstan wrote:



On 09/29/2018 11:35 AM, Tom Lane wrote:

Most of the buildfarm is now happy with the changes I made to have
libpq + ecpg get src/port and src/common files via libraries ...
but lorikeet isn't.  It gets through the core regression tests fine
(so libpq, per se, works), but contrib/dblink fails:

! ERROR:  could not establish connection
! DETAIL:  libpq is incorrectly linked to backend functions

What this means is that libpq is calling the backend version of
src/common/link-canary.c rather than the frontend version.
Why would it do the right thing normally and the wrong thing in dblink?

I can think of a few theories but I lack the ability to investigate:

1. Maybe the dblink.dll build is pulling in libpq.a rather than
establishing a reference to libpq.dll.  If so, the wrong things would
happen because libpq.a won't contain the src/common/ files that
libpq needs.  (It seems like libpq.a is an active hazard given
this.  Why are we building it at all?)

2. Maybe we need a --version-script option or something equivalent
to get libpq.dll's references to be preferentially resolved internally
rather than to the backend.  But this doesn't really explain why it
worked properly before.






I will see if I can determine if 1) is the cause. I don't know enough, 
or in fact anything, about 2), so don;t know that I can help there 
without advice.







It certainly looks like it's not linked to libpq.dll:

   Microsoft (R) COFF/PE Dumper Version 14.15.26726.0
   Copyright (C) Microsoft Corporation.  All rights reserved.


   Dump of file
   \cygwin64\home\andrew\\bf64\root\HEAD\inst\lib\postgresql\dblink.dll

   File Type: DLL

  Image has the following dependencies:

    postgres.exe
    cygcrypto-1.0.0.dll
    cygwin1.dll
    cygssl-1.0.0.dll
    KERNEL32.dll


I'll build an earlier version to do a comparison just to make sure we're 
seeing the right thing.



cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Implementing SQL ASSERTION

2018-09-29 Thread Joe Wildish
Hi David,

> On 26 Sep 2018, at 19:47, David Fetter  wrote:
> 
>> Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)".
> 
> So would DELETE(t), assuming n can be negative.

Oops, right you are. Bug in my implementation :-) 

> Is there some interesting and fairly easily documented subset of
> ASSERTIONs that wouldn't have the "can't prove" property?

We can certainly know at the time the ASSERTION is created if we
can use the transition table optimisation, as that relies upon
the expression being written in such a way that a key can be
derived for each expression.

We could warn or disallow the creation on that basis. Ceri & Widom
mention this actually in their papers, and their view is that most
real-world use cases do indeed allow themselves to be optimised
using the transition tables.

-Joe





Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread David Hedberg
Hi,

On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost  wrote:
> Greetings,
>
> * David Hedberg (david.hedb...@gmail.com) wrote:
>> On Sat, Sep 29, 2018 at 5:56 PM, David Fetter  wrote:
>> > On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote:
>> > As I understand it, those are the options for providing secrets in
>> > general. At least in the case of encryption, one good solution would
>> > be to use an asymmetric encryption scheme, i.e. one where encrypting
>> > doesn't expose a secret in any way.
>> >
>> > As to decryption, that's generally done with more caution in
>> > environments where things are being routinely encrypted in the first
>> > place.
>>
>> Yes; in my specific case the idea is to use public key encryption with
>> gpg. In that scenario the secret does not need to be on the server at
>> all.
>
> Using public key encryption doesn't mean you get to entirely avoid the
> question around how to handle secrets- you'll presumably want to
> actually restore the dump at some point.
>

You are right of course. But I don't see how it's more difficult to
pass the secret to the piped commands than it is to pass it to
postgres.

You wouldn't want to pass the secrets as options to the commands of
course. In the case of gpg you would probably let gpg store and handle
them, which seems to me about the same as letting postgres store them.

>> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost  wrote:
>> Generally, my thinking is that this can be pretty useful in general
>> besides encryption. For other formats the dumps can already be written
>> to standard output and piped through for example gpg or a custom
>> compression application of the administrators choice, so in a sense
>> this functionality would merely add the same feature to the directory
>> format.
>
> That's certainly not the same though.  One of the great advantages of
> custom and directory format dumps is the TOC and the ability to
> selectively extract data from them without having to read the entire
> dump file.  You end up losing that if you have to pass the entire dump
> through something else because you're using the pipe.
>

I can maybe see the problem here, but I apologize if I'm missing the point.

Since all the files are individually passed through separate instances
of the pipe, they can also be individually restored. I guess the
--list option could be (adopted to be) used to produce a clear text
TOC to further use in selective decryption of the rest of the archive?
Possibly combined with an option to not apply the pipeline commands to
the TOC during dump and/or restore, if there's any need for that.

I do think that I understand the advantages of having a TOC that
describes the exact format of the dump and how to restore it, and I am
in no way arguing against having encryption included natively in the
format as a default option.

But I think the pipe option, or one like it, could be used to easily
extend the format. Easily supporting a different compression
algorithm, a different encryption method or even a different storage
method like uploading the files directly to a bucket in S3. In this
way I think that it's similar to be able to write the other formats to
stdout; there are probably many different usages of it out there,
including custom compression or encryption.

If this is simply outside the scope of the directory or the custom
format, that is certainly understandable (and, to me, somewhat
regrettable :-) ).


Thank you the answers,
David



Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread Andres Freund
Hi,

On 2018-09-29 14:51:33 +0200, David Hedberg wrote:
> I recently wanted a way to encrypt/decrypt backups while still
> utilizing the parallel dump/restore functionality. I couldn't see a
> way to do this so I experimented a bit with the directory backup
> format. If there's in fact already a way to do this, please tell me
> now :-)
> 
> The idea is to add a --pipe option to pg_dump / pg_restore where you
> can specify a custom shell command that is used to write / read each
> .dat-file. Usage examples include encryption with pgp and/or custom
> compression pipelines. %p in the command is expanded to the path to
> write to / read from. The pipe command is not applied to the toc.
> 
> The current version is attached. Could something like this be
> acceptable for inclusion?

Isn't that a bit unsatisfying because information about the tables and
their sizes leaks?

My suspicion is, and was, that we're probably at some point are going to
want a format that supports the features the directory format does,
without requiring to write to multiple files...

- Andres



Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread Stephen Frost
Greetings,

* David Hedberg (david.hedb...@gmail.com) wrote:
> On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost  wrote:
> > * David Hedberg (david.hedb...@gmail.com) wrote:
> >> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost  wrote:
> >> Generally, my thinking is that this can be pretty useful in general
> >> besides encryption. For other formats the dumps can already be written
> >> to standard output and piped through for example gpg or a custom
> >> compression application of the administrators choice, so in a sense
> >> this functionality would merely add the same feature to the directory
> >> format.
> >
> > That's certainly not the same though.  One of the great advantages of
> > custom and directory format dumps is the TOC and the ability to
> > selectively extract data from them without having to read the entire
> > dump file.  You end up losing that if you have to pass the entire dump
> > through something else because you're using the pipe.
> 
> I can maybe see the problem here, but I apologize if I'm missing the point.
> 
> Since all the files are individually passed through separate instances
> of the pipe, they can also be individually restored. I guess the
> --list option could be (adopted to be) used to produce a clear text
> TOC to further use in selective decryption of the rest of the archive?

This can work for directory format, but it wouldn't work for custom
format.  For a custom format dump, we'd need a way to encrypt the TOC
independently of the rest, and we might even want to have the TOC
include individual keys for the different objects or similar.

> Possibly combined with an option to not apply the pipeline commands to
> the TOC during dump and/or restore, if there's any need for that.

That certainly doesn't seem to make things simpler or to be a very good
interface.

> But I think the pipe option, or one like it, could be used to easily
> extend the format. Easily supporting a different compression
> algorithm, a different encryption method or even a different storage
> method like uploading the files directly to a bucket in S3. In this
> way I think that it's similar to be able to write the other formats to
> stdout; there are probably many different usages of it out there,
> including custom compression or encryption.

Considering the difficulty in doing selective restores (one of the
primary reasons for doing a logical dump at all, imv) from a dump file
that has to be completely decrypted or decompressed (due to using a
custom compression method), I don't know that I really buy off on this
argument that it's very commonly done or that it's a particularly good
interface to use.

> If this is simply outside the scope of the directory or the custom
> format, that is certainly understandable (and, to me, somewhat
> regrettable :-) ).

What I think isn't getting through is that while this is an interesting
approach, it really isn't a terribly good one, regardless of how
flexible you view it to be.  The way to move this forward seems pretty
clearly to work on adding generalized encryption support to
pg_dump/restore that doesn't depend on calling external programs
underneath of the directory format with a pipe.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Cygwin linking rules

2018-09-29 Thread Marco Atzeri

Am 29.09.2018 um 19:03 schrieb Andrew Dunstan:



On 09/29/2018 12:09 PM, Andrew Dunstan wrote:



On 09/29/2018 11:35 AM, Tom Lane wrote:

Most of the buildfarm is now happy with the changes I made to have
libpq + ecpg get src/port and src/common files via libraries ...
but lorikeet isn't.  It gets through the core regression tests fine
(so libpq, per se, works), but contrib/dblink fails:

! ERROR:  could not establish connection
! DETAIL:  libpq is incorrectly linked to backend functions

What this means is that libpq is calling the backend version of
src/common/link-canary.c rather than the frontend version.
Why would it do the right thing normally and the wrong thing in dblink?

I can think of a few theories but I lack the ability to investigate:

1. Maybe the dblink.dll build is pulling in libpq.a rather than
establishing a reference to libpq.dll.  If so, the wrong things would
happen because libpq.a won't contain the src/common/ files that
libpq needs.  (It seems like libpq.a is an active hazard given
this.  Why are we building it at all?)

2. Maybe we need a --version-script option or something equivalent
to get libpq.dll's references to be preferentially resolved internally
rather than to the backend.  But this doesn't really explain why it
worked properly before.






I will see if I can determine if 1) is the cause. I don't know enough, 
or in fact anything, about 2), so don;t know that I can help there 
without advice.







It certainly looks like it's not linked to libpq.dll:

    Microsoft (R) COFF/PE Dumper Version 14.15.26726.0
    Copyright (C) Microsoft Corporation.  All rights reserved.


    Dump of file
    \cygwin64\home\andrew\\bf64\root\HEAD\inst\lib\postgresql\dblink.dll

    File Type: DLL

       Image has the following dependencies:

         postgres.exe
         cygcrypto-1.0.0.dll
         cygwin1.dll
         cygssl-1.0.0.dll
         KERNEL32.dll


I'll build an earlier version to do a comparison just to make sure we're 
seeing the right thing.



cheers

andrew


building from git and using the attached patch that is used for all 
cygwin packages on latest cygwin


$ uname -svrm
CYGWIN_NT-10.0 2.11.1(0.329/5/3) 2018-09-05 10:24 x86_64

I do not see the problem

== creating database "contrib_regression" ==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==
test paths... ok
test dblink   ... ok
== shutting down postmaster   ==


 $ objdump -x usr/lib/postgresql/dblink.dll |grep "DLL Name:"
DLL Name: postgres.exe
DLL Name: cygpq-5.dll
DLL Name: cygwin1.dll
DLL Name: KERNEL32.dll


I am wondering if I am testing the same
---
$ git log |head
commit 8bddc864000f56d396621d4ad0f13e8e1872ddf5
Author: Stephen Frost 
Date:   Fri Sep 28 19:04:50 2018 -0400

Add application_name to connection authorized msg

The connection authorized message has quite a bit of useful information
in it, but didn't include the application_name (when provided), so 
let's

add that as it can be very useful.
---



---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
--- origsrc/postgresql-9.4.2/src/Makefile.shlib 2015-05-20 00:33:58.0 
+0200
+++ src/Makefile.shlib  2015-05-27 23:01:09.379468300 +0200
@@ -267,7 +267,7 @@ endif
 ifeq ($(PORTNAME), cygwin)
   LINK.shared  = $(CC) -shared
   ifdef SO_MAJOR_VERSION
-shlib  = cyg$(NAME)$(DLSUFFIX)
+shlib  = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX)
   endif
   haslibarule   = yes
 endif
@@ -359,12 +359,9 @@ ifeq ($(PORTNAME), cygwin)
 # Cygwin case
 
 $(shlib): $(OBJS) | $(SHLIB_PREREQS)
-   $(CC) $(CFLAGS)  -shared -o $@  $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) 
$(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE)
+   $(CC) $(CFLAGS)  -shared -o $@ -Wl,--out-implib=$(stlib) $(OBJS) 
$(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE)
 
-$(stlib): $(OBJS) | $(SHLIB_PREREQS)
-   rm -f $@
-   $(LINK.static) $@ $^
-   $(RANLIB) $@
+$(stlib): $(shlib) ;
 
 else
 
--- origsrc/postgresql-9.4.2/src/interfaces/libpq/Makefile  2015-05-20 
00:33:58.0 +0200
+++ src/interfaces/libpq/Makefile   2015-05-27 22:56:43.193200600 +0200
@@ -45,7 +45,7 @@ OBJS += ip.o md5.o
 OBJS += encnames.o wchar.o
 
 ifeq ($(PORTNAME), cygwin)
-override shlib = cyg$(NAME)$(DLSUFFIX)
+override shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX)
 endif
 
 ifeq ($(PORTNAME), win32)


Re: Cygwin linking rules

2018-09-29 Thread Andrew Dunstan




On 09/29/2018 01:03 PM, Andrew Dunstan wrote:



On 09/29/2018 12:09 PM, Andrew Dunstan wrote:



On 09/29/2018 11:35 AM, Tom Lane wrote:

Most of the buildfarm is now happy with the changes I made to have
libpq + ecpg get src/port and src/common files via libraries ...
but lorikeet isn't.  It gets through the core regression tests fine
(so libpq, per se, works), but contrib/dblink fails:

! ERROR:  could not establish connection
! DETAIL:  libpq is incorrectly linked to backend functions

What this means is that libpq is calling the backend version of
src/common/link-canary.c rather than the frontend version.
Why would it do the right thing normally and the wrong thing in dblink?

I can think of a few theories but I lack the ability to investigate:

1. Maybe the dblink.dll build is pulling in libpq.a rather than
establishing a reference to libpq.dll.  If so, the wrong things would
happen because libpq.a won't contain the src/common/ files that
libpq needs.  (It seems like libpq.a is an active hazard given
this.  Why are we building it at all?)

2. Maybe we need a --version-script option or something equivalent
to get libpq.dll's references to be preferentially resolved internally
rather than to the backend.  But this doesn't really explain why it
worked properly before.






I will see if I can determine if 1) is the cause. I don't know 
enough, or in fact anything, about 2), so don;t know that I can help 
there without advice.







It certainly looks like it's not linked to libpq.dll:

   Microsoft (R) COFF/PE Dumper Version 14.15.26726.0
   Copyright (C) Microsoft Corporation.  All rights reserved.


   Dump of file
\cygwin64\home\andrew\\bf64\root\HEAD\inst\lib\postgresql\dblink.dll

   File Type: DLL

      Image has the following dependencies:

        postgres.exe
        cygcrypto-1.0.0.dll
        cygwin1.dll
        cygssl-1.0.0.dll
        KERNEL32.dll


I'll build an earlier version to do a comparison just to make sure 
we're seeing the right thing.







Hmm. Getting the same result from REL_10_STABLE.

Not sure where to go from here.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Adding pipe support to pg_dump and pg_restore

2018-09-29 Thread David Hedberg
Hi,

On Sat, Sep 29, 2018 at 8:03 PM, Stephen Frost  wrote:
> Greetings,
>
> * David Hedberg (david.hedb...@gmail.com) wrote:
>> On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost  wrote:
>> > * David Hedberg (david.hedb...@gmail.com) wrote:
>> >> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost  wrote:
>> >> Generally, my thinking is that this can be pretty useful in general
>> >> besides encryption. For other formats the dumps can already be written
>> >> to standard output and piped through for example gpg or a custom
>> >> compression application of the administrators choice, so in a sense
>> >> this functionality would merely add the same feature to the directory
>> >> format.
>> >
>> > That's certainly not the same though.  One of the great advantages of
>> > custom and directory format dumps is the TOC and the ability to
>> > selectively extract data from them without having to read the entire
>> > dump file.  You end up losing that if you have to pass the entire dump
>> > through something else because you're using the pipe.
>>
>> I can maybe see the problem here, but I apologize if I'm missing the point.
>>
>> Since all the files are individually passed through separate instances
>> of the pipe, they can also be individually restored. I guess the
>> --list option could be (adopted to be) used to produce a clear text
>> TOC to further use in selective decryption of the rest of the archive?
>

I admit that my understanding of the custom format was naive (I have
never actually used it).

>> If this is simply outside the scope of the directory or the custom
>> format, that is certainly understandable (and, to me, somewhat
>> regrettable :-) ).
>
> What I think isn't getting through is that while this is an interesting
> approach, it really isn't a terribly good one, regardless of how
> flexible you view it to be.  The way to move this forward seems pretty
> clearly to work on adding generalized encryption support to
> pg_dump/restore that doesn't depend on calling external programs
> underneath of the directory format with a pipe.

I did get the message that it wasn't the optimal way of doing it, and
I have now also gotten the message that it's probably not really
wanted at all.


Thanks you for your insights,
David



Re: executor relation handling

2018-09-29 Thread Tom Lane
David Rowley  writes:
> I've attached v10 which fixes this and aligns the WARNING text in
> ExecInitRangeTable() and addRangeTableEntryForRelation().

I started poking at this.  I thought that it would be a good cross-check
to apply just the "front half" of 0001 (i.e., creation and population of
the RTE lockmode field), and then to insert checks in each of the
"back half" places (executor, plancache, etc) that the lockmodes they
are computing today match what is in the RTE.

Those checks fell over many times in the regression tests.

There seem to be at least four distinct problems:

1. You set up transformRuleStmt to insert AccessExclusiveLock into
the "OLD" and "NEW" RTEs for a view.  This is surely wrong; we do
not want to take exclusive lock on a view just to run a query using
the view.  It should (usually, anyway) just be AccessShareLock.
However, because addRangeTableEntryForRelation insists that you
hold the requested lock type *now*, just changing the parameter
to AccessShareLock doesn't work.

I hacked around this for the moment by passing NoLock to
addRangeTableEntryForRelation and then changing rte->lockmode
after it returns, but man that's ugly.  It makes me wonder whether
addRangeTableEntryForRelation should be checking the lockmode at all.

I'm inclined to think we should remove the check for current lockmode
in addRangeTableEntryForRelation, and instead just assert that the
passed lockmode must be one of AccessShareLock, RowShareLock, or
RowExclusiveLock depending on whether the RTE is meant to represent
a plain source table, a FOR UPDATE/SHARE source table, or a target
table.  I don't think it's that helpful to be checking that the
caller got exactly the same lock, especially given the number of
places where the patch had to cheat already by using NoLock.

2. The "forUpdatePushedDown" override in AcquireRewriteLocks isn't
getting modeled in the RTE lockmode fields.  In other words, if we
have something like

CREATE VIEW vv AS SELECT * FROM tab1;
SELECT * FROM vv FOR UPDATE OF vv;

the checks fall over, because the tab1 RTE in the view's rangetable
just has AccessShareLock, but we need RowShareLock.  I fixed this
by having AcquireRewriteLocks actually modify the RTE if it is
promoting the lock level.  This is kinda grotty, but we were already
assuming that AcquireRewriteLocks could scribble on the query tree,
so it seems safe enough.

3. There remain some cases where the RTE says RowExclusiveLock but
the executor calculation indicates we only need AccessShareLock.
AFAICT, this happens only when we have a DO ALSO rule that results
in an added query that merely scans the target table.  The RTE used
for that purpose is just the original one, so it still has a lockmode
suitable for a target table.

We could probably hack the rewriter so that it changes the RTE lockmode
back down to AccessShareLock in these cases.  However, I'm inclined to
think that that is something *not* to do, and that we should let the
higher lockmode be used instead, for two reasons:

(1) Asking for both AccessShareLock and RowExclusiveLock on the same
table requires an extra trip through the shared lock manager, for little
value that I can see.

(2) If the DO ALSO rule is run before the main one, we'd be acquiring
AccessShareLock before RowExclusiveLock, resulting in deadlock hazard
due to lock upgrade.  (I think this may be a pre-existing bug, although
it could likely only manifest in corner cases such as where we're pulling
a plan tree out of plancache.  In most cases the first thing we'd acquire
on a rule target table is RowExclusiveLock in the parser, before any
rule rewriting could happen.)

4. I also notice some cases (all in FDW tests) where ExecOpenScanRelation
is choosing AccessShareLock although the RTE has RowShareLock.  These seem
to all be cases where we're implementing FOR UPDATE/SHARE via
ROW_MARK_COPY, so that this:

ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);

if (erm != NULL && erm->relation != NULL)
lockmode = NoLock;

leaves lockmode as AccessShareLock because erm->relation is NULL.
Again, I think this is probably OK, and it'd be better to use
RowShareLock for consistency with other places that might open
the rel.  It will be a change in externally-visible behavior though.

Thoughts?

regards, tom lane



Re: Cygwin linking rules

2018-09-29 Thread Tom Lane
Andrew Dunstan  writes:
> Not sure where to go from here.

What would happen if we stopped building libpq.a, so that the
linker didn't have any choice about what to do?

regards, tom lane



Re: MERGE SQL statement for PG12

2018-09-29 Thread Jaime Casanova
On Mon, 24 Sep 2018 at 05:15, Pavan Deolasee  wrote:
>
> A new version rebased against the current master is attached.
>

Hi Pavan,

A day after you posted this patch commit
29c94e03c7d05d2b29afa1de32795ce178531246 removed ExecStoreTuple.
I'm right in believe that the change in
src/backend/executor/execMerge.c should be for ExecStoreHeapTuples?

-   ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
+   ExecStoreHeapTuple(&tuple, mtstate->mt_existing, false);


-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Continue work on changes to recovery.conf API

2018-09-29 Thread Sergei Kornilov
Hello

thank you for review!

>>  - if present both standby.signal and recovery.signal - we use standby mode
>>  (this design from previous thread)
>
> Didn't find the discussion on that and don't understand the reasons, but
> seems OK and easy to change if we don't like it.
I meant this was implemented in previous proposed patch and i do not changed 
this. I didn't find the discussion on that too...

>>  - recovery_target (immediate), recovery_target_name, recovery_target_time, 
>> recovery_target_xid, recovery_target_lsn are replaced to 
>> recovery_target_type and recovery_target_value (was discussed and changed in 
>> previous thread)
>
> I think this was the major point of contention. I reread the old
> thread, and it's still not clear why we need to change this. _type and
> _value look like an EAV system to me. GUC variables should be
> verifiable independent of another variable. The other idea of using
> only one variable seems better, but using two variables seems like a
> poor compromise between one and five.

> No, they MUST be independently verifiable. The interactions between the 
> check_xxx functions in this patch are utterly unsafe.

Understood, thank you.
I will implement set of current five recovery_target* settings and would like 
to leave reorganization for futher pathes.

Not sure about one thing. All recovery_target settings change one internal 
recoveryTarget variable. GUC infrastructure will guarantee behavior if user 
erroneously set multiple different recovery_target*? I mean check_* and 
assign_* will be called in order and so latest config line wins?

>>  - trigger_file was renamed to promote_signal_file for clarify (my change, 
>> in prev thread was removed)
>
> OK to add the "promote" prefix, but I'd prefer to keep the "trigger"
> name. There is a lot of discussion and knowledge around that. Seems
> unnecessary to change.
Ok, will change to promote_trigger_file

>>  - pg_basebackup with -R (--write-recovery-conf) option will create 
>> standby.signal file and append primary_conninfo and primary_slot_name (if 
>> was used) to postgresql.auto.conf instead of writing new recovery.conf
>
> I wonder if that would cause any problems. The settings in
> postgresql.auto.conf are normally not PGC_POSTMASTER, otherwise you
> couldn't use ALTER SYSTEM to put them there. Maybe it's OK.
Actually we can use ALTER SYSTEM to put PGC_POSTMASTER settings. I tried now 
"alter system set max_connections to 300;", postgresql.auto.conf was changed 
and affect max_connections during database start.
Of course, we can not reload PGC_POSTMASTER settings, but during start we call 
regular ParseConfigFile and can override PGC_POSTMASTER.

regards, Sergei



Re: libpq host/hostaddr/conninfo inconsistencies

2018-09-29 Thread Arthur Zakirov
Hello,

On Fri, Aug 24, 2018 at 11:22:47AM +0200, Fabien COELHO wrote:
> Attached is a rebase after 5ca00774.

I looked a little bit the patch. And I have a few notes.

> However, the actual capability is slightly different: specifying an ip
> address to "host" does work, without ensuing any name or reverse name
> look-ups, even if this is undocumented.

Agree it may have more details within the documentation.

> sh> psql "host=/tmp hostaddr=127.0.0.1"

Yeah this example shows that user may be confused by output of
\conninfo. I think it is psql issue and libpq issue. psql in
exec_command_conninfo() rely only on the PQhost() result. Can we add a
function PQhostType() to solve this issue?

> sh> psql "host=127.0.0.2 hostaddr=127.0.0.1"

I'm not sure that is is the issue. User defined the host name and psql
show it.

> sh> psql "hostaddr=127.0.0.1"

I cannot reproduce it. It gives me the message:

You are connected to database "artur" as user "artur" on host "127.0.0.1" at 
port "5432".

I think it is because of the environment (I didn't define PGHOST
variable, for example). If so, depending on PGHOST variable value
("/tmp" or "127.0.0.1") it is related with first or second issue.

> * Another issue with \conninfo is that if a host resolves to multiple ips,
> there is no way to know which was chosen and/or worked, although on errors
> some messages show the failing ip.

Can you explain it please? You can use PQhost() to know choosed host.

> * The documentation about host/hostaddr/port accepting lists is really
> added as an afterthought: the features are presented for one, and then the
> list is mentionned.

I cannot agree with you. When I've learned libpq before I found
host/hostaddr rules description useful. And I disagree that it is good
to remove it (as the patch does).
Of course it is only my point of view and others may have another opinion.

> (3) checking that hostaddr non empty addresses are only accepted if the
> corresponding host is a name. The user must use the "host=ip" syntax
> to connect to an ip.

Patch gives me an error if I specified only hostaddr:

psql -d "hostaddr=127.0.0.1"
psql: host "/tmp" cannot have an hostaddr "127.0.0.1"

It is wrong, because I didn't specified host=/tmp.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: executor relation handling

2018-09-29 Thread Tom Lane
I wrote:
> I started poking at this.  I thought that it would be a good cross-check
> to apply just the "front half" of 0001 (i.e., creation and population of
> the RTE lockmode field), and then to insert checks in each of the
> "back half" places (executor, plancache, etc) that the lockmodes they
> are computing today match what is in the RTE.
> Those checks fell over many times in the regression tests.

Here's a version that doesn't fall over (for me), incorporating fixes
as I suggested for points 1 and 2, and weakening the back-half assertions
enough to let points 3 and 4 succeed.  I'm inclined to commit this to see
if the buildfarm can find any problems I missed.

Some cosmetic changes:

* I renamed the RTE field to rellockmode in the name of greppability.

* I rearranged the order of the arguments for
addRangeTableEntryForRelation to make it more consistent with the
other addRangeTableEntryXXX functions.

regards, tom lane

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 4f1d365..7dfa327 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1415,6 +1415,7 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender,
 	rte.rtekind = RTE_RELATION;
 	rte.relid = relId;
 	rte.relkind = RELKIND_RELATION; /* no need for exactness here */
+	rte.rellockmode = AccessShareLock;
 
 	context.rtables = list_make1(list_make1(&rte));
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9176f62..4ad5868 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2549,6 +2549,7 @@ AddRelationNewConstraints(Relation rel,
 	pstate->p_sourcetext = queryString;
 	rte = addRangeTableEntryForRelation(pstate,
 		rel,
+		AccessShareLock,
 		NULL,
 		false,
 		true);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d06662b..2d5bc8a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -833,20 +833,21 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 
 	if (stmt->relation)
 	{
+		LOCKMODE	lockmode = is_from ? RowExclusiveLock : AccessShareLock;
+		RangeTblEntry *rte;
 		TupleDesc	tupDesc;
 		List	   *attnums;
 		ListCell   *cur;
-		RangeTblEntry *rte;
 
 		Assert(!stmt->query);
 
 		/* Open and lock the relation, using the appropriate lock type. */
-		rel = heap_openrv(stmt->relation,
-		  (is_from ? RowExclusiveLock : AccessShareLock));
+		rel = heap_openrv(stmt->relation, lockmode);
 
 		relid = RelationGetRelid(rel);
 
-		rte = addRangeTableEntryForRelation(pstate, rel, NULL, false, false);
+		rte = addRangeTableEntryForRelation(pstate, rel, lockmode,
+			NULL, false, false);
 		rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT);
 
 		tupDesc = RelationGetDescr(rel);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3d82edb..d5cb62d 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -528,6 +528,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	rte->rtekind = RTE_RELATION;
 	rte->relid = intoRelationAddr.objectId;
 	rte->relkind = relkind;
+	rte->rellockmode = RowExclusiveLock;
 	rte->requiredPerms = ACL_INSERT;
 
 	for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index cee0ef9..2fd17b2 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -567,7 +567,9 @@ RemoveRoleFromObjectPolicy(Oid roleid, Oid classid, Oid policy_id)
 			qual_expr = stringToNode(qual_value);
 
 			/* Add this rel to the parsestate's rangetable, for dependencies */
-			addRangeTableEntryForRelation(qual_pstate, rel, NULL, false, false);
+			addRangeTableEntryForRelation(qual_pstate, rel,
+		  AccessShareLock,
+		  NULL, false, false);
 
 			qual_parse_rtable = qual_pstate->p_rtable;
 			free_parsestate(qual_pstate);
@@ -589,8 +591,9 @@ RemoveRoleFromObjectPolicy(Oid roleid, Oid classid, Oid policy_id)
 			with_check_qual = stringToNode(with_check_value);
 
 			/* Add this rel to the parsestate's rangetable, for dependencies */
-			addRangeTableEntryForRelation(with_check_pstate, rel, NULL, false,
-		  false);
+			addRangeTableEntryForRelation(with_check_pstate, rel,
+		  AccessShareLock,
+		  NULL, false, false);
 
 			with_check_parse_rtable = with_check_pstate->p_rtable;
 			free_parsestate(with_check_pstate);
@@ -752,11 +755,13 @@ CreatePolicy(CreatePolicyStmt *stmt)
 
 	/* Add for the regular security quals */
 	rte = addRangeTableEntryForRelation(qual_pstate, target_table,
+		AccessShareLock,
 		NULL, false, false);
 	addRTEtoQuery(qual_pstate, rte, false, true, true);
 
 	/* Add for the with-check quals */
 	rte = addRangeTableEntryForRelation(with_check_pstate, target_table,
+		AccessShareLock,
 		NULL, false

Re: [HACKERS] proposal: schema variables

2018-09-29 Thread Pavel Stehule
so 29. 9. 2018 v 10:34 odesílatel Pavel Stehule 
napsal:

>
>
> so 22. 9. 2018 v 8:00 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> rebased against yesterday changes in tab-complete.c
>>
>
> rebased against last changes in master
>

+ using content of schema variable for estimation
+ subtransaction support

I hope so now, there are almost complete functionality. Please, check it.

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>
>> Regards
>>
>> Pavel
>>
>


schema-variables-20180929-02.patch.gz
Description: application/gzip


Re: Cygwin linking rules

2018-09-29 Thread Andrew Dunstan




On 09/29/2018 04:06 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Not sure where to go from here.

What would happen if we stopped building libpq.a, so that the
linker didn't have any choice about what to do?





I will test Marco's patch, which I think does that, tomorrow.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Implementing SQL ASSERTION

2018-09-29 Thread Andrew Gierth
> "Joe" == Joe Wildish  writes:

 >> I haven't looked at the background of this, but if what you want to
 >> know is whether the aggregate function has the semantics of min() or
 >> max() (and if so, which) then the place to look is
 >> pg_aggregate.aggsortop.

 Joe> Thanks for the pointer. I've had a quick look at pg_aggregate, and
 Joe> back at my code, but I think there is more to it than just the
 Joe> sorting property. Specifically we need to know about the aggregate
 Joe> function when combined with connectors <, <=, < ANY, <= ANY, < ALL
 Joe> and <= ALL (and their equivalents with ">" and ">=").

The presence of an aggsortop means "this aggregate function is
interchangeable with (select x from ... order by x using OP limit 1)",
with all of the semantic consequences that implies. Since OP must be the
"<" or ">" member of a btree index opclass, the semantics of its
relationships with other members of the same opfamily can be deduced
from that.

 Joe> Also, it looks like COUNT and SUM don't have a sortop

Right, because those currently have no semantics that PG needs to know
about or describe.

-- 
Andrew (irc:RhodiumToad)



Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-29 Thread Andrew Gierth
> "Andrew" == Andrew Dunstan  writes:

 >> What is the size of a C "int" on this platform?

 Andrew> 4.

Hmm.

Because int being more than 32 bits is the simplest explanation for this
difference.

How about the output of this query:

with d(a) as (values ('----'::uuid),
 ('----'::uuid),
 ('3f3e3c3b-3a30-3938-3736-353433a2313e'::uuid))
  select d1.a, d2.a, uuid_cmp(d1.a,d2.a) from d d1, d d2
   order by d1.a asc, d2.a desc;

-- 
Andrew (irc:RhodiumToad)



Re: libpq host/hostaddr/conninfo inconsistencies

2018-09-29 Thread Thomas Munro
On Sat, Aug 25, 2018 at 7:25 AM Tom Lane  wrote:
> Fabien COELHO  writes:
> > Attached is a rebase after 5ca00774.
>
> I notice that the cfbot thinks that *none* of your pending patches apply
> successfully.  I tried this one locally and what I get is
>
> $ patch -p1 <~/libpq-host-ip-2.patch
> (Stripping trailing CRs from patch.)
> patching file doc/src/sgml/libpq.sgml
> (Stripping trailing CRs from patch.)
> patching file src/interfaces/libpq/fe-connect.c
>
> as compared to the cfbot report, in which every hunk is rejected:
>
> === applying patch ./libpq-host-ip-2.patch
> Hmm...  Looks like a unified diff to me...
> The text leading up to this was:
> --
> |diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
> |index 5e7931ba90..086172d4f0 100644
> |--- a/doc/src/sgml/libpq.sgml
> |+++ b/doc/src/sgml/libpq.sgml
> --
> Patching file doc/src/sgml/libpq.sgml using Plan A...
> Hunk #1 failed at 964.
> Hunk #2 failed at 994.
> 2 out of 2 hunks failed--saving rejects to doc/src/sgml/libpq.sgml.rej
> Hmm...  The next patch looks like a unified diff to me...
> The text leading up to this was:
> --
> |diff --git a/src/interfaces/libpq/fe-connect.c 
> b/src/interfaces/libpq/fe-connect.c
> |index a8048ffad2..34025ba041 100644
> |--- a/src/interfaces/libpq/fe-connect.c
> |+++ b/src/interfaces/libpq/fe-connect.c
> --
> Patching file src/interfaces/libpq/fe-connect.c using Plan A...
> Hunk #1 failed at 908.
> Hunk #2 failed at 930.
> Hunk #3 failed at 943.
> Hunk #4 failed at 974.
> Hunk #5 failed at 1004.
> Hunk #6 failed at 1095.
> Hunk #7 failed at 2098.
> Hunk #8 failed at 2158.
> Hunk #9 failed at 6138.
> 9 out of 9 hunks failed--saving rejects to 
> src/interfaces/libpq/fe-connect.c.rej
> done
>
> So I'm speculating that the cfbot is using a version of patch(1) that
> doesn't have strip-trailing-CRs logic.  Which bemuses me, because
> I thought they all did.

Huh.  Yeah.  I have now switched it over to GNU patch.  It seems to be
happier with Fabien's patches so far, but will take a few minutes to
catch up with all of them.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] kqueue

2018-09-29 Thread Thomas Munro
On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati  wrote:
> On 28/09/2018 14:19, Thomas Munro wrote:
> > On Fri, Sep 28, 2018 at 11:09 AM Andres Freund  wrote:
> >> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> >>> Matteo Beccati reported a 5-10% performance drop on a
> >>> low-end Celeron NetBSD box which we have no explanation for, and we
> >>> have no reports from server-class machines on that OS -- so perhaps we
> >>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> >>> NetBSD until someone can figure out what needs to be fixed there
> >>> (possibly on the NetBSD side)?
> >>
> >> Yea, I'm not too worried about that. It'd be great to test that, but
> >> otherwise I'm also ok to just plonk that into the template.
> >
> > Thanks for the review!  Ok, if we don't get a better idea I'll put
> > this in src/template/netbsd:
> >
> > CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"
>
> A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh
> install of NetBSD 8.0 again shows that kqueue is consistently slower
> running pgbench vs unpatched master on tcp-b like pgbench workloads:
>
> ~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10
>
> while on select only benchmarks the difference is below the noise floor,
> with both doing roughly the same ~30k tps.
>
> Out of curiosity, I've installed FreBSD on an identically specced VM,
> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
> unpatched master, so maybe there's something wrong I'm doing when
> benchmarking. Could you please provide proper instructions?

Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
 Not sure if it's relevant, but do you happen to see gettimeofday()
showing up as a syscall, if you truss a backend running pgbench?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-29 Thread Tom Lane
Andrew Gierth  writes:
> Because int being more than 32 bits is the simplest explanation for this
> difference.

Curious to hear your reasoning behind that statement?  I hadn't gotten
further than "memcmp is broken" ... and neither of those theories is
tenable, because if they were true then a lot more things besides uuid
sorting would be falling over.

regards, tom lane



Re: Odd 9.4, 9.3 buildfarm failure on s390x

2018-09-29 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Andrew Gierth  writes:
 >> Because int being more than 32 bits is the simplest explanation for
 >> this difference.

 Tom> Curious to hear your reasoning behind that statement? I hadn't
 Tom> gotten further than "memcmp is broken" ... and neither of those
 Tom> theories is tenable, because if they were true then a lot more
 Tom> things besides uuid sorting would be falling over.

memcmp() returns an int, and guarantees only the sign of the result, so
((int32) memcmp()) may have the wrong value if int is wider than int32.

But yeah, it seems unlikely that it would break for uuid but not bytea
(or text in collate C).

-- 
Andrew (irc:RhodiumToad)



Re: SerializeParamList vs machines with strict alignment

2018-09-29 Thread Amit Kapila
On Mon, Sep 10, 2018 at 7:05 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Mon, Sep 10, 2018 at 8:58 AM Tom Lane  wrote:
> >> In particular, SerializeParamList does this:
> >>
> >> /* Write flags. */
> >> memcpy(*start_address, &prm->pflags, sizeof(uint16));
> >> *start_address += sizeof(uint16);
> >>
> >> immediately followed by this:
> >>
> >> datumSerialize(prm->value, prm->isnull, typByVal, typLen,
> >> start_address);
>
> > datumSerialize does this:
>
> > memcpy(*start_address, &header, sizeof(int));
> > *start_address += sizeof(int);
>
> > before calling EOH_flatten_into, so it seems to me it should be 4-byte 
> > aligned.
>
> But that doesn't undo the fact that you're now on an odd halfword
> boundary.  In the case I observed, EA_flatten_into was trying to
> store an int32 through a pointer whose hex value ended in E, which
> is explained by the "+= sizeof(uint16)".
>
> > Yeah, I think as suggested by you, start_address should be maxaligned.
>
> A localized fix would be for datumSerialize to temporarily palloc the
> space for EOH_flatten_into to write into, and then it could memcpy
> that to whatever random address it'd been passed.

Attached is a patch along these lines, let me know if you have
something else in mind?  I have manually verified this with
force_parallel_mode=regress configuration in my development
environment.  I don't have access to alignment-sensitive hardware, so
can't test in such an environment.  I will see if I can write a test
as well.

>  Seems kind of
> inefficient though, especially since you'd likely have to do the same
> thing on the receiving side.
>

I am not sure what exactly you have in mind for receiving side.
datumRestore does below for pass-by-reference values:

/* Pass-by-reference case; copy indicated number of bytes. */
Assert(header > 0);
d = palloc(header);
memcpy(d, *start_address, header);
*start_address += header;
return PointerGetDatum(d);

Do we need any other special handling here?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_datum_serialization_v1.patch
Description: Binary data


Re: [HACKERS] [PATCH] Generic type subscripting

2018-09-29 Thread Pavel Stehule
ne 30. 9. 2018 v 0:21 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Fri, 20 Jul 2018 at 23:32, Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> >
> > > On Thu, 26 Apr 2018 at 16:44, Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> > >
> > > > On 22 March 2018 at 23:25, Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> > > >
> > > > Here is the updated version of patch, rebased after recent conflicts
> and with
> > > > suggested documentation improvements.
> > >
> > > Another rebased version of the patch.
> >
> > I've noticed, that I never updated llvmjit code for the arrayref
> expressions,
> > and it's important to do so, since the patch introduces another layer of
> > flexibility. Hence here is the new version.
>
> Here is another rebased version, and a bit of history: the first
> prototypes of
> this patch were sent more than 3 years ago. Of course the patch evolved
> significantly over this period, and I take it as a good sign that it wasn't
> rejected and keeps moving through the commitfests. At the same time the
> lack of
> attention makes things a bit frustrating. I have an impression that it's
> sort
> of regular situation and wonder if there are any ideas (besides the well
> known
> advice of putting some efforts into review patches from other people,
> since I'm
> already doing my best and enjoying this) how to make progress in such
> cases?
>

This feature looks nice, and it can be great when some values of some not
atomic type should be updated.

Regards

Pavel