Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-09 Thread Michael Paquier
On Thu, Aug 09, 2018 at 08:32:32AM +0530, Andres Freund wrote:
> My point is that the documentation isn't sufficient. Not that there's an 
> active problem.

OK.  Attached is the latest version if the patch I have that I was
preparing for commit.

On top of isTempNamespaceInUse I have this note:
+ * Note: this can be used while scanning relations in pg_class to detect
+ * orphaned temporary tables or namespaces with a backend connected to a
+ * given database.

Would you be fine if I add an extra note like what's in
BackendIdGetProc?  Say "The result may be out of date quickly, so the
caller must be careful how to handle this information."
--
Michael
From 1e9ba9fa9b172bda1ea54b1f3be1b930973ff45f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 8 Aug 2018 19:45:31 +0200
Subject: [PATCH] Make autovacuum more aggressive to remove orphaned temp
 tables

Commit dafa084, added in 10, made the removal of temporary orphaned
tables more aggressive.  This commit makes an extra step into the
aggressiveness by adding a flag in each backend's MyProc which tracks
down any temporary namespace currently in use.  The flag is set when the
namespace gets created and can be reset if the temporary namespace has
been created in a transaction or sub-transaction which is aborted.  The
flag value assignment is assumed to be atomic, so this can be done in a
lock-less fashion like other flags already present in PGPROC like
databaseId or backendId.

This new flag gets used by autovacuum to discard more aggressively
orphaned tables by additionally checking for the database a backend is
connected to as well as its temporary namespace in-use, removing
orphaned temporary relations even if a backend reuses the same slot as
one which created temporary relations in the past.

Note that autovacuum workers would scan pg_class for temporary orphaned
relations without seeing the new temporary namespace or any temporary
relations until the transaction creating those gets committed, so that
would be fine also if the assignment is non-atomic.

The base idea of this patch comes from Robert Haas, has been written in
its first version by Tsunakawa Takayuki, and heavily reviewed by me.

Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier, Kyotaro Horiguchi
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05
Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI
breakages on already released versions.
---
 src/backend/access/transam/twophase.c |  1 +
 src/backend/catalog/namespace.c   | 69 ++-
 src/backend/postmaster/autovacuum.c   | 20 +++-
 src/backend/storage/lmgr/proc.c   |  2 +
 src/include/catalog/namespace.h   |  1 +
 src/include/storage/proc.h|  3 ++
 6 files changed, 82 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 0ae0722794..4aff6cf7f2 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -471,6 +471,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->backendId = InvalidBackendId;
 	proc->databaseId = databaseid;
 	proc->roleId = owner;
+	proc->tempNamespaceId = InvalidOid;
 	proc->isBackgroundWorker = false;
 	proc->lwWaiting = false;
 	proc->lwWaitMode = 0;
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 0f67a122ed..19400e7d76 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -47,7 +47,7 @@
 #include "parser/parse_func.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
-#include "storage/sinval.h"
+#include "storage/sinvaladt.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/catcache.h"
@@ -3204,6 +3204,45 @@ isOtherTempNamespace(Oid namespaceId)
 	return isAnyTempNamespace(namespaceId);
 }
 
+/*
+ * isTempNamespaceInUse - is the given namespace owned and actively used
+ * by a backend?
+ *
+ * Note: this can be used while scanning relations in pg_class to detect
+ * orphaned temporary tables or namespaces with a backend connected to a
+ * given database.
+ */
+bool
+isTempNamespaceInUse(Oid namespaceId)
+{
+	PGPROC	   *proc;
+	int			backendId;
+
+	Assert(OidIsValid(MyDatabaseId));
+
+	backendId = GetTempNamespaceBackendId(namespaceId);
+
+	if (backendId == InvalidBackendId ||
+		backendId == MyBackendId)
+		return false;
+
+	/* Is the backend alive? */
+	proc = BackendIdGetProc(backendId);
+	if (proc == NULL)
+		return false;
+
+	/* Is the backend connected to the same database we are looking at? */
+	if (proc->databaseId != MyDatabaseId)
+		return false;
+
+	/* Does the backend own the temporary namespace? */
+	if (proc->tempNamespaceId != namespaceId)
+		return false;
+
+	/* all good to go */
+	return true;
+}
+
 /*
  * GetTempNamespaceBackendId - if the given namespace is a temporary-table
  * namespace (either my own, or another backend's), return the Ba

Typo in optimizer/README

2018-08-09 Thread Tatsuro Yamada

Hi,

Attached patch for fixing a typo in optimizer/README.

s/Partition-wise/Partitionwise/

It is the only place the keyword "Partition-wise" exists, so I created the 
patch.

Regards,

Tatsuro Yamada
NTT Open Source Software Center

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 15af9ceff5..53f02b5300 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1106,7 +1106,7 @@ PartitionSchemeData object.  This reduces memory consumed by
 PartitionSchemeData objects and makes it easy to compare the partition schemes
 of joining relations.
 
-Partition-wise aggregates/grouping
+Partitionwise aggregates/grouping
 --
 
 If the GROUP BY clause has contains all of the partition keys, all the rows


Re: Reopen logfile on SIGHUP

2018-08-09 Thread Kyotaro HORIGUCHI
Hello.

At Tue, 7 Aug 2018 16:36:34 +0300, Alexander Kuzmenkov 
 wrote in 

> I think the latest v4 patch addresses the concerns raised
> upthread. I'm marking the commitfest entry as RFC.

Thank you, but it is forgetting pg_ctl --help output.  I added it
and moved logrotate entry in docs just above "kill" in
app-pg-ctl.html. Also added "-s" option.

Furthermore, I did the following things.

- Since I'm not sure unlink is signal-handler safe on all
  supported platforms, I moved unlink() call out of
  checkLogrotateSignal() to SysLoggerMain, just before rotating
  log file.

- Refactored moving the main stuff to syslogger.c. 

- The diff is splitted into two files and renamed. (but version
  number continues to v5).

I'm not certain whether to allow the signal file alone cause
rotation, but this patch doesn't. Directly placed signal file is
to be removed without causing rotation.

Please find the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From bee1093cc6c0361c138ebaa4920b7f62fdc4959c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 9 Aug 2018 16:09:46 +0900
Subject: [PATCH 1/2] New pg_ctl subcommand "logrotate"

We are able to force log file rotation by calling pg_rotate_logfile()
SQL function or by sending SIGUSR1 directly to syslogger. External
tools that don't want to log in as superuser needs to know the PID of
syslogger.

This patch provides a means cause log rotation by pg_ctl.
---
 src/backend/postmaster/postmaster.c |  6 ++-
 src/backend/postmaster/syslogger.c  | 36 ++-
 src/bin/pg_ctl/pg_ctl.c | 89 -
 src/include/postmaster/syslogger.h  |  3 ++
 4 files changed, 120 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b33cd..84928a3017 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1268,6 +1268,9 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	RemovePromoteSignalFiles();
 
+	/* Do the same for logrotate signal file */
+	RemoveLogrotateSignalFiles();
+
 	/* Remove any outdated file holding the current log filenames. */
 	if (unlink(LOG_METAINFO_DATAFILE) < 0 && errno != ENOENT)
 		ereport(LOG,
@@ -5100,7 +5103,8 @@ sigusr1_handler(SIGNAL_ARGS)
 		signal_child(PgArchPID, SIGUSR1);
 	}
 
-	if (CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE) &&
+	if ((CheckLogrotateSignal() ||
+		 CheckPostmasterSignal(PMSIGNAL_ROTATE_LOGFILE)) &&
 		SysLoggerPID != 0)
 	{
 		/* Tell syslogger to rotate logfile */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 58b759f305..d4c41c9b7b 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -56,6 +56,9 @@
  */
 #define READ_BUF_SIZE (2 * PIPE_CHUNK_SIZE)
 
+/* Log rotation signal file path, relative to $PGDATA */
+#define LOGROTATE_SIGNAL_FILE	"logrotate"
+
 
 /*
  * GUC parameters.  Logging_collector cannot be changed after postmaster
@@ -384,11 +387,18 @@ SysLoggerMain(int argc, char *argv[])
 			}
 		}
 
+		/*
+		 * logrotate signal file alone doesn't cause rotation. We remove the
+		 * file without setting rotation_requested if any.
+		 */
+		if (CheckLogrotateSignal())
+			unlink(LOGROTATE_SIGNAL_FILE);
+
 		if (rotation_requested)
 		{
 			/*
 			 * Force rotation when both values are zero. It means the request
-			 * was sent by pg_rotate_logfile.
+			 * was sent by pg_rotate_logfile or pg_ctl logrotate.
 			 */
 			if (!time_based_rotation && size_rotation_for == 0)
 size_rotation_for = LOG_DESTINATION_STDERR | LOG_DESTINATION_CSVLOG;
@@ -1422,6 +1432,30 @@ update_metainfo_datafile(void)
  * 
  */
 
+/*
+ * Check to see if a log rotation request has arrived. Should be
+ * called by postmaster after receiving SIGUSR1.
+ */
+bool
+CheckLogrotateSignal(void)
+{
+	struct stat stat_buf;
+
+	if (stat(LOGROTATE_SIGNAL_FILE, &stat_buf) == 0)
+		return true;
+
+	return false;
+}
+
+/*
+ * Remove the file signaling a log rotateion request.
+ */
+void
+RemoveLogrotateSignalFiles(void)
+{
+	unlink(LOGROTATE_SIGNAL_FILE);
+}
+
 /* SIGHUP: set flag to reload config file */
 static void
 sigHupHandler(SIGNAL_ARGS)
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index ed2396aa6c..ae8fe75c1a 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -59,6 +59,7 @@ typedef enum
 	STOP_COMMAND,
 	RESTART_COMMAND,
 	RELOAD_COMMAND,
+	LOGROTATE_COMMAND,
 	STATUS_COMMAND,
 	PROMOTE_COMMAND,
 	KILL_COMMAND,
@@ -100,6 +101,7 @@ static char version_file[MAXPGPATH];
 static char pid_file[MAXPGPATH];
 static char backup_file[MAXPGPATH];
 static char promote_file[MAXPGPATH];
+static char logrotate_file[MAXPGPATH];
 
 #ifdef WIN32
 static DWORD pgctl_start_type = SERVICE_AUTO_START;
@@ -125,6 +127,7 @@ static void do_restart(void);
 static void do_reload(void);
 static void do_status(void);
 static void do_promote(void);
+sta

Re: Typo in optimizer/README

2018-08-09 Thread Heikki Linnakangas

On 09/08/18 10:22, Tatsuro Yamada wrote:

Attached patch for fixing a typo in optimizer/README.

s/Partition-wise/Partitionwise/

It is the only place the keyword "Partition-wise" exists, so I created the 
patch.


Thanks, fixed, and also one in a comment in postgres_fdw's regression 
test. I'm not sure which spelling is better, but I agree we should be 
consistent.


- Heikki



Re: REINDEX and shared catalogs

2018-08-09 Thread Michael Paquier
On Wed, Aug 08, 2018 at 07:36:22PM +, Bossart, Nathan wrote:
> On 8/8/18, 2:16 PM, "Michael Paquier"  wrote:
>> By no-backpatch, I only implied patching v12, but that would not be a
>> huge amount of efforts to get that into v11, so I can do that as well.
>> Any objections to doing that?
> 
> +1 for back-patching to v11.

OK, I have been able to grab some time and pushed this thing to HEAD and
REL_11_STABLE.  Thanks all!
--
Michael


signature.asc
Description: PGP signature


Re: Typo in optimizer/README

2018-08-09 Thread Tatsuro Yamada

Hi Heikki,

On 2018/08/09 16:45, Heikki Linnakangas wrote:

On 09/08/18 10:22, Tatsuro Yamada wrote:

Attached patch for fixing a typo in optimizer/README.

s/Partition-wise/Partitionwise/

It is the only place the keyword "Partition-wise" exists, so I created the 
patch.


Thanks, fixed, and also one in a comment in postgres_fdw's regression test. I'm 
not sure which spelling is better, but I agree we should be consistent.


Oh, I forgot to grep contrib directory. Thanks! :)

Tatsuro Yamada
NTT Open Source Software Center





Re: Scariest patch tournament, PostgreSQL 11 edition

2018-08-09 Thread David Rowley
On 9 August 2018 at 04:32, Alvaro Herrera  wrote:
>  499be013de6 │ Support partition pruning at execution time
> │ Alvaro Herrera   │  6

Perhaps adding the enable_partitition_pruning GUC was a good idea after all.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-09 Thread Fabien COELHO



Hello Marina,


v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to report 
client failures that do not cause an aborts and this depends on the level of 
debugging).


Patch applies cleanly, compiles, global & local make check ok.

This patch improves/homogenizes logging & error reporting in pgbench, in 
preparation for another patch which will manage transaction restarts in 
some cases.


However ISTM that it is not as necessary as the previous one, i.e. we 
could do without it to get the desired feature, so I see it more as a 
refactoring done "in passing", and I'm wondering whether it is 
really worth it because it adds some new complexity, so I'm not sure of 
the net benefit.


Anyway, I still have quite a few comments/suggestions on this version.

* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should not 
include "Error"? Maybe "LogLevel"?


I'm at odds with the proposed levels. ISTM that pgbench internal errors 
which warrant an immediate exit should be dubbed "FATAL", which would 
leave the "ERROR" name for... errors, eg SQL errors. I'd suggest to use an 
INFO level for the PGBENCH_DEBUG function, and to keep LOG for main 
program messages, so that all use case are separate. Or, maybe the 
distinction between LOG/INFO is unclear so info is not necessary.


I'm unsure about the "log_min_messages" variable name, I'd suggest 
"log_level".


I do not see the asserts on LOG >= log_min_messages as useful, because the 
level can only be LOG or DEBUG anyway.


This point also suggest that maybe "pgbench_error" is misnamed as well 
(ok, I know I suggested it in place of ereport, but e stands for error 
there), as it is called on errors, but is also on other things. Maybe 
"pgbench_log"? Or just simply "log" or "report", as it is really an local 
function, which does not need a prefix? That would mean that 
"pgbench_simple_error", which is indeed called on errors, could keep its 
initial name "pgbench_error", and be called on errors.


Alternatively, the debug/logging code could be let as it is (i.e. direct 
print to stderr) and the function only called when there is some kind of 
error, in which case it could be named with "error" in its name (or 
elog/ereport...).


* PQExpBuffer

I still do not see a positive value from importing PQExpBuffer complexity 
and cost into pgbench, as the resulting code is not very readable and it 
adds malloc/free cycles, so I'd try to avoid using PQExpBuf as much as 
possible. ISTM that all usages could be avoided in the patch, and most 
should be avoided even if ExpBuffer is imported because it is really 
useful somewhere.


- to call pgbench_error from pgbench_simple_error, you can do a 
pgbench_log_va(level, format, va_list) version called both from 
pgbench_error & pgbench_simple_error.


- for PGBENCH_DEBUG function, do separate calls per type, the 
very small partial code duplication is worth avoiding ExpBuf IMO.


- for doCustom debug: I'd just let the printf as it is, with a comment, as 
it is really very internal stuff for debug. Or I'd just snprintf a 
something in a static buffer.


- for syntax_error: it should terminate, so it should call
pgbench_error(FATAL, ...). Idem, I'd either keep the printf then call
pgbench_error(FATAL, "syntax error found\n") for a final message,
or snprintf in a static buffer.

- for listAvailableScript: I'd simply call "pgbench_error(LOG" several 
time, once per line.


I see building a string with a format (printfExpBuf..) and then calling 
the pgbench_error function with just a "%s" format on the result as not 
very elegant, because the second format is somehow hacked around.


* bool client

I'm unconvince by this added boolean just to switch the level on 
encountered errors.


I'd suggest to let lookupCreateVariable, putVariable* as they are, call 
pgbench_error with a level which does not stop the execution, and abort if 
necessary from the callers with a "aborted because of putVariable/eval/... 
error" message, as it was done before.


pgbench_error calls pgbench_error. Hmmm, why not.

--
Fabien.



Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-09 Thread Andres Freund
On 2018-08-09 09:00:29 +0200, Michael Paquier wrote:
> On Thu, Aug 09, 2018 at 08:32:32AM +0530, Andres Freund wrote:
> > My point is that the documentation isn't sufficient. Not that there's an 
> > active problem.
> 
> OK.  Attached is the latest version if the patch I have that I was
> preparing for commit.
> 
> On top of isTempNamespaceInUse I have this note:
> + * Note: this can be used while scanning relations in pg_class to detect
> + * orphaned temporary tables or namespaces with a backend connected to a
> + * given database.
> 
> Would you be fine if I add an extra note like what's in
> BackendIdGetProc?  Say "The result may be out of date quickly, so the
> caller must be careful how to handle this information."

That's a caveat, not a documentation of the memory ordering /
concurrency. You somewhere need a comment to the effect of "We are
guaranteed to see a recent enough value of ->tempNamespace because
anybody creating a temporary table acquires a lock - serving as a memory
barrier - during the creation of said table, after assigning the
tempNamespace variable. At the same time, before considering dropping a
temporary table as orphaned, we acquire a lock and recheck tempNamespace
afterwards".  Note that I'm not explaining the concrete model you have
here, but the way you could explain a theoretical one.

- Andres



9.6.10 build warning on Fedora 28

2018-08-09 Thread Devrim Gündüz

Hi,

Just noticed this on Fedora 28 box (GCC 8.1.1)

=
pgbench.c: In function 'ParseScript':
pgbench.c:2640:20: warning: '__builtin___sprintf_chk' may write a terminating
nul past the end of the destination [-Wformat-overflow=]
   sprintf(var, "$%d", cmd->argc);
^
In file included from /usr/include/stdio.h:862,
 from ../../../src/include/c.h:81,
 from ../../../src/include/postgres_fe.h:25,
 from pgbench.c:34:
/usr/include/bits/stdio2.h:33:10: note: '__builtin___sprintf_chk' output
between 3 and 13 bytes into a destination of size 12
   return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
  ^~
   __bos (__s), __fmt, __va_arg_pack ());
=

Regards,

-- 
Devrim Gündüz
EnterpriseDB: https://www.enterprisedb.com
PostgreSQL Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] Improve geometric types

2018-08-09 Thread Kyotaro HORIGUCHI
At Wed, 8 Aug 2018 14:39:54 +0200, Tomas Vondra  
wrote in <6ecb4f61-1fb1-08a1-31d6-e58e9c352...@2ndquadrant.com>
> 
> 
> On 08/03/2018 02:39 PM, Tomas Vondra wrote:
> > On 08/03/2018 06:40 AM, Kyotaro HORIGUCHI wrote:
> >> ...
> >>
> >> I'm not confident on replacing double to float8 partially in gist
> >> code. After the 0002 patch applied, I see most of problematic
> >> usage of double or bare arithmetic on dimentional values in
> >> gistproc.c.
> >>
> >>> static inline float
> >>> non_negative(float val)
> >>> {
> >>> if (val >= 0.0f)
> >>>     return val;
> >>> else
> >>>     return 0.0f;
> >>> }
> >>
> >> It is used as "non_negative(overlap)", where overlap is float4,
> >> which is calculated using float8_mi.  Float4 makes sense only if
> >> we need to store a large number of it to somewhere but they are
> >> just working varialbles. Couldn't we eliminate float4 that
> >> doesn't have a requirement to do so?
> >>
> > I'm not sure I follow. The patch does not modify non_negative() at
> > all, and we still call it like this:
> >      if (non_negative(overlap) < non_negative(context->overlap) ||
> >      (range > context->range &&
> >   non_negative(overlap) <= non_negative(context->overlap)))
> >      selectthis = true;
> > where all the "overlap" values are still float4. The only thing that
> > changed here is that instead of doing the arithmetic operations
> > directly we call float8_mi/float8_div to benefit from the float8
> > handling.
> > So I'm not sure how does the patch beaks this? And what do you mean by
> > 'eliminate float4'?
> > 
> 
> Kyotaro-san, can you explain what your concerns regarding this bit
> are? I'd like to get 0002 committed, but I'm not sure I understand
> your point about the changes in gist code, so I can't address
> them. And I certainly don't want to just ignore them ...

It doesn't break nothing so nothing must be done with this. Just
I was a bit uneasy to see meaninglessly used foat4. Sorry for
the unnecessary argument.

After all I don't object to commit it in this shape.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: 9.6.10 build warning on Fedora 28

2018-08-09 Thread David Rowley
On 9 August 2018 at 21:30, Devrim Gündüz  wrote:
> pgbench.c: In function 'ParseScript':
> pgbench.c:2640:20: warning: '__builtin___sprintf_chk' may write a terminating
> nul past the end of the destination [-Wformat-overflow=]
>sprintf(var, "$%d", cmd->argc);
> ^
> In file included from /usr/include/stdio.h:862,
>  from ../../../src/include/c.h:81,
>  from ../../../src/include/postgres_fe.h:25,
>  from pgbench.c:34:
> /usr/include/bits/stdio2.h:33:10: note: '__builtin___sprintf_chk' output
> between 3 and 13 bytes into a destination of size 12
>return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
>   ^~
>__bos (__s), __fmt, __va_arg_pack ());

Generally, new warnings from newer compilers are not fixed in older
branches. This particular warning was fixed in master in [1] and the
discussion was in [2].

The warning is pretty false anyway since cmd->argc won't be negative.

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3a4b891964a531aa7d242a48fcd9e41379863ead
[2] 
https://www.postgresql.org/message-id/304a21ab-a9d6-264a-f688-912869c0d...@2ndquadrant.com


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Typo in doc or wrong EXCLUDE implementation

2018-08-09 Thread KES
Bruce:
>Yes, it would work, but doing that only for equality would be surprising
 to many people 

Why surprising? It is 
[documented](https://www.postgresql.org/docs/current/static/sql-createtable.html#sql-createtable-exclude):
>If all of the specified operators test for equality, this is equivalent to a 
>UNIQUE constraint, although an ordinary unique constraint will be faster.

Thus the UNIQUE constraint is just particular case of exclusion constraint, is 
not?

Tom
>It's less efficient (1) and less portable
Yes, portability has matter, but more general SQL would be more efficient at 
developer hours to support such application in compare to writing many 
particular SQL's (one SQL expression is better than two which do same job). 
Personally I would close the eyes on portability in favor of using modern 
features (looking forward for inclusion constraint)

For speed efficiency (1) this particular case of exclusion constraint can be 
implemented via btree-based uniqueness. (like uniqueness is implemented via 
indexes under the hood. but the implementaion details have no matter as for me)


08.08.2018, 16:51, "Tom Lane" :
> Bruce Momjian  writes:
>>  On Wed, Aug 8, 2018 at 01:55:53PM +0300, KES wrote:
>>>  If such exclusion constraint would be marked as UNIQUE we can use it for 
>>> FK while implementing temporal/bi-temporal tables.
>
>>  Yes, it would work, but doing that only for equality would be surprising
>>  to many people because exclusion constraints are more general than
>>  equality comparisons.
>
> In general, we should be discouraging people from using EXCLUDE syntax
> with simple equality operators, not encouraging them to do so. It's
> less efficient and less portable than a regular btree-based uniqueness
> constraint. So I think this proposal is a bad idea regardless of
> whether it'd be technically feasible or not.
>
> regards, tom lane



Re: Improve behavior of concurrent TRUNCATE

2018-08-09 Thread Michael Paquier
On Wed, Aug 08, 2018 at 03:38:57PM +, Bossart, Nathan wrote:
> Here are some comments for the latest version of the patch.

Thanks for the review, Nathan!

> -truncate_check_rel(Relation rel)
> +truncate_check_rel(Oid relid, Form_pg_class reltuple)
> 
> Could we use HeapTupleGetOid(reltuple) instead of passing the OID
> separately?

If that was a HeapTuple.  And it seems to me that we had better make
truncate_check_rel() depend directly on a Form_pg_class instead of
allowing the caller pass anything and deform the tuple within the
routine.

> - if (rel->rd_rel->relkind != RELKIND_RELATION &&
> - rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
> + if (reltuple->relkind != RELKIND_RELATION &&
> +
> + reltuple->relkind != RELKIND_PARTITIONED_TABLE)
> 
> There appears to be an extra empty line here.

Fixed.  I don't know where this has come from.

> +# Test for lock lookup conflicts with TRUNCATE when working on unowned
> +# relations, particularly catalogs should trigger an ERROR for all the
> +# scenarios present here.
> 
> If possible, it might be worth it to check that TRUNCATE still blocks
> when a role has privileges, too.

Good idea.  I have added more tests for that.  Doing a truncate on
pg_authid for installcheck was a very bad idea anyway (even if it failed
all the time), so I have switched to a custom table, with a GRANT
command to control the access with a custom role.

> Since the only behavior this patch changes is the order of locking and
> permission checks, my vote would be to back-patch.  However, since the
> REINDEX piece won't be back-patched, I can see why we might not here,
> either.

This patch is a bit more invasive than the REINDEX one to be honest, and
as this is getting qualified as an improvement, at least on this thread,
I would be inclined to be more restrictive and just patch HEAD, but not
v11.

Attached is an updated patch with all your suggestions added.
--
Michael
From 4402dd2dd8b7b2672e43ebb2407013520898e8bc Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Aug 2018 12:25:55 +0200
Subject: [PATCH] Refactor TRUNCATE execution to avoid too early lock lookups

A caller of TRUNCATE can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary checks at an earlier stage,
avoiding locking issues at early stages, so as a failure happens for
unprivileged users instead of waiting on a lock.
---
 src/backend/commands/tablecmds.c  | 83 
 .../isolation/expected/truncate-conflict.out  | 99 +++
 src/test/isolation/isolation_schedule |  1 +
 .../isolation/specs/truncate-conflict.spec| 38 +++
 4 files changed, 203 insertions(+), 18 deletions(-)
 create mode 100644 src/test/isolation/expected/truncate-conflict.out
 create mode 100644 src/test/isolation/specs/truncate-conflict.spec

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eb2d33dd86..c9f157d764 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -300,7 +300,10 @@ struct DropRelationCallbackState
 #define child_dependency_type(child_is_partition)	\
 	((child_is_partition) ? DEPENDENCY_AUTO : DEPENDENCY_NORMAL)
 
-static void truncate_check_rel(Relation rel);
+static void truncate_check_rel(Oid relid, Form_pg_class reltuple);
+static void truncate_check_activity(Relation rel);
+static void RangeVarCallbackForTruncate(const RangeVar *relation,
+			Oid relId, Oid oldRelId, void *arg);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 bool is_partition, List **supOids, List **supconstr,
 int *supOidCount);
@@ -1336,15 +1339,25 @@ ExecuteTruncate(TruncateStmt *stmt)
 		bool		recurse = rv->inh;
 		Oid			myrelid;
 
-		rel = heap_openrv(rv, AccessExclusiveLock);
-		myrelid = RelationGetRelid(rel);
+		myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock,
+		   0, RangeVarCallbackForTruncate,
+		   NULL);
+
+		/* open the relation, we already hold a lock on it */
+		rel = heap_open(myrelid, NoLock);
+
 		/* don't throw error for "TRUNCATE foo, foo" */
 		if (list_member_oid(relids, myrelid))
 		{
 			heap_close(rel, AccessExclusiveLock);
 			continue;
 		}
-		truncate_check_rel(rel);
+
+		/*
+		 * RangeVarGetRelidExtended has done some basic checks with its
+		 * callback, and only remain session-level checks.
+		 */
+		truncate_check_activity(rel);
 		rels = lappend(rels, rel);
 		relids = lappend_oid(relids, myrelid);
 		/* Log this relation only if needed for logical decoding */
@@ -1367,7 +1380,9 @@ ExecuteTruncate(TruncateStmt *stmt)
 
 /* find_all_inheritors already got lock */
 rel

Do all rows from a set have the same TupleDesc?

2018-08-09 Thread Raúl Marín Rodríguez
Hi,

I have an user defined aggregate (Postgis' St_AsMVT) that receives rows as
input and I was wondering if all these tuples have the same TupleDesc.

In case it's important, here is how it's normally called:
```
SELECT St_AsMVT(q) FROM (
SELECT * FROM tilertest.tract9double
) q;
```

The idea behind this question is that, if this were true, I could cache
some information (number of attributes, oids, some internal stuff related
to each attribute) until the end of the aggregation to avoid having to
retrieve or calculate them for each one of the rows.

All the tests I've made seem to confirm this is true, but I was wondering
if
there is any guarantee.

Regards

-- 

*Raúl Marín Rodríguez *carto.com


Re: TupleTableSlot abstraction

2018-08-09 Thread Ashutosh Bapat
On Wed, Aug 8, 2018 at 5:07 PM, Ashutosh Bapat
 wrote:

Amit Khandekar offlist told me that the previous patch-set doesn't
apply cleanly on the latest head.

PFA patches rebased on 31380bc7c204e7cfa9c9e1c62947909e2b38577c

>
> 3. compile with LLVM and fix any compilation and regression errors.

 When I compiled server with just 0003 applied with LLVM, the
 compilation went well, but there was a server crash. That patch
 changes type of tts_nvalid from int32 to AttrNumber. I tried debugging
 the crash with a debug LLVM build, but couldn't complete the work.
 Attached patch attrnumber_llvm_type.patch is my incomplete attempt to
 fix that crash. I think, we should make it easy to change the data
 types of the members in structures shared by JIT and non-JIT code, may
 be automatically create both copies of the code somehow. I will get
 back to this after addressing other TODOs.

>

still a TODO

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
<>


[FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Danylo Hlynskyi
Hi! Haven't found discussions on possible ways to encrypt indexes. Let's
start!

The problem
==

I'd like to encrypt some columns (first name, last name, SSN, etc...) in a
nondeterministic way. This can be done using `pg_crypto`, but then I loose
full-text-search (and any other indexing) capabilities on these fields.

Blind indexing also isn't a good enough option.

Obviously we don't want create expression-based indexes, that perform
decryption during index build. This will store plaintexts inside index
buffers, and decryption key will be included in database dump.

We don't trust full-disk-encryption or any other transparent encryption,
because of possible SQL injections.

Solution 1 (possibly can be used even now)

- perform full-disk encryption
- perform encryption of column
- add decrypting expression-based index with decryption key
- limit ways on disclosing index internals. Ideally if no one except admin
can do that
- limit ways to read index definitions - so it's not possible for
application to uncover decryption key from database itself, it should know
it on it's own.

Solution 2 (feature request)

- full-disk encryption is optional
- data column is encrypted
- index is decrypted by construction, but each it's block is encrypted,
even in memory.
- lookups over index do lazy index buffer decrypt and close buffers ASAP
- make every query that has to touch encrypted column or encrypted index
require decryption key. This means, SELECT, DELETE, UPDATE, INSERT, VACUUM,
CLUSTER, CREATE INDEX, pg_dump, pg_restore all should have decryption key
supplied in order to be executed. This also means, that autovacuum daemon
can't work.


What do you think about both solutions? Is it hard to implement soluition 2?


Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Andres Freund



On August 9, 2018 5:30:26 PM GMT+05:30, Danylo Hlynskyi 
 wrote:
> ?Is it hard to implement soluition 2?

Yes.

To the point that I'm fairly certain that an implementation would be considered 
to costly to maintain (vs benefit) of proposed.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



remove ancient pre-dlopen dynloader code

2018-08-09 Thread Peter Eisentraut
The non-dlopen dynloader code for several operating systems is in some
cases decades obsolete, and I have had some doubts that it would even
compile anymore.  Attached are patches for each operating system
removing the obsolete code, with references to when it became obsolete.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d9ee06fc45c31dd6e7857022c0775d1b3ccabd4a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 10 Jul 2018 15:48:24 +0200
Subject: [PATCH 1/5] Remove obsolete darwin dynloader code

not needed since macOS 10.3 (2003)
---
 src/backend/port/dynloader/darwin.c | 103 
 1 file changed, 103 deletions(-)

diff --git a/src/backend/port/dynloader/darwin.c 
b/src/backend/port/dynloader/darwin.c
index 93f19878f5..65fa0e39f2 100644
--- a/src/backend/port/dynloader/darwin.c
+++ b/src/backend/port/dynloader/darwin.c
@@ -1,24 +1,15 @@
 /*
  * Dynamic loading support for macOS (Darwin)
  *
- * If dlopen() is available (Darwin 10.3 and later), we just use it.
- * Otherwise we emulate it with the older, now deprecated, NSLinkModule API.
- *
  * src/backend/port/dynloader/darwin.c
  */
 #include "postgres.h"
 
-#ifdef HAVE_DLOPEN
 #include 
-#else
-#include 
-#endif
 
 #include "dynloader.h"
 
 
-#ifdef HAVE_DLOPEN
-
 void *
 pg_dlopen(const char *filename)
 {
@@ -34,7 +25,6 @@ pg_dlclose(void *handle)
 PGFunction
 pg_dlsym(void *handle, const char *funcname)
 {
-   /* Do not prepend an underscore: see dlopen(3) */
return dlsym(handle, funcname);
 }
 
@@ -43,96 +33,3 @@ pg_dlerror(void)
 {
return dlerror();
 }
-#else  /* !HAVE_DLOPEN */
-
-/*
- * These routines were taken from the Apache source, but were made
- * available with a PostgreSQL-compatible license.  Kudos Wilfredo
- * Sánchez .
- */
-
-static NSObjectFileImageReturnCode cofiff_result = NSObjectFileImageFailure;
-
-void *
-pg_dlopen(const char *filename)
-{
-   NSObjectFileImage image;
-
-   cofiff_result = NSCreateObjectFileImageFromFile(filename, &image);
-   if (cofiff_result != NSObjectFileImageSuccess)
-   return NULL;
-   return NSLinkModule(image, filename,
-   NSLINKMODULE_OPTION_BINDNOW |
-   
NSLINKMODULE_OPTION_RETURN_ON_ERROR);
-}
-
-void
-pg_dlclose(void *handle)
-{
-   NSUnLinkModule(handle, NSUNLINKMODULE_OPTION_NONE);
-}
-
-PGFunction
-pg_dlsym(void *handle, const char *funcname)
-{
-   NSSymbol symbol;
-   char   *symname = (char *) malloc(strlen(funcname) + 2);
-
-   if (!symname)
-   return NULL;
-
-   sprintf(symname, "_%s", funcname);
-   if (NSIsSymbolNameDefined(symname))
-   {
-   symbol = NSLookupAndBindSymbol(symname);
-
-   free(symname);
-   return (PGFunction) NSAddressOfSymbol(symbol);
-   }
-   else
-   {
-   free(symname);
-   return NULL;
-   }
-}
-
-char *
-pg_dlerror(void)
-{
-   NSLinkEditErrors c;
-   int errorNumber;
-   const char *fileName;
-   const char *errorString = NULL;
-
-   switch (cofiff_result)
-   {
-   case NSObjectFileImageSuccess:
-   /* must have failed in NSLinkModule */
-   NSLinkEditError(&c, &errorNumber, &fileName, 
&errorString);
-   if (errorString == NULL || *errorString == '\0')
-   errorString = "unknown link-edit failure";
-   break;
-   case NSObjectFileImageFailure:
-   errorString = "failed to open object file";
-   break;
-   case NSObjectFileImageInappropriateFile:
-   errorString = "inappropriate object file";
-   break;
-   case NSObjectFileImageArch:
-   errorString = "object file is for wrong architecture";
-   break;
-   case NSObjectFileImageFormat:
-   errorString = "object file has wrong format";
-   break;
-   case NSObjectFileImageAccess:
-   errorString = "insufficient permissions for object 
file";
-   break;
-   default:
-   errorString = "unknown failure to open object file";
-   break;
-   }
-
-   return (char *) errorString;
-}
-
-#endif /* HAVE_DLOPEN */
-- 
2.18.0

From a0040d8d7b2ace5e1dece9dcf5d4fb4a647d8ed4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 10 Jul 2018 15:50:28 +0200
Subject: [PATCH 2/5] Remove obsolete linux dynloader code

This has been obsolete probably since the late 1990s.
---
 conf

Re: remove ancient pre-dlopen dynloader code

2018-08-09 Thread Andres Freund
Hi,

On 2018-08-09 14:29:08 +0200, Peter Eisentraut wrote:
> The non-dlopen dynloader code for several operating systems is in some
> cases decades obsolete, and I have had some doubts that it would even
> compile anymore.  Attached are patches for each operating system
> removing the obsolete code, with references to when it became obsolete.

Cool, I encountered those files a couple times when grepping for
things. +1 for the removal.

Greetings,

Andres Freund



pgbench exit code

2018-08-09 Thread Peter Eisentraut
In pgbench, when an error occurs during a benchmark run (SQL error,
connection error, etc.) it just prints an error message and then
proceeds to print the run summary normally and exits with status 0.  So
this is quite inconvenient, especially from a script.

The attached patch changes this so that it exits with status 1 and also
prints a line below the summary advising that the results are incomplete.

The test suite had, curiously, encoded the previous behavior, checking
for exit status 0 vs 1 in various error situations.  In this patch, I
have just updated the expected results (which are all "1" now), but
perhaps we should remove listing that individually and just check for
nonzero in all cases somewhere centrally.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e269dccd449efffc83e53d7f1954818b44205935 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 3 Jul 2018 19:51:24 +0200
Subject: [PATCH] pgbench: Report errors during run better

When an error occurs during a benchmark run, exit with a nonzero exit
code and write a message at the end.  Previously, it would just print
the error message when it happened but then proceed to print the run
summary normally and exit with status 0.
---
 src/bin/pgbench/pgbench.c| 10 +++-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 51 ++--
 2 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 41b756c089..19bd6b033f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4830,6 +4830,8 @@ main(int argc, char **argv)
PGresult   *res;
char   *env;
 
+   int exit_code = 0;
+
progname = get_progname(argv[0]);
 
if (argc > 1)
@@ -5570,6 +5572,9 @@ main(int argc, char **argv)
(void) threadRun(thread);
 #endif /* ENABLE_THREAD_SAFETY 
*/
 
+   if (thread->state->state == CSTATE_ABORTED)
+   exit_code = 1;
+
/* aggregate thread level stats */
mergeSimpleStats(&stats.latency, &thread->stats.latency);
mergeSimpleStats(&stats.lag, &thread->stats.lag);
@@ -5594,7 +5599,10 @@ main(int argc, char **argv)
INSTR_TIME_SUBTRACT(total_time, start_time);
printResults(threads, &stats, total_time, conn_total_time, 
latency_late);
 
-   return 0;
+   if (exit_code != 0)
+   fprintf(stderr, "Run was aborted; the above results are 
incomplete.\n");
+
+   return exit_code;
 }
 
 static void *
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl 
b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 2fc021dde7..82e75c5f03 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -534,7 +534,7 @@ sub pgbench
# SQL
[
'sql syntax error',
-   0,
+   1,
[
qr{ERROR:  syntax error},
qr{prepared statement .* does not exist}
@@ -553,11 +553,11 @@ sub pgbench
 
# SHELL
[
-   'shell bad command',0,
+   'shell bad command',1,
[qr{\(shell\) .* meta-command failed}], q{\shell 
no-such-command}
],
[
-   'shell undefined variable', 0,
+   'shell undefined variable', 1,
[qr{undefined variable ":nosuchvariable"}],
q{-- undefined variable in shell
 \shell echo ::foo :nosuchvariable
@@ -597,81 +597,81 @@ sub pgbench
[qr{unexpected function name}], q{\set i noSuchFunction()}
],
[
-   'set invalid variable name', 0,
+   'set invalid variable name', 1,
[qr{invalid variable name}], q{\set . 1}
],
[
-   'set int overflow',   0,
+   'set int overflow',   1,
[qr{double to int overflow for 100}], q{\set i int(1E32)}
],
-   [ 'set division by zero', 0, [qr{division by zero}], q{\set i 1/0} ],
+   [ 'set division by zero', 1, [qr{division by zero}], q{\set i 1/0} ],
[
-   'set bigint out of range', 0,
+   'set bigint out of range', 1,
[qr{bigint out of range}], q{\set i 9223372036854775808 / -1}
],
[
'set undefined variable',
-   0,
+   1,
[qr{undefined variable "nosuchvariable"}],
q{\set i :nosuchvariable}
],
[ 'set unexpected char', 1, [qr{unexpected character .;.}], q{\set i ;} 
],
[
'set too many args',
-   0,
+   1,
[qr{too many functi

Re: commitfest 2018-07

2018-08-09 Thread Andrew Dunstan





Well, here we are at the end of July.


here's the current state of Commitfest 2018-07:


*Status summary: *Needs review 
: 83. Waiting on 
Author : 31. Ready for 
Committer : 17. 
Committed : 54. Moved 
to next CF : 8. 
Rejected : 5. Returned 
with Feedback : 2. 
Total : 200.







My apologies to the community for not wrapping up the commitfest in a 
more timely fashion. I have been somewhat overtaken by events that 
unexpectedly claimed my time. I'm currently working through the open 
items to determine a recommended disposition.



cheers


andrew



--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-08-09 Thread Etsuro Fujita

(2018/08/08 17:30), Kyotaro HORIGUCHI wrote:

Please find the attached.


Thanks for the patch, Horiguchi-san!


At Fri, 3 Aug 2018 11:48:38 +0530, Ashutosh Bapat  
wrote in

The purpose of non-Var node is to avoid adding the attribute to
relation descriptor. Idea is to create a new node, which will act as a
place holder for table oid or row id (whatever) to be fetched from the
foreign server.


I think so too.


I don't understand why do you think we need it to be
added to the relation descriptor.


I don't understand that either.


I choosed to expand tuple descriptor for junk column added to
foreign relaions. We might be better to have new member in
ForeignScan but I didn't so that we can backpatch it.


I've not looked at the patch closely yet, but I'm not sure that it's a 
good idea to expand the tuple descriptor of the target relation on the 
fly so that it contains the remotetableoid as a non-system attribute of 
the target table.  My concern is: is there not any risk in affecting 
some other part of the planner and/or the executor?  (I was a bit 
surprised that the patch passes the regression tests successfully.)


To avoid expanding the tuple descriptor, I'm wondering whether we could 
add a Param representing remotetableoid, not a Var undefined anywhere in 
the system catalogs, as mentioned above?



What the patch does are:

- This abuses ForeignScan->fdw_scan_tlist to store the additional
   junk columns when foreign simple relation scan (that is, not a
   join).


I think that this issue was introduced in 9.3, which added postgres_fdw 
in combination with support for writable foreign tables, but 
fdw_scan_tlist was added to 9.5 as part of join-pushdown infrastructure, 
so my concern is that we might not be able to backpatch your patch to 
9.3 and 9.4.


That's it for now.

Best regards,
Etsuro Fujita



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-09 Thread Andrew Dunstan




On 08/09/2018 01:03 AM, Tom Lane wrote:

Peter Geoghegan  writes:

On Wed, Aug 8, 2018 at 7:40 PM, Tom Lane  wrote:

Oooh ... but pg_class wouldn't be big enough to get a parallel
index rebuild during that test, would it?

Typically not, but I don't think that we can rule it out right away.

Hmmm ... maybe we should temporarily stick in an elog(LOG) showing whether
a parallel build happened or not, so that we can check the buildfarm logs
next time we see that failure?


I don't know all that much about the buildfarm client code, and it's
late.

It doesn't really stick in any undocumented configuration changes,
AFAIK.  Possibly Andrew would have some more insight.




No, everything should be visible in the config. Hidden things are what I 
try to avoid.


The only things the configure()  adds are the prefix and pgport 
settings, and the cache-file. make() only adds a "-j jobs" if so 
configured. make_check() normally just runs "make NO_LOCALE=1 check".




cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-08-09 Thread David Rowley
On 8 August 2018 at 01:29, Andres Freund  wrote:
> On 2018-08-08 01:23:51 +1200, David Rowley wrote:
>> I'm not proposing that sessions running older snapshots can't see that
>> there's a new partition. The code I have uses PartitionIsValid() to
>> test if the partition should be visible to the snapshot. The
>> PartitionDesc will always contain details for all partitions stored in
>> pg_partition whether they're valid to the current snapshot or not.  I
>> did it this way as there's no way to invalidate the relcache based on
>> a point in transaction, only a point in time.
>
> I don't think that solves the problem that an arriving relcache
> invalidation would trigger a rebuild of rd_partdesc, while it actually
> is referenced by running code.
>
> You'd need to build infrastructure to prevent that.
>
> One approach would be to make sure that everything relying on
> rt_partdesc staying the same stores its value in a local variable, and
> then *not* free the old version of rt_partdesc (etc) when the refcount >
> 0, but delay that to the RelationClose() that makes refcount reach
> 0. That'd be the start of a framework for more such concurrenct
> handling.

I'm not so sure not freeing the partdesc until the refcount reaches 0
is safe. As you'd expect, we hold a lock on a partitioned table
between the planner and executor, but the relation has been closed and
the ref count returns to 0, which means when the relation is first
opened in the executor that the updated PartitionDesc is obtained.  A
non-concurrent attach would have been blocked in this case due to the
lock being held by the planner. Instead of using refcount == 0,
perhaps we can release the original partdesc only when there are no
locks held by us on the relation.

It's late here now, so I'll look at that tomorrow.

I've attached what I was playing around with. I think I'll also need
to change RelationGetPartitionDesc() to have it return the original
partdesc, if it's non-NULL.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


dont_destroy_original_partdesc_on_rel_inval.patch
Description: Binary data


Re: remove ancient pre-dlopen dynloader code

2018-08-09 Thread Tom Lane
Andres Freund  writes:
> On 2018-08-09 14:29:08 +0200, Peter Eisentraut wrote:
>> The non-dlopen dynloader code for several operating systems is in some
>> cases decades obsolete, and I have had some doubts that it would even
>> compile anymore.  Attached are patches for each operating system
>> removing the obsolete code, with references to when it became obsolete.

> Cool, I encountered those files a couple times when grepping for
> things. +1 for the removal.

LGTM, too.

regards, tom lane



Re: Do all rows from a set have the same TupleDesc?

2018-08-09 Thread Tom Lane
=?UTF-8?B?UmHDumwgTWFyw61uIFJvZHLDrWd1ZXo=?=  writes:
> I have an user defined aggregate (Postgis' St_AsMVT) that receives rows as
> input and I was wondering if all these tuples have the same TupleDesc.

Yes, they should.  It's possible that the tuples themselves might have
visible inconsistencies (primarily, smaller t_natts in some than the
tuple descriptor says), but as long as you confine yourself to using
the standard tuple-disassembly functions that won't matter.

> The idea behind this question is that, if this were true, I could cache
> some information (number of attributes, oids, some internal stuff related
> to each attribute) until the end of the aggregation to avoid having to
> retrieve or calculate them for each one of the rows.

Yeah, caching such info is common.  It's traditional to code defensively
about the validity of the cache (see e.g. record_out), but I'm not sure
to what extent that's really necessary for aggregates.  In the case of
record_out, paranoia is probably justified because FmgrInfo structs for
I/O functions may get cached and re-used across statements, but I doubt
that can happen for an aggregate.

regards, tom lane



Re: Allow COPY's 'text' format to output a header

2018-08-09 Thread Cynthia Shang

> On Aug 8, 2018, at 2:57 PM, Simon Muller  wrote:
> 
> If there's a merge conflict against master, then it'd be good for an
> updated patch to be posted.
> 
> Thanks!
> 
> Stephen
> 
> Attached is an updated patch that should directly apply against current 
> master.
> 
> --
> Simon Muller
>  
> 

This patch looks good. I realized I should have changed the status back while 
we were discussing all this. It is now (and still is) ready for committer.

Thanks,
-Cynthia

Re: Optimizer misses big in 10.4 with BRIN index

2018-08-09 Thread Emre Hasegeli
> So I basically spent most of the time trying to create a reproducible case
> and I can say I failed. I can however reproduce this with specific large
> data set with specific data distribution, but not an artificial one.

The query plans posted that has the statistics prefer Bitmap Index
Scan.  This is not reproduction of the originally posted case.

> ** Wait... What??? "Rows Removed by Index Recheck: 1643390" but data is
> almost sequential! Let's take a look at it.

I don't think it has anything to do with query planning.  Have you
tried "pages_per_range" option of BRIN?



libpq should not look up all host addresses at once

2018-08-09 Thread Tom Lane
Whilst fooling with the patch for CVE-2018-10915, I got annoyed by
the fact that connectDBStart() does the DNS lookups for all supplied
hostnames at once, and fails if any of them are bad.  It was reasonable
to do the lookup there when we only allowed one hostname, but now that
"host" can be a list, this is really pretty stupid.  The whole point
of allowing multiple hostnames is redundancy and avoiding a single
point of failure; but the way this is written, *each* of your servers'
DNS servers is a single point of failure all by itself.  If any one of
them is down, you don't connect.  Plus, in the normal case where you
successfully connect to something before the very last host in the list,
the extra DNS lookups are wasted --- and DNS lookups aren't that cheap,
since they typically involve a network round trip.

So I think what this code should do is (1) look up each hostname as it
needs it, not all at once, and (2) proceed on to the next hostname
if it gets a DNS lookup failure, not fail the whole connection attempt
immediately.  As attached.

I'm tempted to call this a back-patchable bug fix, because the existing
behavior basically negates the entire value of the multi-hostname feature
once you consider the possibility of DNS server failures.  But given the
lack of field complaints, maybe that's an overreaction.

I'll put this in the September fest for review.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 986f74f..8200b39 100644
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*** static PGconn *makeEmptyPGconn(void);
*** 360,366 
  static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
  static void freePGconn(PGconn *conn);
  static void closePGconn(PGconn *conn);
! static void release_all_addrinfo(PGconn *conn);
  static void sendTerminateConn(PGconn *conn);
  static PQconninfoOption *conninfo_init(PQExpBuffer errorMessage);
  static PQconninfoOption *parse_connection_string(const char *conninfo,
--- 360,366 
  static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
  static void freePGconn(PGconn *conn);
  static void closePGconn(PGconn *conn);
! static void release_conn_addrinfo(PGconn *conn);
  static void sendTerminateConn(PGconn *conn);
  static PQconninfoOption *conninfo_init(PQExpBuffer errorMessage);
  static PQconninfoOption *parse_connection_string(const char *conninfo,
*** setKeepalivesWin32(PGconn *conn)
*** 1742,1751 
  static int
  connectDBStart(PGconn *conn)
  {
- 	char		portstr[MAXPGPATH];
- 	int			ret;
- 	int			i;
- 
  	if (!conn)
  		return 0;
  
--- 1742,1747 
*** connectDBStart(PGconn *conn)
*** 1765,1865 
  	resetPQExpBuffer(&conn->errorMessage);
  
  	/*
- 	 * Look up socket addresses for each possible host using
- 	 * pg_getaddrinfo_all.
- 	 */
- 	for (i = 0; i < conn->nconnhost; ++i)
- 	{
- 		pg_conn_host *ch = &conn->connhost[i];
- 		struct addrinfo hint;
- 		int			thisport;
- 
- 		/* Initialize hint structure */
- 		MemSet(&hint, 0, sizeof(hint));
- 		hint.ai_socktype = SOCK_STREAM;
- 		hint.ai_family = AF_UNSPEC;
- 
- 		/* Figure out the port number we're going to use. */
- 		if (ch->port == NULL || ch->port[0] == '\0')
- 			thisport = DEF_PGPORT;
- 		else
- 		{
- 			thisport = atoi(ch->port);
- 			if (thisport < 1 || thisport > 65535)
- 			{
- appendPQExpBuffer(&conn->errorMessage,
-   libpq_gettext("invalid port number: \"%s\"\n"),
-   ch->port);
- conn->options_valid = false;
- goto connect_errReturn;
- 			}
- 		}
- 		snprintf(portstr, sizeof(portstr), "%d", thisport);
- 
- 		/* Use pg_getaddrinfo_all() to resolve the address */
- 		ret = 1;
- 		switch (ch->type)
- 		{
- 			case CHT_HOST_NAME:
- ret = pg_getaddrinfo_all(ch->host, portstr, &hint, &ch->addrlist);
- if (ret || !ch->addrlist)
- 	appendPQExpBuffer(&conn->errorMessage,
- 	  libpq_gettext("could not translate host name \"%s\" to address: %s\n"),
- 	  ch->host, gai_strerror(ret));
- break;
- 
- 			case CHT_HOST_ADDRESS:
- hint.ai_flags = AI_NUMERICHOST;
- ret = pg_getaddrinfo_all(ch->hostaddr, portstr, &hint, &ch->addrlist);
- if (ret || !ch->addrlist)
- 	appendPQExpBuffer(&conn->errorMessage,
- 	  libpq_gettext("could not parse network address \"%s\": %s\n"),
- 	  ch->hostaddr, gai_strerror(ret));
- break;
- 
- 			case CHT_UNIX_SOCKET:
- #ifdef HAVE_UNIX_SOCKETS
- hint.ai_family = AF_UNIX;
- UNIXSOCK_PATH(portstr, thisport, ch->host);
- if (strlen(portstr) >= UNIXSOCK_PATH_BUFLEN)
- {
- 	appendPQExpBuffer(&conn->errorMessage,
- 	  libpq_gettext("Unix-domain socket path \"%s\" is too long (maximum %d bytes)\n"),
- 	  portstr,
- 	  (int) (UNIXSOCK_PATH_BUFLEN - 1));
- 	conn->options_valid = false;
- 	goto connect_errReturn;
- }
- 
- /*
-  * NULL hostname tells

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-09 Thread Marina Polyakova

On 09-08-2018 12:28, Fabien COELHO wrote:

Hello Marina,


Hello!


v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to 
report client failures that do not cause an aborts and this depends on 
the level of debugging).


Patch applies cleanly, compiles, global & local make check ok.


:-)


This patch improves/homogenizes logging & error reporting in pgbench,
in preparation for another patch which will manage transaction
restarts in some cases.

However ISTM that it is not as necessary as the previous one, i.e. we
could do without it to get the desired feature, so I see it more as a
refactoring done "in passing", and I'm wondering whether it is really
worth it because it adds some new complexity, so I'm not sure of the
net benefit.


We discussed this starting with [1]:


IMO this patch is more controversial than the other ones.

It is not really related to the aim of the patch series, which could
do without, couldn't it?


I'd suggest that it should be an independent submission, unrelated 
to

the pgbench error management patch.


I suppose that this is related; because of my patch there may be a 
lot

of such code (see v7 in [1]):

-   fprintf(stderr,
-   "malformed variable \"%s\" value: 
\"%s\"\n",
-   var->name, var->svalue);
+   if (debug_level >= DEBUG_FAILS)
+   {
+   fprintf(stderr,
+   "malformed variable \"%s\" value: 
\"%s\"\n",
+   var->name, var->svalue);
+   }

-   if (debug)
+   if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, sql);


I'm not sure that debug messages needs to be kept after debug, if it 
is

about debugging pgbench itself. That is debatable.


AFAICS it is not about debugging pgbench itself, but about more 
detailed

information that can be used to understand what exactly happened during
its launch. In the case of errors this helps to distinguish between
failures or errors by type (including which limit for retries was
violated and how far it was exceeded for the serialization/deadlock
errors).

That's why it was suggested to make the error function which hides 
all

these things (see [2]):

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in 
the

main code, wrap with some function like log(level, msg).


Yep. I did not wrote that, but I agree with an "elog" suggestion to 
switch


   if (...) { fprintf(...); exit/abort/continue/... }

to a simpler:

   elog(level, ...)



Anyway, I still have quite a few comments/suggestions on this version.


Thank you very much for them!


* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should
not include "Error"? Maybe "LogLevel"?


On the one hand, this sounds better for me too. On the other hand, will 
not this be in some kind of conflict with error level codes in elog.h?..


/* Error level codes */
#define DEBUG5  10  /* Debugging messages, in 
categories of
 * decreasing 
detail. */
#define DEBUG4  11
...


I'm at odds with the proposed levels. ISTM that pgbench internal
errors which warrant an immediate exit should be dubbed "FATAL",


Ok!


which
would leave the "ERROR" name for... errors, eg SQL errors. I'd suggest
to use an INFO level for the PGBENCH_DEBUG function, and to keep LOG
for main program messages, so that all use case are separate. Or,
maybe the distinction between LOG/INFO is unclear so info is not
necessary.


The messages of the errors in SQL and meta commands are printed only if 
the option --debug-fails is used so I'm not sure that they should have a 
higher error level than main program messages (ERROR vs LOG). About an 
INFO level for the PGBENCH_DEBUG function - ISTM that some main program 
messages such as "dropping old tables...\n" or ..." tuples (%d%%) done 
(elapsed %.2f s, remaining %.2f s)\n" can also use it.. About that all 
use cases were separate - in the current version the level LOG also 
includes messages about abortions of the clients.


I'm unsure about the "log_min_messages" variable name, I'd suggest 
"log_level".


I do not see the asserts on LOG >= log_min_messages as useful, because
the level can only be LOG or DEBUG anyway.


Ok!


This point also suggest that maybe "pgbench_error" is misnamed as well
(ok, I know I suggested it in place of ereport, but e stands for error
there), as it is called on errors, but is also on other things. Maybe
"pgbench_log"? Or just simply "log" or "report", as it is really an
local function, which does not need a prefix? That would 

libpq connection timeout mismanagement

2018-08-09 Thread Tom Lane
The patch that taught libpq about allowing multiple target hosts
modified connectDBComplete() with the intent of making the
connect_timeout (if specified) apply per-host, not to the complete
connection attempt.  It did not do a very good job though, because
the timeout only gets reset when connectDBComplete() itself detects
a timeout.  If PQconnectPoll advances to a new host due to some
other cause, the previous host's timeout continues to run, possibly
causing a premature timeout failure for the new one.

Another thing that I find pretty strange is that it is coded so that,
in event of a timeout detection by connectDBComplete, we give up on the
current connhost entry and advance to the next host, ignoring any
additional addresses we might have for the current hostname.  This seems
at best poorly thought through.  There's no good reason for libpq to
assume that all the addresses returned by DNS point at the same machine,
or share the same network failure points in between.

Attached is an attempt to improve this.  I'll park it in the Sept fest.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 986f74f..429be74 100644
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*** connectDBComplete(PGconn *conn)
*** 1905,1910 
--- 1905,1912 
  	PostgresPollingStatusType flag = PGRES_POLLING_WRITING;
  	time_t		finish_time = ((time_t) -1);
  	int			timeout = 0;
+ 	int			last_whichhost = -2;	/* certainly different from whichhost */
+ 	struct addrinfo *last_addr_cur = NULL;
  
  	if (conn == NULL || conn->status == CONNECTION_BAD)
  		return 0;
*** connectDBComplete(PGconn *conn)
*** 1922,1929 
  			 */
  			if (timeout < 2)
  timeout = 2;
- 			/* calculate the finish time based on start + timeout */
- 			finish_time = time(NULL) + timeout;
  		}
  	}
  
--- 1924,1929 
*** connectDBComplete(PGconn *conn)
*** 1932,1937 
--- 1932,1952 
  		int			ret = 0;
  
  		/*
+ 		 * (Re)start the connect_timeout timer if it's active and we are
+ 		 * considering a different host than we were last time through.  If
+ 		 * we've already succeeded, though, needn't recalculate.
+ 		 */
+ 		if (flag != PGRES_POLLING_OK &&
+ 			timeout > 0 &&
+ 			(conn->whichhost != last_whichhost ||
+ 			 conn->addr_cur != last_addr_cur))
+ 		{
+ 			finish_time = time(NULL) + timeout;
+ 			last_whichhost = conn->whichhost;
+ 			last_addr_cur = conn->addr_cur;
+ 		}
+ 
+ 		/*
  		 * Wait, if necessary.  Note that the initial state (just after
  		 * PQconnectStart) is to wait for the socket to select for writing.
  		 */
*** connectDBComplete(PGconn *conn)
*** 1975,1992 
  		if (ret == 1)			/* connect_timeout elapsed */
  		{
  			/*
! 			 * Attempt connection to the next host, ignoring any remaining
! 			 * addresses for the current host.
  			 */
! 			conn->try_next_addr = false;
! 			conn->try_next_host = true;
  			conn->status = CONNECTION_NEEDED;
- 
- 			/*
- 			 * Restart the connect_timeout timer for the new host.
- 			 */
- 			if (timeout > 0)
- finish_time = time(NULL) + timeout;
  		}
  
  		/*
--- 1990,1999 
  		if (ret == 1)			/* connect_timeout elapsed */
  		{
  			/*
! 			 * Give up on current server/address, try the next one.
  			 */
! 			conn->try_next_addr = true;
  			conn->status = CONNECTION_NEEDED;
  		}
  
  		/*


Re: Improve behavior of concurrent TRUNCATE

2018-08-09 Thread Bossart, Nathan
On 8/9/18, 5:29 AM, "Michael Paquier"  wrote:
>> -truncate_check_rel(Relation rel)
>> +truncate_check_rel(Oid relid, Form_pg_class reltuple)
>> 
>> Could we use HeapTupleGetOid(reltuple) instead of passing the OID
>> separately?
>
> If that was a HeapTuple.  And it seems to me that we had better make
> truncate_check_rel() depend directly on a Form_pg_class instead of
> allowing the caller pass anything and deform the tuple within the
> routine.

Got it.  I agree, it makes sense to force the caller to provide a
Form_pg_class.

>> +# Test for lock lookup conflicts with TRUNCATE when working on unowned
>> +# relations, particularly catalogs should trigger an ERROR for all the
>> +# scenarios present here.
>> 
>> If possible, it might be worth it to check that TRUNCATE still blocks
>> when a role has privileges, too.
>
> Good idea.  I have added more tests for that.  Doing a truncate on
> pg_authid for installcheck was a very bad idea anyway (even if it failed
> all the time), so I have switched to a custom table, with a GRANT
> command to control the access with a custom role.

Good call.  Accidentally truncating pg_authid might have led to some
interesting test results...

>> Since the only behavior this patch changes is the order of locking and
>> permission checks, my vote would be to back-patch.  However, since the
>> REINDEX piece won't be back-patched, I can see why we might not here,
>> either.
>
> This patch is a bit more invasive than the REINDEX one to be honest, and
> as this is getting qualified as an improvement, at least on this thread,
> I would be inclined to be more restrictive and just patch HEAD, but not
> v11.

Alright.

> Attached is an updated patch with all your suggestions added.

Thanks!  This patch builds cleanly, the new tests pass, and my manual
testing hasn't uncovered any issues.  Notably, I cannot reproduce the
originally reported authentication issue by using TRUNCATE after this
change.  Beyond a few small comment wording suggestions below, it
looks good to me.

+   /*
+* RangeVarGetRelidExtended has done some basic checks with its
+* callback, and only remain session-level checks.
+*/

This is definitely a nitpick, but I might rephrase this to "...so only
session-level checks remain" or to "...but we still need to do
session-level checks."

+/*
+ * Set of extra sanity checks to check if a given relation is safe to
+ * truncate.  This is split with truncate_check_rel as
+ * RangeVarCallbackForTruncate can only call the former.
+ */
+static void
+truncate_check_activity(Relation rel)

It might be worth mentioning why RangeVarCallbackForTruncate can only
call truncate_check_rel() (we haven't opened the relation yet).

+# Test for lock lookup conflicts with TRUNCATE when working on unowned
+# relations, particularly catalogs should trigger an ERROR for all the
+# scenarios present here.

Since we are testing a few more things now, perhaps this could just be
"Test for locking conflicts with TRUNCATE commands."

+# TRUNCATE will directly fail

Maybe we could say something more like this:
If the role doesn't have privileges to TRUNCATE the table,
TRUNCATE should immediately fail without waiting for a lock.

+# TRUNCATE will block
+permutation "s1_begin" "s1_tab_lookup" "s2_grant" "s2_auth" "s2_truncate" 
"s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s2_truncate" "s1_tab_lookup" 
"s1_commit" "s2_reset"
+permutation "s1_begin" "s2_grant" "s2_auth" "s1_tab_lookup" "s2_truncate" 
"s1_commit" "s2_reset"
+permutation "s2_grant" "s2_auth" "s2_truncate" "s1_begin" "s1_tab_lookup" 
"s1_commit" "s2_reset"

The second and fourth tests don't seem to actually block.  Perhaps we
could reword the comment to say something like this:
If the role has privileges to TRUNCATE the table, TRUNCATE
will block if another session holds a lock on the table.

Nathan



libpq should append auth failures, not overwrite

2018-08-09 Thread Tom Lane
I noticed that, although most error reports during libpq's connection
setup code append to conn->errorMessage, the ones in fe-auth.c and 
fe-auth-scram.c don't: they're all printfPQExpBuffer() not
appendPQExpBuffer().  This seems wrong to me.  It makes no difference
in simple cases with a single target server, but as soon as you have
multiple servers listed in "host", this coding makes it impossible
to tell which server rejected your login.

So I think we should basically s/printfPQExpBuffer/appendPQExpBuffer/g
anywhere those files touch conn->errorMessage, allowing any problems
with previous servers to be preserved in the eventually-reported message.
I won't bother posting an actual patch for that right now, but has anyone
got an objection?

regards, tom lane



Re: Do all rows from a set have the same TupleDesc?

2018-08-09 Thread Raúl Marín Rodríguez
Hi Tom,

> Yeah, caching such info is common.  It's traditional to code defensively
> about the validity of the cache (see e.g. record_out), but I'm not sure
> to what extent that's really necessary for aggregates. In the case of
> record_out, paranoia is probably justified because FmgrInfo structs for
> I/O functions may get cached and re-used across statements, but I doubt
> that can happen for an aggregate.

The implementation of record_out is exactly what I was planning plus some
extra defenses. I'd consider the record type changing mid aggregation a
race condition, so I'll drop those checks.

Thanks a lot for the information.

Regards,


-- 

*Raúl Marín Rodríguez *carto.com


Re: POC for a function trust mechanism

2018-08-09 Thread Bruce Momjian
On Wed, Aug  8, 2018 at 01:15:38PM -0400, Tom Lane wrote:
> This is sort of a counter-proposal to Noah's discussion of search path
> security checking in <20180805080441.gh1688...@rfd.leadboat.com>.
> (There's no technical reason we couldn't do both things, but I think
> this'd be more useful to most people.)

Yes, the query from Noah below confirmed that schema qualification just
isn't a realistic approach:

CREATE FUNCTION latitude(earth)
RETURNS float8
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
AS 'SELECT CASE
WHEN @cube_schema(at)(dot)cube_ll_coord($1, 3) OPERATOR(pg_catalog./)
@extschema(at)(dot)earth() OPERATOR(pg_catalog.<) -1 THEN 
-90::pg_catalog.float8
WHEN @cube_schema(at)(dot)cube_ll_coord($1, 3) OPERATOR(pg_catalog./)
@extschema(at)(dot)earth() OPERATOR(pg_catalog.>) 1 THEN 
90::pg_catalog.float8
ELSE 
pg_catalog.degrees(pg_catalog.asin(@cube_schema(at)(dot)cube_ll_coord($1, 3)
OPERATOR(pg_catalog./) @extschema(at)(dot)earth()))
END';

Of course, with the limitations of backpatching and security-only
discussion, that was the best we could do in the past.

> The core idea here is to prevent security problems not by changing an
> application's rules for operator/function name resolution, but by
> detecting an attempted compromise and preventing the trojan-horse code
> from being executed.  Essentially, a user or application is to declare
> a list of roles that it trusts functions owned by, and the system will
> then refuse to execute functions owned by other not-trusted roles.
> So, if $badguy creates a trojan-horse operator and manages to capture
> a call from your SQL code, he'll nonetheless not be able to execute
> code as you.

Yes, this is the only reasonable approach I can think of.

> To reduce the overhead of the mechanism and chance of unintentionally
> breaking things, superuser-owned functions (particularly, all built-in
> functions) are always trusted by everybody.  A superuser who wants to
> become you can do so trivially, with no need for a trojan horse, so
> this restriction isn't creating any new security hole.

Agreed.

> The things that we hadn't resolved, which is why this didn't get further
> than POC stage, were
> 
> (1) What's the mechanism for declaring trust?  In this POC, it's just
> a GUC that you can set to a list of role names, with $user for yourself
> and "public" if you want to trust everybody.  It's not clear if that's
> good enough, or if we want something a bit more locked-down.

Yes, works for me.

> (2) Is trust transitive?  Where and how would the list of trusted roles
> change?  Arguably, if you call a SECURITY DEFINER function, then once
> you've decided that you trust the function owner, actual execution of the
> function should use the function owner's list of trusted roles not yours.
> With the GUC approach, it'd be necessary for SECURITY DEFINER functions
> to implement this with a "SET trusted_roles" clause, much as they now
> have to do with search_path.  That's possible but it's again not very
> non-invasive, so we'd been speculating about automating this more.
> If we had, say, a catalog that provided the desired list of trusted roles
> for every role, then we could imagine implementing that context change
> automatically.  Likewise, stuff like autovacuum or REINDEX would want
> to run with the table owner's list of trusted roles, but the GUC approach
> doesn't really provide enough infrastructure to know what to do there.

I can't think of any other places we do transitive permissions, except
for role membership.  I don't see the logic in adding such transitivity
to function/operator calls, or even a per-function GUC.  I assume most
sites have a small number of extensions installed by a predefined group
of users, usually superusers. If there is a larger group, a group role
should be created and those people put in the role, and the group role
trusted.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-08-09 Thread Bruce Momjian
On Wed, Aug  8, 2018 at 10:27:59AM -0700, Peter Geoghegan wrote:
> On Wed, Aug 8, 2018 at 10:23 AM, Kyle Samson  wrote:
> > We found the exact same issue on a production database at TripAdvisor. We 
> > have no new information to add for debugging. Bumping this to up the 
> > priority on a patch version release getting out.
> 
> There is a batch of point releases that were just stamped + tagged.
> They should be released shortly.

They have been released:

https://www.postgresql.org/about/news/1878/

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: libpq should append auth failures, not overwrite

2018-08-09 Thread Michael Paquier
On Thu, Aug 09, 2018 at 11:44:27AM -0400, Tom Lane wrote:
> I noticed that, although most error reports during libpq's connection
> setup code append to conn->errorMessage, the ones in fe-auth.c and 
> fe-auth-scram.c don't: they're all printfPQExpBuffer() not
> appendPQExpBuffer().  This seems wrong to me.  It makes no difference
> in simple cases with a single target server, but as soon as you have
> multiple servers listed in "host", this coding makes it impossible
> to tell which server rejected your login.

+1.
--
Michael


signature.asc
Description: PGP signature


Commitfest 2018-07 RFC items

2018-08-09 Thread Andrew Dunstan



As will be seen in the summary below, I think a good many of the RCF 
items in the commitfest probably aren't. More than half of them need 
more review, as I read the mailing list threads.


I propose to move them all to the next CF, but in many cases after 
changing the to "needs review" or in one case "waiting on author".


One big wart remains in this list - the Fix LWLock degradation on NUMA 
patch. Some committer who is comfortable in this area needs to pick it 
up, or of we don't have such a person we should reject it instead of 
making it go round and round like the Flying Dutchman.



I'll deal with the non-RFC items in separate email


cheers


andrew



https://commitfest.postgresql.org/18/1525/ postgres.exe doesn't output 
crash dump when it crashes before main()

Still under discussion
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/669/ pgbench - allow to store 
select results into variables

Latest patch has not been reviewed
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/1661/ Update psql tab completion 
for multiple commits

Latest patch has been reviewed very recently
Recommendation: leave as RFC and move

https://commitfest.postgresql.org/18/1085/ XML XPath default namespace 
support
No activity since January. I'm going to try to pick this one up in the 
next few weeeks myself.

Recommendation: leave as RFC and move

https://commitfest.postgresql.org/18/1694/ Partition tree inspection 
functions

Latest patch needs review
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/1166/ Fix LWLock degradation on NUMA
No real activity since November. Desperately needs a committer who is 
prepared to take it on, or if not we should reject it.

Recommendation: leave as RFC and move

https://commitfest.postgresql.org/18/1389/ verify ALTER TABLE SET NOT 
NULL by valid constraints

It's not clear that there is consensus that this is wanted.
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/1319/ Add ALWAYS DEFERRED option 
for CONSTRAINTs and CONSTRAINT TRIGGERs

Latest patch needs further review
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/1522/ pg_hba.conf : new auth option 
: clientcert=verify-full

Still some open discussion items
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/1647/ GSSAPI encryption support
No activity since June. Needs a committer
Recommendation: leave as RFC and move

https://commitfest.postgresql.org/18/713/ Correct space parsing in 
to_timestamp()

Discussion ongoing
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/1160/ Improve geometric types
Some parts committed
Recommendation: leave as RFC and move

https://commitfest.postgresql.org/18/1294/ Custom compression methods
Needs rebase
Recommendation: change to "waiting on author" and move

https://commitfest.postgresql.org/18/1587/ kNN for SP-GiST
Recent review
Recommendation: leave as RFC and move

https://commitfest.postgresql.org/18/1629/ Allow COPY's 'text' format to 
output a header

Recent Review
Recommendation: leave as RFC and move

https://commitfest.postgresql.org/18/1622/ A way to request the rotation 
of server logs

Latest patch needs review
Recommendation: change to "needs review" and move

https://commitfest.postgresql.org/18/1707/ pg_stat_statements_reset() 
update to reset specific statement statistics

Latest patch needs review
Recommendation: change to "needs review" and move


--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Improve behavior of concurrent TRUNCATE

2018-08-09 Thread Michael Paquier
On Thu, Aug 09, 2018 at 03:27:04PM +, Bossart, Nathan wrote:
> Thanks!  This patch builds cleanly, the new tests pass, and my manual
> testing hasn't uncovered any issues.  Notably, I cannot reproduce the
> originally reported authentication issue by using TRUNCATE after this
> change.  Beyond a few small comment wording suggestions below, it
> looks good to me.

Thanks, I have updated the patch as you suggested.  Any more
improvements to it that you can foresee?

> The second and fourth tests don't seem to actually block.

Yeah, that's intentional.
--
Michael
From c19f8a362d4ddd72a7d5a5b0e12a1a45693a0aca Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Aug 2018 18:27:49 +0200
Subject: [PATCH] Refactor TRUNCATE execution to avoid too early lock lookups

A caller of TRUNCATE can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary checks at an earlier stage,
avoiding locking issues at early stages, so as a failure happens for
unprivileged users instead of waiting on a lock.
---
 src/backend/commands/tablecmds.c  | 84 
 .../isolation/expected/truncate-conflict.out  | 99 +++
 src/test/isolation/isolation_schedule |  1 +
 .../isolation/specs/truncate-conflict.spec| 38 +++
 4 files changed, 204 insertions(+), 18 deletions(-)
 create mode 100644 src/test/isolation/expected/truncate-conflict.out
 create mode 100644 src/test/isolation/specs/truncate-conflict.spec

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eb2d33dd86..c327038a09 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -300,7 +300,10 @@ struct DropRelationCallbackState
 #define child_dependency_type(child_is_partition)	\
 	((child_is_partition) ? DEPENDENCY_AUTO : DEPENDENCY_NORMAL)
 
-static void truncate_check_rel(Relation rel);
+static void truncate_check_rel(Oid relid, Form_pg_class reltuple);
+static void truncate_check_activity(Relation rel);
+static void RangeVarCallbackForTruncate(const RangeVar *relation,
+			Oid relId, Oid oldRelId, void *arg);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 bool is_partition, List **supOids, List **supconstr,
 int *supOidCount);
@@ -1336,15 +1339,26 @@ ExecuteTruncate(TruncateStmt *stmt)
 		bool		recurse = rv->inh;
 		Oid			myrelid;
 
-		rel = heap_openrv(rv, AccessExclusiveLock);
-		myrelid = RelationGetRelid(rel);
+		myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock,
+		   0, RangeVarCallbackForTruncate,
+		   NULL);
+
+		/* open the relation, we already hold a lock on it */
+		rel = heap_open(myrelid, NoLock);
+
 		/* don't throw error for "TRUNCATE foo, foo" */
 		if (list_member_oid(relids, myrelid))
 		{
 			heap_close(rel, AccessExclusiveLock);
 			continue;
 		}
-		truncate_check_rel(rel);
+
+		/*
+		 * RangeVarGetRelidExtended has done some basic checks with its
+		 * callback, but session-level checks remain.
+		 */
+		truncate_check_activity(rel);
+
 		rels = lappend(rels, rel);
 		relids = lappend_oid(relids, myrelid);
 		/* Log this relation only if needed for logical decoding */
@@ -1367,7 +1381,9 @@ ExecuteTruncate(TruncateStmt *stmt)
 
 /* find_all_inheritors already got lock */
 rel = heap_open(childrelid, NoLock);
-truncate_check_rel(rel);
+truncate_check_rel(RelationGetRelid(rel), rel->rd_rel);
+truncate_check_activity(rel);
+
 rels = lappend(rels, rel);
 relids = lappend_oid(relids, childrelid);
 /* Log this relation only if needed for logical decoding */
@@ -1450,7 +1466,8 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 ereport(NOTICE,
 		(errmsg("truncate cascades to table \"%s\"",
 RelationGetRelationName(rel;
-truncate_check_rel(rel);
+truncate_check_rel(relid, rel->rd_rel);
+truncate_check_activity(rel);
 rels = lappend(rels, rel);
 relids = lappend_oid(relids, relid);
 /* Log this relation only if needed for logical decoding */
@@ -1700,38 +1717,47 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 }
 
 /*
- * Check that a given rel is safe to truncate.  Subroutine for ExecuteTruncate
+ * Check that a given relation is safe to truncate.  Subroutine for
+ * ExecuteTruncate and RangeVarCallbackForTruncate.
  */
 static void
-truncate_check_rel(Relation rel)
+truncate_check_rel(Oid relid, Form_pg_class reltuple)
 {
 	AclResult	aclresult;
+	char	   *relname = NameStr(reltuple->relname);
 
 	/*
 	 * Only allow truncate on regular tables and partitioned tables (although,
 	 * the latter are only being included here for the 

Re: libpq should not look up all host addresses at once

2018-08-09 Thread Alvaro Herrera
On 2018-Aug-09, Tom Lane wrote:

> Whilst fooling with the patch for CVE-2018-10915, I got annoyed by
> the fact that connectDBStart() does the DNS lookups for all supplied
> hostnames at once, and fails if any of them are bad.  It was reasonable
> to do the lookup there when we only allowed one hostname, but now that
> "host" can be a list, this is really pretty stupid.  The whole point
> of allowing multiple hostnames is redundancy and avoiding a single
> point of failure; but the way this is written, *each* of your servers'
> DNS servers is a single point of failure all by itself.  If any one of
> them is down, you don't connect.  Plus, in the normal case where you
> successfully connect to something before the very last host in the list,
> the extra DNS lookups are wasted --- and DNS lookups aren't that cheap,
> since they typically involve a network round trip.

I'm not very familiar with the libpq code structure, but I think
connectDBStart() is not used for synchronous connections, only
asynchronous, is that correct?  If that's the case, then perhaps the
reason this hasn't been more widely reported is simple that those two
features (async conns and multihost conninfo strings) are just not very
commonly used ... and even less so with failing DNS setups.

> I'm tempted to call this a back-patchable bug fix, because the existing
> behavior basically negates the entire value of the multi-hostname feature
> once you consider the possibility of DNS server failures.  But given the
> lack of field complaints, maybe that's an overreaction.

Well, since it's broken, then I don't think it serves anybody very well.
I vote to backpatch it to 10.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-09 Thread Peter Geoghegan
On Wed, Aug 8, 2018 at 10:08 PM, Peter Geoghegan  wrote:
>> Hmmm ... maybe we should temporarily stick in an elog(LOG) showing whether
>> a parallel build happened or not, so that we can check the buildfarm logs
>> next time we see that failure?
>
> I can do that tomorrow. Of course, it might be easier to just push the
> pending fix for the mapped relation bug, and see if that makes the
> issue go away. That would take longer, but it's also the only thing
> that's likely to be definitive anyway.

I'm going to push my fix for the relmapper.c parallel CREATE INDEX bug
tomorrow, and see what happens. I have a hard time imagining how there
could be a parallel index build on pg_class, now that Andrew has
indicated there is no reason to think that
"min_parallel_table_scan_size = 0" could slip in.

-- 
Peter Geoghegan



Re: PATCH: pgbench - option to build using ppoll() for larger connection counts

2018-08-09 Thread Andrew Dunstan



On 07/03/2018 07:52 PM, Andrew Dunstan wrote:



On 05/17/2018 01:23 AM, Thomas Munro wrote:

On Tue, Mar 27, 2018 at 9:23 AM, Rady, Doug  wrote:

pgbench11-ppoll-v12.patch

Hi Doug,

FYI this patch is trying and failing to use ppoll() on Windows:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.30




It's still failing -  see 



I'm setting this back to "Waiting on Author" until that's fixed.




The author hasn't replied, but the attached seems to have cured the 
bitrot so that it at least applies. Let's see what the cfbot makes of it 
and then possibly fix any Windows issues.



cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/configure b/configure
index 2665213..4579003 100755
--- a/configure
+++ b/configure
@@ -14916,7 +14916,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat ppoll pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 397f6bc..bd08068 100644
--- a/configure.in
+++ b/configure.in
@@ -1540,7 +1540,7 @@ PGAC_FUNC_WCSTOMBS_L
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
+AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat ppoll pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
 
 AC_REPLACE_FUNCS(fseeko)
 case $host_os in
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 41b756c..e23ba1a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -45,9 +45,18 @@
 #include 
 #include 
 #include 
+#ifndef PGBENCH_USE_SELECT			/* force use of select(2)? */
+#ifdef HAVE_PPOLL
+#define POLL_USING_PPOLL
+#include 
+#endif
+#endif
+#ifndef POLL_USING_PPOLL
 #ifdef HAVE_SYS_SELECT_H
+#define POLL_USING_SELECT
 #include 
 #endif
+#endif
 
 #ifdef HAVE_SYS_RESOURCE_H
 #include 		/* for getrlimit */
@@ -92,13 +101,19 @@ static int	pthread_join(pthread_t th, void **thread_return);
 
 /
  * some configurable parameters */
-
-/* max number of clients allowed */
+#ifdef POLL_USING_SELECT	/* using select(2) */
+#define SOCKET_WAIT_METHOD "select"
+typedef fd_set socket_set;
 #ifdef FD_SETSIZE
-#define MAXCLIENTS	(FD_SETSIZE - 10)
+#define MAXCLIENTS	(FD_SETSIZE - 10) /* system limited max number of clients allowed */
 #else
-#define MAXCLIENTS	1024
+#define MAXCLIENTS	1024		/* max number of clients allowed */
 #endif
+#else	/* using ppoll(2) */
+#define SOCKET_WAIT_METHOD "ppoll"
+typedef struct pollfd socket_set;
+#define MAXCLIENTS	-1		/* unlimited number of clients */
+#endif /* POLL_USING_SELECT */
 
 #define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
 
@@ -525,6 +540,13 @@ static void addScript(ParsedScript script);
 static void *threadRun(void *arg);
 static void setalarm(int seconds);
 static void finishCon(CState *st);
+static socket_set *alloc_socket_set(int count);
+static bool error_on_socket(socket_set *sa, int idx, PGconn *con);
+static void free_socket_set(socket_set *sa);
+static bool ignore_socket(socket_set *sa, int idx, PGconn *con);
+static void clear_socket_set(socket_set *sa, int count);
+static void set_socket(socket_set *sa, int fd, int idx);
+static int wait_on_socket_set(socket_set *sa, int nstate, int maxsock, int64 usec);
 
 
 /* callback functions for our flex lexer */
@@ -1143,6 +1165,7 @@ doConnect(void)
 			!have_password)
 		{
 			PQfinish(conn);
+			conn = NULL;
 			simple_prompt("Password: ", password, sizeof(password), false);
 			have_password = true;
 			new_pass = true;
@@ -4903,7 +4926,7 @@ main(int argc, char **argv)
 			case 'c':
 benchmarking_option_set = true;
 nclients = atoi(optarg);
-if (nclients <= 0 || nclients > MAXCLIENTS)
+if (nclients <= 0 || (MAXCLIENTS != -1 && nclients > MA

Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-09 Thread Michael Paquier
On Thu, Aug 09, 2018 at 02:29:54AM -0700, Andres Freund wrote:
> On 2018-08-09 09:00:29 +0200, Michael Paquier wrote:
>> Would you be fine if I add an extra note like what's in
>> BackendIdGetProc?  Say "The result may be out of date quickly, so the
>> caller must be careful how to handle this information."
> 
> That's a caveat, not a documentation of the memory ordering /
> concurrency.

I think that I am going to add this note anyway in the new routine.

> You somewhere need a comment to the effect of "We are
> guaranteed to see a recent enough value of ->tempNamespace because
> anybody creating a temporary table acquires a lock - serving as a memory
> barrier - during the creation of said table, after assigning the
> tempNamespace variable. At the same time, before considering dropping a
> temporary table as orphaned, we acquire a lock and recheck tempNamespace
> afterwards".  Note that I'm not explaining the concrete model you have
> here, but the way you could explain a theoretical one.

Okay, here is an idea...  When assigning the value I have now that:
+   /*
+* Mark MyProc as owning this namespace which other processes can use to
+* decide if a temporary namespace is in use or not.  We assume that
+* assignment of namespaceId is an atomic operation.  Even if it is not,
+* there is no visible temporary relations associated to it and the
+* temporary namespace creation is not committed yet, so none of its
+* contents should be seen yet if scanning pg_class or pg_namespace.
+*/
I actually have tried to mention what you are willing to see in the
comments with the last sentence.  So that is awkward :)

I would propose to reword the last sentence of the patch as follows
then:
"Even if it is not atomic, the temporary relation which resulted in the
creation of this temporary namespace is still locked until the current
transaction commits, so it would not be accessible yet."

When resetting the value on abort I have that:
+   /*
+* Reset the temporary namespace flag in MyProc. The creation of
+* the temporary namespace has failed for some reason and should
+* not be seen by other processes as it has not been committed
+* yet, hence this would be fine even if not atomic, still we
+* assume that it is an atomic assignment.
+*/

Hence I would propose the following wording for this part:
"Reset the temporary namespace flag in MyProc.  We assume that this
operation is atomic, however it would be fine even if not atomic as the
temporary table which created this namespace is still locked until this
transaction aborts so it would not be visible yet."

Better ideas are of course welcome.
--
Michael


signature.asc
Description: PGP signature


Re: libpq should not look up all host addresses at once

2018-08-09 Thread Tom Lane
Alvaro Herrera  writes:
> On 2018-Aug-09, Tom Lane wrote:
>> Whilst fooling with the patch for CVE-2018-10915, I got annoyed by
>> the fact that connectDBStart() does the DNS lookups for all supplied
>> hostnames at once, and fails if any of them are bad.  It was reasonable
>> to do the lookup there when we only allowed one hostname, but now that
>> "host" can be a list, this is really pretty stupid.

> I'm not very familiar with the libpq code structure, but I think
> connectDBStart() is not used for synchronous connections, only
> asynchronous, is that correct?

No, it's used for all connections.  It's trivial to demonstrate the
problem in, eg, psql:

$ psql "host=localhost"
[ connects OK ]
$ psql "host=localhost,bogus"
psql: could not translate host name "bogus" to address: Name or service not 
known

>> I'm tempted to call this a back-patchable bug fix, because the existing
>> behavior basically negates the entire value of the multi-hostname feature
>> once you consider the possibility of DNS server failures.  But given the
>> lack of field complaints, maybe that's an overreaction.

> Well, since it's broken, then I don't think it serves anybody very well.
> I vote to backpatch it to 10.

The only reason not to is that it involves a change in the contents of
struct PGconn.  But I suppose we could put the added fields at the end
in v10, if we're worried about clients looking into that struct (not
that they should be, but ...)

regards, tom lane



logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-09 Thread Tomas Vondra

Hi,

While investigating an issue with rewrite maps in logical decoding, I 
found it's pretty darn trivial to hit this:


ERROR:  53000: exceeded maxAllocatedDescs (492) while trying to open
file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"

This happens because we seem to open the mapping files and keep them 
open until the next checkpoint, at which point we fsync + close.


I suppose there are reasons why it's done this way, and admittedly the 
test that happens to trigger this is a bit extreme (essentially running 
pgbench concurrently with 'vacuum full pg_class' in a loop). I'm not 
sure it's extreme enough to deem it not an issue, because people using 
many temporary tables often deal with bloat by doing frequent vacuum 
full on catalogs.


Furthermore, the situation is made worse by opening the same file 
repeatedly, under different file descriptor.


I've added elog(LOG) to OpenTransientFile and CloseTransientFile - see 
the attached file.log, which is filtered to a single process. The 'cnt' 
field is showing numAllocatedDescs. There are 500 calls to 
OpenTransientFile, but only 8 calls to CloseTransientFile (all of them 
for pg_filenode.map).


But each of the mapping files is opened 9x, so we run out of file 
descriptors 10x faster. Moreover, I'm pretty sure simply increasing the 
file descriptor limit is not a solution - we'll simply end up opening 
the files over and over until hitting it again.


But wait, there's more - what happens after hitting the limit? We 
restart the decoding process, and end up getting this:


ERROR:  53000: exceeded maxAllocatedDescs (492) while trying to open
file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"
LOCATION:  OpenTransientFile, fd.c:2161
LOG:  0: starting logical decoding for slot "s"
DETAIL:  streaming transactions committing after 1/6097DD48, reading
WAL from 1/60275848
LOCATION:  CreateDecodingContext, logical.c:414
LOG:  0: logical decoding found consistent point at 1/60275848
DETAIL:  Logical decoding will begin using saved snapshot.
LOCATION:  SnapBuildRestore, snapbuild.c:1872
ERROR:  XX000: subtransaction logged without previous top-level txn
record
LOCATION:  ReorderBufferAssignChild, reorderbuffer.c:790
LOG:  0: starting logical decoding for slot "s"
DETAIL:  streaming transactions committing after 1/60988088, reading
WAL from 1/60275848
LOCATION:  CreateDecodingContext, logical.c:414
LOG:  0: logical decoding found consistent point at 1/60275848
DETAIL:  Logical decoding will begin using saved snapshot.
LOCATION:  SnapBuildRestore, snapbuild.c:1872
ERROR:  XX000: subtransaction logged without previous top-level txn
record
LOCATION:  ReorderBufferAssignChild, reorderbuffer.c:790

I'd say this implies the "can't open file" is handled in a way that 
corrupts the mapping files, making it impossible to restart the 
decoding. AFAICS the only option at this point is to drop the 
subscription and start over.


I think the right solution here is (a) making sure we don't needlessly 
open the same mapping file over and over, (b) closing the files sooner, 
instead of waiting for the next checkpoint. I guess a small local cache 
of file descriptors would do both things.


Not sure about the error handling. Even if we get rid of the file 
descriptor limit issue, I guess there are other ways why this we can 
fail here, so we probably need to improve that too.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2018-08-09 18:39:14.424 CEST [1681] LOG:  0: OpenTransientFile fd=16 cnt=0 fname=pg_logical/mappings/map-4000-4eb-1_556CE4F8-50b479-50b479
2018-08-09 18:39:14.424 CEST [1681] LOG:  0: OpenTransientFile fd=17 cnt=1 fname=pg_logical/mappings/map-4000-4eb-1_557CA3B8-50b479-50b47a
2018-08-09 18:39:14.424 CEST [1681] LOG:  0: OpenTransientFile fd=18 cnt=2 fname=pg_logical/mappings/map-4000-4eb-1_5582C658-50b479-50b592
2018-08-09 18:39:14.424 CEST [1681] LOG:  0: OpenTransientFile fd=19 cnt=3 fname=pg_logical/mappings/map-4000-4eb-1_55885DF0-50b479-50b60a
2018-08-09 18:39:14.424 CEST [1681] LOG:  0: OpenTransientFile fd=20 cnt=4 fname=pg_logical/mappings/map-4000-4eb-1_55961398-50b479-50b60c
2018-08-09 18:39:14.424 CEST [1681] LOG:  0: OpenTransientFile fd=21 cnt=5 fname=pg_logical/mappings/map-4000-4eb-1_55A7EA60-50b479-50b60d
2018-08-09 18:39:14.425 CEST [1681] LOG:  0: OpenTransientFile fd=22 cnt=6 fname=pg_logical/mappings/map-4000-4eb-1_55AB63B0-50b479-50b60e
2018-08-09 18:39:14.425 CEST [1681] LOG:  0: OpenTransientFile fd=23 cnt=7 fname=pg_logical/mappings/map-4000-4eb-1_55AED680-50b479-50b60f
2018-08-09 18:39:14.425 CEST [1681] LOG:  0: OpenTransientFile fd=24 cnt=8 fname=pg_logical/mappings/map-4000-4eb-1_55BD9030-50b479-50b610
2018-08-09 18:39:14.425 CEST [1681] LOG:  0: OpenTransi

Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-09 Thread Tom Lane
Tomas Vondra  writes:
> While investigating an issue with rewrite maps in logical decoding, I 
> found it's pretty darn trivial to hit this:
>  ERROR:  53000: exceeded maxAllocatedDescs (492) while trying to open
>  file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"

> This happens because we seem to open the mapping files and keep them 
> open until the next checkpoint, at which point we fsync + close.

It sounds like whoever wrote that code was completely impervious to the
API spec for AllocateFile():

 * Note that files that will be open for any significant length of time
 * should NOT be handled this way, since they cannot share kernel file
 * descriptors with other files; there is grave risk of running out of FDs
 * if anyone locks down too many FDs.

Perhaps these files should be accessed through fd.c's VFD infrastructure
rather than ignoring it.  Reinventing it is especially not the way to go.

> Furthermore, the situation is made worse by opening the same file 
> repeatedly, under different file descriptor.

That's just stupid :-(

Note that the limit on maxAllocatedDescs is based on the kernel limit
on how many files a process can have open, so assuming you can increase
it is not going to end well.

regards, tom lane



Re: Some pgq table rewrite incompatibility with logical decoding?

2018-08-09 Thread Tomas Vondra

On 06/25/2018 07:48 PM, Jeremy Finzel wrote:



On Mon, Jun 25, 2018 at 12:41 PM, Andres Freund > wrote:


Hi,

On 2018-06-25 10:37:18 -0500, Jeremy Finzel wrote:
> I am hoping someone here can shed some light on this issue - I apologize 
if
> this isn't the right place to ask this but I'm almost some of you all were
> involving in pgq's dev and might be able to answer this.
> 
> We are actually running 2 replication technologies on a few of our dbs,

> skytools and pglogical.  Although we are moving towards only using logical
> decoding-based replication, right now we have both for different purposes.
> 
> There seems to be a table rewrite happening on table pgq.event_58_1 that

> has happened twice, and it ends up in the decoding stream, resulting in 
the
> following error:
> 
> ERROR,XX000,"could not map filenode ""base/16418/1173394526"" to relation

> OID"
> 
> In retracing what happened, we discovered that this relfilenode was

> rewritten.  But somehow, it is ending up in the logical decoding stream as
> is "undecodable".  This is pretty disastrous because the only way to fix 
it
> really is to advance the replication slot and lose data.
> 
> The only obvious table rewrite I can find in the pgq codebase is a truncate

> in pgq.maint_rotate_tables.sql.  But there isn't anything surprising
> there.  If anyone has any ideas as to what might cause this so that we
> could somehow mitigate the possibility of this happening again until we
> move off pgq, that would be much appreciated.

I suspect the issue might be that pgq does some updates to catalog
tables. Is that indeed the case?


I also suspected this.  The only case I found of this is that it is 
doing deletes and inserts to pg_autovacuum.  I could not find anything 
quickly otherwise but I'm not sure if I'm missing something in some of 
the C code.




I don't think that's true, for two reasons.

Firstly, I don't think pgq updates catalogs directly, it simply 
truncates the queue tables when rotating them (which updates the 
relfilenode in pg_class, of course).


Secondly, we're occasionally seeing this on systems that do not use pgq, 
but that do VACUUM FULL on custom "queue" tables. The symptoms are 
exactly the same ("ERROR: could not map filenode"). It's pretty damn 
rare and we don't have direct access to the systems, so investigation is 
difficult :-( Our current hypothesis is that it's somewhat related to 
subtransactions (because of triggers with exception blocks).


Jeremy, are you able to reproduce the issue locally, using pgq? That 
would be very valuable.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-09 Thread Alvaro Herrera
On 2018-Aug-09, Tomas Vondra wrote:

> I suppose there are reasons why it's done this way, and admittedly the test
> that happens to trigger this is a bit extreme (essentially running pgbench
> concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's
> extreme enough to deem it not an issue, because people using many temporary
> tables often deal with bloat by doing frequent vacuum full on catalogs.

Actually, it seems to me that ApplyLogicalMappingFile is just leaking
the file descriptor for no good reason.  There's a different
OpenTransientFile call in ReorderBufferRestoreChanges that is not
intended to be closed immediately, but the other one seems a plain bug,
easy enough to fix.

> But wait, there's more - what happens after hitting the limit? We restart
> the decoding process, and end up getting this:
> 
> ERROR:  53000: exceeded maxAllocatedDescs (492) while trying to open
> file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"
> LOCATION:  OpenTransientFile, fd.c:2161
> LOG:  0: starting logical decoding for slot "s"
> DETAIL:  streaming transactions committing after 1/6097DD48, reading
> WAL from 1/60275848
> LOCATION:  CreateDecodingContext, logical.c:414
> LOG:  0: logical decoding found consistent point at 1/60275848
> DETAIL:  Logical decoding will begin using saved snapshot.
> LOCATION:  SnapBuildRestore, snapbuild.c:1872
> ERROR:  XX000: subtransaction logged without previous top-level txn
> record

Hmm, I wonder if we introduced some bug in f49a80c481f7.

> I'd say this implies the "can't open file" is handled in a way that corrupts
> the mapping files, making it impossible to restart the decoding. AFAICS the
> only option at this point is to drop the subscription and start over.

Wow, that seems pretty serious.  Clearly that's a different bug that
must be fixed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



xact_start meaning when dealing with procedures?

2018-08-09 Thread hubert depesz lubaczewski
Hi
I just noticed that when I called a procedure that commits and rollbacks
- the xact_start in pg_stat_activity is not updated. Is it intentional?

I'm on newest 12devel, built today.

Best regards,

depesz




Re: Improve behavior of concurrent TRUNCATE

2018-08-09 Thread Bossart, Nathan
On 8/9/18, 11:31 AM, "Michael Paquier"  wrote:
> Thanks, I have updated the patch as you suggested.  Any more
> improvements to it that you can foresee?

Looks good to me.

Nathan



Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-09 Thread Alvaro Herrera
On 2018-Aug-09, Tom Lane wrote:

> It sounds like whoever wrote that code was completely impervious to the
> API spec for AllocateFile():
> 
>  * Note that files that will be open for any significant length of time
>  * should NOT be handled this way, since they cannot share kernel file
>  * descriptors with other files; there is grave risk of running out of FDs
>  * if anyone locks down too many FDs.
> 
> Perhaps these files should be accessed through fd.c's VFD infrastructure
> rather than ignoring it.  Reinventing it is especially not the way to go.

Ah, right.  Maybe ReorderBufferRestoreChanges should use
PathNameOpenFile / FileRead ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-09 Thread Tomas Vondra




On 08/09/2018 07:47 PM, Alvaro Herrera wrote:

On 2018-Aug-09, Tomas Vondra wrote:


I suppose there are reasons why it's done this way, and admittedly the test
that happens to trigger this is a bit extreme (essentially running pgbench
concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's
extreme enough to deem it not an issue, because people using many temporary
tables often deal with bloat by doing frequent vacuum full on catalogs.


Actually, it seems to me that ApplyLogicalMappingFile is just leaking
the file descriptor for no good reason.  There's a different
OpenTransientFile call in ReorderBufferRestoreChanges that is not
intended to be closed immediately, but the other one seems a plain bug,
easy enough to fix.


But wait, there's more - what happens after hitting the limit? We restart
the decoding process, and end up getting this:

 ERROR:  53000: exceeded maxAllocatedDescs (492) while trying to open
 file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"
 LOCATION:  OpenTransientFile, fd.c:2161
 LOG:  0: starting logical decoding for slot "s"
 DETAIL:  streaming transactions committing after 1/6097DD48, reading
 WAL from 1/60275848
 LOCATION:  CreateDecodingContext, logical.c:414
 LOG:  0: logical decoding found consistent point at 1/60275848
 DETAIL:  Logical decoding will begin using saved snapshot.
 LOCATION:  SnapBuildRestore, snapbuild.c:1872
 ERROR:  XX000: subtransaction logged without previous top-level txn
 record


Hmm, I wonder if we introduced some bug in f49a80c481f7.



That's possible. I'm running on f85537a88d, i.e. with f49a80c481f7.


I'd say this implies the "can't open file" is handled in a way that corrupts
the mapping files, making it impossible to restart the decoding. AFAICS the
only option at this point is to drop the subscription and start over.


Wow, that seems pretty serious.  Clearly that's a different bug that
must be fixed.



Yep.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: xact_start meaning when dealing with procedures?

2018-08-09 Thread Peter Eisentraut
On 09/08/2018 19:57, hubert depesz lubaczewski wrote:
> I just noticed that when I called a procedure that commits and rollbacks
> - the xact_start in pg_stat_activity is not updated. Is it intentional?

It's an artifact of the way this is computed.  The reported transaction
timestamp is the timestamp of the first top-level statement of the
transaction.  This assumes that transactions contain statements, not the
other way around, like it is now possible.  I'm not sure what an
appropriate improvement would be here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC for a function trust mechanism

2018-08-09 Thread David Kohn
On Thu, Aug 9, 2018 at 12:11 PM Bruce Momjian  wrote:

> ...
>
> > The things that we hadn't resolved, which is why this didn't get further
> > than POC stage, were
> >
> > (1) What's the mechanism for declaring trust?  In this POC, it's just
> > a GUC that you can set to a list of role names, with $user for yourself
> > and "public" if you want to trust everybody.  It's not clear if that's
> > good enough, or if we want something a bit more locked-down.
>
> Yes, works for me.
>
> > (2) Is trust transitive?  Where and how would the list of trusted roles
> > change?  Arguably, if you call a SECURITY DEFINER function, then once
> > you've decided that you trust the function owner, actual execution of the
> > function should use the function owner's list of trusted roles not yours.
> > With the GUC approach, it'd be necessary for SECURITY DEFINER functions
> > to implement this with a "SET trusted_roles" clause, much as they now
> > have to do with search_path.  That's possible but it's again not very
> > non-invasive, so we'd been speculating about automating this more.
> > If we had, say, a catalog that provided the desired list of trusted roles
> > for every role, then we could imagine implementing that context change
> > automatically.  Likewise, stuff like autovacuum or REINDEX would want
> > to run with the table owner's list of trusted roles, but the GUC approach
> > doesn't really provide enough infrastructure to know what to do there.
>
> I can't think of any other places we do transitive permissions, except
> for role membership.  I don't see the logic in adding such transitivity
> to function/operator calls, or even a per-function GUC.  I assume most
> sites have a small number of extensions installed by a predefined group
> of users, usually superusers. If there is a larger group, a group role
> should be created and those people put in the role, and the group role
> trusted.
>

I am wondering how this will interact with the inheritance of roles. For
instance, if two users are members of the same role, and one creates a
function the expectation would be that other users in the same role will
not trust that function. However, do I trust functions that are owned by
the roles that I am a member of? Or do I have to list any nested roles
explicitly? If the former, I suppose we'd have to modify how alter function
set owner works. It is currently allowed for roles that you are a member of
(and would then create a security hole). However, not trusting functions
owned by roles that I am a member of seems to also be a bit
counterintuitive.
Best,
David


Re: libpq should not look up all host addresses at once

2018-08-09 Thread Chapman Flack
On 08/09/2018 11:05 AM, Tom Lane wrote:

> So I think what this code should do is (1) look up each hostname as it
> needs it, not all at once, and (2) proceed on to the next hostname
> if it gets a DNS lookup failure, not fail the whole connection attempt
> immediately.  As attached.

Would it be worth the complexity to be a little async about it,
fling a few DNS requests out, and try the hosts in the order the
responses come back?

-Chap



Re: xact_start meaning when dealing with procedures?

2018-08-09 Thread Vik Fearing
On 09/08/18 20:13, Peter Eisentraut wrote:
> On 09/08/2018 19:57, hubert depesz lubaczewski wrote:
>> I just noticed that when I called a procedure that commits and rollbacks
>> - the xact_start in pg_stat_activity is not updated. Is it intentional?
> 
> It's an artifact of the way this is computed.  The reported transaction
> timestamp is the timestamp of the first top-level statement of the
> transaction.  This assumes that transactions contain statements, not the
> other way around, like it is now possible.  I'm not sure what an
> appropriate improvement would be here.

That would just mean that query_start would be older than xact_start,
but that's okay because the displayed query would be a CALL so we'll
know what's going on.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2018-08-09 Thread Andrew Dunstan




On 01/24/2018 04:30 AM, Pavel Stehule wrote:


I am sending updated version.

Very much thanks for very precious review






Thomas,

I am unable to replicate the Linux failure seen in the cfbot on my 
Fedora machine. Both when building with libxml2 and without, after 
applying the latest patch the tests pass without error. Can you please 
investigate what's going on here?


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Bear Giles
There are alternatives. If you know what you want to find, e.g., a search
by username or email address, you can store a strong hash of the value as
an indexed column. By "strong hash" I mean don't just use md5 or sha1, or
even one round with a salt. I can give you more details about how and why
offline.

So you might have a record with:

   id serial primary key,
   email_hash text not null indexed,
   first_name_hash text indexed,
   last_name_hash text indexed,
   phone_number_hash text indexed ,
   'wallet'_containing_all_encrypted_values text

and that allows you to search on email, first name, last name, or phone
number, or some combination on them. But no expressions. The hashing would
be done in your app, not the database. You also probably want to convert
everything to lowercase, maybe remove spaces, etc., before computing the
hash.

You should be prepared to handle multiple matches. It's unlikely that an
email or phone number hash won't be unique but it's safest to always be
prepared for more than one match, decrypt the 'wallet', and then do a final
comparison. That also gives you a bit of protection from an attacker
creating an account and then changing the hash values to match someone
else. You can use that to support very limited expressions, e.g., also keep
a hash on the first three letters of their last name, but that will
compromise your security a bit since it allows an attacker to perform some
statistical analysis on the data.

Finally there's the general advice that hashes (and encrypted values)
should always have a version number of some sort. It could be something as
simple as 3$hash, or it could be a composite column or even a user-defined
type. The # indicates is a lookup into a table, perhaps in your app, that
tells you which hashing algorithm and salt to use. It makes life a lot
easier if the security audit tells you that you need to change your
cipher/salt/key/whatever but you can't do it immediately since you don't
know everything you need in order to do it, e.g., the password that you
need in order to recompute the hash value. With that version number it's
easy to continue to accept the existing password so they can log in, and in
the background you quietly recompute the hash using the new
salt/algorithm/whatever and update their record. I've worked for some
pretty knowledgeable companies that have overlooked this.

On Thu, Aug 9, 2018 at 6:05 AM, Andres Freund  wrote:

>
>
> On August 9, 2018 5:30:26 PM GMT+05:30, Danylo Hlynskyi <
> abcz2.upr...@gmail.com> wrote:
> > ?Is it hard to implement soluition 2?
>
> Yes.
>
> To the point that I'm fairly certain that an implementation would be
> considered to costly to maintain (vs benefit) of proposed.
>
> Andres
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>


Re: libpq should not look up all host addresses at once

2018-08-09 Thread Tom Lane
Chapman Flack  writes:
> On 08/09/2018 11:05 AM, Tom Lane wrote:
>> So I think what this code should do is (1) look up each hostname as it
>> needs it, not all at once, and (2) proceed on to the next hostname
>> if it gets a DNS lookup failure, not fail the whole connection attempt
>> immediately.  As attached.

> Would it be worth the complexity to be a little async about it,
> fling a few DNS requests out, and try the hosts in the order the
> responses come back?

It would be nice if an async connection request didn't have to block during
DNS lookups ... but I don't know of any portable library API for async DNS
requests, and it's most certainly not worth the trouble for us to write
our own version of getaddrinfo(3).

In practice, I think the async connection mode is mostly a legacy API
at this point anyway; surely most people who need that sort of behavior
are handling it nowadays by invoking libpq on a separate thread.  So I
just don't see it being worth a huge amount of work and maintenance
effort to get that to happen.  (Having said that, at least moving the
lookup from connectDBStart into PQconnectPoll, as this patch does,
is a step in the right direction.)

Now that I think about it, there may be some text in the libpq docs
claiming that the lookup happens in PQconnectStart not PQconnectPoll;
that would need adjustment.

regards, tom lane



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-08-09 Thread Peter Eisentraut
On 07/08/2018 15:29, Andres Freund wrote:
> I don't think that solves the problem that an arriving relcache
> invalidation would trigger a rebuild of rd_partdesc, while it actually
> is referenced by running code.

The problem is more generally that a relcache invalidation changes all
pointers that might be in use.  So it's currently not safe to trigger a
relcache invalidation (on tables) without some kind of exclusive lock.
One possible solution to this is outlined here:


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC for a function trust mechanism

2018-08-09 Thread Bruce Momjian
On Thu, Aug  9, 2018 at 02:12:41PM -0400, David Kohn wrote:
> On Thu, Aug 9, 2018 at 12:11 PM Bruce Momjian  wrote:
> I can't think of any other places we do transitive permissions, except
> for role membership.  I don't see the logic in adding such transitivity
> to function/operator calls, or even a per-function GUC.  I assume most
> sites have a small number of extensions installed by a predefined group
> of users, usually superusers. If there is a larger group, a group role
> should be created and those people put in the role, and the group role
> trusted.
> 
> 
> I am wondering how this will interact with the inheritance of roles. For
> instance, if two users are members of the same role, and one creates a 
> function
> the expectation would be that other users in the same role will not trust that
> function.

Well, right now, if you want to give members of a role rights to
something, you have to specifically grant rights to that role.  I would
assume the same thing would happen here --- if you want to trust a group
role, you have to mention that group role in the GUC list (not
function-level GUC).

Do we allow any GUC on a function?  Would not allowing this be confusing?

If we did transitive permissions, I could trust someone, and that person
could call a function of someone else they trust, and after a while you
don't know who you are trusting, which is why I think complex setups
like that are unwise.

> However, do I trust functions that are owned by the roles that I am a
> member of? Or do I have to list any nested roles explicitly? If the former, I
> suppose we'd have to modify how alter function set owner works. It is 
> currently
> allowed for roles that you are a member of (and would then create a security
> hole). However, not trusting functions owned by roles that I am a member of
> seems to also be a bit counterintuitive.

Well, if someone adds me to the 'bad' role, do I have any control over
that?  Seems someone adding me to their role is not something I am
requesting.  Let's look at the docs on GRANT ROLE:

   GRANT on Roles
   This variant of the GRANT command grants membership in a role to
   one or more other roles. Membership in a role is significant because
   it conveys the privileges granted to a role to each of its members.

   If WITH ADMIN OPTION is specified, the member can in turn grant
   membership in the role to others, and revoke membership in the role
   as well. Without the admin option, ordinary users cannot do that. A
   role is not considered to hold WITH ADMIN OPTION on itself, but it
   may grant or revoke membership in itself from a database session
   where the session user matches the role. Database superusers can
   grant or revoke membership in any role to anyone. Roles having
   CREATEROLE privilege can grant or revoke membership in any role
   that is not a superuser.

   Unlike the case with privileges, membership in a role cannot be
   granted to PUBLIC. Note also that this form of the command does
   not allow the noise word GROUP.

The point is that it is granting the role _access_ to something, not
something trust that the role accepts.  The WITH ADMIN OPTION would
allow ordinary users to add roles for whoever they want to attack.

Basically, as it is now, someone adding me to their role membership has
no downside for me.  To trust my own role membership adds a downside to
role membership that I don't think we want to do --- it makes role
membership too complex in what it grants _and_ trusts.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Typo in doc or wrong EXCLUDE implementation

2018-08-09 Thread Bruce Momjian
On Thu, Aug 9, 2018 at 01:11:05PM +0300, KES wrote:
> Bruce:
> >Yes, it would work, but doing that only for equality would be
> >surprising
>  to many people
>
> Why surprising? It is
> [documented](https://www.postgresql.org/docs/current/static/sql-create
> table.html#sql-createtable-exclude):
> >If all of the specified operators test for equality, this is
> >equivalent to a UNIQUE constraint, although an ordinary unique
> >constraint will be faster.
>
> Thus the UNIQUE constraint is just particular case of exclusion
> constraint, is not?

Well, for me a UNIQUE constraint guarantees each discrete value is
unique, while exclusion constraint says discrete or ranges or geometric
types don't overlap.  I realize equality is a special case of discrete,
but having such cases be marked as UNIQUE seems too confusing.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Typo in doc or wrong EXCLUDE implementation

2018-08-09 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, Aug 9, 2018 at 01:11:05PM +0300, KES wrote:
>> Why surprising? It is
>> [documented](https://www.postgresql.org/docs/current/static/sql-create
>> table.html#sql-createtable-exclude):
>>> If all of the specified operators test for equality, this is
>>> equivalent to a UNIQUE constraint, although an ordinary unique
>>> constraint will be faster.

>> Thus the UNIQUE constraint is just particular case of exclusion
>> constraint, is not?

> Well, for me a UNIQUE constraint guarantees each discrete value is
> unique, while exclusion constraint says discrete or ranges or geometric
> types don't overlap.  I realize equality is a special case of discrete,
> but having such cases be marked as UNIQUE seems too confusing.

I think the OP is reading "equivalent" literally, as meaning that
an EXCLUDE with operators that act like equality is treated as being
the same as UNIQUE for *every* purpose.  We're not going there, IMO,
so probably we need to tweak the doc wording a little.  Perhaps
writing "functionally equivalent" would be better?  Or instead of
"is equivalent to", write "imposes the same restriction as"?

regards, tom lane



Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Nico Williams
On Thu, Aug 09, 2018 at 03:00:26PM +0300, Danylo Hlynskyi wrote:
> The problem
> ==
> 
> [...]
> 
> We don't trust full-disk-encryption or any other transparent encryption,
> because of possible SQL injections.

Can you elaborate on this?

> Solution 1 (possibly can be used even now)
> 
> - perform full-disk encryption
> - perform encryption of column
> - add decrypting expression-based index with decryption key
> - limit ways on disclosing index internals. Ideally if no one except admin
> can do that
> - limit ways to read index definitions - so it's not possible for
> application to uncover decryption key from database itself, it should know
> it on it's own.

But... you have to have the decryption key(s) in memory at all times to
enable any write operations.  And plaintext as well at various times.

What does this gain you that FDE doesn't?

> Solution 2 (feature request)
> 
> - full-disk encryption is optional
> - data column is encrypted
> - index is decrypted by construction, but each it's block is encrypted,
> even in memory.
> - lookups over index do lazy index buffer decrypt and close buffers ASAP
> - make every query that has to touch encrypted column or encrypted index
> require decryption key. This means, SELECT, DELETE, UPDATE, INSERT, VACUUM,
> CLUSTER, CREATE INDEX, pg_dump, pg_restore all should have decryption key
> supplied in order to be executed. This also means, that autovacuum daemon
> can't work.

Same response.

> What do you think about both solutions? Is it hard to implement soluition 2?

They gain little or nothing over doing filesystem encryption in the OS
or even just plain FDE (see below).  They are not worthwhile.

You need to define your threat model.  What bad actors are you
protecting against?  What threats are you protecting against?

https://www.postgresql.org/message-id/20180622042337.GL4200%40localhost

Here are some threats you might choose to protect against:

1) passive attackers on the wire
2) active  attackers on the wire
3a) theft / compromise of storage devices
3b) compromise of decommissioned storage devices
3c) theft of running server
4) compromised backup storage
5) bad / compromised clients
6) bad / compromised DBAs or sysadmins
7) side channel exploits
8) ??

(1) and (2) are taken care of by TLS.

(3a) is taken care of by FDE in controllers, say, or by physical
security.

(3b) is taken care of by proper decommissioning, but FDE helps.

(3c) you can't protect against if you have keys in memory.  You could
use client-side crypto, but you'll have more clients to worry about than
servers.  Physical security is your best option.  (And really, you don't
get any way to protect against law enforcement taking the devices.)

(4) is taken care of by encrypting backups, which requires no changes to
PG to get.

(5) is taken care of (to some degree) by server-side logic (triggers,
...).

(6)...  You can't protect against sysadmins, really, nor DBAs, but you
can use crypto on the *client*-side to get some protection.  Since the
PG client is very thin and dumb, the PG client can't easily do this.
The idea is to encrypt values and MAC/sign rows to prevent DBAs/
sysadmins seeing sensitive data or tampering with your data.

(7) one deals with by using crypto implementations built with side
channel protection, though, really, this is a very difficult subject in
general, especially since Spectre.

Nico
-- 



Re: POC for a function trust mechanism

2018-08-09 Thread Nico Williams
On Wed, Aug 08, 2018 at 01:15:38PM -0400, Tom Lane wrote:
> This is sort of a counter-proposal to Noah's discussion of search path
> security checking in <20180805080441.gh1688...@rfd.leadboat.com>.
> (There's no technical reason we couldn't do both things, but I think
> this'd be more useful to most people.)

So, this is why I always fully-qualify all references to functions,
tables, etc.  I also always set a search_path on each function just in
case I accidentally leave a non-fully-qualified symbol.

I would like to have a way to request that all non-fully-qualified
symbols be resolved at function create/replace time and that the
resolution results be made permanent for the function.  If I have
several schemas in a search_path at function definition time, this would
not allow me to move dependencies around without replacing the
dependents -- that's OK for me.

Nico
-- 



Re: Documentaion fix.

2018-08-09 Thread Alvaro Herrera
On 2018-Aug-09, Kyotaro HORIGUCHI wrote:

> # I noticed that the subject has typo..

Yeah.  I suggest never changing subject lines, because Gmail has the
nasty (mis-)feature of making such a response into a completely new
thread.  I don't know if Google paid mail service behaves in the same
way.

So if you change Subject, please do include a URL to the previous thread
if necessary, for the benefit of people reading on Gmail.  

I wouldn't worry about this for any other individual email provider, but
Gmail is by far the largest fraction of subscribers :-(  I guess this
shows just how much better Google made webmail systems than what existed
at the time.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC for a function trust mechanism

2018-08-09 Thread David Kohn
On Thu, Aug 9, 2018 at 3:04 PM Bruce Momjian  wrote:

>
>
> Well, right now, if you want to give members of a role rights to
> something, you have to specifically grant rights to that role.  I would
> assume the same thing would happen here --- if you want to trust a group
> role, you have to mention that group role in the GUC list (not
> function-level GUC).

Sure, but if I grant execute on a function to a role, members of that role
will be able to execute that function. Now, each member will (potentially)
need to update their trust list before doing that. Which seems a bit odd.
Or will I be able to modify the some sort of default trust list of the
group role? If not, it seems like it could be an administrative nightmare,
if so there are potential issues with who is allowed to modify the list of
trusted users that then gets inherited.

> ...
>
> Basically, as it is now, someone adding me to their role membership has
> no downside for me.  To trust my own role membership adds a downside to
> role membership that I don't think we want to do --- it makes role
> membership too complex in what it grants _and_ trusts.
>
> Makes sense, and I can see how that could get out of hand in terms of
figuring out who you trust. I guess I don't know of other cases where this
concept of trusting comes about in our current permissions system? And it
seems to introduce a lot of odd cases where you end up with a sort of
permissions error or I guess a trust error in this case.

One possibility that might help this would be to only use the check this if
a) the user who created the function isn't in the trust list and b) there
is a function with the same name and equivalent argument classes that would
be called if you weren't to call the untrusted user's function. So it is
only used for disambiguation.

Best,
David


Re: Typo in doc or wrong EXCLUDE implementation

2018-08-09 Thread David G. Johnston
On Thu, Aug 9, 2018 at 12:31 PM, Tom Lane  wrote:

> I think the OP is reading "equivalent" literally, as meaning that
> an EXCLUDE with operators that act like equality is treated as being
> the same as UNIQUE for *every* purpose.  We're not going there, IMO,
> so probably we need to tweak the doc wording a little.  Perhaps
> writing "functionally equivalent" would be better?  Or instead of
> "is equivalent to", write "imposes the same restriction as"?
>

Maybe something like:

diff --git a/doc/src/sgml/ref/create_table.sgml
b/doc/src/sgml/ref/create_table.sgml
index d936de3f23..7c31fe853b 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -928,12 +928,10 @@ WITH ( MODULUS numeric_literal, REM
   The EXCLUDE clause defines an exclusion
   constraint, which guarantees that if
   any two rows are compared on the specified column(s) or
-  expression(s) using the specified operator(s), not all of these
-  comparisons will return TRUE.  If all of the
-  specified operators test for equality, this is equivalent to a
-  UNIQUE constraint, although an ordinary unique
constraint
-  will be faster.  However, exclusion constraints can specify
-  constraints that are more general than simple equality.
+  expression(s) using the specified operator(s), at least one of the
+  comparisons will return FALSE.
+  Exclusion constraints can (and should) be used to specify
+  expressions that do not involve simple equality.
   For example, you can specify a constraint that
   no two rows in the table contain overlapping circles
   (see ) by using the
@@ -968,6 +966,14 @@ WITH ( MODULUS numeric_literal, REM
   exclusion constraint on a subset of the table; internally this
creates a
   partial index. Note that parentheses are required around the
predicate.
  
+
+ 
+  PostgreSQL does not consider an exclusion
+  constraint to be a valid unique constraint for purposes of
determining the
+  validity of a foreign key constraint.  For this reason, in addition
to performance,
+  an exclusion constraint defined using only equality operators should
be defined
+  as a UNIQUE constraint.
+ 
 



create-table-exclude-doc.diff
Description: Binary data


Re: Doc patch: add RECURSIVE to bookindex

2018-08-09 Thread Alvaro Herrera
On 2018-Aug-01, Fabien COELHO wrote:

> Hello Daniel,
> 
> > Patch applies cleanly, doc build ok, works for me.
> 
> I have added it to the next CF and marked it as ready.

Pushed, thanks.

I applied it to 11 too.  I would have added it even further back, but it
didn't apply cleanly.

How about an index entry for "CSV"?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: peripatus build failures....

2018-08-09 Thread Alvaro Herrera
On 2018-Jul-09, Larry Rosenman wrote:

> On Mon, Jul 09, 2018 at 05:25:50PM -0400, Tom Lane wrote:
> > I wrote:
> > > I'd been hesitant to back-patch dddfc4cb2 back in April; but now that
> > > it's survived some beta testing, I think that doing so seems like the
> > > most appropriate way to fix this.
> > 
> > Done.  Hopefully I didn't break anything; a lot of this code has mutated
> > to some extent since 9.3.  But I expect the buildfarm will point out any
> > problems.
> 
> The reason you might not have seen it on FreeBSD before is that FreeBSD
> 12 now uses lld (llvm's LD) to link, and it changed the default for -z. 

So, has the hack for -z notext been removed from the freebsd port build?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC for a function trust mechanism

2018-08-09 Thread Bruce Momjian
On Thu, Aug  9, 2018 at 04:01:09PM -0400, David Kohn wrote:
> 
> 
> On Thu, Aug 9, 2018 at 3:04 PM Bruce Momjian  wrote:
> 
> 
> 
> Well, right now, if you want to give members of a role rights to
> something, you have to specifically grant rights to that role.  I would
> assume the same thing would happen here --- if you want to trust a group
> role, you have to mention that group role in the GUC list (not
> function-level GUC).
> 
> Sure, but if I grant execute on a function to a role, members of that role 
> will
> be able to execute that function. Now, each member will (potentially) need to
> update their trust list before doing that. Which seems a bit odd. Or will I be

Look at your wording above, "I grant execute" --- you are opening up
permissions to them.  There is no approval on their part that signifies
they should trust you.  If I open permissions for a file on my web
server, it doesn't mean people should trust me more than before.

> able to modify the some sort of default trust list of the group role? If not,
> it seems like it could be an administrative nightmare, if so there are
> potential issues with who is allowed to modify the list of trusted users that
> then gets inherited.

We certainly don't want to double-down on extending trust by allowing
someone to modify someone else's trusted role list.  Practically, if you
are opening up permissions to someone, you will need to create a group
that you both belong to first, and have them trust the group, or they
can trust you directly.  The benefit of a group role is that other
people can be added to that group without having to modify the trusted
role list.  Basically, the function caller is trusting whoever controls
membership to that group role.  This is different from having someone
trusting a role just because they were added to it (perhaps without
their knowledge).


> Basically, as it is now, someone adding me to their role membership has
> no downside for me.  To trust my own role membership adds a downside to
> role membership that I don't think we want to do --- it makes role
> membership too complex in what it grants _and_ trusts.
> 
> Makes sense, and I can see how that could get out of hand in terms of figuring
> out who you trust. I guess I don't know of other cases where this concept of
> trusting comes about in our current permissions system? And it seems to
> introduce a lot of odd cases where you end up with a sort of permissions error
> or I guess a trust error in this case. 

Yep.

> One possibility that might help this would be to only use the check this if a)
> the user who created the function isn't in the trust list and b) there is a
> function with the same name and equivalent argument classes that would be
> called if you weren't to call the untrusted user's function. So it is only 
> used
> for disambiguation. 

You can't do that because if you do, someone creating a ambiguous
function would cause a denial-of-service attack --- better to fail at
the time of the first function, when you are likely watching the output.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-09 Thread Ivan Kartyshov

Hello, thank you for your review.

Alexander Korotkov писал 2018-06-20 20:54:

Thinking about that more I found that adding vacuum mark as an extra
argument to LockAcquireExtended is also wrong.  It would be still hard
to determine if we should log the lock in LogStandbySnapshot().  We'll
have to add that flag to shared memory table of locks.  And that looks
like a kludge.

Therefore, I'd like to propose another approach: introduce new lock
level.  So, AccessExclusiveLock will be split into two
AccessExclusiveLocalLock and AccessExclusiveLock.  In spite of
AccessExclusiveLock, AccessExclusiveLocalLock will be not logged to
standby, and used for heap truncation.

I expect some resistance to my proposal, because fixing this "small
bug" doesn't deserve new lock level.  But current behavior of
hot_standby_feedback is buggy.  From user prospective,
hot_standby_feedback option is just non-worker, which causes master to
bloat without protection for standby queries from cancel.  We need to
fix that, but I don't see other proper way to do that except adding
new lock level...


Your offer is very interesting, it made patch smaller and more 
beautiful.
So I update patch and made changes accordance with the proposed concept 
of

special AccessExclusiveLocalLock.


I don't yet understand what is the problem here and why this patch
should solve that.  As I get idea of this patch, GetOldestXmin() will
initialize xmin of walsender with lowest xmin of replication slot.
But xmin of replication slots will be anyway taken into account by
GetSnapshotData() called by vacuum.


How to test:
Make async replica, turn on feedback, reduce 
max_standby_streaming_delay

and aggressive autovacuum.


You forgot to mention, that one should setup the replication using
replication slot.  Naturally, if replication slot isn't exist, then
master shouldn't keep dead tuples for disconnected standby.  Because
master doesn't know if standby will reconnect or is it gone forever.


I tried to solve hot_standby_feedback without replication slots,
so I found a little window of possibility of case, when vacuum on
master make heap truncation of pages that standby needs for just started
transaction.

How to test:
Make async replica, turn on feedback and reduce
max_standby_streaming_delay.
Make autovacuum more aggressive.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0



Test1:
Here we will do a load on the master and simulation of  a long
transaction with repeated 1 second SEQSCANS on the replica (by  calling
pg_sleep 1 second duration every 6 seconds).
MASTERREPLICA
 hot_standby = on
 max_standby_streaming_delay = 1s
 hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
 start
 BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
 SELECT pg_sleep(value) FROM test;
 \watch 6



---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.



On Patched version lazy_vacuum_truncation passed without fatal errors.



Only some times Error occurs because this tests is too synthetic
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
Because of rising ResolveRecoveryConflictWithSnapshot while
redo some visibility flags to avoid this conflict we can do test2 or
increase max_standby_streaming_delay.



Test2:
Here we will do a load on the master and simulation of  a long
transaction on the replica (by  taking LOCK on table)
MASTERREPLICA
 hot_standby = on
 max_standby_streaming_delay = 1s
 hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1)
id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
 start
 BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
 LOCK TABLE test IN ACCESS SHARE MODE;
 select * from test;
 \watch 6



---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

I would like to here you opinion over this implementation.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 73934e5..73975cc 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -854,8 +854,8 @@ ERROR:  could not serialize access due to read/write dependencies among transact


 
- Conflicts with the ACCESS EXCLUSIVE lock
- mode only.
+ Conflicts with the ACCESS EXCLUSIVE and
+ A

Re: peripatus build failures....

2018-08-09 Thread Larry Rosenman
It was never put into the build, and I have a PR open to remove the LLD_UNSAFE 
flag for 10.5 and the rest of today's releases. 

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229523

-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

On 8/9/18, 3:25 PM, "Alvaro Herrera"  wrote:

On 2018-Jul-09, Larry Rosenman wrote:

> On Mon, Jul 09, 2018 at 05:25:50PM -0400, Tom Lane wrote:
> > I wrote:
> > > I'd been hesitant to back-patch dddfc4cb2 back in April; but now that
> > > it's survived some beta testing, I think that doing so seems like the
> > > most appropriate way to fix this.
> > 
> > Done.  Hopefully I didn't break anything; a lot of this code has mutated
> > to some extent since 9.3.  But I expect the buildfarm will point out any
> > problems.
> 
> The reason you might not have seen it on FreeBSD before is that FreeBSD
> 12 now uses lld (llvm's LD) to link, and it changed the default for -z. 

So, has the hack for -z notext been removed from the freebsd port build?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services






Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Bear Giles
Some regulatory standards require all UII, even all PII, information be
encrypted within the database, not just on encrypted media. That's to
reduce exposure even if someone gets access to a live server, e.g., via SQL
Injection. (The perennial #1 risk for software vulnerabilities.)

UII is uniquely identifiable information, e.g., SSN.

PII is personally identifiable information, e.g, email address, phone
number, address. It doesn't have to be enough to uniquely identify the
person, just enough to cut the number of possible individuals down to a
handful. That's a surprising large number of fields, e.g., knowing where
someone was born and their birthdate will get you close to their SSN for
anyone born after the IRS started requiring SSNs for all claimed
dependents. Knowing someone's birth date and city of residence will get you
down to a handful of individuals, often a single individual depending upon
their age and the size of their city. It's remarkably easy to uniquely
identify something like 75% of the population if you have the data from a
couple different sites and some way to correlate the records. (That's why
it's not good enough to just use the sha1 of an email address, etc.)

I know the government required UII encryption in its databases when I last
worked on a government contract, and I think they've required PII
encryption as well for years. I would be verify surprised if HIPAA doesn't
require that as well for PII in addition to the medical info. I definitely
know PCI-DSS requires encryption of all information on the credit card
itself - you can keep the last few digits (I think 6 are allowed but for is
recommended) to facilitate searches. Of course companies could still have
the same information unencrypted in other columns or tables (except for the
CC number itself - and you *never* keep the CVN in any form on threat of
losing your ability to accept credit cards if you're caught) but they were
encouraged to encrypt it as well.

Anyway legal requirements is "#0" on that list. Everything else *might*
happen but depending upon the nature of the data you *will* be audited for
compliance with regulations, either preemptively (e.g., VISA requires
periodic audits of anyone making more than $X in transactions per year) or
after a breach. One of my other past employers did the type of auditing
VISA requires and their promotional material was full of interviews with
former small business owners who lost their business after a breach. It
wasn't due to the loss itself, it's because any breach automatically
requires the strictest auditing for the next (4?) years and that cost far
more than the average independent restaurant, auto repair shop, etc., can
afford. Obviously their business model is (in part) to scare people but
there are plenty of situations where you have to encrypt data within the
database and not just rely on encrypted media.




> Here are some threats you might choose to protect against:
>
> 1) passive attackers on the wire
> 2) active  attackers on the wire
> 3a) theft / compromise of storage devices
> 3b) compromise of decommissioned storage devices
> 3c) theft of running server
> 4) compromised backup storage
> 5) bad / compromised clients
> 6) bad / compromised DBAs or sysadmins
> 7) side channel exploits
> 8) ??
>
> (1) and (2) are taken care of by TLS.
>
> (3a) is taken care of by FDE in controllers, say, or by physical
> security.
>
> (3b) is taken care of by proper decommissioning, but FDE helps.
>
> (3c) you can't protect against if you have keys in memory.  You could
> use client-side crypto, but you'll have more clients to worry about than
> servers.  Physical security is your best option.  (And really, you don't
> get any way to protect against law enforcement taking the devices.)
>
> (4) is taken care of by encrypting backups, which requires no changes to
> PG to get.
>
> (5) is taken care of (to some degree) by server-side logic (triggers,
> ...).
>
> (6)...  You can't protect against sysadmins, really, nor DBAs, but you
> can use crypto on the *client*-side to get some protection.  Since the
> PG client is very thin and dumb, the PG client can't easily do this.
> The idea is to encrypt values and MAC/sign rows to prevent DBAs/
> sysadmins seeing sensitive data or tampering with your data.
>
> (7) one deals with by using crypto implementations built with side
> channel protection, though, really, this is a very difficult subject in
> general, especially since Spectre.
>
> Nico
> --
>
>


Re: Constraint documentation

2018-08-09 Thread Alvaro Herrera
On 2018-Aug-07, Lætitia Avrot wrote:

> Hi Peter,
> 
> I understand what you're pointing at and I agree that it could be a good
> thing to be able to dump/restore a table without problem.
> 
> My point was that check constraints weren't supposed to be used that way
> theorically (or maybe i'm mistaken ?) so I thought maybe we should just
> inform the user that this kind of use of a check constraint is a misuse of
> that feature.

Tom Lane pointed out in another thread that the SQL standard lists
feature F673 "Reads SQL-data routine invocations in CHECK constraints"
which permits CHECK constraints to examine tables, so saying "you're not
supposed to do this", while correct from a Postgres perspective, would
be short-sighted ISTM, because we will make ourselves liars as soon as
we implement the feature.

I agree that we should point this out in *some* way, just not sure how.
Maybe something like "Postgres does not currently support CHECK
constraints containing queries, therefore we recommend to avoid them."
I would not mention pg_dump by name, just say dumps may not restore
depending on phase of moon.

(BTW I'm not sure of the term "other tables".  You could have a query
that references the same table ...)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Repeatable Read Isolation in SQL running via background worker

2018-08-09 Thread Jeremy Finzel
I am using worker_spi as a model to run a SQL statement inside a background
worker.  From my browsing of the Postgres library, I believe that if I want
repeatable read isolation level, the proper way for me to attain this is to
add this line after StartTransactionCommand() in worker_spi_main:

XactIsoLevel = XACT_REPEATABLE_READ;

Or - am I mistaken?  Does PushActiveSnapshot already ensure I will get the
same snapshot of the data within this transaction?

Can anyone help me if this is accurate or if there are any other gotchas I
should be aware of?

The SQL statement will be run every minute for example, and each time with
this isolation level.  At least, that is my goal.

Any help is much appreciated.

Thanks,
Jeremy


Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Nico Williams
On Thu, Aug 09, 2018 at 02:34:07PM -0600, Bear Giles wrote:
> Some regulatory standards require all UII, even all PII, information be
> encrypted within the database, not just on encrypted media. That's to
> reduce exposure even if someone gets access to a live server, e.g., via SQL
> Injection. (The perennial #1 risk for software vulnerabilities.)

My preference for dealing with SQL Injection is to not provide direct
SQL access, but to use PostgREST exporting a schema that has only PG SQL
functions encapsulating all supported queries.  You just can't have
injection with such an access layer because you don't get to send SQL to
the server (because you don't get to send SQL to PostgREST).  It really
helps that PostgREST is written in Haskell.

That said, sure, if you have SQL Injection issues, then encrypting in
the database will do provided that there's no transparent way to access
the data (otherwise you've gained nothing).  That basically means you're
doing all the crypto on the client.

If you're doing all the crypto on the client, then your options for
indexing are very limited indeed.  To avoid offline dictionary attacks
you have to index MAC'ed values, effectively.  You can still do free
text indexing, provided you MAC each word.  MAC == message
authentication code, really, it's a keyed hash function, typically HMAC,
UMAC, or some such.  You could also index ciphertext, provided it has an
authentication tag, but you don't gain anything versus just indexing the
authentication tag.

> I know the government required UII encryption in its databases when I last
> [...]

Usually regulations are not quite as prescriptive as that, though
there's always a discussion to be had with the regulators/auditors when
you deviate from the norm.  You're generally not precluded from having
better solutions than is the norm.

Nico
-- 



Re: Typo in doc or wrong EXCLUDE implementation

2018-08-09 Thread Vik Fearing
On 09/08/18 21:09, Bruce Momjian wrote:
> On Thu, Aug 9, 2018 at 01:11:05PM +0300, KES wrote:
>> Bruce:
>>> Yes, it would work, but doing that only for equality would be
>>> surprising
>>  to many people
>>
>> Why surprising? It is
>> [documented](https://www.postgresql.org/docs/current/static/sql-create
>> table.html#sql-createtable-exclude):
>>> If all of the specified operators test for equality, this is
>>> equivalent to a UNIQUE constraint, although an ordinary unique
>>> constraint will be faster.
>>
>> Thus the UNIQUE constraint is just particular case of exclusion
>> constraint, is not?
> 
> Well, for me a UNIQUE constraint guarantees each discrete value is
> unique, while exclusion constraint says discrete or ranges or geometric
> types don't overlap.  I realize equality is a special case of discrete,
> but having such cases be marked as UNIQUE seems too confusing.

One of the things I'm currently trying to implement is the WITHOUT
OVERLAPS for UNIQUE constraints.

See SQL:2016 section 11.7 
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: PATCH: pgbench - option to build using ppoll() for larger connection counts

2018-08-09 Thread Andrew Dunstan



On 08/09/2018 12:45 PM, Andrew Dunstan wrote:



On 07/03/2018 07:52 PM, Andrew Dunstan wrote:



On 05/17/2018 01:23 AM, Thomas Munro wrote:
On Tue, Mar 27, 2018 at 9:23 AM, Rady, Doug  
wrote:

pgbench11-ppoll-v12.patch

Hi Doug,

FYI this patch is trying and failing to use ppoll() on Windows:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.30 






It's still failing -  see 



I'm setting this back to "Waiting on Author" until that's fixed.




The author hasn't replied, but the attached seems to have cured the 
bitrot so that it at least applies. Let's see what the cfbot makes of 
it and then possibly fix any Windows issues.







And there's still a Windows problem which I think is cured in the 
attached patch


cheers

andrew


--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/configure b/configure
index 2665213..4579003 100755
--- a/configure
+++ b/configure
@@ -14916,7 +14916,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat ppoll pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 397f6bc..bd08068 100644
--- a/configure.in
+++ b/configure.in
@@ -1540,7 +1540,7 @@ PGAC_FUNC_WCSTOMBS_L
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
+AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat ppoll pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
 
 AC_REPLACE_FUNCS(fseeko)
 case $host_os in
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 41b756c..3d378db 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -45,9 +45,18 @@
 #include 
 #include 
 #include 
+#ifndef PGBENCH_USE_SELECT			/* force use of select(2)? */
+#ifdef HAVE_PPOLL
+#define POLL_USING_PPOLL
+#include 
+#endif
+#endif
+#ifndef POLL_USING_PPOLL
+#define POLL_USING_SELECT
 #ifdef HAVE_SYS_SELECT_H
 #include 
 #endif
+#endif
 
 #ifdef HAVE_SYS_RESOURCE_H
 #include 		/* for getrlimit */
@@ -92,13 +101,19 @@ static int	pthread_join(pthread_t th, void **thread_return);
 
 /
  * some configurable parameters */
-
-/* max number of clients allowed */
+#ifdef POLL_USING_SELECT	/* using select(2) */
+#define SOCKET_WAIT_METHOD "select"
+typedef fd_set socket_set;
 #ifdef FD_SETSIZE
-#define MAXCLIENTS	(FD_SETSIZE - 10)
+#define MAXCLIENTS	(FD_SETSIZE - 10) /* system limited max number of clients allowed */
 #else
-#define MAXCLIENTS	1024
+#define MAXCLIENTS	1024		/* max number of clients allowed */
 #endif
+#else	/* using ppoll(2) */
+#define SOCKET_WAIT_METHOD "ppoll"
+typedef struct pollfd socket_set;
+#define MAXCLIENTS	-1		/* unlimited number of clients */
+#endif /* POLL_USING_SELECT */
 
 #define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
 
@@ -525,6 +540,13 @@ static void addScript(ParsedScript script);
 static void *threadRun(void *arg);
 static void setalarm(int seconds);
 static void finishCon(CState *st);
+static socket_set *alloc_socket_set(int count);
+static bool error_on_socket(socket_set *sa, int idx, PGconn *con);
+static void free_socket_set(socket_set *sa);
+static bool ignore_socket(socket_set *sa, int idx, PGconn *con);
+static void clear_socket_set(socket_set *sa, int count);
+static void set_socket(socket_set *sa, int fd, int idx);
+static int wait_on_socket_set(socket_set *sa, int nstate, int maxsock, int64 usec);
 
 
 /* callback functions for our flex lexer */
@@ -1143,6 +1165,7 @@ doConnect(void)
 			!have_password)
 		{
 			PQfinish(conn);
+			conn = NULL;
 			simple_prompt("Password: ", password, sizeof(password), false);
 			have_password = true;
 			new_pass = true;
@@ -4903,7 +4926,7 @@ main(int argc, char **argv)
 			case 'c':
 benchmarking_option_set = true

Commitfest 2018-07 WOA items

2018-08-09 Thread Andrew Dunstan



I don't propose to annotate every one of these. If they aren't mentioned 
below I intend to move them to the next CF without any change.



cheers


andrew


https://commitfest.postgresql.org/18/1644/ Add 
--include-table-data-where option to pg_dump, to export only a subset of 
table data

I'm not really clear what we're waiting on the author for.

https://commitfest.postgresql.org/18/1635/ libpq compression
No activity since early June. The author and the reviewers seem to be 
somewhat at odds.

I'm inclined to return this with feedback.

https://commitfest.postgresql.org/18/1252/ Foreign Key Arrays
Nothing since May
Suggest Return with feedback.

https://commitfest.postgresql.org/18/1710/ Row filtering for logical 
replication

No progress since March
Suggest Return with Feedback






CF 2018-07 Needs Review items

2018-08-09 Thread Andrew Dunstan



There has been some progress on quite a lot of these.


I'm going to move them all to the next CF.


Below are some notes on a few that seem to have stalled a bit.


cheers


andrew


https://commitfest.postgresql.org/18/1518/ lc_messages parameter doesn't 
work on Windows

This is a bug fix so we shouldn't drop it. But nobody is picking it up :-(

|https://commitfest.postgresql.org/18/1641/ |Try to produce compiler warnings for 
incorrect usage of printf("%m")
Conversation seems to have died back in May. Wait another CF to see if it's 
still alive.

https://commitfest.postgresql.org/18/1500/ csv output format for psql
Progress seems to have stalled. This one is up my street so I will take a look 
when I get a chance.


||

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: POC for a function trust mechanism

2018-08-09 Thread David Kohn
>
> We certainly don't want to double-down on extending trust by allowing
> someone to modify someone else's trusted role list.  Practically, if you
> are opening up permissions to someone, you will need to create a group
> that you both belong to first, and have them trust the group, or they
> can trust you directly.  The benefit of a group role is that other
> people can be added to that group without having to modify the trusted
> role list.  Basically, the function caller is trusting whoever controls
> membership to that group role.  This is different from having someone
> trusting a role just because they were added to it (perhaps without
> their knowledge).
>

I think if you trust a group role you are implicitly trusting any members
of that group, because one can always alter the function owner from
themselves to the group role, because they are a member of that group. So
what you'd need to do is create a special group role that only owned the
functions and then not make anyone an actual member of that group, but you
could trust that group role. Then a separate group role that everyone would
be a member of and you'd do grants from the first role to the second.
So for what you proposed, if you are opening up permissions to someone by
using a role that you are both members of, then you implicitly open up
permissions from them to you as well.

Anyway, I guess all of this seems to introduce a lot more complexity into
an already complex permissions management system...is this all about the
public schema? Can we just make create function/operator etc something you
have to grant even in the public schema? It seems like that could be
significantly more user friendly than this. Or otherwise, would functions
owned by the database or schema owner be exempt from this? Because there
are many setups where people try to avoid superuser usage by creating
database or schema owner users who can do things like create function,
which a normal users can now use. Would checks be skipped if the function
call is schema qualified because then there's no reasonable way to think
that someone is being fooled about which function they are executing?

Best,
David


Re: Commitfest 2018-07 WOA items

2018-08-09 Thread Alvaro Herrera
On 2018-Aug-09, Andrew Dunstan wrote:

> https://commitfest.postgresql.org/18/1252/ Foreign Key Arrays
> Nothing since May
> Suggest Return with feedback.

Please keep it around for my sake.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC for a function trust mechanism

2018-08-09 Thread Isaac Morland
On 9 August 2018 at 18:18, David Kohn  wrote:

Anyway, I guess all of this seems to introduce a lot more complexity into
> an already complex permissions management system...is this all about the
> public schema? Can we just make create function/operator etc something you
> have to grant even in the public schema? It seems like that could be
> significantly more user friendly than this.
>

Already true, if you do:

REVOKE CREATE ON SCHEMA public FROM PUBLIC;

Which I do, in all my databases, and which is probably a good idea in most
scenarios.


> Or otherwise, would functions owned by the database or schema owner be
> exempt from this? Because there are many setups where people try to avoid
> superuser usage by creating database or schema owner users who can do
> things like create function, which a normal users can now use. Would checks
> be skipped if the function call is schema qualified because then there's no
> reasonable way to think that someone is being fooled about which function
> they are executing?
>

At present, permissions are completely separate from ownership: your
ability to use an object does not depend on who owns what (I believe you
can even revoke your own rights to use your own stuff). I suspect changing
this is probably not a good idea.


Re: Commitfest 2018-07 RFC items

2018-08-09 Thread Fabien COELHO



Hello Andrew,

https://commitfest.postgresql.org/18/669/ pgbench - allow to store select 
results into variables



Latest patch has not been reviewed
Recommendation: change to "needs review" and move


AFAICS the latest patch is a trivial rebase after some perl automatic 
indentation changes, there was no new contents per se.


--
Fabien



Re: pgbench exit code

2018-08-09 Thread Fabien COELHO



Hello Peter,


In pgbench, when an error occurs during a benchmark run (SQL error,
connection error, etc.) it just prints an error message and then
proceeds to print the run summary normally and exits with status 0.  So
this is quite inconvenient, especially from a script.


Yep. I'm fine with changing this behavior... last time I suggested 
something like that, or probably something more drastic like existing 
immediately when there is little sense to go on, it was not approved, some 
people want the report anyway.


Your approach of not changing the output too much but changing the exit 
status and adding a warning may get through more easily.


Note that there is some logic in distinguishing between different type of 
errors (before the bench start vs the bench has started), so I'd suggest 
that the exit status should be different.



The attached patch changes this so that it exits with status 1 and also
prints a line below the summary advising that the results are incomplete.

The test suite had, curiously, encoded the previous behavior,


Sure, the point of the tests is to check for unwanted regressions. It does 
not mean that the behavior is ideal for all use cases.


Patch applies cleanly, compiles, global & local "make check" are ok.

The abort is by a client, but the code seems to only check the first 
client of a thread. ISTM that if the second or later client abort it may 
not be detected? Probably an intermediate aggregation at the thread level 
is needed, or maybe a global variable, or as errors are counted somewhere, 
it may be enough just to check that the count is non zero?


 -   [ $status ? qr{^$} : qr{processed: 0/1} ],
 +   [],


The patch removes a check that there was an output and that no 
transactions where processed. ISTM it should be kept. If a different exit 
status is chosen on abort, that would allow to keep it easily.


Probably there should be some documentation changes as well?

Note that the patch probably interferes in small ways with Marina's patch 
for retrying on serialization or deadlock errors, see:


https://commitfest.postgresql.org/18/1645/

--
Fabien.



Re: CF 2018-07 Needs Review items

2018-08-09 Thread Tom Lane
Andrew Dunstan  writes:
> |https://commitfest.postgresql.org/18/1641/ |Try to produce compiler warnings 
> for incorrect usage of printf("%m")
> Conversation seems to have died back in May. Wait another CF to see if it's 
> still alive.

That's certainly still alive as far as I'm concerned.  But I was not
expecting it to get looked at in this CF, since it was a new item.

regards, tom lane



Re: POC for a function trust mechanism

2018-08-09 Thread Bruce Momjian
On Thu, Aug  9, 2018 at 06:18:16PM -0400, David Kohn wrote:
> We certainly don't want to double-down on extending trust by allowing
> someone to modify someone else's trusted role list.  Practically, if you
> are opening up permissions to someone, you will need to create a group
> that you both belong to first, and have them trust the group, or they
> can trust you directly.  The benefit of a group role is that other
> people can be added to that group without having to modify the trusted
> role list.  Basically, the function caller is trusting whoever controls
> membership to that group role.  This is different from having someone
> trusting a role just because they were added to it (perhaps without
> their knowledge).
> 
> 
> I think if you trust a group role you are implicitly trusting any members of
> that group, because one can always alter the function owner from themselves to
> the group role, because they are a member of that group. So what you'd need to
> do is create a special group role that only owned the functions and then not
> make anyone an actual member of that group, but you could trust that group
> role. Then a separate group role that everyone would be a member of and you'd

Good point.  I think you are right that if you trust a role, you trust
whoever controls role membership to add only trust-worthy people --- that
seems obvious because any member can create functions you will trust,
even if they don't change the function owner.

> do grants from the first role to the second. 
> So for what you proposed, if you are opening up permissions to someone by 
> using
> a role that you are both members of, then you implicitly open up permissions
> from them to you as well. 

I don't see the value in that.  My point is that you can't trust anyone
in a role you are a member of, by default --- you have to make that
decision as a user.

> Anyway, I guess all of this seems to introduce a lot more complexity into an
> already complex permissions management system...is this all about the public
> schema? Can we just make create function/operator etc something you have to
> grant even in the public schema? It seems like that could be significantly 
> more
> user friendly than this. Or otherwise, would functions owned by the database 
> or

I think 95% of users are proabably creating these things as the
accessing user or super-user.

> schema owner be exempt from this? Because there are many setups where people
> try to avoid superuser usage by creating database or schema owner users who 
> can
> do things like create function, which a normal users can now use. Would checks
> be skipped if the function call is schema qualified because then there's no
> reasonable way to think that someone is being fooled about which function they
> are executing? 

I think most setups will be pretty simple.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



A dubious message related to SP-GiST compress method.

2018-08-09 Thread Kyotaro HORIGUCHI
I ran across the followin gcode in spgutils.c.

> if (OidIsValid(cache->config.leafType) &&
>   cache->config.leafType != atttype)
> {
>   if (!OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>  errmsg("compress method must not defined when leaf type is 
> different from input type")));
>   fillTypeDesc(&cache->attLeafType, cache->config.leafType);

IIUC the fourth line looks inconsistent with the message in 7th
line. Compress methos is required in the case. Isn't "must not
defined" a typo of "must be defined"?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4a9b5da268..6d59b316ae 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -133,7 +133,7 @@ spgGetCache(Relation index)
 			if (!OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		 errmsg("compress method must not defined when leaf type is different from input type")));
+		 errmsg("compress method must be defined when leaf type is different from input type")));
 
 			fillTypeDesc(&cache->attLeafType, cache->config.leafType);
 		}


Re: PATCH: pgbench - option to build using ppoll() for larger connection counts

2018-08-09 Thread Andrew Dunstan




On 08/09/2018 05:46 PM, Andrew Dunstan wrote:



On 08/09/2018 12:45 PM, Andrew Dunstan wrote:



On 07/03/2018 07:52 PM, Andrew Dunstan wrote:



On 05/17/2018 01:23 AM, Thomas Munro wrote:
On Tue, Mar 27, 2018 at 9:23 AM, Rady, Doug  
wrote:

pgbench11-ppoll-v12.patch

Hi Doug,

FYI this patch is trying and failing to use ppoll() on Windows:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.30 






It's still failing -  see 



I'm setting this back to "Waiting on Author" until that's fixed.




The author hasn't replied, but the attached seems to have cured the 
bitrot so that it at least applies. Let's see what the cfbot makes of 
it and then possibly fix any Windows issues.







And there's still a Windows problem which I think is cured in the 
attached patch





and the CFBot agrees.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-08-09 Thread Andres Freund
Hi,

On 2018-08-09 20:57:35 +0200, Peter Eisentraut wrote:
> On 07/08/2018 15:29, Andres Freund wrote:
> > I don't think that solves the problem that an arriving relcache
> > invalidation would trigger a rebuild of rd_partdesc, while it actually
> > is referenced by running code.
> 
> The problem is more generally that a relcache invalidation changes all
> pointers that might be in use.

I don't think that's quite right. We already better be OK with
superfluous invals that do not change anything, because there's already
sources of those (just think of vacuum, analyze, relation extension,
whatnot).


> So it's currently not safe to trigger a relcache invalidation (on
> tables) without some kind of exclusive lock.

I don't think that's true in a as general sense as you're stating it.
It's not OK to send relcache invalidations for things that people rely
on, and that cannot be updated in-place. Because of the dangling pointer
issue etc.

The fact that currently it is not safe to *change* partition related
stuff without an AEL and how to make it safe is precisely what I was
talking about in the thread. It won't be a general solution, but the
infrastructure I'm talking about should get us closer.


> One possible solution to this is outlined here:
> 

I don't see anything in here that addresses the issue structurally?

Greetings,

Andres Freund



Re: Commitfest 2018-07 WOA items

2018-08-09 Thread Andrew Dunstan




On 08/09/2018 06:19 PM, Alvaro Herrera wrote:

On 2018-Aug-09, Andrew Dunstan wrote:


https://commitfest.postgresql.org/18/1252/ Foreign Key Arrays
Nothing since May
Suggest Return with feedback.

Please keep it around for my sake.




OK. I spent some time looking at it today, and it's a bit of a mess, but 
over to you.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Why do we expand tuples in execMain.c?

2018-08-09 Thread Andrew Dunstan




On 08/08/2018 09:15 PM, Andres Freund wrote:


On August 9, 2018 1:33:17 AM GMT+05:30, Andrew Dunstan 
 wrote:


On 08/08/2018 12:20 AM, Andres Freund wrote:

Hi,

I noticed
if (HeapTupleHeaderGetNatts(tuple.t_data) <
RelationGetDescr(erm->relation)->natts)
{
copyTuple = heap_expand_tuple(&tuple,
   
   RelationGetDescr(erm->relation));
}
else
{
/* successful, copy tuple */
copyTuple = heap_copytuple(&tuple);
}

in EvalPlanQualFetchRowMarks, and I'm somewhat confused why it's

there?

If it's required here, why isn't it required in dozens of other

places?





Not dozens, I think, since you can't have short records for catalog
tables, which account for most of the calls to heap_copytuple().

I will look at the remainder of cases (less than 10) and reply in a day

or two.

But why is it needed at all, and the deforming code at the site where we access 
the columns isn't sufficient?




Yeah, I think you're right, and this is vestigial code that I neglected 
to remove when cleaning up from development.


I'll fix it.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Doc patch for index access method function

2018-08-09 Thread Tatsuro Yamada

Hi,

Attached patch for fixing documents of "61.2. Index Access Method Functions" and
"61.6. Index Cost Estimation Functions".

I added a variable "double *indexPages" introduced by commit 5262f7a4f to the 
documents and
also added its explanation. Please read and revise it because I'm a non-native 
English speaker.

Regards,

Tatsuro Yamada
NTT Open Source Software Center

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index c72c522175..2ef638dc7d 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -385,7 +385,8 @@ amcostestimate (PlannerInfo *root,
 Cost *indexStartupCost,
 Cost *indexTotalCost,
 Selectivity *indexSelectivity,
-double *indexCorrelation);
+double *indexCorrelation,
+double *indexPages);
 
Estimate the costs of an index scan.  This function is described fully
in , below.
@@ -1155,7 +1156,8 @@ amcostestimate (PlannerInfo *root,
 Cost *indexStartupCost,
 Cost *indexTotalCost,
 Selectivity *indexSelectivity,
-double *indexCorrelation);
+double *indexCorrelation,
+double *indexPages);
 
 
The first three parameters are inputs:
@@ -1197,7 +1199,7 @@ amcostestimate (PlannerInfo *root,
   
 
   
-   The last four parameters are pass-by-reference outputs:
+   The last five parameters are pass-by-reference outputs:
 

 
@@ -1236,6 +1238,15 @@ amcostestimate (PlannerInfo *root,
   
  
 
+
+
+ *indexPages
+ 
+  
+   Set to number of index leaf pages
+  
+ 
+

   
 
@@ -1283,6 +1294,11 @@ amcostestimate (PlannerInfo *root,
table.
   
 
+  
+   The indexPages should be set to the number of leaf pages.  This is used
+   to adjust the estimate for the cost of the disk access. 
+  
+
   
When loop_count is greater than one, the returned numbers
should be averages expected for any one scan of the index.


Re: Constraint documentation

2018-08-09 Thread Pantelis Theodosiou
On Thu, Aug 9, 2018 at 10:32 PM, Alvaro Herrera 
wrote:

> On 2018-Aug-07, Lætitia Avrot wrote:
>
> > Hi Peter,
> >
> > I understand what you're pointing at and I agree that it could be a good
> > thing to be able to dump/restore a table without problem.
> >
> > My point was that check constraints weren't supposed to be used that way
> > theorically (or maybe i'm mistaken ?) so I thought maybe we should just
> > inform the user that this kind of use of a check constraint is a misuse
> of
> > that feature.
>
> Tom Lane pointed out in another thread that the SQL standard lists
> feature F673 "Reads SQL-data routine invocations in CHECK constraints"
> which permits CHECK constraints to examine tables, so saying "you're not
> supposed to do this", while correct from a Postgres perspective, would
> be short-sighted ISTM, because we will make ourselves liars as soon as
> we implement the feature.
>
> I agree that we should point this out in *some* way, just not sure how.
> Maybe something like "Postgres does not currently support CHECK
> constraints containing queries, therefore we recommend to avoid them."
> I would not mention pg_dump by name, just say dumps may not restore
> depending on phase of moon.
>
> (BTW I'm not sure of the term "other tables".  You could have a query
> that references the same table ...)
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

I like this:

> "Postgres does not currently support CHECK constraints containing
queries, therefore we recommend to avoid them."

Perhaps adding:

> CHECK constraints are currently meant to be used as *row constraints*
only.
> Use - if possible - UNIQUE or EXCLUDE constraints. for constraints that
involve many or all rows of a table,
> and FOREIGN KEY constraints for cross table constraints.
> More complex constraints will be available when ASSERTION are implemented.

And then adding some warning about using functions in CHECK constraints to
bypass current limitations.

Pantelis Theodsoiou


Re: Postgres 11 release notes

2018-08-09 Thread Masahiko Sawada
Hi,

I found that the release note says "Add pgtrgm function
strict_word_similarity() to compute the similarity of whole words" but
I think "pgtrgm" should be "pg_trgm". Attached patch fixes it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_11_release_note.patch
Description: Binary data


NLS handling fixes.

2018-08-09 Thread Kyotaro HORIGUCHI
Hello. I found that backend's .po file has lines for GUC
descriptions but we won't see them anywhere.

The cause is GetConfigOptionByNum is fogetting to retrieve
translations for texts that have been marked with gettext_noop.

Regarding GUCs, group names, short desc and long desc have
translations so the attached first patch (fix_GUC_nls.patch) let
the translations appear.


Besides GUCs, I found another misuse of gettext_noop in
pg_GSS_recvauth. (2nd fix_GSSerr_nls.patch)

view_query_is_auto_updatable() and most of its caller are making
the same mistake in a similar way. All caller sites require only
translated message but bringing translated message around doesn't
seem good so the attached third patch adds _() to all required
places. (3rd fix_view_updt_nls.patch, 5th fix_vacuumdb_nls.patch)

psql is making a bit different mistake. \gdesc seems intending
the output columns names in NLS string but they actually
aren't. DescribeQuery is using PrintQueryResults but it is
intended to be used only for SendQuery. Replacing it with
printQuery let \gdesc print NLS string but I'm not sure it is the
right thing to fix this. (4th, fix psql_nls.patch)


plperl/plpgsql/tcl have another kind of problem in NLS. It
installs some GUC parameters and their descriptions actually have
a translation but in *its own* po file. So GetConfigOptionByNum
cannot get them. I don't have an idea to fix this for now.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b05fb209bb..7b023c0064 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8419,13 +8419,13 @@ GetConfigOptionByNum(int varnum, const char **values, bool *noshow)
 		values[2] = NULL;
 
 	/* group */
-	values[3] = config_group_names[conf->group];
+	values[3] = gettext(config_group_names[conf->group]);
 
 	/* short_desc */
-	values[4] = conf->short_desc;
+	values[4] = gettext(conf->short_desc);
 
 	/* extra_desc */
-	values[5] = conf->long_desc;
+	values[5] = gettext(conf->long_desc);
 
 	/* context */
 	values[6] = GucContext_Names[conf->context];
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index cecd104b4a..43a93e493a 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1227,7 +1227,7 @@ pg_GSS_recvauth(Port *port)
 		{
 			gss_delete_sec_context(&lmin_s, &port->gss->ctx, GSS_C_NO_BUFFER);
 			pg_GSS_error(ERROR,
-		 gettext_noop("accepting GSS security context failed"),
+		 gettext("accepting GSS security context failed"),
 		 maj_stat, min_stat);
 		}
 
@@ -1253,7 +1253,7 @@ pg_GSS_recvauth(Port *port)
 	maj_stat = gss_display_name(&min_stat, port->gss->name, &gbuf, NULL);
 	if (maj_stat != GSS_S_COMPLETE)
 		pg_GSS_error(ERROR,
-	 gettext_noop("retrieving GSS user name failed"),
+	 gettext("retrieving GSS user name failed"),
 	 maj_stat, min_stat);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eb2d33dd86..2ac7eae84b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10807,7 +10807,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 ereport(ERROR,
 		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		 errmsg("WITH CHECK OPTION is supported only on automatically updatable views"),
-		 errhint("%s", view_updatable_error)));
+		 errhint("%s", _(view_updatable_error;
 		}
 	}
 
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 7d4511c585..ffb71c0ea7 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -502,7 +502,7 @@ DefineView(ViewStmt *stmt, const char *queryString,
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("WITH CHECK OPTION is supported only on automatically updatable views"),
-	 errhint("%s", view_updatable_error)));
+	 errhint("%s", _(view_updatable_error;
 	}
 
 	/*
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index b56995925b..1894e812e6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1640,7 +1640,14 @@ DescribeQuery(const char *query, double *elapsed_msec)
 			}
 
 			if (OK && results)
-OK = PrintQueryResults(results);
+			{
+printQueryOpt myopt = pset.popt;
+
+myopt.nullPrint = NULL;
+myopt.title = NULL;
+myopt.translate_header = true;
+printQuery(results, &myopt, pset.queryFout, false, pset.logfile);
+			}
 
 			termPQExpBuffer(&buf);
 		}
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 60f8b1c394..9408b8f869 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -371,7 +371,7 @@ vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
 	{
 		if (stage != ANALYZE_NO_STAGE)
 			printf(_("%s: processing database \"%s\": %s\n"),
-   progname, PQdb(conn), stage_messages[stage]);
+   progname, PQdb(conn), _(stage_messages[stage])