Re: Connection slots reserved for replication

2019-01-02 Thread Oleksii Kliukin
On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
> Hi,
> 
> On Thu, 6 Dec 2018 at 00:55, Petr Jelinek  
> wrote:
> > > Does excluding WAL senders from the max_connections limit and including 
> > > max_wal_senders in MaxBackends also imply that we need to add 
> > > max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, 
> > > requiring its value on the replica to be not lower than the one on the 
> > > primary?
> > >
> >
> > I think it does, we need the proc slots for walsenders on the standby
> > same way we do for normal backends.
> 
> You are absolutely right. Attaching the new version of the patch.

Thank you. I've checked that the replica correctly complains when its value of 
max_wal_senders is lower than the one on the primary at v6. 

As stated in my previous comment, I think we should retain the specific error 
message on exceeding max_wal_senders, instead of showing the generic "too many 
clients already'.  Attached is the patch that fixes this small thing. I've also 
rebased it against the master and took a liberty of naming it v7.  It makes me 
wondering why don't we apply the same level of details to the regular out of 
connection message and don't show the actual value of max_connections in the 
error text?

The code diff to the previous version is rather small:


diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index d1a8113cb6..df073673f5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2273,8 +2273,8 @@ InitWalSenderSlot(void)
Assert(MyWalSnd == NULL);
 
/*
-* Find a free walsender slot and reserve it. If this fails, we must be
-* out of WalSnd structures.
+* Find a free walsender slot and reserve it. This must not fail due
+* to the prior check for free walsenders at InitProcess.
 */
for (i = 0; i < max_wal_senders; i++)
{
@@ -2310,13 +2310,7 @@ InitWalSenderSlot(void)
break;
}
}
-   if (MyWalSnd == NULL)
-   ereport(FATAL,
-   (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
-errmsg("number of requested standby 
connections "
-   "exceeds max_wal_senders 
(currently %d)",
-   max_wal_senders)));
-
+   Assert(MyWalSnd != NULL);
/* Arrange to clean up at walsender exit */
on_shmem_exit(WalSndKill, 0);
 }

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 33387fb71b..fe33fc42e3 100644
@ -341,6 +353,12 @@ InitProcess(void)
 * in the autovacuum case?
 */
SpinLockRelease(ProcStructLock);
+   if (am_walsender)
+   ereport(FATAL,
+   (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
+errmsg("number of requested standby 
connections "
+   "exceeds 
max_wal_senders (currently %d)",
+   max_wal_senders)));
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 errmsg("sorry, too many clients already")));


Cheers,
Oleksii

diff --git a/doc/src/sgml/high-availability.sgml 
b/doc/src/sgml/high-availability.sgml
index d8fd195da0..36ea513273 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2172,6 +2172,11 @@ LOG:  database system is ready to accept read only 
connections
  max_locks_per_transaction
 

+   
+
+ max_wal_senders
+
+   

 
  max_worker_processes
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 8d9d40664b..18665895cb 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -717,13 +717,13 @@ psql: could not connect to server: No such file or 
directory

 SEMMNI
 Maximum number of semaphore identifiers (i.e., sets)
-at least ceil((max_connections + 
autovacuum_max_workers + max_worker_processes + 5) / 16) plus room 
for other applications
+at least ceil((max_connections + 
autovacuum_max_workers + wax_wal_senders + max_worker_processes + 5) / 
16) plus room for other applications

 

 SEMMNS
 Maximum number of semaphores system-wide
-ceil((max_connections + autovacuum_max_workers + 
max_worker_processes + 5) / 16) * 17 plus room for other 
applications
+ceil((max_connections + autovacuum_max_workers + 
max_wal_senders + max_worker_processes + 5) / 16) * 17 plus room for 
other applications

 

@@ -780,13 +780,13 @@ psql: could not connect to server: No such file or 
dir

Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2019-01-02 Thread Nikolay Shaplov
В письме от среда, 2 января 2019 г. 0:05:10 MSK пользователь Alvaro Herrera 
написал:
> One thing I would like to revise here is to avoid unnecessary API change
> -- for example the RelationHasCascadedCheckOption macro does not really
> need to be renamed because it only applies to views, so there's no
> possible conflict with other relation types.  We can keep the original
> name and add a StaticAssert that the given relation is indeed a view.
> This should keep churn somewhat low.  Of course, this doesn't work for
> some options where you need a different one for different relation
> kinds, such as fillfactor, but that's unavoidable.

My intention was to make API names more consistent. If you are addressing View 
specific option, it is good to address it via View[Something] macros or 
function. Relations[Something] seems to be a bad name, since we are dealing 
not with any relation in general, but with a view relation.

This is internal API, right? If we change it everywhere, then it is changed 
and nothing will be broken?

May be it may lead to problems I am unable to see, but I still unable to see 
them so I can't make any judgment about it.

If you insist (you have much more experience than I) I would do as you advise, 
but still I do not understand.



Re: Connection slots reserved for replication

2019-01-02 Thread Petr Jelinek
Hi,

On 02/01/2019 10:21, Oleksii Kliukin wrote:
> On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
>> Hi,
>>
>> On Thu, 6 Dec 2018 at 00:55, Petr Jelinek  
>> wrote:
 Does excluding WAL senders from the max_connections limit and including 
 max_wal_senders in MaxBackends also imply that we need to add 
 max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, 
 requiring its value on the replica to be not lower than the one on the 
 primary?

>>>
>>> I think it does, we need the proc slots for walsenders on the standby
>>> same way we do for normal backends.
>>
>> You are absolutely right. Attaching the new version of the patch.
> 
> Thank you. I've checked that the replica correctly complains when its value 
> of max_wal_senders is lower than the one on the primary at v6. 
> 
> As stated in my previous comment, I think we should retain the specific error 
> message on exceeding max_wal_senders, instead of showing the generic "too 
> many clients already'.  Attached is the patch that fixes this small thing. 
> I've also rebased it against the master and took a liberty of naming it v7.  
> It makes me wondering why don't we apply the same level of details to the 
> regular out of connection message and don't show the actual value of 
> max_connections in the error text?
> 

+1

The patch generally looks good, but the documentation of max_wal_senders
needs updating. In config.sgml we say:

> WAL sender processes count towards the total number
> of connections, so this parameter's value must be less than
>  minus
> .

This is now misleading.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: FETCH FIRST clause WITH TIES option

2019-01-02 Thread Surafel Temesgen
On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra 
wrote:

> >
> > The attached patch include all the comment given by Tomas and i
> > check sql standard about LIMIT and this feature
> >
>
> Unfortunately, it seems the "limit" regression tests fail for some
> reason - the output mismatches the expected results for some reason. It
> seems as if the WITH TIES code affects ordering of the results within
> the group. See the attached file.
>
>
Yes the reason is the order of returned row is not always the same. I
remove other columns from the result set to get constant result

Regards

Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..ed7249daeb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 6792f9e86c..4fc3f61b26 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,

Re: Connection slots reserved for replication

2019-01-02 Thread Oleksii Kliukin


On Wed, Jan 2, 2019, at 11:02, Petr Jelinek wrote:

> The patch generally looks good, but the documentation of max_wal_senders
> needs updating. In config.sgml we say:
> 
> > WAL sender processes count towards the total number
> > of connections, so this parameter's value must be less than
> >  minus
> > .
> 
> This is now misleading.

Thank you. Attached the new version(called it v8) with the following changes to 
the documentation:

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305add..6634ce8cdc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -697,8 +697,7 @@ include_dir 'conf.d'
 

 The default value is three connections. The value must be less
-than max_connections minus
-.
+than max_connections.
 This parameter can only be set at server start.

   
@@ -3453,24 +3452,25 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows



-Specifies the maximum number of concurrent connections from
-standby servers or streaming base backup clients (i.e., the
-maximum number of simultaneously running WAL sender
-processes). The default is 10. The value 0 means replication is
-disabled. WAL sender processes count towards the total number
-of connections, so this parameter's value must be less than
- minus
-.
-Abrupt streaming client disconnection might leave an orphaned
-connection slot behind until
-a timeout is reached, so this parameter should be set slightly
-higher than the maximum number of expected clients so disconnected
-clients can immediately reconnect.  This parameter can only
-be set at server start.
+Specifies the maximum number of concurrent connections from standby
+servers or streaming base backup clients (i.e., the maximum number of
+simultaneously running WAL sender processes). The default is 10. The
+value 0 means replication is disabled.  Abrupt streaming client
+disconnection might leave an orphaned connection slot behind until a
+timeout is reached, so this parameter should be set slightly higher
+than the maximum number of expected clients so disconnected clients
+can immediately reconnect.  This parameter can only be set at server
+start.
 Also, wal_level must be set to
 replica or higher to allow connections from standby
 servers.

+
+   
+ When running a standby server, you must set this parameter to the
+ same or higher value than on the master server. Otherwise, queries
+ will not be allowed in the standby server.
+

   
 
Cheers,
Oleksii
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305add..6634ce8cdc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -697,8 +697,7 @@ include_dir 'conf.d'
 

 The default value is three connections. The value must be less
-than max_connections minus
-.
+than max_connections.
 This parameter can only be set at server start.

   
@@ -3453,24 +3452,25 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows



-Specifies the maximum number of concurrent connections from
-standby servers or streaming base backup clients (i.e., the
-maximum number of simultaneously running WAL sender
-processes). The default is 10. The value 0 means replication is
-disabled. WAL sender processes count towards the total number
-of connections, so this parameter's value must be less than
- minus
-.
-Abrupt streaming client disconnection might leave an orphaned
-connection slot behind until
-a timeout is reached, so this parameter should be set slightly
-higher than the maximum number of expected clients so disconnected
-clients can immediately reconnect.  This parameter can only
-be set at server start.
+Specifies the maximum number of concurrent connections from standby
+servers or streaming base backup clients (i.e., the maximum number of
+simultaneously running WAL sender processes). The default is 10. The
+value 0 means replication is disabled.  Abrupt streaming client
+disconnection might leave an orphaned connection slot behind until a
+timeout is reached, so this parameter should be set slightly higher
+than the maximum number of expected clients so disconnected clients
+can immediately reconnect.  This parameter can only be set at server
+start.
 Also, wal_level must be set to
 replica or higher to allow connections from standby
 servers.

+
+   
+ When running a standby server, you 

Re: Connection slots reserved for replication

2019-01-02 Thread Alexander Kukushkin
Hi,

On Wed, 2 Jan 2019 at 12:17, Oleksii Kliukin  wrote:

> Thank you. Attached the new version(called it v8) with the following changes 
> to the documentation:

Thank you for jumping on it. Your changes look good to me.


I was also thinking about changing the value in PG_CONTROL_VERSION,
because we added the new field into the control file, but decided to
leave this change to committer.

Regards,
--
Alexander Kukushkin



Re: pg_dump multi VALUES INSERT

2019-01-02 Thread Surafel Temesgen
Hi,
Thank you for looking at it
On Mon, Dec 31, 2018 at 12:38 PM David Rowley 
wrote:

> Just looking at the v5 patch, it seems not to handle 0 column tables
> correctly.
>
> For example:
>
> # create table t();
> # insert into t default values;
> # insert into t default values;
>
> $ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql
>
> # \i dump.sql
> [...]
> INSERT 0 1
> psql:dump.sql:35: ERROR:  syntax error at or near ")"
> LINE 1: );
> ^
>

The attach patch contain a fix for it
Regards
Surafel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 2015410a42..ee94d1d293 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --insert-multi
+  
+   
+Specify the number of values per INSERT command.
+This will make the dump file smaller than --inserts
+and it is faster to reload but lack per row data lost on error
+instead entire affected insert statement data lost.
+   
+  
+ 
+
  
   --load-via-partition-root
   
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 4a2e122e2d..73a243ecb0 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -72,6 +72,7 @@ typedef struct _restoreOptions
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
@@ -144,6 +145,7 @@ typedef struct _dumpOptions
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_multiple;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 341b1a51f2..3176a71262 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -313,6 +313,7 @@ main(int argc, char **argv)
 	int			plainText = 0;
 	ArchiveFormat archiveFormat = archUnknown;
 	ArchiveMode archiveMode;
+	char   *p;
 
 	static DumpOptions dopt;
 
@@ -359,6 +360,7 @@ main(int argc, char **argv)
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
 		{"inserts", no_argument, &dopt.dump_inserts, 1},
+		{"insert-multi", required_argument, NULL, 8},
 		{"lock-wait-timeout", required_argument, NULL, 2},
 		{"no-tablespaces", no_argument, &dopt.outputNoTablespaces, 1},
 		{"quote-all-identifiers", no_argument, "e_all_identifiers, 1},
@@ -557,6 +559,27 @@ main(int argc, char **argv)
 dosync = false;
 break;
 
+			case 8:			/* inserts values number */
+errno = 0;
+dopt.dump_inserts_multiple = strtol(optarg, &p, 10);
+if (p == optarg || *p != '\0')
+{
+	write_msg(NULL, "argument of --insert-multi must be a number\n");
+	exit_nicely(1);
+}
+if (errno == ERANGE)
+{
+	write_msg(NULL, "argument of --insert-multi exceeds integer range.\n");
+	exit_nicely(1);
+}
+if (dopt.dump_inserts_multiple < 0)
+{
+	write_msg(NULL, "argument of --insert-multi must be positive number\n");
+	exit_nicely(1);
+}
+
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -607,8 +630,9 @@ main(int argc, char **argv)
 	if (dopt.if_exists && !dopt.outputClean)
 		exit_horribly(NULL, "option --if-exists requires option -c/--clean\n");
 
-	if (dopt.do_nothing && !(dopt.dump_inserts || dopt.column_inserts))
-		exit_horribly(NULL, "option --on-conflict-do-nothing requires option --inserts or --column-inserts\n");
+	if (dopt.do_nothing && !(dopt.dump_inserts || dopt.column_inserts ||
+		dopt.dump_inserts_multiple))
+		exit_horribly(NULL, "option --on-conflict-do-nothing requires option --inserts , --insert-multi or --column-inserts\n");
 
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
@@ -877,6 +901,7 @@ main(int argc, char **argv)
 	ropt->use_setsessauth = dopt.use_setsessauth;
 	ropt->disable_dollar_quoting = dopt.disable_dollar_quoting;
 	ropt->dump_inserts = dopt.dump_inserts;
+	ropt->dump_inserts_multiple = dopt.dump_inserts_multiple;
 	ropt->no_comments = dopt.no_comments;
 	ropt->no_publications = dopt.no_publications;
 	ropt->no_security_labels = dopt.no_security_labels;
@@ -967,6 +992,7 @@ help(const char *progname)
 	printf(_("  --exclude-table-data=TABLE   do NOT dump data for the named table(s)\n"));
 	printf(_("  --if-exists  use IF EXISTS when dropping objects\n"));
 	printf(_("  --insertsdump data as INSERT commands, rather than COPY\n"));
+	printf(_("  --insert-multi   number of values per INSERT command\n"));
 	printf(_("  --load-via-partition-rootload partitions via the root table\n"));
 	printf(_("  --no-commentsdo not dump com

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-02 Thread Stephen Frost
Greetings,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> On 2018-Dec-14, Stephen Frost wrote:
> 
> > > My vote goes to put the keyword inside of and exclusively in the
> > > parenthesized option list.
> > 
> > I disagree with the idea of exclusively having concurrently be in the
> > parentheses.  'explain buffers' is a much less frequently used option
> > (though that might, in part, be because it's a bit annoying to write out
> > explain (analyze, buffers) select...; I wonder if we could have a way to
> > say "if I'm running analyze, I always want buffers"...),
> 
> I'm skeptical.  I think EXPLAIN ANALYZE is more common because it has
> more than one decade of advantage compared to the more detailed option
> list.  Yes, it's also easier, but IMO it's a brain thing (muscle
> memory), not a fingers thing.

I would argue that it's both.

> > but concurrently reindexing a table (or index..) is going to almost
> > certainly be extremely common, perhaps even more common than *not*
> > reindexing concurrently.
> 
> Well, users can use the reindexdb utility and save some keystrokes.

That's a really poor argument as those unix utilities are hardly ever
used, in my experience.

> Anyway we don't typically add redundant ways to express the same things.
> Where we have them, it's just because the old way was there before, and
> we added the extensible way later.  Adding two in the first appearance
> of a new feature seems absurd to me.

SQL allows many, many different ways to express the same thing.  I agree
that we haven't done that much in our utility commands, but I don't see
that as an argument against doing so, just that we haven't (previously)
really had the need- because most of the time we don't have a bunch of
different options where we want to have a list.

> After looking at the proposed grammar again today and in danger of
> repeating myself, IMO allowing the concurrency keyword to appear outside
> the parens would be a mistake.  Valid commands:
> 
>   REINDEX (VERBOSE, CONCURRENTLY) TABLE foo;
>   REINDEX (CONCURRENTLY) INDEX bar;

This discussion hasn't changed my opinion, and, though I'm likely
repeating myself as well, I also agree with the down-thread comment that
this ship really has already sailed.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Noah Misch  writes:
> > Either of those solutions sounds fine.  Like last time, I'll vote for 
> > explicitly
> > verifying leakproofness.
> 
> Yeah, I'm leaning in that direction as well.  Other than comparisons
> involving strings, it's not clear that we'd gain much from insisting
> on leakproofness in general, and it seems like it might be rather a
> large burden to put on add-on data types.

While I'd actually like it if we required leakproofness for what we
ship, I agree that we shouldn't blindly assume that add-on data types
are always leakproof and that then requires that we explicitly verify
it.  Perhaps an argument can be made that there are some cases where
what we ship can't or shouldn't be leakproof for usability but, ideally,
those would be relatively rare exceptions that don't impact common
use-cases.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: add_partial_path() may remove dominated path but still in use

2019-01-02 Thread Kohei KaiGai
2018年12月31日(月) 22:25 Amit Kapila :
>
> On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai  wrote:
> >
> > 2018年12月31日(月) 13:10 Amit Kapila :
> > >
> > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai  wrote:
> > > > 2018年12月30日(日) 4:12 Tom Lane :
> > > > >
> > > > > Kohei KaiGai  writes:
> > > > > > 2018年12月29日(土) 1:44 Tom Lane :
> > > > > >> However, first I'd like to know why this situation is arising in 
> > > > > >> the first
> > > > > >> place.  To have the situation you're describing, we'd have to have
> > > > > >> attempted to make some Gather paths before we have all the partial 
> > > > > >> paths
> > > > > >> for the relation they're for.  Why is that a good thing to do?  It 
> > > > > >> seems
> > > > > >> like such Gathers are necessarily being made with incomplete 
> > > > > >> information,
> > > > > >> and we'd be better off to fix things so that none are made till 
> > > > > >> later.
> > > > >
> > > > > > Because of the hook location, Gather-node shall be constructed with 
> > > > > > built-in
> > > > > > and foreign partial scan node first, then extension gets a chance 
> > > > > > to add its
> > > > > > custom paths (partial and full).
> > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next 
> > > > > > to the
> > > > > > generate_gather_paths().
> > > > >
> > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > in which extensions are allowed to add partial paths, and that
> > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > >
> > > +1.  This idea sounds sensible to me.
> > >
> > > > >
> > > > I have almost same opinion, but the first hook does not need to be
> > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we 
> > > > can
> > > > add both of partial and regular paths here, then generate_gather_paths()
> > > > may generate a Gather-path on top of the best partial-path.
> > > >
> > >
> > > Won't it be confusing for users if we allow both partial and full
> > > paths in first hook and only full paths in the second hook?
> > > Basically, in many cases, the second hook won't be of much use.  What
> > > advantage you are seeing in allowing both partial and full paths in
> > > the first hook?
> > >
> > Two advantages. The first one is, it follows same manner of
> > set_foreign_pathlist()
> > which allows to add both of full and partial path if FDW supports 
> > parallel-scan.
> > The second one is practical. During the path construction, extension needs 
> > to
> > check availability to run (e.g, whether operators in WHERE-clause is 
> > supported
> > on GPU device...), calculate its estimated cost and so on. Not a small
> > portion of
> > them are common for both of full and partial path. So, if the first
> > hook accepts to
> > add both kind of paths at once, extension can share the common properties.
> >
>
> You have a point, though I am not sure how much difference it can
> create for cost computation as ideally, both will have different
> costing model.  I understand there are some savings by avoiding some
> common work, is there any way to cache the required information?
>
I have no idea for the clean way.
We may be able to have an opaque pointer for extension usage, however,
it may be problematic if multiple extension uses the hook.

> > Probably, the second hook is only used for a few corner case where an 
> > extension
> > wants to manipulate path-list already built, like pg_hint_plan.
> >
>
> Okay, but it could be some work for extension authors who are using
> the current hook, not sure they would like to divide the work between
> first and second hook.
>
I guess they don't divide their code, but choose either of them.
In case of PG-Strom, even if there are two hooks around the point, it will use
the first hook only, unless it does not prohibit to call add_path() here.
However, some adjustments are required. Its current implementation makes
GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
is called after the generate_gather_paths().
Once we could choose the first hook, no need to make a GatherPath by itself,
because PostgreSQL-core will make the path if partial custom-path is enough
reasonable cost. Likely, this adjustment is more preferable one.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 



Re: using expression syntax for partition bounds

2019-01-02 Thread Peter Eisentraut
On 26/11/2018 05:58, Amit Langote wrote:
> On 2018/11/09 14:38, Amit Langote wrote:
>> Rebased due to change in addRangeTableEntryForRelation's API.
> 
> Rebased again due to changes in psql's describe output for partitioned tables.

Review:

Is "partition bound" the right term?  For list partitioning, it's not
really a bound.  Maybe it's OK.

Keep the ordering of EXPR_KIND_PARTITION_BOUND in the various switch
statements consistent.

I don't see any treatment of non-immutable expressions.  There is a test
case with a non-immutable cast that was removed, but that's all.  There
was considerable discussion earlier in the thread on whether
non-immutable expressions should be allowed.  I'm not sure what the
resolution was, but in any case there should be documentation and tests
of the outcome.

The collation handling might need another look.  The following is
allowed without any diagnostic:

CREATE TABLE t2 (
a text COLLATE "POSIX"
) PARTITION BY RANGE (a);
CREATE TABLE t2a PARTITION OF t2 FOR VALUES FROM ('foo' COLLATE "C") TO
('xyz');

I think the correct behavior would be that an explicit collation in the
partition bound expression is an error.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Using vim for developing porstres wiki article

2019-01-02 Thread James Coleman
> I've been trying to use vim for postgres development some yeas ago, but I did
> not manage to do it for log time, as I quit the job etc.
>
> Now I am trying again, but I've lost my .vimrc and notes and had to start from
> the very beginning. I vaguely remember what tools I've been using, but I have
> to google for them as they are not listed anywhere, and I have to meet all the
> problems I've met before, again.
>
> So I decided to write it down to a wiki article, while I am restoring the
> configuration so I do not have to remember them for the third time, if I loose
> .vimrc again. :-)

I like expanding the small section in the Developer FAQ into a more
detailed article.

But the new article is missing several things relative to the old
instructions, and I don't think the changes should have been made to
that page until this was more fully baked.

A few specific thoughts:

1. The new article makes it more difficult to get started since every
setting would need to be copied separately. I think there should be a
cohesive block of options that we recommend for copy/paste along with
inline comments explaining what each one does.

2. There's a bit of conflation between general Vim setup and Postgres
specific development. The old section I think was mostly geared toward
someone who uses Vim but wants the Postgres-specific parts, and that's
a valuable use case. Perhaps we could split the article into a section
on general Vim setup (for example, turning on syntax) and a section on
"if you also already use Vim, there's a way to do project-specific
settings and the ones you should use".

3. Several of the old specified options didn't make it into the new
article's details and are a loss. I noticed this particularly since
since just 2 or 3 days ago I myself had edited this section to add the
softtabstop=0 option (the Vim default) so that if soft tabs are
enabled in someone's general Vim config then hitting the tab key won't
result in inserting 2 spaces while working in the Postgres source.

Thanks,
James Coleman



Re: [PATCH] check for ctags utility in make_ctags

2019-01-02 Thread Peter Eisentraut
On 01/01/2019 17:44, Nikolay Shaplov wrote:
> +if [ ! $(command -v ctags) ]
> +then
> +  echo "'ctags' utility is not found" 1>&2
> +  echo "Please install 'ctags' to run make_ctags" 1>&2
> +  exit 1
> +fi

This assumes that the ctags and etags programs are part of packages of
the same name.  I don't think that is always the case.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: chained transactions

2019-01-02 Thread Peter Eisentraut
On 26/12/2018 09:20, Fabien COELHO wrote:
> I try to avoid global variables when possible as a principle, because I 
> paid to learn that they are bad:-) Maybe I'd do a local struct, declare an 
> instance within the function, and write two functions to dump/restore the 
> transaction status variables into the referenced struct. Maybe this is not 
> worth the effort.

Those are reasonable alternatives, I think, but it's a bit overkill in
this case.

>>> About the tests: I'd suggest to use more options on the different tests,
>>> eg SERIALIZABLE, READ ONLY… Also ISTM that tests should show
>>> transaction_read_only value as well.
>>
>> OK, updated a bit.  (Using READ ONLY doesn't work because then you can't
>> write anything inside the transaction.)
> 
> Sure. Within a read-only tx, it could check that transaction_read_only is 
> on, and still on when chained, though.

I think the tests prove the point that the values are set and unset and
reset in various scenarios.  We don't need to test every single
combination, that's not the job of this patch.

> The documentation should probably tell in the compatibility sections that 
> AND CHAIN is standard conforming.

Good point.  Updated that.

> In the automatic rollback tests, maybe I'd insert 3 just once, and use 
> another value for the chained transaction.

done

> Tests may eventually vary BEGIN/START, COMMIT/END, ROLLBACK/ABORT.

Those work on the parser level, so don't really affect this patch.  It
would make the tests more confusing to read, I think.

> As you added a SAVEPOINT, maybe I'd try rollbacking to it.

That would error out and invalidate the subsequent tests, so it's not
easily possible.  We could write a totally separate test for it, but I'm
not sure what that would be proving.

Updated patches attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ec4e153976f21d48f66bde1403d569ce63f8638e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 2 Jan 2019 15:09:04 +0100
Subject: [PATCH v4 1/2] Transaction chaining

Add command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN, which
start new transactions with the same transaction characteristics as the
just finished one, per SQL standard.
---
 doc/src/sgml/ref/abort.sgml|  14 +-
 doc/src/sgml/ref/commit.sgml   |  19 +-
 doc/src/sgml/ref/end.sgml  |  14 +-
 doc/src/sgml/ref/rollback.sgml |  19 +-
 src/backend/access/transam/xact.c  |  73 +++-
 src/backend/catalog/sql_features.txt   |   2 +-
 src/backend/nodes/copyfuncs.c  |   1 +
 src/backend/nodes/equalfuncs.c |   1 +
 src/backend/parser/gram.y  |  19 +-
 src/backend/tcop/utility.c |   4 +-
 src/bin/psql/tab-complete.c|   8 +-
 src/include/access/xact.h  |   4 +-
 src/include/nodes/parsenodes.h |   1 +
 src/test/regress/expected/transactions.out | 191 +
 src/test/regress/sql/transactions.sql  |  63 +++
 15 files changed, 408 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/ref/abort.sgml b/doc/src/sgml/ref/abort.sgml
index 21799d2a83..0372913365 100644
--- a/doc/src/sgml/ref/abort.sgml
+++ b/doc/src/sgml/ref/abort.sgml
@@ -21,7 +21,7 @@
 
  
 
-ABORT [ WORK | TRANSACTION ]
+ABORT [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
 
  
 
@@ -51,6 +51,18 @@ Parameters
  
 

+
+   
+AND CHAIN
+
+ 
+  If AND CHAIN is specified, a new transaction is
+  immediately started with the same transaction characteristics (see ) as the just finished one.  Otherwise,
+  no new transaction is started.
+ 
+
+   
   
  
 
diff --git a/doc/src/sgml/ref/commit.sgml b/doc/src/sgml/ref/commit.sgml
index b2e8d5d180..96a018e6aa 100644
--- a/doc/src/sgml/ref/commit.sgml
+++ b/doc/src/sgml/ref/commit.sgml
@@ -21,7 +21,7 @@
 
  
 
-COMMIT [ WORK | TRANSACTION ]
+COMMIT [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
 
  
 
@@ -48,6 +48,18 @@ Parameters
  
 

+
+   
+AND CHAIN
+
+ 
+  If AND CHAIN is specified, a new transaction is
+  immediately started with the same transaction characteristics (see ) as the just finished one.  Otherwise,
+  no new transaction is started.
+ 
+
+   
   
  
 
@@ -79,9 +91,8 @@ Examples
   Compatibility
 
   
-   The SQL standard only specifies the two forms
-   COMMIT and COMMIT
-   WORK. Otherwise, this command is fully conforming.
+   The command COMMIT conforms to the SQL standard.  The
+   form COMMIT TRANSACTION is a PostgreSQL extension.
   
  
 
diff --git a/doc/src/sgml/ref/end.sgml b/doc/src/sgml/ref/end.sgml
index 7523315f34..8b8f4f0dbb 100644
--- a/doc/src/sgml/ref/end.sgml
+++ b/doc/src/sgml/ref/end.sgml
@@ -21,7 +21,7 @@
 
  
 
-END [ WORK | TRANSACTION ]
+END [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
 
  
 
@@ -50,6 +50,18 @@ Parameters
  

Re: chained transactions

2019-01-02 Thread Peter Eisentraut
On 26/12/2018 09:47, Fabien COELHO wrote:
> I'm wary of changing the SPI_commit and SPI_rollback interfaces which are 
> certainly being used outside the source tree and could break countless 
> code, and it seems quite unclean that commit and rollback would do 
> anything else but committing or rollbacking.

These are new as of PG11 and are only used by PL implementations that
support transaction control in procedures, of which there are very few.
We could write separate functions for the "and chain" variants, but I
hope that eventually all PLs will support chaining (because that's
really what you ought to be using in procedures), and so then the
non-chaining interfaces would end up being unused.

> ISTM that it should be kept as is and only managed from the PL/pgsql 
> exec_stmt_* functions, which have to be adapted anyway. That would 
> minimise changes and not break existing code.

But we want other PLs to be able to use this too.

> If SPI_* functions are modified, which I would advise against, I find 
> keeping the next assignment in the chained case doubtful:
> 
>   _SPI_current->internal_xact = false;

This is correct as is.  The internal_xact flag prevents
CommitTransactionCommand() and AbortCurrentTransaction() from releasing
SPI memory, so it only needs to be set around those calls.  Afterwards
it's unset again so that a top-level transaction end will end up freeing
SPI memory.

Maybe something like internal_xact_end would have been a clearer name.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

2019-01-02 Thread Tomas Vondra



On 1/2/19 11:51 AM, Surafel Temesgen wrote:
> 
> 
> On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> >
> >     The attached patch include all the comment given by Tomas and i
> >     check sql standard about LIMIT and this feature
> >
> 
> Unfortunately, it seems the "limit" regression tests fail for some
> reason - the output mismatches the expected results for some reason. It
> seems as if the WITH TIES code affects ordering of the results within
> the group. See the attached file.
> 
> 
> Yes the reason is the order of returned row is not always the same. I
> remove other columns from the result set to get constant result
> 

Thanks, that seems reasonable.

After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause). Consider a
query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate
will be 10, but in fact we know that it's likely to produce ~1000 rows
(because that's the average group size).

So I think the patch should tweak the estimate to be

  limitCount + (avgGroupSize/2);

or perhaps

  Max(avgGroupSize, limitCount + (avgGroupSize/2))

The 1/2 is there because we don't know where the group starts.

Of course, using average group size like this is rather crude, but it's
about the best thing we can do. In principle, increasing the cardinality
estimate is the right thing to do.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: explain plans with information about (modified) gucs

2019-01-02 Thread Peter Eisentraut
On 14/12/2018 12:41, Tomas Vondra wrote:
> 1) names of the options
> 
> I'm not particularly happy with calling the option "gucs" - it's an
> acronym and many users have little idea what GUC stands for. So I think
> a better name would be desirable, but I'm not sure what would that be.
> Options? Parameters?

"settings"

(see pg_settings, SET)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: explain plans with information about (modified) gucs

2019-01-02 Thread Tomas Vondra


On 1/2/19 4:20 PM, Peter Eisentraut wrote:
> On 14/12/2018 12:41, Tomas Vondra wrote:
>> 1) names of the options
>>
>> I'm not particularly happy with calling the option "gucs" - it's an
>> acronym and many users have little idea what GUC stands for. So I think
>> a better name would be desirable, but I'm not sure what would that be.
>> Options? Parameters?
> 
> "settings"
> 
> (see pg_settings, SET)
> 

Yup, that seems like a better choice. Thanks.

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: GiST VACUUM

2019-01-02 Thread Heikki Linnakangas

On 28/10/2018 19:32, Andrey Borodin wrote:

Hi everyone!


2 окт. 2018 г., в 6:14, Michael Paquier  написал(а):
Andrey, your latest patch does not apply.  I am moving this to the next
CF, waiting for your input.


I'm doing preps for CF.
Here's rebased version.


Thanks, I had another look at these.

In patch #1, to do the vacuum scan in physical order:

* The starting NSN was not acquired correctly for unlogged and temp 
relations. They don't use WAL, so their NSN values are based on the 
'unloggedLSN' counter, rather than current WAL insert pointer. So must 
use gistGetFakeLSN() rather than GetInsertRecPtr() for them. Fixed that.


* Adjusted comments a little bit, mostly by copy-pasting the 
better-worded comments from the corresponding nbtree code, and ran pgindent.


I think this is ready to be committed, except that I didn't do any 
testing. We discussed the need for testing earlier. Did you write some 
test scripts for this, or how have you been testing?



Patch #2:

* Bitmapset stores 32 bit signed integers, but BlockNumber is unsigned. 
So this will fail with an index larger than 2^31 blocks. That's perhaps 
academical, I don't think anyone will try to create a 16 TB GiST index 
any time soon. But it feels wrong to introduce an arbitrary limitation 
like that.


* I was surprised that the bms_make_empty() function doesn't set the 
'nwords' field to match the allocated size. Perhaps that was 
intentional, so that you don't need to scan the empty region at the end, 
when you scan through all matching bits? Still, seems surprising, and 
needs a comment at least.


* We're now scanning all internal pages in the 2nd phase. Not only those 
internal pages that contained downlinks to empty leaf pages. That's 
probably OK, but the comments need updating on that.


* In general, comments need to be fixed and added in several places. For 
example, there's no clear explanation on what the "delete XID", stored 
in pd_prune_xid, means. (I know what it is, because I'm familiar with 
the same mechanism in B-tree, but it's not clear from the code itself.)


These can be fixed, they're not show-stoppers, but patch #2 isn't quite 
ready yet.


- Heikki



Re: GiST VACUUM

2019-01-02 Thread Heikki Linnakangas

On 02/01/2019 17:35, Heikki Linnakangas wrote:

On 28/10/2018 19:32, Andrey Borodin wrote:

Hi everyone!


2 окт. 2018 г., в 6:14, Michael Paquier  написал(а):
Andrey, your latest patch does not apply.  I am moving this to the next
CF, waiting for your input.


I'm doing preps for CF.
Here's rebased version.


Thanks, I had another look at these.


Here are new patch versions, with the fixes I mentioned. Forgot to 
attach these earlier.


- Heikki
>From 46456dbf1d07aaf3e6963035a02aaa060decace3 Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Sun, 28 Oct 2018 22:20:58 +0500
Subject: [PATCH 1/2] Physical GiST scan in VACUUM v18-heikki

---
 src/backend/access/gist/gist.c   |   8 +-
 src/backend/access/gist/gistvacuum.c | 430 +++
 2 files changed, 247 insertions(+), 191 deletions(-)

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index a2cb84800e8..d42e810c6b3 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -38,7 +38,7 @@ static bool gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
  bool unlockbuf, bool unlockleftchild);
 static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
 GISTSTATE *giststate, List *splitinfo, bool releasebuf);
-static void gistvacuumpage(Relation rel, Page page, Buffer buffer,
+static void gistprunepage(Relation rel, Page page, Buffer buffer,
 			   Relation heapRel);
 
 
@@ -261,7 +261,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 	 */
 	if (is_split && GistPageIsLeaf(page) && GistPageHasGarbage(page))
 	{
-		gistvacuumpage(rel, page, buffer, heapRel);
+		gistprunepage(rel, page, buffer, heapRel);
 		is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
 	}
 
@@ -1544,11 +1544,11 @@ freeGISTstate(GISTSTATE *giststate)
 }
 
 /*
- * gistvacuumpage() -- try to remove LP_DEAD items from the given page.
+ * gistprunepage() -- try to remove LP_DEAD items from the given page.
  * Function assumes that buffer is exclusively locked.
  */
 static void
-gistvacuumpage(Relation rel, Page page, Buffer buffer, Relation heapRel)
+gistprunepage(Relation rel, Page page, Buffer buffer, Relation heapRel)
 {
 	OffsetNumber deletable[MaxIndexTuplesPerPage];
 	int			ndeletable = 0;
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 5948218c779..4fb32bf76bf 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -21,6 +21,34 @@
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
 
+/* Working state needed by gistbulkdelete */
+typedef struct
+{
+	IndexVacuumInfo *info;
+	IndexBulkDeleteResult *stats;
+	IndexBulkDeleteCallback callback;
+	void	   *callback_state;
+	GistNSN		startNSN;
+	BlockNumber totFreePages;	/* true total # of free pages */
+} GistVacState;
+
+static void gistvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
+			   IndexBulkDeleteCallback callback, void *callback_state);
+static void gistvacuumpage(GistVacState *vstate, BlockNumber blkno,
+			   BlockNumber orig_blkno);
+
+IndexBulkDeleteResult *
+gistbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
+			   IndexBulkDeleteCallback callback, void *callback_state)
+{
+	/* allocate stats if first time through, else re-use existing struct */
+	if (stats == NULL)
+		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+
+	gistvacuumscan(info, stats, callback, callback_state);
+
+	return stats;
+}
 
 /*
  * VACUUM cleanup: update FSM
@@ -28,104 +56,36 @@
 IndexBulkDeleteResult *
 gistvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
-	Relation	rel = info->index;
-	BlockNumber npages,
-blkno;
-	BlockNumber totFreePages;
-	double		tuplesCount;
-	bool		needLock;
-
 	/* No-op in ANALYZE ONLY mode */
 	if (info->analyze_only)
 		return stats;
 
-	/* Set up all-zero stats if gistbulkdelete wasn't called */
+	/*
+	 * If gistbulkdelete was called, we need not do anything, just return the
+	 * stats from the latest gistbulkdelete call.  If it wasn't called, we
+	 * still need to do a pass over the index, to obtain index statistics.
+	 */
 	if (stats == NULL)
+	{
 		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+		gistvacuumscan(info, stats, NULL, NULL);
+	}
 
 	/*
-	 * Need lock unless it's local to this backend.
+	 * It's quite possible for us to be fooled by concurrent page splits into
+	 * double-counting some index tuples, so disbelieve any total that exceeds
+	 * the underlying heap's count ... if we know that accurately.  Otherwise
+	 * this might just make matters worse.
 	 */
-	needLock = !RELATION_IS_LOCAL(rel);
-
-	/* try to find deleted pages */
-	if (needLock)
-		LockRelationForExtension(rel, ExclusiveLock);
-	npages = RelationGetNumberOfBlocks(rel);
-	if (needLock)
-		UnlockRelationForExtension(rel, ExclusiveLock);
-
-	totFreePages = 0;
-	tuplesCount = 0;
-	for (blkno = GIST

backslash-dot quoting in COPY CSV

2019-01-02 Thread Daniel Verite
 Hi,

The doc on COPY CSV says about the backslash-dot sequence:

  To avoid any misinterpretation, a \. data value appearing as a
  lone entry on a line is automatically quoted on output, and on
  input, if quoted, is not interpreted as the end-of-data marker

However this quoting does not happen when \. is already part
of a quoted field. Example:

COPY (select 'somevalue', E'foo\n\\.\nbar') TO STDOUT CSV;

outputs:

somevalue,"foo
\.
bar"

which conforms to the CSV rules, by which we are not allowed
to replace \. by anything AFAICS.
The trouble is, when trying to import this back with COPY FROM,
it will error out at the backslash-dot and not import anything.
Furthermore, if these data are meant to be embedded into a
script, it creates a potential risk of SQL injection.

It is a known issue? I haven't found previous discussions on this.
It looks to me like the ability of backslash-dot to be an end-of-data
marker should be neutralizable for CSV. When the data is not embedded,
it's not needed anyway, and when it's embedded, we could surely think
of alternatives.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Tom Lane
Mark  writes:
> I have a long running job which deletes all of the common_student table and
> then repopulates it. It takes long time to load all the other data and
> commit the transaction. I didn't think the delete inside the transaction
> would have any effect until it is commited or rolled back.
> I will have to rewrite the application so it updates the existing rows
> rather than deleting all and then inserting.

Hmm ... I'm not sure if that will actually make things better.  The root
of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:

 * We count delete-in-progress rows as still live, using
 * the same reasoning given above; but we don't bother to
 * include them in the sample.

The "reasoning given above" is a reference to what happens for
INSERT_IN_PROGRESS tuples:

 * Insert-in-progress rows are not counted.  We assume
 * that when the inserting transaction commits or aborts,
 * it will send a stats message to increment the proper
 * count.  This works right only if that transaction ends
 * after we finish analyzing the table; if things happen
 * in the other order, its stats update will be
 * overwritten by ours.  However, the error will be large
 * only if the other transaction runs long enough to
 * insert many tuples, so assuming it will finish after us
 * is the safer option.

Now the problem with this, from your perspective, is that *neither*
INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
ANALYZE's data sample.  So a long-running update transaction will
cause all the rows it changes to be excluded from the sample until
commit.  This seems like a bad thing, and it definitely means that
adjusting your app as you're contemplating won't help.

In order to handle the bulk-update scenario sanely, it seems like
we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
to be included.  And it looks like, for consistency with the row-counting
logic, the one that's included needs to be DELETE_IN_PROGRESS.  This
is a slightly annoying conclusion, because it means we're accumulating
stats that we know are likely to soon be obsolete, but I think sampling
INSERT_IN_PROGRESS tuples instead would lead to very strange results
in some cases.

In short, I think we want to do this to the DELETE_IN_PROGRESS logic:

if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
deadrows += 1;
else
+   {
+   sample_it = true;
liverows += 1;
+   }

with suitable adjustment of the adjacent comment.

Thoughts?

regards, tom lane



Re: [PATCH] check for ctags utility in make_ctags

2019-01-02 Thread Tom Lane
Peter Eisentraut  writes:
> On 01/01/2019 17:44, Nikolay Shaplov wrote:
>> +if [ ! $(command -v ctags) ]
>> +then
>> +  echo "'ctags' utility is not found" 1>&2
>> +  echo "Please install 'ctags' to run make_ctags" 1>&2
>> +  exit 1
>> +fi

> This assumes that the ctags and etags programs are part of packages of
> the same name.  I don't think that is always the case.

In fact, that's demonstrably not so: on my RHEL6 and Fedora boxes,
/usr/bin/etags isn't owned by any package, because it's a symlink
managed by the "alternatives" system.  It points to /usr/bin/etags.emacs
which is owned by the emacs-common package.  So dropping the advice
about how to fix the problem seems like a good plan.

regards, tom lane



Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Mark
Hi Tom,

Thanks for your reply.

Am I correct in my understanding that any row that has been modified (i.e.
UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
in the sample?

I'm going to rework the application so there is less time between the
DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
this smaller time window. Looks like your patch will help if this happens.

Then again, it also seems no one has had a problem with its current
behaviour (for 11 years!).

Thanks,

Mark

On Wed, 2 Jan 2019 at 16:11 Tom Lane  wrote:

> Mark  writes:
> > I have a long running job which deletes all of the common_student table
> and
> > then repopulates it. It takes long time to load all the other data and
> > commit the transaction. I didn't think the delete inside the transaction
> > would have any effect until it is commited or rolled back.
> > I will have to rewrite the application so it updates the existing rows
> > rather than deleting all and then inserting.
>
> Hmm ... I'm not sure if that will actually make things better.  The root
> of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:
>
>  * We count delete-in-progress rows as still live,
> using
>  * the same reasoning given above; but we don't bother
> to
>  * include them in the sample.
>
> The "reasoning given above" is a reference to what happens for
> INSERT_IN_PROGRESS tuples:
>
>  * Insert-in-progress rows are not counted.  We assume
>  * that when the inserting transaction commits or
> aborts,
>  * it will send a stats message to increment the proper
>  * count.  This works right only if that transaction
> ends
>  * after we finish analyzing the table; if things
> happen
>  * in the other order, its stats update will be
>  * overwritten by ours.  However, the error will be
> large
>  * only if the other transaction runs long enough to
>  * insert many tuples, so assuming it will finish
> after us
>  * is the safer option.
>
> Now the problem with this, from your perspective, is that *neither*
> INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
> ANALYZE's data sample.  So a long-running update transaction will
> cause all the rows it changes to be excluded from the sample until
> commit.  This seems like a bad thing, and it definitely means that
> adjusting your app as you're contemplating won't help.
>
> In order to handle the bulk-update scenario sanely, it seems like
> we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
> to be included.  And it looks like, for consistency with the row-counting
> logic, the one that's included needs to be DELETE_IN_PROGRESS.  This
> is a slightly annoying conclusion, because it means we're accumulating
> stats that we know are likely to soon be obsolete, but I think sampling
> INSERT_IN_PROGRESS tuples instead would lead to very strange results
> in some cases.
>
> In short, I think we want to do this to the DELETE_IN_PROGRESS logic:
>
> if
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
> deadrows += 1;
> else
> +   {
> +   sample_it = true;
> liverows += 1;
> +   }
>
> with suitable adjustment of the adjacent comment.
>
> Thoughts?
>
> regards, tom lane
>


Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Tom Lane
Mark  writes:
> Am I correct in my understanding that any row that has been modified (i.e.
> UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
> in the sample?

An update will mark the existing tuple as delete-in-progress and then
insert a new tuple (row version) that's insert-in-progress.

A concurrent ANALYZE scan will definitely see the old tuple (modulo
sampling considerations) but it's timing-dependent which state it sees it
in --- it could still be "live" when we see it, or already
delete-in-progress.  ANALYZE might or might not see the new tuple at all,
depending on timing and where the new tuple gets placed.  So "count/sample
delete-in-progress but not insert-in-progress" seems like a good rule to
minimize the timing sensitivity of the results.  It's not completely
bulletproof, but I think it's better than what we're doing now.

> I'm going to rework the application so there is less time between the
> DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
> this smaller time window.

Yeah, that's about the best you can do from the application side.

regards, tom lane



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-02 Thread Tomas Vondra
On 4/1/18 9:26 AM, David Rowley wrote:
> On 16 March 2018 at 04:01, Tom Lane  wrote:
>> I hadn't been paying much attention to this thread, but I've now taken
>> a quick look at the 2018-02-19 patch, and I've got to say I do not like
>> it much.  The changes in createplan.c in particular seem like hack-and-
>> slash rather than anything principled or maintainable.
> 
> Thanks for looking at this.  I didn't manage to discover any other
> working solutions to when the Vars can be replaced. If we don't do
> this in createplan.c then it's going to cause problems in places such
> as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c
> gets hold of the plan.
> 
>> The core issue here is that Paths involving the appendrel and higher
>> will contain Vars referencing the appendrel's varno, whereas the child
>> is set up to emit Vars containing its own varno, and somewhere we've got
>> to match those up.  I don't think though that this problem is exactly
>> specific to single-member Appends, and so I would rather we not invent a
>> solution that's specific to that.  A nearly identical issue is getting
>> rid of no-op SubqueryScan nodes.  I've long wished we could simply not
>> emit those in the first place, but it's really hard to do because of
>> the fact that Vars inside the subquery have different varnos from those
>> outside.  (I've toyed with the idea of globally flattening the rangetable
>> before we start planning, not at the end, but haven't made it happen yet;
>> and anyway that would be only one step towards such a goal.)
> 
> I'm not quite sure why you think the solution I came up with is
> specific to single-member Appends. The solution merely uses
> single-member Append paths as a proxy path for the solution which I've
> tried to make generic to any node type. For example, the patch also
> resolves the issue for MergeAppend, so certainly nothing in there is
> specific to single-member Appends.  I could have made the proxy any
> other path type, it's just that you had suggested Append would be
> better than inventing ProxyPath, which is what I originally proposed.
> 

I think a natural question is how the approach devised in this thread
(based on single-member Append paths) could be used to deal with no-op
Subquery nodes. I don't see an obvious way to do that, and I guess
that's what Tom meant by "specific to single-member Appends".

>> It might be worth looking at whether we couldn't fix the single-member-
>> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c
>> get rid of them.  That's not the most beautiful solution perhaps, but
>> it'd be very localized and low-risk.
> 
> It might be possible, but wouldn't that only solve 1 out of 2
> problems. The problem of the planner not generating the most optimal
> plan is ignored with this. For example, it does not make much sense to
> bolt a Materialize node on top of an IndexScan node in order to
> provide the IndexScan with mark/restore capabilities... They already
> allow that.
> 

Yeah, I think we can't do all the magic in setrefs.c if we want the
decisions to affect plan shape in any significant way - the decision
whether an Append node has only a single node needs to happen while
constructing the path or shortly after it (before we build other paths
on top of it). And before costing the path.

So setrefs.c seems a too late for that :-(

I suppose this limitation was acceptable for no-op Subqueries, but I'm
not sure the same thing applies to single-member Appends.

>> In general setrefs.c is the right place to deal with variable-matching
>> issues.  So even if you don't like that specific idea, it'd probably be
>> worth thinking about handling this by recording instructions telling
>> setrefs what to do, instead of actually doing it at earlier stages.
> 
> From what I can see, setrefs.c is too late. ERRORs are generated
> before setrefs.c gets hold of the plan if we don't replace Vars.
> 
> I'm not opposed to finding a better way to do this.
> 

AFAICS the patch essentially does two things: (a) identifies Append
paths with a single member and (b) ensures the Vars are properly mapped
when the Append node is skipped when creating the plan.

I agree doing (a) in setrefs.c is too late, as mentioned earlier. But
perhaps we could do at least (b) to setrefs.c?

I'd say mapping the Vars is the most fragile and error-prone piece of
this patch. It's a bit annoying that it's inventing a new infrastructure
to do that, translates expressions in various places, establishes new
rules for tlists (having to call finalize_plan_tlist when calling
build_path_tlist before create_plan_recurse), etc.

But we already do such mappings (not exactly the same, but similar) for
other purposes - from setrefs.c. So why can't why do it the same way here?

FWIW I haven't tried to do any of that, and I haven't looked on this
patch for quite a long time, so perhaps I'm missing an obvious obstacle
preventing us from doing that ...


regards

-- 
Tomas Vo

Re: Implicit make rules break test examples

2019-01-02 Thread Tom Lane
Donald Dong  writes:
> I think #3 is a better choice since it's less invasive and would
> potentially avoid similar problems in the future. I think may worth
> the risks of breaking some extensions. I moved the rule to the
> Makefile.global and added  $(X) in case it's set.

Yeah, I think #3 is the best choice too.

I'm not quite sure about the $(X) addition --- it makes the output
file not agree with what gmake thinks the target is.  However, I
observe other places doing the same thing, so let's try that and
see what the buildfarm thinks.

> I also updated the order in Makefile.linux in the same patch since
> they have the same cause. I'm not sure if changes are necessary for
> other platforms, and I am not able to test it, unfortunately.

That's what we have a buildfarm for.  Pushed, we'll soon find out.

regards, tom lane



Re: pgbench doc fix

2019-01-02 Thread Alvaro Herrera
On 2018-Dec-03, Tatsuo Ishii wrote:

> To:
> ---
> -M querymode
> --protocol=querymode
> 
> Protocol to use for submitting queries to the server:
> 
> simple: use simple query protocol.
> 
> extended: use extended query protocol.
> 
> prepared: use extended query protocol with prepared statements.
> 
> Because in "prepared" mode pgbench reuses the parse analysis
> result for the second and subsequent query iteration, pgbench runs
> faster in the prepared mode than in other modes.

Looks good to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Noah Misch  writes:
>>> Either of those solutions sounds fine.  Like last time, I'll vote for 
>>> explicitly
>>> verifying leakproofness.

>> Yeah, I'm leaning in that direction as well.  Other than comparisons
>> involving strings, it's not clear that we'd gain much from insisting
>> on leakproofness in general, and it seems like it might be rather a
>> large burden to put on add-on data types.

> While I'd actually like it if we required leakproofness for what we
> ship, I agree that we shouldn't blindly assume that add-on data types
> are always leakproof and that then requires that we explicitly verify
> it.  Perhaps an argument can be made that there are some cases where
> what we ship can't or shouldn't be leakproof for usability but, ideally,
> those would be relatively rare exceptions that don't impact common
> use-cases.

Seems like we have consensus that MinMaxExpr should verify leakproofness
rather than just assume it, so I'll go fix that.

What's your opinion on the question of whether to try to make text_cmp
et al leakproof?  I notice that texteq/textne are (correctly) marked
leakproof, so perhaps the usability issue isn't as pressing as I first
thought; but it remains true that there are fairly common cases where
the current marking is going to impede optimization.  I also realized
that in the wake of 586b98fdf, we have to remove the leakproofness
marking of name_cmp and name inequality comparisons --- which I did
at d01e75d68, but that's potentially a regression in optimizability
of catalog queries, so it's not very nice.

Also, I believe that Peter's work towards making text equality potentially
collation-sensitive will destroy the excuse for marking texteq/textne
leakproof if we're going to be 100% rigid about that, and that would be
a very big regression.

So I'd like to get to a point where we're comfortable marking these
functions leakproof despite the possibility of corner-case failures.
We could just decide that the existing failure cases in varstr_cmp are
not usefully exploitable for information leakage, or perhaps we could
dumb down the error messages some more to make them even less so.
It'd also be nice to have some articulatable policy that encompasses
a choice like this.

Thoughts?

regards, tom lane



Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-02 Thread Jesper Pedersen

Hi,

On 12/29/18 10:03 AM, Peter Eisentraut wrote:

On 21/12/2018 11:12, Jesper Pedersen wrote:

Here is a patch that passes the -j option from pg_upgrade down to
vacuumdb if supported.


It's not clear to me that that is what is usually wanted.

The point of the post-upgrade analyze script is specifically to do the
analyze in a gentle fashion.  Mixing in parallelism would seem to defeat
that a bit.



Well, that really depends. The user passed -j to pg_upgrade in order for 
the upgrade to happen faster, so maybe they would expect, as I would, 
that the ANALYZE phase would happen in parallel too.


At least there should be a "notice" about what the script will do in 
this case.


Best regards,
 Jesper



Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2019-01-02 Thread Peter Eisentraut
On 22/11/2018 14:10, Christoph Berg wrote:
> It's nice that PG_REGRESS_DIFF_OPTS exists, but the diffs are often
> coming from automated build logs where setting the variable after the
> fact doesn't help.
> 
> Please consider the attached patch, extension packagers will thank
> you.

Committed.

While we're considering the pg_regress output, what do you think about
replacing the ==... separator with a standard diff separator like
"diff %s %s %s\n".  This would make the file behave more like a proper
diff file, for use with other tools.  And it shows the diff options
used, for clarity.  See attached patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 00380be627fd1df6dd0474380d31fd76fcb644de Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 2 Jan 2019 21:24:51 +0100
Subject: [PATCH] Use standard diff separator for regression.diffs

Instead of ==..., use the standard separator for a multi-file
diff, which is, per POSIX,

"diff %s %s %s\n", , , 

This makes regression.diffs behave more like a proper diff file, for
use with other tools.  And it shows the diff options used, for
clarity.
---
 src/test/regress/pg_regress.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 2c469413a3..9786e86211 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1453,20 +1453,23 @@ results_differ(const char *testname, const char 
*resultsfile, const char *defaul
 * Use the best comparison file to generate the "pretty" diff, which we
 * append to the diffs summary file.
 */
-   snprintf(cmd, sizeof(cmd),
-"diff %s \"%s\" \"%s\" >> \"%s\"",
-pretty_diff_opts, best_expect_file, resultsfile, 
difffilename);
-   run_diff(cmd, difffilename);
 
-   /* And append a separator */
+   /* Write diff header */
difffile = fopen(difffilename, "a");
if (difffile)
{
fprintf(difffile,
-   
"\n==\n\n");
+   "diff %s %s %s\n",
+   pretty_diff_opts, best_expect_file, 
resultsfile);
fclose(difffile);
}
 
+   /* Run diff */
+   snprintf(cmd, sizeof(cmd),
+"diff %s \"%s\" \"%s\" >> \"%s\"",
+pretty_diff_opts, best_expect_file, resultsfile, 
difffilename);
+   run_diff(cmd, difffilename);
+
unlink(diff);
return true;
 }
-- 
2.20.1



Re: psql display of foreign keys

2019-01-02 Thread Peter Eisentraut
On 06/12/2018 23:56, Michael Paquier wrote:
> On Thu, Dec 06, 2018 at 01:26:30PM -0300, Alvaro Herrera wrote:
>> That means I can just get pg_partition_root() done first, and try to
>> write the queries using that instead of WITH RECURSIVE.
> 
> FWIW, I was just writing a patch about this one, so I was going to spawn
> a new thread today :)
> 
> Let's definitely avoid WITH RECURSIVE if we can.

I'm setting this to "Waiting on Author", awaiting a new version based on
pg_partition_root() once that one is done.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2019-01-02 Thread Tom Lane
Peter Eisentraut  writes:
> While we're considering the pg_regress output, what do you think about
> replacing the ==... separator with a standard diff separator like
> "diff %s %s %s\n".  This would make the file behave more like a proper
> diff file, for use with other tools.  And it shows the diff options
> used, for clarity.  See attached patch.

I'm confused by this patch.  Doesn't moving the diff call like that
break the logic completely?

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Noah Misch  writes:
> >>> Either of those solutions sounds fine.  Like last time, I'll vote for 
> >>> explicitly
> >>> verifying leakproofness.
> 
> >> Yeah, I'm leaning in that direction as well.  Other than comparisons
> >> involving strings, it's not clear that we'd gain much from insisting
> >> on leakproofness in general, and it seems like it might be rather a
> >> large burden to put on add-on data types.
> 
> > While I'd actually like it if we required leakproofness for what we
> > ship, I agree that we shouldn't blindly assume that add-on data types
> > are always leakproof and that then requires that we explicitly verify
> > it.  Perhaps an argument can be made that there are some cases where
> > what we ship can't or shouldn't be leakproof for usability but, ideally,
> > those would be relatively rare exceptions that don't impact common
> > use-cases.
> 
> Seems like we have consensus that MinMaxExpr should verify leakproofness
> rather than just assume it, so I'll go fix that.
> 
> What's your opinion on the question of whether to try to make text_cmp
> et al leakproof?  I notice that texteq/textne are (correctly) marked
> leakproof, so perhaps the usability issue isn't as pressing as I first
> thought; but it remains true that there are fairly common cases where
> the current marking is going to impede optimization.  I also realized
> that in the wake of 586b98fdf, we have to remove the leakproofness
> marking of name_cmp and name inequality comparisons --- which I did
> at d01e75d68, but that's potentially a regression in optimizability
> of catalog queries, so it's not very nice.

Well, as mentioned, I'd really be happier to have more things be
leakproof, when they really are leakproof.  What we've done in some
places, and I'm not sure how practical this is elsewhere, is to show
data when we know the user is allowed to see it anyway, to aide in
debugging and such (I'm thinking here specifically of
BuildIndexValueDescription(), which will just return a NULL in the case
where the user doesn't have permission to view all of the columns
involved).  As these are error cases, we're generally happy to consider
spending a bit of extra time to figure that out, is there something
similar we could do for these cases where we'd really like to report
useful information to the user, but only if we think they're probably
allowed to see it?

> Also, I believe that Peter's work towards making text equality potentially
> collation-sensitive will destroy the excuse for marking texteq/textne
> leakproof if we're going to be 100% rigid about that, and that would be
> a very big regression.

That could be a serious problem, I agree.

> So I'd like to get to a point where we're comfortable marking these
> functions leakproof despite the possibility of corner-case failures.
> We could just decide that the existing failure cases in varstr_cmp are
> not usefully exploitable for information leakage, or perhaps we could
> dumb down the error messages some more to make them even less so.
> It'd also be nice to have some articulatable policy that encompasses
> a choice like this.

I'd rather not say "well, these are mostly leakproof and therefore it's
good enough" unless those corner-case failures you're referring to are
really "this system call isn't documented to ever fail in a way we can't
handle, but somehow it did and we're blowing up because of it."

At least, in the cases where we're actually leaking knowledge that we
shouldn't be.  If what we're leaking is some error being returned where
all we're returning is an error code and not the actual data then that
doesn't seem like it's really much of a leak to me..?  I'm just glancing
through varstr_cmp and perhaps I'm missing something but it seems like
everywhere we're returning an error, at least from there, it's an error
code of some kind being returned and not the data that was passed in to
the function.  I didn't spend a lot of time hunting through it though.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: log bind parameter values on error

2019-01-02 Thread Peter Eisentraut
On 15/12/2018 00:04, Alexey Bashtanov wrote:
> I'd like to propose a patch to log bind parameter values not only when 
> logging duration,
> but also on error (timeout in particular) or in whatever situation the 
> statement normally gets logged.
> This mostly could be useful when the request originator doesn't log them 
> either, so it's hard
> to reproduce the problem.

That's a reasonable problem to solve.

> So I decided to cache textual representations on bind stage,
> which is especially easy if the client uses text protocol.

That sounds like a plausible approach.  Have you done any performance
measurements?

In your patch, I would organize the changes to the portal API a bit
differently.  Don't change the signature of CreatePortal().  Get rid of
PortalSetCurrentTop() and PortalClearCurrentTop().  Just have a global
variable CurrentPortal and set it directly.  And then change
GetCurrentPortalBindParameters() to take a Portal as argument.  This can
all happen inside postgres.c without changing the Portal APIs.

In fact, maybe don't use the Portal structure at all and just store the
saved textualized values inside postgres.c in a static variable.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Log PostgreSQL version number on startup

2019-01-02 Thread Peter Eisentraut
On 21/11/2018 15:46, Christoph Berg wrote:
> 2018-11-21 15:19:47.394 CET [24453] LOG:  starting PostgreSQL 12devel on 
> x86_64-pc-linux-gnu, compiled by gcc (Debian 8.2.0-9) 8.2.0, 64-bit

Do we want to do the whole version string, or just "PostgreSQL 12devel"?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> So I'd like to get to a point where we're comfortable marking these
>> functions leakproof despite the possibility of corner-case failures.
>> We could just decide that the existing failure cases in varstr_cmp are
>> not usefully exploitable for information leakage, or perhaps we could
>> dumb down the error messages some more to make them even less so.
>> It'd also be nice to have some articulatable policy that encompasses
>> a choice like this.

> I'd rather not say "well, these are mostly leakproof and therefore it's
> good enough" unless those corner-case failures you're referring to are
> really "this system call isn't documented to ever fail in a way we can't
> handle, but somehow it did and we're blowing up because of it."

Well, that's pretty much what we've got here.

1. As Noah noted, every failure case in varstr_cmp is ideally a can't
happen case, since if it could happen on valid data then that data
couldn't be put into a btree index.

2. AFAICS, all the error messages in question just report that a system
operation failed, with some errno string or equivalent; none of the
original data is reported.  (Obviously, we'd want to add comments
discouraging people from changing that ...)

Conceivably, an attacker could learn the length of some long string
by noting a palloc failure report --- but we've already accepted an
equivalent hazard in texteq or byteaeq, I believe, and anyway it's
pretty hard to believe that an attacker could control such failures
well enough to weaponize it.

So the question boils down to whether you think that somebody could
infer something else useful about the contents of a string from
the strerror (or ICU u_errorName()) summary of a system function
failure.  This seems like a pretty thin argument to begin with,
and then the presumed difficulty of making such a failure happen
repeatably makes it even harder to credit as a useful information
leak.

So I'm personally satisfied that we could mark text_cmp et al as
leakproof, but I'm not sure how we define a project policy that
supports such a determination.

regards, tom lane



Arrays of domain returned to client as non-builtin oid describing the array, not the base array type's oid

2019-01-02 Thread James Robinson
I was surprised yesterday in a difference between querying domains as scalars 
versus domains as arrays. As we're all generally aware, when a domain is 
queried and projected as a scalar in a result set, it is described 
over-the-wire as that column having the oid of the domain's base type, NOT the 
oid of the domain itself. This helps out many clients and their applications, 
but confuses a few who want to use domains as 'tagged types' to register new 
client-side type mappings against. Changing that behavior seems to asked every 
now and then and rejected due to breaking more than it would help. And can be 
worked around through making a whole new type sharing much of the config as the 
base type.

But when arrays of the domain are returned to the client, the column is 
described on the wire with the oid of the domain's array type, instead of the 
oid of the base type's array type. This seems inconsistent to me, even though 
it can be worked around in SQL by a cast of either the element type when 
building the array, or casting the resulting array type.

Example SQL:

create database test;

\c test

create domain required_text text
check (trim(value) = value and length(value) > 0) not null;

create table people
(
name required_text
);

insert into people values ('Joe'), ('Mary'), ('Jane');

And then client-side interaction using python/psycopg2 (sorry, am ignorant of 
how to get psql itself to show the protocol-level oids):

import psycopg2
con = psycopg2.connect('dbname=test')
cur = con.cursor()

# Scalar behaviours first: a query of the domain or the base type return the 
base type's oid:
>>> cur.execute('select name from people')
>>> cur.description
(Column(name='name', type_code=25, display_size=None, internal_size=-1, 
precision=None, scale=None, null_ok=None),)
>>> cur.execute('select name::text from people')
>>> cur.description
(Column(name='name', type_code=25, display_size=None, internal_size=-1, 
precision=None, scale=None, null_ok=None),)

Arrays of the base type (forced through explicit cast of either the element or 
the array):
>>> cur.execute('select array_agg(name::text) from people')
>>> cur.description
(Column(name='array_agg', type_code=1009, display_size=None, internal_size=-1, 
precision=None, scale=None, null_ok=None),)
>>> cur.execute('select array_agg(name)::text[] from people')
>>> cur.description
(Column(name='array_agg', type_code=1009, display_size=None, internal_size=-1, 
precision=None, scale=None, null_ok=None),)

Arrays of the domain, showing the new array type:
cur.execute('select array_agg(name) from people')
>>> cur.description
(Column(name='array_agg', type_code=2392140, display_size=None, 
internal_size=-1, precision=None, scale=None, null_ok=None),)

Interesting bits from my pg_type -- 2392140 is indeed the oid of the array type 
for the domain.

test=# select oid, typname, typcategory, typelem from pg_type where typname in 
( '_text', '_required_text');
   oid   |typname | typcategory | typelem 
-++-+-
1009 | _text  | A   |  25
 2392140 | _required_text | A   | 2392141

So -- do others find this inconsistent, or is it just me and I should work on 
having psycopg2 be able to learn the type mapping itself if I don't want to do 
SQL-side casts? I'll argue that if scalar projections erase the domain's oid, 
then array projections ought to as well.

Thanks!
James

-
James Robinson
ja...@jlr-photo.com
http://jlr-photo.com/





Re: pg_dump multi VALUES INSERT

2019-01-02 Thread David Rowley
On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen  wrote:
> On Mon, Dec 31, 2018 at 12:38 PM David Rowley  
> wrote:
>> Just looking at the v5 patch, it seems not to handle 0 column tables 
>> correctly.

> The attach patch contain a fix for it

+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }

So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.

I've not looked at the rest of the patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: log bind parameter values on error

2019-01-02 Thread Alexey Bashtanov




That sounds like a plausible approach.  Have you done any performance
measurements?


No I haven't yet


In your patch, I would organize the changes to the portal API a bit
differently.  Don't change the signature of CreatePortal().

okay

Get rid of PortalSetCurrentTop() and PortalClearCurrentTop().


I'll remove them from Portal API, but possibly have them in postgres.c
if you don't mind, just to avoid code duplication.
Renamed and maybe even inlined.


Just have a global
variable CurrentPortal and set it directly.  And then change
GetCurrentPortalBindParameters() to take a Portal as argument.  This can
all happen inside postgres.c without changing the Portal APIs.

Okay, will do

In fact, maybe don't use the Portal structure at all and just store the
saved textualized values inside postgres.c in a static variable.


Unlike SQL, parameters may spend much more memory, so I'd have them
in portal memory context to make sure the memory is released earlier 
rather than later.

Tracking individual variable lifetime like we do with debug_query_string
sounds doable but a bit non-straightforward to me,
see e.g. the tricks we do with transaction commands.

Also, I'd like to avoid early forming of the error string, as we may 
need to combine
them differently in future, e.g. when logging in various logging formats 
or languages.

One-by-one pfree-ing doesn't look tempting either.

Do you think it would be acceptable to leave them cached in parameters 
structure?


Best,
  Alex




Re: pg11.1: dsa_area could not attach to segment

2019-01-02 Thread Thomas Munro
Hi Justin,

On Tue, Jan 1, 2019 at 11:17 AM Justin Pryzby  wrote:
> dsa_area could not attach to segment

/*
 * If we are reached by dsa_free or dsa_get_address,
there must be at
 * least one object allocated in the referenced
segment.  Otherwise,
 * their caller has a double-free or access-after-free
bug, which we
 * have no hope of detecting.  So we know it's safe to
access this
 * array slot without holding a lock; it won't change
underneath us.
 * Furthermore, we know that we can see the latest
contents of the
 * slot, as explained in check_for_freed_segments, which those
 * functions call before arriving here.
 */
handle = area->control->segment_handles[index];

/* It's an error to try to access an unused slot. */
if (handle == DSM_HANDLE_INVALID)
elog(ERROR,
 "dsa_area could not attach to a
segment that has been freed");

segment = dsm_attach(handle);
if (segment == NULL)
elog(ERROR, "dsa_area could not attach to segment");

Hmm.  We observed a valid handle, but then couldn't attach to it,
which could indicate that the value we saw was stale (and the theory
stated above has a hole?), or the segment was in the process of being
freed and we have a use-after-free problem leading to this race, or
something else along those lines.  If you can reproduce this on a dev
system, it'd be good to see the backtrace and know which of several
call paths led here, perhaps by changing that error to PANIC.  I'll
try that too.

-- 
Thomas Munro
http://www.enterprisedb.com



RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-01-02 Thread Chengchao Yu
Greetings,

Happy new year!

We would like to follow up again for this issue and fix proposal. Could someone 
give some suggestions to the fix proposal? Or other ideas to fix this issue?

Looking forward to your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/


From: Chengchao Yu 
Sent: Wednesday, December 19, 2018 2:51 PM
To: pgsql-hack...@postgresql.org
Cc: Prabhat Tripathi ; Sunil Kamath 
; Michal Primke ; Bhavin 
Gandhi 
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

Greetings,

Just would like to follow up this issue and fix proposal. We really would like 
to have this issue fixed in PG. Could someone give some suggestions to the fix 
proposal? Or other ideas to fix this issue?

Looking forward for your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

From: Chengchao Yu
Sent: Friday, November 30, 2018 1:00 PM
To: 'Pg Hackers' 
mailto:pgsql-hack...@postgresql.org>>
Cc: Prabhat Tripathi mailto:pt...@microsoft.com>>; Sunil 
Kamath mailto:sunil.kam...@microsoft.com>>; Michal 
Primke mailto:mpri...@microsoft.com>>
Subject: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs


Greetings,



Recently, we hit a few occurrences of deadlock when IO failure (including disk 
full, random remote disk IO failures) happens in single user mode. We found the 
issue exists on both Linux and Windows in multiple postgres versions.



Here are the steps to repro on Linux (as Windows repro is similar):


1.   Get latest PostgreSQL code, build and install the executables.



$ git clone 
https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install


2.   Run initdb to initialize a PG database folder.



$ /path/to/postgres/bin/initdb -D /path/to/data


3.   Because the unable to write relation data scenario is difficult to hit 
naturally even reserved space is turned off, I have prepared a small patch (see 
attachment "emulate-error.patch") to force an error when PG tries to write data 
to relation files. We can just apply the patch and there is no need to put 
efforts flooding data to disk any more.



$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install


4.   Connect to the newly initialized database cluster with single user 
mode, create a table, and insert some data to the table, do a checkpoint or 
directly give EOF. Then we hit the deadlock issue and the process will not exit 
until we kill it.



Do a checkpoint explicitly:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;



2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when 
writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247



Or directly give an EOF:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation 
base/12368/1247


5.   Moreover, when we try

Re: [PATCH] check for ctags utility in make_ctags

2019-01-02 Thread Michael Paquier
On Wed, Jan 02, 2019 at 11:35:46AM -0500, Tom Lane wrote:
> In fact, that's demonstrably not so: on my RHEL6 and Fedora boxes,
> /usr/bin/etags isn't owned by any package, because it's a symlink
> managed by the "alternatives" system.  It points to /usr/bin/etags.emacs
> which is owned by the emacs-common package.  So dropping the advice
> about how to fix the problem seems like a good plan.

+1, let's keep it simple.  I would just use "ctags/etags not found"
as error message.
--
Michael


signature.asc
Description: PGP signature


Re: Logical decoding for operations on zheap tables

2019-01-02 Thread Amit Kapila
On Mon, Dec 31, 2018 at 9:56 AM Amit Kapila  wrote:
>
> To support logical decoding for zheap operations, we need a way to
> ensure zheap tuples can be registered as change streams.   One idea
> could be that we make ReorderBufferChange aware of another kind of
> tuples as well, something like this:
>
..
>
> Apart from this, we need to define different decode functions for
> zheap operations as the WAL data is different for heap and zheap, so
> same functions can't be used to decode.
>
> I have written a very hacky version to support zheap Insert operation
> based on the above idea.
>

I went ahead and tried to implement the decoding for Delete operation
as well based on the above approach and the result is attached.

>
> The yet another approach could be that in the decode functions after
> forming zheap tuples from WAL, we can convert them to heap tuples.  I
> have not tried that, so not sure if it can work, but it seems to me if
> we can avoid tuple conversion overhead, it will be good.
>

While implementing the decoding for delete operation, I noticed that
the main changes required are to write a decode operation and
additional WAL (like old tuple) which anyway is required even if we
pursue this approach, so I think it might be better to with the
approach where we don't need tuple conversion (aka something similar
to what is done in attached patch).

Note - This patch is based on pluggable-zheap branch
(https://github.com/anarazel/postgres-pluggable-storage)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


decode_zops_2.patch
Description: Binary data


Re: Refactoring the checkpointer's fsync request queue

2019-01-02 Thread Thomas Munro
On Wed, Jan 2, 2019 at 11:40 AM Thomas Munro
 wrote:
> For the 0001 patch, I'll probably want to reconsider the naming a bit
> ("simple -> "specialized", "generic", ...?), refine (ability to turn
> off the small vector optimisation?  optional MemoryContext?  ability
> to extend without copying or zero-initialising at the same time?
> comparators with a user data parameter?  two-value comparators vs
> three-value comparators?  qsort with inline comparator?  etc etc), and
> remove some gratuitous C++ cargo cultisms, and maybe also instantiate
> the thing centrally for some common types (I mean, perhaps 0002 should
> use a common uint32_vector rather than defining its own
> segnum_vector?).

Here's a new version that fixes a couple of stupid bugs (mainly a
broken XXX_lower_bound(), which I replaced with the standard algorithm
I see in many sources).

I couldn't resist the urge to try porting pg_qsort() to this style.
It seems to be about twice as fast as the original at sorting integers
on my machine with -O2.  I suppose people aren't going to be too
enthusiastic about yet another copy of qsort in the tree, but maybe
this approach (with a bit more work) could replace the Perl code-gen
for tuple sorting.  Then the net number of copies wouldn't go up, but
this could be used for more things too, and it fits with the style of
simplehash.h and simplevector.h.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-parameterized-vectors-and-sorting-searching-s-v6.patch
Description: Binary data


0002-Refactor-the-fsync-machinery-to-support-future-SM-v6.patch
Description: Binary data


Re: emacs configuration for new perltidy settings

2019-01-02 Thread Noah Misch
On Thu, Jul 12, 2012 at 12:35:26AM +0300, Peter Eisentraut wrote:
> This might be useful for some people.  Here is an emacs configuration
> for perl-mode that is compatible with the new perltidy settings.  Note
> that the default perl-mode settings produce indentation that will be
> completely shredded by the new perltidy settings.
> 
> (defun pgsql-perl-style ()
>   "Perl style adjusted for PostgreSQL project"
>   (interactive)
>   (setq tab-width 4)
>   (setq perl-indent-level 4)
>   (setq perl-continued-statement-offset 4)
>   (setq perl-continued-brace-offset 4)

(Later, commit 56fb890 changed perl-continued-statement-offset to 2.)  This
indents braces (perltidy aligns the brace with "if", but perl-mode adds
perl-continued-statement-offset + perl-continued-brace-offset = 6 columns):

if (-s "src/backend/snowball/stopwords/$lang.stop")
  {
  $stop = ", StopWords=$lang";
  }

If I run perltidy on 60d9979, then run perl-mode indent, the diff between the
perltidy run and perl-mode indent run is:
 129 files changed, 8468 insertions(+), 8468 deletions(-)
If I add (perl-continued-brace-offset . -2):
 119 files changed, 3515 insertions(+), 3515 deletions(-)
If I add (perl-indent-continued-arguments . 4) as well:
 86 files changed, 2626 insertions(+), 2626 deletions(-)
If I add (perl-indent-parens-as-block . t) as well:
 65 files changed, 2373 insertions(+), 2373 deletions(-)

That's with GNU Emacs 24.5.1.  Versions 24.3.1 and 21.4.1 show similar trends,
though 21.4.1 predates perl-indent-continued-arguments and
perl-indent-parens-as-block.

I'm attaching the patch to make it so, along with a patch that illustrates my
testing method.  "sh reindent-perl.sh" will test emacs.samples using your
Emacs installation.  (I don't plan to push the testing patch.)
diff --git a/.dir-locals.el b/.dir-locals.el
index eff4671..ab6208b 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -9,7 +9,7 @@
(indent-tabs-mode . nil)))
  (perl-mode . ((perl-indent-level . 4)
(perl-continued-statement-offset . 2)
-   (perl-continued-brace-offset . 4)
+   (perl-continued-brace-offset . -2)
(perl-brace-offset . 0)
(perl-brace-imaginary-offset . 0)
(perl-label-offset . -2)
diff --git a/src/tools/editors/emacs.samples b/src/tools/editors/emacs.samples
index a7152b0..529c98a 100644
--- a/src/tools/editors/emacs.samples
+++ b/src/tools/editors/emacs.samples
@@ -47,10 +47,13 @@
   (interactive)
   (setq perl-brace-imaginary-offset 0)
   (setq perl-brace-offset 0)
-  (setq perl-continued-brace-offset 4)
   (setq perl-continued-statement-offset 2)
+  (setq perl-continued-brace-offset (- perl-continued-statement-offset))
   (setq perl-indent-level 4)
   (setq perl-label-offset -2)
+  ;; Next two aren't marked safe-local-variable, so .dir-locals.el omits them.
+  (setq perl-indent-continued-arguments 4)
+  (setq perl-indent-parens-as-block t)
   (setq indent-tabs-mode t)
   (setq tab-width 4))
 
diff --git a/reindent-perl.el b/reindent-perl.el
new file mode 100644
index 000..17ff125
--- /dev/null
+++ b/reindent-perl.el
@@ -0,0 +1,45 @@
+;; Import with-demoted-errors from Emacs 24.5.1 (present in 23.1+).
+(when (not (fboundp 'macroexp-progn))
+  (defun macroexp-progn (exps)
+	"Return an expression equivalent to `(progn ,@EXPS)."
+	(if (cdr exps) `(progn ,@exps) (car exps
+
+(when (not (fboundp 'with-demoted-errors))
+  (defmacro with-demoted-errors (format &rest body)
+	"Run BODY and demote any errors to simple messages.
+FORMAT is a string passed to `message' to format any error message.
+It should contain a single %-sequence; e.g., \"Error: %S\".
+
+If `debug-on-error' is non-nil, run BODY without catching its errors.
+This is to be used around code which is not expected to signal an error
+but which should be robust in the unexpected case that an error is signaled.
+
+For backward compatibility, if FORMAT is not a constant string, it
+is assumed to be part of BODY, in which case the message format
+used is \"Error: %S\"."
+	(let ((err (make-symbol "err"))
+		  (format (if (and (stringp format) body) format
+	(prog1 "Error: %S"
+	  (if format (push format body))
+	  `(condition-case ,err
+		   ,(macroexp-progn body)
+		 (error (message ,format ,err) nil)
+
+
+(load (expand-file-name "./src/tools/editors/emacs.samples"))
+(setq enable-local-variables :all)		; not actually needed, given emacs.samples
+
+(while (setq fname (read-from-minibuffer "File name: "))
+  (set-buffer "*scratch*")
+  (find-file fname)
+  (message "%s"
+		   (list
+			(current-buffer)
+			perl-indent-level
+			tab-width
+			(if (boundp 'perl-indent-continued-arguments)
+perl-indent-continued-arguments "no-such-var")
+			(if (boundp 'perl-indent-parens-as-block)
+perl-indent-parens-as-block "no-such-var")))
+  (with-demoted-errors (indent-region (point