Re: dsa_allocate() faliure

2018-11-27 Thread Thomas Munro
On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
 wrote:
> Hmm.  I will see if I can come up with a many-partition torture test
> reproducer for this.

No luck.  I suppose one theory that could link both failure modes
would a buffer overrun, where in the non-shared case it trashes a
pointer that is later dereferenced, and in the shared case it writes
past the end of allocated 4KB pages and corrupts the intrusive btree
that lives in spare pages to track available space.

--
Thomas Munro
http://www.enterprisedb.com



Re: Tid scan improvements

2018-11-27 Thread Edmund Horner
On Thu, 22 Nov 2018 at 20:41, David Rowley  wrote:
> I've now had a look over the latest patches and I've found a few more
> things.  Many of these are a bit nitpicky, but certainly not all. I
> also reviewed 0004 this time.


Whew!  A lot more things to look at.

I've tried to address most of what you've raised, and attach yet
another set of patches.  There are are few things that I'm not settled
on, discussed below under Big Items.

CC'd Tomas, if he wants to check what I've done with the TidRange
array allocation.


* Big Items *

> 0001:
>
> 1. The row estimates are not quite right.  This cases the row
> estimation to go the wrong way for isgt.
>
> For example, the following gets 24 rows instead of 26.
>
> postgres=# create table t (a int);
> CREATE TABLE
> postgres=# insert into t select generate_Series(1,100);
> INSERT 0 100
> postgres=# analyze t;
> postgres=# explain analyze select * from t where ctid >= '(0,75)';
>  QUERY PLAN
> -
>  Seq Scan on t  (cost=0.00..2.25 rows=24 width=4) (actual
> time=0.046..0.051 rows=26 loops=1)
>Filter: (ctid >= '(0,75)'::tid)
>Rows Removed by Filter: 74
>  Planning Time: 0.065 ms
>  Execution Time: 0.074 ms
> (5 rows)
>
> The < and <= case is not quite right either. < should have 1 fewer
> tuple than the calculated average tuples per page, and <= should have
> the same (assuming no gaps)
>
> I've attached a small delta patch that I think improves things here.

Thanks, I've incorporated your patch.  I think the logic for iseq and
isgt makes sense now.

Since we only have the total number of tuples and the total number of
pages, and no real statistics, this might be the best we can
reasonably do.  There's still a noticable rowcount error for the last
page, and slighter rowcount errors for other pages.  We estimate
density = ntuples/npages for all pages; but in a densely populated
table, we'll average only half the number of tuples in the last page
as earlier pages.

I guess we *could* estimate density = ntuples/(npages - 0.5) for all
but the last page; and half that for the last.  But that adds
complexity, and you'd still only get a good row count when the last
page was about half full.

I implemented this anyway, and it does improve row counts a bit.  I'll
include it in the next patch set and you can take a look.

I also spent some time today agonising over how visiblity would affect
things, but did not come up with anything useful to add to our
formulas.

> 3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
> allocation until it reaches the required size. See how
> MakeSharedInvalidMessagesArray() does it.  Doing it this way ensures
> we always have a power of two sized array which is much nicer if we
> ever reach the palloc() limit as if the array is sized at the palloc()
> limit / 2 + 1, then if we try to double it'll fail.  Of course, it's
> unlikely to be a problem here, but... the question would be how to
> decide on the initial size.

I've tried to change things that way, but we still need to deal with
excessive numbers of items.

I've defined a constant MaxTidRanges = MaxAllocSize/sizeof(TidRange),
and raise an error if the required size exceeds that.

> 4. "at" needs shifted left a couple of words
>
> /*
> * If the lower bound was already or above at the maximum block
> * number, then there is no valid range.
> */
>
> but I don't see how it could be "or above".  The ctid type does not
> have the room for that. Although, that's not to say you should test if
> (block == MaxBlockNumber), the >= seems better for the code. I'm just
> complaining about the comment.

We have to deal with TIDs entered by the user, which can include
invalid ones like (4294967295,0).  MaxBlockNumber is 4294967294.

> 12. expected_comparison_operator is a bit long a name:
>
> IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)
>
> How about just expected_opno?
>
> 14. This is not great:
>
> [horrible macros in tidpath.c]
>
> The 4 macros for >, >=, < and <= are only used by IsTidRangeClause()
> which means IsTidComparison() could get called up to 4 times. Most of
> the work it does would be redundant in that case.  Maybe it's better
> to rethink that?

Yeah.  I've rewritten all this as two functions, IsTidEqualClause and
IsTidRangeClause, which each check the opno, with a helper function
IsTidBinaryExpression that checks everything else.

> 18. I'd expect the following not to produce a sort above the Tid Scan.
>
> postgres=# set enable_seqscan=0;
> SET
> postgres=# explain select * from t inner join t t1 on t.ctid = t1.ctid
> where t.ctid < '(0,10)' ;
>   QUERY PLAN
> ---
>  Merge Join  (cost=108.65..109.28 rows=9 width=8)
>Merge Cond: (t.ctid = t1.ctid)
>-

Re: pg11.1 jit segv

2018-11-27 Thread Andres Freund
Hi,

On 2018-11-26 22:56:09 -0600, Justin Pryzby wrote:
> On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote:
> > Could you check that the attached patch this also fixes your original
> > issue? Going through the code to see if there's other occurances of
> > this.
> 
> Confirmed that fixes my crash.

Thanks a lot for narrowing down your crash to something I can reproduce!


Here's a more complete patch, with a testcase.

Tom, the test creates a 1100k column table (using \set ECHO none +
gexec), but with a small row. Currently it's not dropped after the
table, as I thought it might be worthwhile to be tested by
pg_dump/upgrade etc too. You're probably the person most concerned with
test runtimes, ... Any concerns about that? The table creation is
quick*, on the order of 30ms.

Greetings,

Andres Freund


*at least as long as there's no default columns and the table's not
dropped, seems we have somewhat of an O(N^2) situation going on when
dropping a table with many columns that have default columns - we
re-build the cache entry after each dropped default value. But as the
max is 1600 columns, that's not too bad.
>From fda7a2b41cba265f6dc3212f65a9da5f3518cf7c Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 27 Nov 2018 00:22:02 -0800
Subject: [PATCH v1] Fix jit compilation issue on very wide tables.

Reported-By: Justin Pryzby
Author: Andres Freund
Discussion: https://postgr.es/m/20181115223959.gb10...@telsasoft.com
Backpatch: 11, just as jit compilation was
---
 src/backend/jit/llvm/llvmjit_deform.c  | 12 +---
 src/test/regress/expected/create_table.out | 10 ++
 src/test/regress/expected/sanity_check.out |  1 +
 src/test/regress/sql/create_table.sql  | 10 ++
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 4111bf0a54b..d168eae0873 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -248,10 +248,16 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 			v_infomask2,
 			"maxatt");
 
+	/*
+	 * Need to zext, as getelementptr otherwise treats hoff as a signed 8bit
+	 * integer, which'd yield a negative offset for t_hoff > 127.
+	 */
 	v_hoff =
-		l_load_struct_gep(b, v_tuplep,
-		  FIELDNO_HEAPTUPLEHEADERDATA_HOFF,
-		  "t_hoff");
+		LLVMBuildZExt(b,
+	  l_load_struct_gep(b, v_tuplep,
+		FIELDNO_HEAPTUPLEHEADERDATA_HOFF,
+		""),
+	  LLVMInt32Type(), "t_hoff");
 
 	v_tupdata_base =
 		LLVMBuildGEP(b,
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index e92748c1ea0..b26b4e7b6d9 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -261,6 +261,16 @@ ERROR:  relation "as_select1" already exists
 CREATE TABLE IF NOT EXISTS as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r';
 NOTICE:  relation "as_select1" already exists, skipping
 DROP TABLE as_select1;
+-- create an extra wide table to test for issues related to that
+-- (temporarily hide query, to avoid the long CREATE TABLE stmt)
+\set ECHO none
+INSERT INTO extra_wide_table(firstc, lastc) VALUES('first col', 'last col');
+SELECT firstc, lastc FROM extra_wide_table;
+  firstc   |  lastc   
+---+--
+ first col | last col
+(1 row)
+
 -- check that tables with oids cannot be created anymore
 CREATE TABLE withoid() WITH OIDS;
 ERROR:  syntax error at or near "OIDS"
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c77060d36c1..009a89fc1ad 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -44,6 +44,7 @@ dupindexcols|t
 e_star|f
 emp|f
 equipment_r|f
+extra_wide_table|f
 f_star|f
 fast_emp4000|t
 float4_tbl|f
diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql
index 90cc1a578f3..c6f048f8c2a 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -277,6 +277,16 @@ CREATE TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r';
 CREATE TABLE IF NOT EXISTS as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r';
 DROP TABLE as_select1;
 
+-- create an extra wide table to test for issues related to that
+-- (temporarily hide query, to avoid the long CREATE TABLE stmt)
+\set ECHO none
+SELECT 'CREATE TABLE extra_wide_table(firstc text, '|| array_to_string(array_agg('c'||i||' bool'),',')||', lastc text);'
+FROM generate_series(1, 1100) g(i)
+\gexec
+\set ECHO all
+INSERT INTO extra_wide_table(firstc, lastc) VALUES('first col', 'last col');
+SELECT firstc, lastc FROM extra_wide_table;
+
 -- check that tables with oids cannot be created anymore
 CREATE TABLE withoid() WITH OIDS;
 CREATE TABLE withoid() WITH (oids);
-- 
2.18.0.rc2.dirty



Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Magnus Hagander
On Tue, Nov 27, 2018 at 4:46 AM Andres Freund  wrote:

> Hi,
>
> On 2018-11-27 12:20:13 +0900, Michael Paquier wrote:
> > On Mon, Nov 26, 2018 at 10:13:34PM -0500, David Steele wrote:
> > > Non-exclusive backups have been available since 9.6 and several
> third-party
> > > solutions support this mode, in addition to pg_basebackup.
> >
> > I think that two releases is not actually that much time to just nuke
> > the exclusive backup interface.  I would be fine if the docs show the
> > deprecation more aggressively using a warning section, and we could add
> > an explicit WARNING message about the matter directly when calling
> > pg_start_backup and pg_stop_backup.
>
> That was my gut reaction as well, but I think David's argument about
> existing scripts / workflows being broken due the the recovery.conf
> is a good reason to be more aggressive here.
>


Yeah, I'm in the same boat here -- it feels like it's a bit too short, but
since we're breaking things for people *anyway*, it's probably better to
break both at once than to have all those people have their things broken
multiple times.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


RE: Protect syscache from bloating with negative cache entries

2018-11-27 Thread Ideriha, Takeshi
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>I haven't looked into the code but I'm going to do it later.

Hi, I've taken a look at 0001 patch. Reviewing the rest of patch will be later.

if (!IsParallelWorker())
  
+   {   
  
stmtStartTimestamp = GetCurrentTimestamp(); 
  
+   
  
+   /* Set this timestamp as aproximated current time */
  
+   SetCatCacheClock(stmtStartTimestamp);   
  
+   }   
  
else
  
Just confirmation.
At first I thought that when parallel worker is active catcacheclock is not 
updated.
But when parallel worker is active catcacheclock is updated by the parent and 
no problem is occurred.

+   int tupsize = 0;

   


   
/* negative entries have no tuple associated */ 

   
if (ntp)

   
{   

   
int i;  

   
+   int tupsize;

+   ct->size = tupsize;

@@ -1906,17 +2051,24 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, 
Datum *arguments,
ct->dead = false;   

   
ct->negative = negative;

   
ct->hash_value = hashValue; 

   
+   ct->naccess = 0;

   
+   ct->lastaccess = catcacheclock; 

   
+   ct->size = tupsize;

tupsize is declared twice inside and outiside of if scope but it doesn't seem 
you need to do so. 
And ct->size = tupsize is executed twice at if block and outside of if-else 
block.

+static inline TimestampTz  

   
+GetCatCacheClock(void) 

   

This function is not called by anyone in this version of patch. In previous 
version, this one is called by plancache.
Will further patch focus only on catcache? In this case this one can be 
removed. 

There are some typos.
+   int size;   /* palloc'ed size off 
this tuple */ 
typo: off->of

+   /* Set this timestamp as aproximated current time */
typo: aproximated->approximated

+ * GUC variable to define the minimum size of hash to cosider entry eviction.
typo: cosider -> consider

+   /* initilize catcache reference clock if haven't done yet */ 
typo:initilize -> initialize

Regards,
Takeshi Ideriha


Re: Continue work on changes to recovery.conf API

2018-11-27 Thread Peter Eisentraut
On 25/11/2018 21:39, Andres Freund wrote:
> On 2018-11-25 13:24:15 -0500, Stephen Frost wrote:
>> - User performs a backup with pg_basebackup -R
>> - Replica is then promoted to be a primary
>> - User performs a backup with pg_basebackup -R on the new primary
>> - Duplicate entries end up in postgresql.auto.conf.  Ew.
> Why don't we put recovery entries into postgresql.recovery.conf or such?
> And overwrite rather than append?

Adding more such sub-configuration files would be easy to do and has
some merit, but the devil is in the details.  In what order would those
files be read?  Who is supposed to write to it, is it reserved for
pg_basebackup use only?  If you choose to use this particular name, is
it not used when not in recovery?  Does this file belong in the data
directory or the configuration directory?  Which choice will offend
packagers the least?  Etc.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-27 Thread Peter Eisentraut
On 26/11/2018 21:30, Sergei Kornilov wrote:
> - recovery_target = immediate was replaced with recovery_target_immediate 
> bool GUC

Why?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: tab-completion debug print

2018-11-27 Thread Kyotaro HORIGUCHI
Hello.

At Mon, 26 Nov 2018 07:08:53 +0100, David Fetter  wrote in 
<20181126060853.gp...@fetter.org>
> On Sun, Nov 25, 2018 at 11:21:51PM -0500, Tom Lane wrote:
> > Kyotaro HORIGUCHI  writes:
> > >> On Fri, Nov 23, 2018 at 04:32:31PM -0500, Tom Lane wrote:
> > >>> Hm.  I can see the value of instrumenting tab-complete when you're 
> > >>> trying
> > >>> to debug why it did something, but this output format seems pretty terse
> > >>> and unreadable.
> > 
> > > The reason for the minimal output was it won't disturb console so
> > > much when stderr is not redirected. It could be richer if we
> > > premise redirection.
> > 
> > It seems to me this behavior would be completely unusable if it's not
> > redirected; I don't understand why you're even considering that as an
> > interesting option.  libreadline is necessarily active when we're doing
> > this, and it will get very confused (or at least produce a very confused
> > display) if anything else is emitting text onto the active console line.

The mess here started because I forgot about -L option of psql. I
wouldn't do that if I knew of it.

> Integrating with libreadline would be a *much* bigger project for
> reasons starting with the ones you state.
> 
> > Maybe somebody who never makes any typing errors at all and doesn't
> > really need to see what they typed could use it like that, but I for
> > one would find it quite useless.
> > In fact, I was thinking of proposing that the output shouldn't go to
> > stderr in the first place.  If you do it like that, then you're also
> > confusing this info with query error output, which does little for
> > usability either.

So, we can use the exiting log file feature of psql.

> > Speaking of query error output, it wouldn't be a bad thing if this
> > mode could show any errors from the tab-completion queries.  I
> > suppose people look into the postmaster log for that right now; but
> > if this were all going to some separate log file, I'd vote for
> > showing the constructed queries and their results or errors.
> 
> I briefly thought about logging this to the postmaster log, but psql
> is not the server, and doesn't, as a rule, have access to the same
> kinds of things the server does, so it's not really the right thing.
> On a busy server, because we don't yet have ways to carve off streams
> of logs, these ones could get lost in the noise.

Agreed.

> This brings up an interesting idea, though. It seems like we're
> backing into a logging facility for psql with this feature. How about
> just sending stderr to some file(s) in /var/log/psql/ , or if we're
> getting super fancy, to the syslog family?

syslog seems to be a kind of overdone for psql.. As mentioned
above, this version emits logs into the file specifed by -L
option. Some other information is also emitted into the file but
it doesn't seem to be a problem.

psql -L ~/psqldebug.log postgres

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 32a4dd7adecab70e464222af41552b22332679b4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 27 Nov 2018 14:57:46 +0900
Subject: [PATCH] Tab-copletion debug log

With this patch, psql built with TABCOMPLETION_DEBUG defined emits
tab-completion debug log into the file specified -L option.
---
 src/bin/psql/tab-complete.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 9dbd555166..a8758b22ee 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -60,6 +60,44 @@ extern char *filename_completion_function();
 #define completion_matches rl_completion_matches
 #endif
 
+/*
+ * By enabling the following definition every completion attempt emits a log
+ * line into the log file if any. Every log line consists of the source line
+ * number where it is made, completion context and the completion result.
+ */
+#ifdef TABCOMPLETION_DEBUG
+#ifdef HAVE_RL_COMPLETION_MATCHES
+#define org_completion_matches rl_completion_matches
+#else
+#define org_completion_matches completion_matches
+#endif
+
+#undef completion_matches
+#define completion_matches(t, f) completion_debug(__LINE__, (t), (f))
+
+static char **completion_debug(int line,
+			   const char *text, rl_compentry_func_t *func)
+{
+	char **list = org_completion_matches(text, func);
+
+	if (pset.logfile)
+	{
+		/* Emit completion log */
+
+		/* Enclose empty list with brackets since it is an intermediate state
+		 * which is immediately followed by a non-empty list.
+		 */
+		fprintf(pset.logfile, "%s:%d:%s\"%s\" -> (", __FILE__, line, list ? "" : "[", text);
+		for (int i = 0; list && list[i]; ++i)
+			fprintf(pset.logfile, "%s\"%s\"", i ? ", " : "", list[i]);			
+		fprintf(pset.logfile, ")%s\n", list ? "": "]");
+		fflush(pset.logfile);
+	}
+
+	return list;
+}
+#endif
+
 /* word break characters */
 #define WORD_BREAKS		"\t\n@$><=;|&{() "
 
-- 
2.16.3



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-27 Thread Sergei Kornilov
Hello

>>  - recovery_target = immediate was replaced with recovery_target_immediate 
>> bool GUC
>
> Why?
Due this comment: 
https://www.postgresql.org/message-id/20181126172118.GY3415%40tamriel.snowman.net
> I've not been following this very closely, but seems like
> recovery_target_string is a bad idea.. Why not just make that
> 'recovery_target_immediate' and make it a boolean? Having a GUC that's
> only got one valid value seems really odd.

regards, Sergei



Planning time of Generic plan for a table partitioned into a lot

2018-11-27 Thread Kato, Sho
Hi,
I found that making a generic plan of SELECT/UPDATE/DELETE for a table 
partitioned into thousands is slow.
Especially, UPDATE/DELETE statement is too slow.

I'm afraid that I could not come up with a good idea, but how can I shorten the 
creation time of a generic plan?

The results are as follows.

*setup*

postgres=# create table t(id int) partition by range(id);
CREATE TABLE
postgres=# \o /dev/null
postgres=# select 'create table t_' || x || ' partition of t for values from (' 
|| x || ') to (' || x+1 || ')'from generate_series(1, 8192) x;
postgres=# \gexec
postgres-# analyze;
ANALYZE

*explain analyze results*

postgres=# set plan_cache_mode = force_generic_plan;
SET
postgres=# prepare select_stmt(int) as select * from t where id = $1;
PREPARE
postgres=# explain analyze execute select_stmt(8192);
   QUERY PLAN   


 Append  (cost=0.00..343572.48 rows=106496 width=4) (actual time=0.015..0.015 
rows=0 loops=1)
   Subplans Removed: 8191
   ->  Seq Scan on t_8192  (cost=0.00..41.88 rows=13 width=4) (actual 
time=0.013..0.013 rows=0 loops=1)
 Filter: (id = $1)
 Planning Time: 206.415 ms
 Execution Time: 0.742 ms
(6 rows)

postgres=# prepare update_stmt(int) as update t set id = id + 1 where id = $1;
PREPARE
postgres=# explain analyze execute update_stmt(8192);
   QUERY PLAN
-
 Update on t  (cost=0.00..343306.24 rows=106496 width=10) (actual 
time=39.502..39.503 rows=0 loops=1)
   Update on t_1
   Update on t_2
   ...
   ->  Seq Scan on t_1  (cost=0.00..41.91 rows=13 width=10) (actual 
time=0.025..0.026 rows=0 loops=1)
 Filter: (id = $1)
   ->  Seq Scan on t_2  (cost=0.00..41.91 rows=13 width=10) (actual 
time=0.004..0.005 rows=0 loops=1)
 Filter: (id = $1)
   ...
 Planning Time: 14357.504 ms
 Execution Time: 397.652 ms
(24579 rows)

postgres=# prepare delete_stmt(int) as delete from t where id = $1;
PREPARE
postgres=# explain analyze execute delete_stmt(8192);
   QUERY PLAN

 Delete on t  (cost=0.00..343040.00 rows=106496 width=6) (actual 
time=51.628..51.628 rows=0 loops=1)
   Delete on t_1
   Delete on t_2
   ...
   ->  Seq Scan on t_1  (cost=0.00..41.88 rows=13 width=6) (actual 
time=0.025..0.026 rows=0 loops=1)
 Filter: (id = $1)
   ->  Seq Scan on t_2  (cost=0.00..41.88 rows=13 width=6) (actual 
time=0.014..0.015 rows=0 loops=1)
 Filter: (id = $1)
   ...
 Planning Time: 14225.908 ms
 Execution Time: 405.605 ms
(24579 rows)

Of course, in case of plan_cache_mode = force_custom_plan, it is not problem 
because unnecessary paths are pruned by speeding up planning with partitions 
patch[1].

However, if plan_cachemode is force_generic_plan, generic plan is made at the 
first execution of prepared statement.
If plan_cache_mode is auto(default), generic plan is made at the sixth 
execution.
So, with default setting, performance get lower at the sixth execution.
Even if you do not improve creation of generic plan, if the number of partition 
is large, it is better to recommend force_custom_plan.
Thoughts?

[1]: 
https://www.postgresql.org/message-id/9d7c5112-cb99-6a47-d3be-cf1ee6862...@lab.ntt.co.jp

Regards,
Sho Kato




Re: vacuum and autovacuum - is it good to configure the threshold at TABLE LEVEL?

2018-11-27 Thread amul sul
Hopefully, this[1] will help you.

1] 
https://www.percona.com/blog/2018/08/10/tuning-autovacuum-in-postgresql-and-autovacuum-internals/

regards,
Amul
On Tue, Nov 27, 2018 at 11:50 AM rajan  wrote:
>
> Hi,
>
> Please suggest me on the following,
>
> 1. Is it better to configure autovacuum threshold at table level?
> 2. Is there any discussions in this forum which I can refer for
> understanding vacuum/autovacuum?
>
> Thanks in advance.
> Rajan.
>
>
>
> -
> --
> Thanks,
> Rajan.
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
>



Re: Protect syscache from bloating with negative cache entries

2018-11-27 Thread Kyotaro HORIGUCHI
Thank you for reviewing.

At Thu, 15 Nov 2018 11:02:10 +, "Ideriha, Takeshi" 
 wrote in 
<4E72940DA2BF16479384A86D54D0988A6F1F4165@G01JPEXMBKW04>
> Hello, thank you for updating the patch.
> 
> 
> >From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> >At Thu, 4 Oct 2018 04:27:04 +, "Ideriha, Takeshi"
> > wrote in
> ><4E72940DA2BF16479384A86D54D0988A6F1BCB6F@G01JPEXMBKW04>
> >> >As a *PoC*, in the attached patch (which applies to current master),
> >> >size of CTups are counted as the catcache size.
> >> >
> >> >It also provides pg_catcache_size system view just to give a rough
> >> >idea of how such view looks. I'll consider more on that but do you have 
> >> >any opinion
> >on this?
> >> >
> >...
> >> Great! I like this view.
> >> One of the extreme idea would be adding all the members printed by
> >> CatCachePrintStats(), which is only enabled with -DCATCACHE_STATS at this
> >moment.
> >> All of the members seems too much for customers who tries to change
> >> the cache limit size But it may be some of the members are useful
> >> because for example cc_hits would indicate that current cache limit size 
> >> is too small.
> >
> >The attached introduces four features below. (But the features on relcache 
> >and
> >plancache are omitted).
> I haven't looked into the code but I'm going to do it later.
> 
> Right now It seems to me that focusing on catalog cache invalidation and its 
> stats a quick route
> to commit this feature. 
> 
> >1. syscache stats collector (in 0002)
> >
> >Records syscache status consists of the same columns above and "ageclass"
> >information. We could somehow triggering a stats report with signal but we 
> >don't want
> >take/send/write the statistics in signal handler. Instead, it is turned on 
> >by setting
> >track_syscache_usage_interval to a positive number in milliseconds.
> 
> I agreed. Agecalss is important to tweak the prune_min_age. 
> Collecting stats is heavy at every stats change
> 
> >2. pg_stat_syscache view.  (in 0002)
> >
> >This view shows catcache statistics. Statistics is taken only on the 
> >backends where
> >syscache tracking is active.
> >
> >>  pid  | application_name |relname |cache_name
> >|   size   |ageclass | nentries
> >>
> >--+--++---+--
> >+-+---
> >>  9984 | psql | pg_statistic   | 
> >> pg_statistic_relid_att_inh_index  |
> >12676096 | {30,60,600,1200,1800,0} | {17660,17310,55870,0,0,0}
> >
> >Age class is the basis of catcache truncation mechanism and shows the 
> >distribution
> >based on elapsed time since last access. As I didn't came up an appropriate 
> >way, it is
> >represented as two arrays.  Ageclass stores maximum age for each class in 
> >seconds.
> >Nentries holds entry numbers correnponding to the same element in ageclass. 
> >In the
> >above example,
> >
> > age class  : # of entries in the cache
> >   up to   30s  : 17660
> >   up to   60s  : 17310
> >   up to  600s  : 55870
> >   up to 1200s  : 0
> >   up to 1800s  : 0
> >   more longer  : 0
> >
> > The ageclass is {0, 0.05, 0.1, 1, 2, 3}th multiples of  cache_prune_min_age 
> > on the
> >backend.
> 
> I just thought that the pair of ageclass and nentries can be represented as
> json or multi-dimensional array but in virtual they are all same and can be 
> converted each other
> using some functions. So I'm not sure which representaion is better one.

Multi dimentional array in any style sounds reasonable. Maybe
array is preferable in system views as it is a basic type than
JSON. In the attached, it looks like the follows:

=# select * from pg_stat_syscache  where ntuples > 100;
-[ RECORD 1 ]--
pid | 1817
relname | pg_class
cache_name  | pg_class_oid_index
size| 2048
ntuples | 189
searches| 1620
hits| 1431
neg_hits| 0
ageclass| {{30,189},{60,0},{600,0},{1200,0},{1800,0},{0,0}}
last_update | 2018-11-27 19:22:00.74026+09


> >3. non-transactional GUC setting (in 0003)
> >
> >It allows setting GUC variable set by the action GUC_ACTION_NONXACT(the name
> >requires condieration) survive beyond rollback. It is required by remote guc 
> >setting to
> >work sanely. Without the feature a remote-set value within a trasction will 
> >disappear
> >involved in rollback. The only local interface for the NONXACT action is
> >set_config(name, value, is_local=false, is_nonxact = true). 
> >pg_set_backend_guc()
> >below works on this feature.
> 
> TBH, I'm not familiar with around this and I may be missing something.
> In order to change the other backend's GUC value,
> is ignoring transactional behevior always necessary? When transaction of GUC 
> setting
> is failed and rollbacked, if the error message is supposeed to be reported I 
> thought 
> just trying the transaction again is enough.

The target ba

Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread David Steele
On 11/26/18 11:04 PM, Robert Haas wrote:
> On Mon, Nov 26, 2018 at 10:13 PM David Steele  wrote:
>> I propose we remove the deprecated exclusive backup mode of
>> pg_start_backup()/pg_stop_backup() for Postgres 12.
> 
> -1. I don't have a problem with deprecating exclusive backup mode
> eventually, but I don't see any good reason to do it this soon.
> 
> It's not like the problems with exclusive backup are so serious that
> you can't work around them.  If you know which machine is your master,
> you can arrange to remove backup_label on reboot (only) on the master
> (only). 

I do not recommend that users write automated scripts to delete files
out of $PGDATA.  Accidents happen that way.

If this is our recommendation on how the mitigate the problem with
exclusive backup then I think it proves the point that it is broken and
should be removed.

> Sure, a lot of people probably do this wrong, but it's
> infeasible to disallow all the things that people might use
> incorrectly without making the system useless.

But we have already fixed this problem -- it only remains to remove the
old method.

> There must be hundreds or thousands of backup scripts written by
> individual users that still use exclusive mode floating around out
> there.  Forcing all of those to be updated or scrapped will annoy
> users to no benefit.

But we are OK with forcing them to scrap their recovery scripts?

And there is a benefit -- Postgres will be able to restart if it crashes
during a backup.

Regards,
-- 
-David
da...@pgmasters.net



Re: csv format for psql

2018-11-27 Thread Daniel Verite
Tom Lane wrote:

> what I did instead was just to make
> csv_print_field force field quoting if any of these cases could
> possibly apply.  That will result in excess quoting in some
> cases, but I think that's fine, since they're all pretty uncommon.

Indeed.

> (BTW, it seems only chance that the server's CSV mode isn't also
> subject to this case, but since it always quotes empty fields,
> I think we're OK there.)
> 
> Pushed with that correction and some other last-minute review.

Thanks!


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Continue work on changes to recovery.conf API

2018-11-27 Thread David Steele
On 11/27/18 3:59 AM, Peter Eisentraut wrote:
> On 25/11/2018 21:39, Andres Freund wrote:
>> On 2018-11-25 13:24:15 -0500, Stephen Frost wrote:
>>> - User performs a backup with pg_basebackup -R
>>> - Replica is then promoted to be a primary
>>> - User performs a backup with pg_basebackup -R on the new primary
>>> - Duplicate entries end up in postgresql.auto.conf.  Ew.
>> Why don't we put recovery entries into postgresql.recovery.conf or such?
>> And overwrite rather than append?
> 
> Adding more such sub-configuration files would be easy to do and has
> some merit, but the devil is in the details.  In what order would those
> files be read?  Who is supposed to write to it, is it reserved for
> pg_basebackup use only?  If you choose to use this particular name, is
> it not used when not in recovery?  Does this file belong in the data
> directory or the configuration directory?  Which choice will offend
> packagers the least?  Etc.

I would prefer a specific file that will be auto-included into
postgresql.conf when present but be ignored when not present.  Some
settings are generally ephemeral (recovery_target_time) and it would be
nice for them to go away.  When recovery is complete the file would be
renamed to .done just as recovery.conf is now.

I had been thinking about keeping the recovery.conf file name but on
reflection is seems best to make a clean break.  I like Andres'
suggestion of postgresql.recovery.conf.

I would not be in favor of this file being for pg_basebackup only.  If
it has documented and consistent behavior then any restore solution
should be able to use it.

To be clear, the aim is to give tools a place to write recovery GUCs
that are considered temporary and/or replaceable.  Users are still free
to write permanent recovery GUCs into whatever file they would like,
e.g. restore_command.

Regards,
-- 
-David
da...@pgmasters.net



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-27 Thread Peter Eisentraut
On 27/11/2018 10:10, Sergei Kornilov wrote:
> Hello
> 
>>>  - recovery_target = immediate was replaced with recovery_target_immediate 
>>> bool GUC
>>
>> Why?
> Due this comment: 
> https://www.postgresql.org/message-id/20181126172118.GY3415%40tamriel.snowman.net
>> I've not been following this very closely, but seems like
>> recovery_target_string is a bad idea.. Why not just make that
>> 'recovery_target_immediate' and make it a boolean? Having a GUC that's
>> only got one valid value seems really odd.

It is a bit odd, but that's the way it's been, and I don't see a reason
to change it as part of this fix.  We are attempting to fix the way the
GUC parameters are parsed, not change the name and meaning of the
parameters themselves.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-27 Thread Michael Paquier
On Thu, Nov 22, 2018 at 01:16:09PM +0900, Michael Paquier wrote:
> No, pgarch_readyXLog() should still look after .ready files as those are
> here for this purpose, but we could have an additional check to see if
> the segment linked with it actually exists and can be archived.  This
> check could happen in pgarch.c code before calling the archive command
> gets called (just before pgarch_ArchiverCopyLoop and after
> XLogArchiveCommandSet feels rather right, and that it should be cheap
> enough to call stat()).

s/pgarch_ArchiverCopyLoop/pgarch_archiveXlog/.

Attached is a patch showing shaped based on the idea of upthread.
Thoughts?
--
Michael
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 844b9d1b0e..1cc788ca83 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -427,6 +428,9 @@ pgarch_ArchiverCopyLoop(void)
 
 		for (;;)
 		{
+			struct stat	stat_buf;
+			char		pathname[MAXPGPATH];
+
 			/*
 			 * Do not initiate any more archive commands after receiving
 			 * SIGTERM, nor after the postmaster has died unexpectedly. The
@@ -456,6 +460,25 @@ pgarch_ArchiverCopyLoop(void)
 return;
 			}
 
+			/*
+			 * In the event of a system crash, archive status files may be
+			 * left behind as their removals are not durable, cleaning up
+			 * orphan entries here is the cheapest method.  So check that
+			 * the segment trying to be archived still exists.
+			 */
+			snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+			if (stat(pathname, &stat_buf) != 0)
+			{
+char		xlogready[MAXPGPATH];
+
+StatusFilePath(xlogready, xlog, ".ready");
+if (durable_unlink(xlogready, WARNING) == 0)
+	ereport(WARNING,
+			(errmsg("removed orphan archive status file %s",
+	xlogready)));
+return;
+			}
+
 			if (pgarch_archiveXlog(xlog))
 			{
 /* successful */


signature.asc
Description: PGP signature


Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-27 Thread Peter Eisentraut
On 27/11/2018 13:29, Peter Eisentraut wrote:
> On 27/11/2018 10:10, Sergei Kornilov wrote:
>> Hello
>>
  - recovery_target = immediate was replaced with recovery_target_immediate 
 bool GUC
>>>
>>> Why?
>> Due this comment: 
>> https://www.postgresql.org/message-id/20181126172118.GY3415%40tamriel.snowman.net
>>> I've not been following this very closely, but seems like
>>> recovery_target_string is a bad idea.. Why not just make that
>>> 'recovery_target_immediate' and make it a boolean? Having a GUC that's
>>> only got one valid value seems really odd.
> 
> It is a bit odd, but that's the way it's been, and I don't see a reason
> to change it as part of this fix.  We are attempting to fix the way the
> GUC parameters are parsed, not change the name and meaning of the
> parameters themselves.

The attached seems to be the simplest way to fix this.  (Needs
documentation updates, test updates, error message refinement.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d5cb3f3a30bd045283c9848bd39ff012c817d908 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 27 Nov 2018 13:43:54 +0100
Subject: [PATCH] WIP Only allow one recovery target setting

---
 src/backend/utils/misc/guc.c | 29 +++--
 src/include/access/xlog.h|  2 +-
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6497393c03..dd2caee95b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11065,14 +11065,11 @@ check_recovery_target(char **newval, void **extra, 
GucSource source)
 static void
 assign_recovery_target(const char *newval, void *extra)
 {
+   if (recoveryTarget)
+   elog(ERROR, "multiple recovery targets specified");
+
if (newval && strcmp(newval, "") != 0)
recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
-   else
-   /*
-* Reset recoveryTarget to RECOVERY_TARGET_UNSET to proper 
handle user
-* setting multiple recovery_target with blank value on last.
-*/
-   recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11098,13 +11095,14 @@ check_recovery_target_xid(char **newval, void 
**extra, GucSource source)
 static void
 assign_recovery_target_xid(const char *newval, void *extra)
 {
+   if (recoveryTarget)
+   elog(ERROR, "multiple recovery targets specified");
+
if (newval && strcmp(newval, "") != 0)
{
recoveryTarget = RECOVERY_TARGET_XID;
recoveryTargetXid = *((TransactionId *) extra);
}
-   else
-   recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11161,13 +11159,14 @@ check_recovery_target_time(char **newval, void 
**extra, GucSource source)
 static void
 assign_recovery_target_time(const char *newval, void *extra)
 {
+   if (recoveryTarget)
+   elog(ERROR, "multiple recovery targets specified");
+
if (newval && strcmp(newval, "") != 0)
{
recoveryTarget = RECOVERY_TARGET_TIME;
recoveryTargetTime = *((TimestampTz *) extra);
}
-   else
-   recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11186,13 +11185,14 @@ check_recovery_target_name(char **newval, void 
**extra, GucSource source)
 static void
 assign_recovery_target_name(const char *newval, void *extra)
 {
+   if (recoveryTarget)
+   elog(ERROR, "multiple recovery targets specified");
+
if (newval && strcmp(newval, "") != 0)
{
recoveryTarget = RECOVERY_TARGET_NAME;
recoveryTargetName = (char *) newval;
}
-   else
-   recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11240,13 +11240,14 @@ check_recovery_target_lsn(char **newval, void 
**extra, GucSource source)
 static void
 assign_recovery_target_lsn(const char *newval, void *extra)
 {
+   if (recoveryTarget)
+   elog(ERROR, "multiple recovery targets specified");
+
if (newval && strcmp(newval, "") != 0)
{
recoveryTarget = RECOVERY_TARGET_LSN;
recoveryTargetLSN = *((XLogRecPtr *) extra);
}
-   else
-   recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f3a7ba4d42..1fd4aec3c7 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -79,7 +79,7 @@ extern HotStandbyState standbyState;
  */
 typedef enum
 {
-   RECOVERY_TARGET_UNSET,
+   RECOVERY_TARGET_UNSET = 0,
RECOVERY_TARGET_XID,
RECOVERY_TARGET_TIME,
RECOVERY_TARGET_NAME,
-- 
2.19.1



Re: Inadequate executor locking of indexes

2018-11-27 Thread David Rowley
On Tue, 27 Nov 2018 at 19:00, Amit Langote
 wrote:
> On 2018/11/27 6:19, David Rowley wrote:
> > On Mon, 26 Nov 2018 at 18:57, Amit Langote
> >> That's an interesting point.  Although, considering the concerns that Tom
> >> raised about the same index relation being locked such that lock-strength
> >> upgrade occurs (his example containing two CTEs), we'll have to find a way
> >> to do the ModifyTable run-time pruning such that result relations and
> >> their indexes (possibly under multiple ModifyTable nodes) are locked with
> >> RowExclusiveLock before they're locked with AccessShareLock lock as scan
> >> relations.  For example, we might be able to find a way to do the
> >> ModifyTable run-time pruning in InitPlan before initializing plan trees.
> >
> > I thought my idea of the processing the rangetable at the end of
> > planning to determine the strongest lock per relation would have
> > solved that.
>
> Yeah, would be nice to make that work.

Here's a very rough and incomplete patch just to demo what I had in
mind. The finalize_lockmodes() is likely more or less complete, just
minus me testing it works.  What's mostly missing is changing all the
places that grab index locks to use the new idxlockmode field. I've
really just changed index scan and index only scan and just stared a
bit at nodeModifyTable.c wondering what I should do with that
operation != CMD_DELETE test before the ExecOpenIndices() call.

The patch also includes code to determine the strongest rellockmode
per relation.  Perhaps it's not really worth doing that since parse
analyze could still cause some lock upgrade hazards. The code that's
there just fixes things for the executor, so really would only have an
effect when executing cached plans.

If this looks like a good path to go in, then I can produce something
a bit more finished. I'm just a bit unsure when exactly I can do that
as I'm on leave and have other commitments to take care of.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


wip_idxlockmode_and_lock_upgrade_fix.patch
Description: Binary data


postgres_fdw: oddity in costing aggregate pushdown paths

2018-11-27 Thread Etsuro Fujita
Hi,

While working on [1], I noticed that since we don't set the selectivity
and cost of the local_conds (i.e., fpinfo->local_conds_sel and
fpinfo->local_conds_cost) properly in add_foreign_grouping_paths and
foreign_grouping_ok, estimate_path_cost_size produces considerably
underestimated results when use_remote_estimate.  I think this would
cause problems in later planning steps.  So, why not add something like
this to add_foreign_grouping_paths, as done in other functions such as
postgresGetForeignJopinPaths?

--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5496,6 +5496,19 @@ add_foreign_grouping_paths(PlannerInfo *root,
RelOptInfo\
 *input_rel,
if (!foreign_grouping_ok(root, grouped_rel, extra->havingQual))
return;

+   /*
+* Compute the selectivity and cost of the local_conds, so we don't have
+* to do it over again for each path.  The best we can do for these
+* conditions is to estimate selectivity on the basis of local
statistics.
+*/
+   fpinfo->local_conds_sel = clauselist_selectivity(root,
+fpinfo->local_conds,
+0,
+JOIN_INNER,
+NULL);
+
+   cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
+
/* Estimate the cost of push down */
estimate_path_cost_size(root, grouped_rel, NIL, NIL, &rows,
&width, &startup_cost, &total_cost);

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5BBC4140.50403%40lab.ntt.co.jp



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-11-27 Thread Michael Banck
Hi,

Am Freitag, den 19.10.2018, 22:50 +0900 schrieb Michael Paquier:
> On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote:
> Thanks.  This is now committed after some tweaks to the comments, a bit
> earlier than I thought first.

I found an issue with this [d55241af7, "Use whitelist to choose files
scanned with pg_verify_checksums"] commit, namely, it makes
pg_verify_checksums no longer scan non-default tablespaces. So if you
have all of your data in tablespaces, it will more-or-less immediately
return with success.

I've extended the test suite to induce corruption in a table located in
a non-default tablespace, see the attached patch.

If fails like this, i.e. does not detect the corruption:

t/002_actions.pl .. 14/42 
#   Failed test 'fails with corrupted data in non-default tablespace status 
(got 0 vs expected 1)'
#   at t/002_actions.pl line 87.

#   Failed test 'fails with corrupted data in non-default tablespace stdout 
/(?^:Bad checksums:.*1)/'
#   at t/002_actions.pl line 87.
#   'Checksum scan completed
# Data checksum version: 1
# Files scanned:  1102
# Blocks scanned: 2861
# Bad checksums:  0
# '
# doesn't match '(?^:Bad checksums:.*1)'

#   Failed test 'fails with corrupted data in non-default tablespace stderr 
/(?^:checksum verification failed)/'
#   at t/002_actions.pl line 87.
#   ''
# doesn't match '(?^:checksum verification failed)'
# Looks like you failed 3 tests of 42.
t/002_actions.pl .. Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/42 subtests 

The problem is that "PG_12_201811201" is not considered a valid relation
file by isRelFileName(), so it skips it and the rest of the tablespace.

I had a quick look at fixing this but did not manage to immediately come
up with a solution, so posting here for now.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 0e1725d9f2..fd64de050e 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 36;
+use Test::More tests => 42;
 
 # Initialize node with checksums enabled.
 my $node = get_new_node('node_checksum');
@@ -54,31 +54,7 @@ command_fails(['pg_verify_checksums',  '-D', $pgdata],
 			  "fails with online cluster");
 
 # Create table to corrupt and get its relfilenode
-$node->safe_psql('postgres',
-	"SELECT a INTO corrupt1 FROM generate_series(1,1) AS a;
-	ALTER TABLE corrupt1 SET (autovacuum_enabled=false);");
-
-my $file_corrupted = $node->safe_psql('postgres',
-	"SELECT pg_relation_filepath('corrupt1')");
-my $relfilenode_corrupted =  $node->safe_psql('postgres',
-	"SELECT relfilenode FROM pg_class WHERE relname = 'corrupt1';");
-
-# Set page header and block size
-my $pageheader_size = 24;
-my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
-$node->stop;
-
-# Checksums are correct for single relfilenode as the table is not
-# corrupted yet.
-command_ok(['pg_verify_checksums',  '-D', $pgdata,
-	'-r', $relfilenode_corrupted],
-	"succeeds for single relfilenode with offline cluster");
-
-# Time to create some corruption
-open my $file, '+<', "$pgdata/$file_corrupted";
-seek($file, $pageheader_size, 0);
-syswrite($file, '\0\0\0\0\0\0\0\0\0');
-close $file;
+my $relfilenode_corrupted = create_corruption($node, 'corrupt1', 'pg_default');
 
 # Global checksum checks fail
 $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
@@ -95,6 +71,72 @@ $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
 		  [qr/checksum verification failed/],
 		  'fails for corrupted data on single relfilenode');
 
+# Drop corrupt table again and make sure there is no more corruption
+$node->start;
+$node->safe_psql('postgres', 'DROP TABLE corrupt1;');
+$node->stop;
+$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+'succeeds again: '.$node->data_dir);
+
+# Create table to corrupt in a non-default tablespace and get its relfilenode
+my $tablespace_dir = $node->data_dir."/../ts_corrupt_dir";
+mkdir ($tablespace_dir);
+$node->start;
+$node->safe_psql('postgres', "CREATE TABLESPACE ts_corrupt LOCATION '".$tablespace_dir."';");
+$relfilenode_corrupted = create_corruption($node, 'corrupt2', 'ts_corrupt');
+$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+		  1,
+		  [qr/Bad checksums:.*1/],
+		  [qr/checksum verification failed/],
+		  'fails with corrupted data in non-defau

Re: [RFC] Removing "magic" oids

2018-11-27 Thread Andrew Dunstan



On 11/26/18 5:50 PM, Andres Freund wrote:



With some adjustments to the tests to remove problematic cases (e.g.
postgres_fdw's ft_pg_type) the tests pass. The exception is
HEAD->HEAD. The change is that the LOs are not dumped in the same
order pre and post upgrade. I can change the tests to allow for a
greater fuzz factor - generally when the source and target are the
same we don't allow any fuzz.  Or if we care we could do a better job
of dumping LOs in a consistent order.

So you'd want to dump large objects in oid order or such? Probably
comparatively not a huge overhead, but also not nothing? We don't really
force ordering in other places in pg_dump afaik.


Well, all other data is dumped in a consistent order, and the tests rely on
this. If we don't care about that for LOs I can accommodate it. I don't have
a terribly strong opinion about the desirability of making LOs keep the same
behaviour.

I don't think it's true that other comparable metadata is dumped in a
really consistent order. What changes is the order of data in a system
catalog (pg_largeobject_metadata), and we don't enforce that the order
stays the same in e.g. pg_class either.  I guess we could add a
not-actually-needed sort to getBlobs(), with a comment saying that we
only need that for better comparability and note that it's less needed
for other types of objects due to the dependency ordering that pg_dump
does for most object types.




As of now, I have simnply got around the issue in the buildfarm module 
by allowing up to 50 lines of fuzz in the pre-upgrade and post-upgrade 
diffs when the source and destination branch are the same.



cheers


andrew


--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Python versions (was Re: RHEL 8.0 build)

2018-11-27 Thread Peter Eisentraut
On 25/11/2018 23:14, Tom Lane wrote:
> Andres Freund  writes:
>> On 2018-11-24 15:49:25 -0500, Tom Lane wrote:
>>> There's been some preliminary discussion about starting to default to
>>> python3, but given this project's inherent conservatism, I don't expect
>>> that to happen for some years yet.  In any case, whenever we do pull
>>> that trigger we'd surely do so only in HEAD not released branches, so
>>> buildfarm owners will need to deal with the case for years more.
> 
>> Why don't we probe for python2 in addition to python by default? That
>> ought to make RHEL 8 work, without making the switch just yet.
> 
> I'm unexcited about that because that *would* be expressing a version
> preference --- and one that's on the wrong side of history.

I think it would be appropriate to probe in the order

python python3 python2

This would satisfy most scenarios that are valid under PEP 394.

> Also, I noticed on a fresh FreeBSD 12.0 installation that what
> I've got is
> 
> $ ls /usr/bin/pyth*
> ls: /usr/bin/pyth*: No such file or directory
> $ ls /usr/local/bin/pyth*
> /usr/local/bin/python2.7
> /usr/local/bin/python2.7-config
> /usr/local/bin/python3.6
> /usr/local/bin/python3.6-config
> /usr/local/bin/python3.6m
> /usr/local/bin/python3.6m-config
> 
> So there are modern platforms on which "python2" isn't going to make
> it work automatically either.

I don't think this is a setup we need to support.  You are probably
suppose to install a meta package that will provide a "python" or
"python3" binary.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Peter Eisentraut
On 27/11/2018 04:46, Andres Freund wrote:
> That was my gut reaction as well, but I think David's argument about
> existing scripts / workflows being broken due the the recovery.conf
> is a good reason to be more aggressive here.

But backup scripts are not affected by the recovery.conf changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SSL tests failing with "ee key too small" error on Debian SID

2018-11-27 Thread Peter Eisentraut
On 26/11/2018 01:35, Michael Paquier wrote:
> When going up to 2k, it takes longer to generate the keys than to run
> the tests, so keeping them in the tree looks like a pretty good gain to
> me.

Another concern might be that repeatedly generating certificates might
drain entropy unnecessarily.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 02:09:05PM +0100, Michael Banck wrote:
> I had a quick look at fixing this but did not manage to immediately come
> up with a solution, so posting here for now.

If you look at another thread, the patch posted on the top would
actually solve this issue:
https://www.postgresql.org/message-id/20181021134206.ga14...@paquier.xyz

Your problem could also be solved with the minimalistic patch attached,
so fixing on the way the problems with temporary files present in PGDATA
something like the attached could be used...  Based on the stale status
of the other thread I am unsure what should be done though.
--
Michael
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index f0e09bea20..148aa511f6 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -21,6 +21,7 @@
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
+#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -189,9 +190,22 @@ scan_directory(const char *basedir, const char *subdir)
 		char		fn[MAXPGPATH];
 		struct stat st;
 
-		if (!isRelFileName(de->d_name))
+		if (strcmp(de->d_name, ".") == 0 ||
+			strcmp(de->d_name, "..") == 0)
 			continue;
 
+		/* Skip temporary files */
+		if (strncmp(de->d_name,
+	PG_TEMP_FILE_PREFIX,
+	strlen(PG_TEMP_FILE_PREFIX)) == 0)
+			continue;
+
+		/* Skip temporary folders */
+		if (strncmp(de->d_name,
+	PG_TEMP_FILES_DIR,
+	strlen(PG_TEMP_FILES_DIR)) == 0)
+			return;
+
 		snprintf(fn, sizeof(fn), "%s/%s", path, de->d_name);
 		if (lstat(fn, &st) < 0)
 		{
@@ -206,6 +220,13 @@ scan_directory(const char *basedir, const char *subdir)
 	   *segmentpath;
 			BlockNumber segmentno = 0;
 
+			/*
+			 * Only normal relation files can be analyzed.  Note that this
+			 * skips temporary relations.
+			 */
+			if (!isRelFileName(de->d_name))
+continue;
+
 			/*
 			 * Cut off at the segment boundary (".") to get the segment number
 			 * in order to mix it into the checksum. Then also cut off at the
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 0e1725d9f2..fd64de050e 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 36;
+use Test::More tests => 42;
 
 # Initialize node with checksums enabled.
 my $node = get_new_node('node_checksum');
@@ -54,31 +54,7 @@ command_fails(['pg_verify_checksums',  '-D', $pgdata],
 			  "fails with online cluster");
 
 # Create table to corrupt and get its relfilenode
-$node->safe_psql('postgres',
-	"SELECT a INTO corrupt1 FROM generate_series(1,1) AS a;
-	ALTER TABLE corrupt1 SET (autovacuum_enabled=false);");
-
-my $file_corrupted = $node->safe_psql('postgres',
-	"SELECT pg_relation_filepath('corrupt1')");
-my $relfilenode_corrupted =  $node->safe_psql('postgres',
-	"SELECT relfilenode FROM pg_class WHERE relname = 'corrupt1';");
-
-# Set page header and block size
-my $pageheader_size = 24;
-my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');
-$node->stop;
-
-# Checksums are correct for single relfilenode as the table is not
-# corrupted yet.
-command_ok(['pg_verify_checksums',  '-D', $pgdata,
-	'-r', $relfilenode_corrupted],
-	"succeeds for single relfilenode with offline cluster");
-
-# Time to create some corruption
-open my $file, '+<', "$pgdata/$file_corrupted";
-seek($file, $pageheader_size, 0);
-syswrite($file, '\0\0\0\0\0\0\0\0\0');
-close $file;
+my $relfilenode_corrupted = create_corruption($node, 'corrupt1', 'pg_default');
 
 # Global checksum checks fail
 $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
@@ -95,6 +71,72 @@ $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
 		  [qr/checksum verification failed/],
 		  'fails for corrupted data on single relfilenode');
 
+# Drop corrupt table again and make sure there is no more corruption
+$node->start;
+$node->safe_psql('postgres', 'DROP TABLE corrupt1;');
+$node->stop;
+$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+'succeeds again: '.$node->data_dir);
+
+# Create table to corrupt in a non-default tablespace and get its relfilenode
+my $tablespace_dir = $node->data_dir."/../ts_corrupt_dir";
+mkdir ($tablespace_dir);
+$node->start;
+$node->safe_psql('postgres', "CREATE TABLESPACE ts_corrupt LOCATION '".$tablespace_dir."';");
+$relfilenode_corrupted = create_corruption($node, 'corrupt2', 'ts_corrupt');
+$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+		  1,
+		  [qr/Bad checksums:.*1/],
+		  [qr/checksum verification failed/],
+		  'fails with corrupted data in non-default tablespace');
+
+# Drop corrupt table again and make sure there is no more corruption
+$node->start;
+$nod

Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 02:21:49PM +0100, Peter Eisentraut wrote:
> On 27/11/2018 04:46, Andres Freund wrote:
>> That was my gut reaction as well, but I think David's argument about
>> existing scripts / workflows being broken due the the recovery.conf
>> is a good reason to be more aggressive here.
> 
> But backup scripts are not affected by the recovery.conf changes.

In any of my own backup scripts (yeah!), I don't have any dependency to
that either.  Or perhaps pgBackRest has a dependency in this area?
--
Michael


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Simon Riggs
On Tue, 27 Nov 2018 at 03:13, David Steele  wrote:


> The deprecated exclusive mode promises to make a difficult problem
> simple but doesn't live up to that promise. That's why it was replaced
> externally in 9.6 and why pg_basebackup has not used exclusive backups
> since it was introduced in 9.2.
>
> Non-exclusive backups have been available since 9.6 and several
> third-party solutions support this mode, in addition to pg_basebackup.
>
> The recently introduced recovery changes will break current automated
> solutions so this seems like a good opportunity to make improvements on
> the backup side as well.
>
> I'll submit a patch for the 2019-01 commitfest.
>

-1 for removal, in this release

It's not there because anyone likes it, it's there because removing it is a
risk

Recent changes are the reason to keep it, not a reason to remove.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-27 Thread Sergei Kornilov
Hi

> The attached seems to be the simplest way to fix this. (Needs
> documentation updates, test updates, error message refinement.)

Thank you!
I add documentation change, tests and rephrased error message.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index db1a2d4..7cbea6e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3221,7 +3221,7 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   recovery_target_lsn, recovery_target_name,
   recovery_target_time, or recovery_target_xid
   can be used; if more than one of these is specified in the configuration
-  file, the last entry will be used.
+  file, a error will be raised.
   These parameters can only be set at server start.
  
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6497393..290a157 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11065,14 +11065,11 @@ check_recovery_target(char **newval, void **extra, GucSource source)
 static void
 assign_recovery_target(const char *newval, void *extra)
 {
+	if (recoveryTarget)
+		elog(ERROR, "can not specify multiple recovery targets");
+
 	if (newval && strcmp(newval, "") != 0)
 		recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
-	else
-		/*
-		 * Reset recoveryTarget to RECOVERY_TARGET_UNSET to proper handle user
-		 * setting multiple recovery_target with blank value on last.
-		 */
-		recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11098,13 +11095,14 @@ check_recovery_target_xid(char **newval, void **extra, GucSource source)
 static void
 assign_recovery_target_xid(const char *newval, void *extra)
 {
+	if (recoveryTarget)
+		elog(ERROR, "can not specify multiple recovery targets");
+
 	if (newval && strcmp(newval, "") != 0)
 	{
 		recoveryTarget = RECOVERY_TARGET_XID;
 		recoveryTargetXid = *((TransactionId *) extra);
 	}
-	else
-		recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11161,13 +11159,14 @@ check_recovery_target_time(char **newval, void **extra, GucSource source)
 static void
 assign_recovery_target_time(const char *newval, void *extra)
 {
+	if (recoveryTarget)
+		elog(ERROR, "can not specify multiple recovery targets");
+
 	if (newval && strcmp(newval, "") != 0)
 	{
 		recoveryTarget = RECOVERY_TARGET_TIME;
 		recoveryTargetTime = *((TimestampTz *) extra);
 	}
-	else
-		recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11186,13 +11185,14 @@ check_recovery_target_name(char **newval, void **extra, GucSource source)
 static void
 assign_recovery_target_name(const char *newval, void *extra)
 {
+	if (recoveryTarget)
+		elog(ERROR, "can not specify multiple recovery targets");
+
 	if (newval && strcmp(newval, "") != 0)
 	{
 		recoveryTarget = RECOVERY_TARGET_NAME;
 		recoveryTargetName = (char *) newval;
 	}
-	else
-		recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
@@ -11240,13 +11240,14 @@ check_recovery_target_lsn(char **newval, void **extra, GucSource source)
 static void
 assign_recovery_target_lsn(const char *newval, void *extra)
 {
+	if (recoveryTarget)
+		elog(ERROR, "can not specify multiple recovery targets");
+
 	if (newval && strcmp(newval, "") != 0)
 	{
 		recoveryTarget = RECOVERY_TARGET_LSN;
 		recoveryTargetLSN = *((XLogRecPtr *) extra);
 	}
-	else
-		recoveryTarget = RECOVERY_TARGET_UNSET;
 }
 
 static bool
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f3a7ba4..1fd4aec 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -79,7 +79,7 @@ extern HotStandbyState standbyState;
  */
 typedef enum
 {
-	RECOVERY_TARGET_UNSET,
+	RECOVERY_TARGET_UNSET = 0,
 	RECOVERY_TARGET_XID,
 	RECOVERY_TARGET_TIME,
 	RECOVERY_TARGET_NAME,
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index f6f2e8b..cba367c 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 
 # Create and test a standby from given backup, with a certain recovery target.
 # Choose $until_lsn later than the transaction commit that causes the row
@@ -115,31 +115,44 @@ test_recovery_standby('name', 'standby_4', $node_master, \@recovery_params,
 test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
 	"5000", $lsn5);
 
-# Multiple targets
-# Last entry has priority (note that an array respects the order of items
-# not hashes).
+# multiple recovery targets are disallowed
+sub test_recovery_multiple_targets
+{
+	my $test_name   = shift;
+	my $node_name   = shift;
+	my $node_master = shift;
+	my $recovery_params = shift;
+
+	my $node_standby = get_new_node($node_name);
+	$node_standby->init_from_backup($node_master, 'my_backup',
+		has_restoring => 1);
+
+	foreach my $param_item (@$recovery_params)
+	{
+		$node_

Re: SSL tests failing with "ee key too small" error on Debian SID

2018-11-27 Thread Peter Eisentraut
On 01/10/2018 14:18, Kyotaro HORIGUCHI wrote:
> The attached second patch just changes key size to 2048 bits and
> "ee key too small" are eliminated in 001_ssltests_master, but
> instead I got "ca md too weak" error. This is eliminated by using
> sha256 instead of sha1 in cas.config. (third attached)

I have applied these configuration changes and created a new set of test
files with them.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SSL tests failing with "ee key too small" error on Debian SID

2018-11-27 Thread Peter Eisentraut
On 01/10/2018 14:18, Kyotaro HORIGUCHI wrote:
> By the way I got (with both 1.0.2k and 1.1.1) a "tlsv1 alert
> unknown ca" error from 002_scram.pl. It is fixed for me by the
> forth attached, but I'm not sure why we haven't have such a
> complain. (It happens only for me?)

I haven't seen it.  Do the tests print that out or does it appear in the
logs?  Which test complains?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Continue work on changes to recovery.conf API

2018-11-27 Thread Peter Eisentraut
On 27/11/2018 13:21, David Steele wrote:
> I would prefer a specific file that will be auto-included into
> postgresql.conf when present but be ignored when not present.  Some
> settings are generally ephemeral (recovery_target_time) and it would be
> nice for them to go away.  When recovery is complete the file would be
> renamed to .done just as recovery.conf is now.

That might be a useful facility, but it wouldn't really address the
pg_basebackup -R issue, because that creates settings that you don't
want going away in this manner.  You'd then need two separate such
files, one for targeted recovery that goes away when recovery ends, and
one that is automatically included that pg_basebackup can overwrite at will.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SSL tests failing with "ee key too small" error on Debian SID

2018-11-27 Thread Tom Lane
Peter Eisentraut  writes:
> On 01/10/2018 14:18, Kyotaro HORIGUCHI wrote:
>> The attached second patch just changes key size to 2048 bits and
>> "ee key too small" are eliminated in 001_ssltests_master, but
>> instead I got "ca md too weak" error. This is eliminated by using
>> sha256 instead of sha1 in cas.config. (third attached)

> I have applied these configuration changes and created a new set of test
> files with them.

Buildfarm critters aren't going to be happy unless you back-patch that.

regards, tom lane



Re: pgsql: Integrate recovery.conf into postgresql.conf

2018-11-27 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 27/11/2018 13:29, Peter Eisentraut wrote:
> > On 27/11/2018 10:10, Sergei Kornilov wrote:
>   - recovery_target = immediate was replaced with 
>  recovery_target_immediate bool GUC
> >>>
> >>> Why?
> >> Due this comment: 
> >> https://www.postgresql.org/message-id/20181126172118.GY3415%40tamriel.snowman.net
> >>> I've not been following this very closely, but seems like
> >>> recovery_target_string is a bad idea.. Why not just make that
> >>> 'recovery_target_immediate' and make it a boolean? Having a GUC that's
> >>> only got one valid value seems really odd.
> > 
> > It is a bit odd, but that's the way it's been, and I don't see a reason
> > to change it as part of this fix.  We are attempting to fix the way the
> > GUC parameters are parsed, not change the name and meaning of the
> > parameters themselves.

I disagree quite a bit- we're whacking around how recovery works and
this has been a wart since it went in because the contemplated later
changes to have more than one value be accepted there never
materialized, so we might as well change it while we're changing
everything else and clean it up.

If we really want to change it later we can do that, but we've evidently
decided to go the opposite direction and have multiple GUCs instead, so
let's be consistent.

> The attached seems to be the simplest way to fix this.  (Needs
> documentation updates, test updates, error message refinement.)

Sure, this approach seems fine to me and is perhaps a bit cleaner.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Nov 27, 2018 at 02:21:49PM +0100, Peter Eisentraut wrote:
> > On 27/11/2018 04:46, Andres Freund wrote:
> >> That was my gut reaction as well, but I think David's argument about
> >> existing scripts / workflows being broken due the the recovery.conf
> >> is a good reason to be more aggressive here.
> > 
> > But backup scripts are not affected by the recovery.conf changes.
> 
> In any of my own backup scripts (yeah!), I don't have any dependency to
> that either.  Or perhaps pgBackRest has a dependency in this area?

If you don't consider your recovery scripts and your backup scripts to
be related then I've really got to wonder how you're regularly testing
your backups to make sure that they're actually valid.

If you aren't regularly testing your backups then I've got little
sympathy.

To be clear, pgbackrest doesn't have any dependency here- but it, like
all of the other 3rd party backup solutions and any restore solution
that a user has come up with, are going to have to be changed to deal
with the changes in how recovery works, so this is a good time to make
these changes.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> There must be hundreds or thousands of backup scripts written by
> individual users that still use exclusive mode floating around out
> there.  Forcing all of those to be updated or scrapped will annoy
> users to no benefit.

How about automated recovery scripts?

We've decided it's fine to break all of those which exist out there.

I'm concerned, seriously, that people don't have anywhere near the
concern about the recovery side of things as they do about the backup
side of things and that's really concerning.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Peter Eisentraut
On 27/11/2018 15:45, Stephen Frost wrote:
>>> But backup scripts are not affected by the recovery.conf changes.
>> In any of my own backup scripts (yeah!), I don't have any dependency to
>> that either.  Or perhaps pgBackRest has a dependency in this area?
> If you don't consider your recovery scripts and your backup scripts to
> be related then I've really got to wonder how you're regularly testing
> your backups to make sure that they're actually valid.

The sort of installations that continue to use the exclusive backup mode
probably have the following tooling: a 20-line shell script to make the
backup and either a 10-line shell script or a similarly sized README or
wiki page to do the recovery.  Changing the latter for the recovery.conf
changes is probably a 3-line change.  Changing the former for the
removal of exclusive backups would require major changes.  (Try writing
a shell script that keeps a psql session open while it takes the backup
from the file system.  It's possible, but it requires significantly more
logic.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 27/11/2018 15:45, Stephen Frost wrote:
> >>> But backup scripts are not affected by the recovery.conf changes.
> >> In any of my own backup scripts (yeah!), I don't have any dependency to
> >> that either.  Or perhaps pgBackRest has a dependency in this area?
> > If you don't consider your recovery scripts and your backup scripts to
> > be related then I've really got to wonder how you're regularly testing
> > your backups to make sure that they're actually valid.
> 
> The sort of installations that continue to use the exclusive backup mode
> probably have the following tooling: a 20-line shell script to make the
> backup and either a 10-line shell script or a similarly sized README or
> wiki page to do the recovery.  Changing the latter for the recovery.conf
> changes is probably a 3-line change.  Changing the former for the
> removal of exclusive backups would require major changes.  (Try writing
> a shell script that keeps a psql session open while it takes the backup
> from the file system.  It's possible, but it requires significantly more
> logic.)

They're also the sort of installations which don't have reliable backups
and don't have any clue about the danger they are in due to the current
bug/issue/whatever we have with exclusive backups.

No, I don't agree that it's sensible to continue to march on as if
nothing is wrong.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Andreas Karlsson

On 11/27/18 3:46 PM, Stephen Frost wrote:

I'm concerned, seriously, that people don't have anywhere near the
concern about the recovery side of things as they do about the backup
side of things and that's really concerning.


I agree with your larger point, but in this case the two breakages do 
not seem equal. As far as I gather the removal of recovery.conf will in 
practice result in a longer downtime when your recovery scripts breaks 
but not any data loss. While if we remove the exclusive backup mode then 
some people's backup scripts will break and if they do not properly 
monitor their backups they risk data loss.


I think we should use more caution when data loss is at stake rather 
than "just" downtime. So personally I am in favor of updating the manual 
with warnings (right now it does not even say if exclusive or 
non-exclusive is the default) and adding a deprecation warning when 
people use the exclusive mode.


Best regards,
Andreas



Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Stephen Frost
Greetings,

* Andreas Karlsson (andr...@proxel.se) wrote:
> On 11/27/18 3:46 PM, Stephen Frost wrote:
> >I'm concerned, seriously, that people don't have anywhere near the
> >concern about the recovery side of things as they do about the backup
> >side of things and that's really concerning.
> 
> I agree with your larger point, but in this case the two breakages do not
> seem equal. As far as I gather the removal of recovery.conf will in practice
> result in a longer downtime when your recovery scripts breaks but not any
> data loss. While if we remove the exclusive backup mode then some people's
> backup scripts will break and if they do not properly monitor their backups
> they risk data loss.

Let's please try to remember that this is across a major version upgrade
and is removing a broken capability that's already been deprecated for a
couple years.

If they aren't monitoring their backup scripts today, and aren't
regularly performing restores of those backups, they're already risking
data loss.

> I think we should use more caution when data loss is at stake rather than
> "just" downtime. So personally I am in favor of updating the manual with
> warnings (right now it does not even say if exclusive or non-exclusive is
> the default) and adding a deprecation warning when people use the exclusive
> mode.

Waiting isn't going to change any of these factors, nor will throwing
warnings about the exclusive mode if people aren't monitoring their
scripts.

If we really want to keep the exclusive backup mode around, then, as
Magnus said on a nearby thread, it needs to be fixed.  If it's fixed and
just works and everyone's scripts work, then great, then we can
un-deprecate it and move on.

If we aren't going to fix it then let's remove it.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Andreas Karlsson

On 11/27/18 4:11 PM, Stephen Frost wrote:

I agree with your larger point, but in this case the two breakages do not
seem equal. As far as I gather the removal of recovery.conf will in practice
result in a longer downtime when your recovery scripts breaks but not any
data loss. While if we remove the exclusive backup mode then some people's
backup scripts will break and if they do not properly monitor their backups
they risk data loss.


Let's please try to remember that this is across a major version upgrade
and is removing a broken capability that's already been deprecated for a
couple years.


I know that and you know that, but even our own manual use the exclusive 
mode in some of the examples, e.g: "pg_start_backup('label_goes_here')"[1].


Your point about that people who do not monitor their backups wont 
notice warnings is fair, but even so I think we should give people more 
time to migrate when even our own documentation isn't always clear about 
the exclusive mode being deprecated.


1. 
https://www.postgresql.org/docs/11/functions-admin.html#FUNCTIONS-ADMIN-BACKUP


Andreas



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-11-27 Thread Michael Banck
Hi,

Am Dienstag, den 27.11.2018, 22:52 +0900 schrieb Michael Paquier:
> On Tue, Nov 27, 2018 at 02:09:05PM +0100, Michael Banck wrote:
> > I had a quick look at fixing this but did not manage to immediately come
> > up with a solution, so posting here for now.
> 
> If you look at another thread, the patch posted on the top would
> actually solve this issue:
> https://www.postgresql.org/message-id/20181021134206.ga14...@paquier.xyz

Oh, I kinda followed that thread a bit, but I think that patch fixes
things more by matter of moving code around, at least I haven't noticed
that tablespaces were explicitly claimed to be fixed in that thread.

> Your problem could also be solved with the minimalistic patch attached,
> so fixing on the way the problems with temporary files present in PGDATA
> something like the attached could be used...  

Thanks!

> Based on the stale status
> of the other thread I am unsure what should be done though.

As pg_verify_checksums appears to be broken with respect to tablespaces
in REL_11_STABLE (so I think 11.1, but not 11.0) as well, I think this
merits a short-term minimal invasive fix (i.e. the patch you posted,
just that there's no TAP testsuite for pg_verify_checksums in v11
unfortunately) on its own, regardless of the wider changes proposed in
the other thread.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Peter Eisentraut
On 27/11/2018 16:02, Stephen Frost wrote:
> They're also the sort of installations which don't have reliable backups
> and don't have any clue about the danger they are in due to the current
> bug/issue/whatever we have with exclusive backups.
> 
> No, I don't agree that it's sensible to continue to march on as if
> nothing is wrong.

It's fair to argue that this facility is broken and needs to be removed.
 But that needs to be its own argument.

The argument in this subthread is that since we're already making
changes in an adjacent area, removing this feature will have a low impact.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Continue work on changes to recovery.conf API

2018-11-27 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 27/11/2018 13:21, David Steele wrote:
> > I would prefer a specific file that will be auto-included into
> > postgresql.conf when present but be ignored when not present.  Some
> > settings are generally ephemeral (recovery_target_time) and it would be
> > nice for them to go away.  When recovery is complete the file would be
> > renamed to .done just as recovery.conf is now.
> 
> That might be a useful facility, but it wouldn't really address the
> pg_basebackup -R issue, because that creates settings that you don't
> want going away in this manner.  You'd then need two separate such
> files, one for targeted recovery that goes away when recovery ends, and
> one that is automatically included that pg_basebackup can overwrite at will.

I've been thinking about this also and I agree that there's some
challenges when it comes to having another file- what happens if someone
does an ALTER SYSTEM on primary_conninfo while it's in the
'recovery.auto.conf' (or whatever) file?  Does that go into
postgresql.auto.conf, or somewhere else?  What if they do a RESET?

Then there's the other side of things- do we really want things like
recovery_target_time being kept around in postgresql.auto.conf after a
promotoion?  Do we want to keep appending primary_conninfo into
postgresql.auto.conf?  I haven't looked but I'm also concerned that
something like ALTER SYSTEM RESET isn't really prepared to find
duplicates in postgresql.auto.conf...

Maybe thinking through what we want to have happen in each scenario
would be good.  If you perform a pg_basebackup -R and there's already
something set in postgresql.auto.conf for primary conninfo- what should
happen?  If we reach the end of recovery and promote while
postgresql.auto.conf has recovery_target_time set, what should happen?
If third-party tools want to do what pg_basebackup -R does and modify
things in postgresql.auto.conf, how should they do that?

Here's my thoughts on answers to the above:

- pg_basebackup -R should INSERT .. ON CONFLICT UPDATE the settings that
  it wants to set in postgresql.auto.conf

- When we reach the end of recovery and promote, we should DELETE from
  the postgresql.auto.conf the recovery target settings.

- External tools should either be told that they can modify
  postgresql.auto.conf and given guideance on how to do so, or we should
  provide a tool which allows them to do so (or maybe both).

As we already have a section for recovery target settings that clearly
has them as independent, hopefully this will make sense to users.  Open
to other ideas too, of course, but I don't think we can continue to just
append things to the end of postgresql.auto.conf when pg_basebackup is
run with -R.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread David Steele
On 11/27/18 9:54 AM, Peter Eisentraut wrote:
> On 27/11/2018 15:45, Stephen Frost wrote:
 But backup scripts are not affected by the recovery.conf changes.
>>> In any of my own backup scripts (yeah!), I don't have any dependency to
>>> that either.  Or perhaps pgBackRest has a dependency in this area?
>> If you don't consider your recovery scripts and your backup scripts to
>> be related then I've really got to wonder how you're regularly testing
>> your backups to make sure that they're actually valid.
> 
> The sort of installations that continue to use the exclusive backup mode
> probably have the following tooling: a 20-line shell script to make the
> backup and either a 10-line shell script or a similarly sized README or
> wiki page to do the recovery.  Changing the latter for the recovery.conf
> changes is probably a 3-line change.  

Really?  We have gone from a file that can be safely overwritten
(recovery.conf) to a file that might need to be parsed to figure out if
the settings already exists (postgresql.auto.conf).  Of course, you can
just continue to append to the file but that seems pretty grotty to me.

This will also require changes to all HA solutions or backup solutions
that deal with recovery.  I think you underestimate how big a change
this is.  I've been thinking about how to write code for it and it is
significantly more complex -- also more flexible so I'm happy about that.

> Changing the former for the
> removal of exclusive backups would require major changes.  (Try writing
> a shell script that keeps a psql session open while it takes the backup
> from the file system.  It's possible, but it requires significantly more
> logic.)

We provide pg_basebackup with core and it is a solid tool for doing
backups.  Do we really want to encourage the use of bash scripts to do
what we already have a tool for?  If users want to do something more
complex than pg_basebackup then bash is certainly not the tool for that.

Regards,
-- 
-David
da...@pgmasters.net



Re: shared-memory based stats collector

2018-11-27 Thread Tomas Vondra

On 11/27/18 9:59 AM, Kyotaro HORIGUCHI wrote:
>>
>> ...>>

For the main workload there's pretty much no difference, but for selects
from the stats catalogs there's ~20% drop in throughput. In absolute
numbers this means drop from ~670tps to ~550tps. I haven't investigated
this, but I suppose this is due to dshash seqscan being more expensive
than reading the data from file.


Thanks for finding that. The three seqscan loops in
pgstat_vacuum_stat cannot take such a long time, I think. I'll
investigate it.



OK. I'm not sure this is related to pgstat_vacuum_stat - the slowdown 
happens while querying the catalogs, so why would that trigger vacuum of 
the stats? I may be missing something, of course.


FWIW, the "query statistics" test simply does this:

  SELECT * FROM pg_stat_all_tables;
  SELECT * FROM pg_stat_all_indexes;
  SELECT * FROM pg_stat_user_indexes;
  SELECT * FROM pg_stat_user_tables;
  SELECT * FROM pg_stat_sys_tables;
  SELECT * FROM pg_stat_sys_indexes;

and the slowdown happened even it was running on it's own (nothing else 
running on the instance). Which mostly rules out concurrency issues with 
the hash table locking etc.



I don't think any of this is an issue in practice, though. The important
thing is that there's no measurable impact on the regular workload.

Now, a couple of comments regarding individual parts of the patch.


0001-0003
-

I do think 0001 - 0003 are ready, with some minor cosmetic issues:

1) I'd rephrase the last part of dshash_seq_init comment more like this:

* If consistent is set for dshash_seq_init, the all hash table
* partitions are locked in the requested mode (as determined by the
* exclusive flag), and the locks are held until the end of the scan.
* Otherwise the partition locks are acquired and released as needed
* during the scan (up to two partitions may be locked at the same time).


Replaced with this.


Maybe it should briefly explain what the consistency guarantees are (and
aren't), but considering we're not materially changing the existing
behavior probably  is not really necessary.


Mmm. actually sequential scan is a new thing altogether, but..



Sure, there are new pieces. But does it significantly change consistency 
guarantees when reading the stats? I don't think so - there was no 
strict consistency guaranteed before (due to data interleaved with 
inquiries, UDP throwing away packets under load, etc.). Based on the 
discussion in this thread that seems to be the consensus.



2) I think the dshash_find_extended() signature should be more like
dshash_find(), i.e. just appending parameters instead of moving them
around unnecessarily. Perhaps we should add


Sure. It seems to have done by my off-lined finger;p Fixed.



;-)



0004 (+0005 and 0007)
-

This seems fine, but I have my doubts about two changes - removing of
stats_temp_directory and the IDLE_STATS_UPDATE_TIMEOUT thingy.

There's a couple of issues with the stats_temp_directory. Firstly, I
don't understand why it's spread over multiple parts of the patch. The
GUC is removed in 0004, the underlying variable is removed in 0005 and
then the docs are updated in 0007. If we really want to do this, it
should happen in a single patch.


Sure.


But the main question is - do we really want to do that? I understand
this directory was meant for the stats data we're moving to shared
memory, so removing it seems natural. But clearly it's used by
pg_stat_statements - 0005 fixes that, of course, but I wonder if there
are other extensions using it to store files?
It's not just about how intensive I/O to those files is, but this also
means the files will now be included in backups / pg_rewind, and maybe
that's not really desirable?

Maybe it's fine but I'm not quite convinced about it ...


It was also in my mind. Anyway sorry for the strange separation.

I was confused about pgstat_stat_directory (the names are
actually very confusing..). Addition to that pg_stat_statements
does *not* use the variable stats_temp_directory, but using
PG_STAT_TMP_DIR. pgstat_stat_directory was used only by
basebackup.c.

The GUC base variable pgstat_temp_directory is not extern'ed so
we can just remove it along with the GUC
definition. pgstat_stat_directory (it actually stores *temporary*
stats directory) was extern'ed in pgstat.h and PG_STAT_TMP_DIR is
defined in pgstat.h. They are not removed in the new version.
Finally 0005 no longer breaks any other bins, contribs and
external extensions.



Great. I'll take a look.


I'm not sure I understand what IDLE_STATS_UPDATE_TIMEOUT does. You've
described it as

This adds a new timeout IDLE_STATS_UPDATE_TIMEOUT. This works
similarly to IDLE_IN_TRANSACTIION_SESSION_TIMEOUT. It fires in
at most PGSTAT_STAT_MIN_INTERVAL(500)ms to clean up pending
statistics update.

but I'm not sure what pending updates do you mean? Aren't we updating
the stats at the end of each transaction? At least that's what we've
been doing before, so may

Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread David Steele
On 11/27/18 10:29 AM, Peter Eisentraut wrote:
> On 27/11/2018 16:02, Stephen Frost wrote:
>> They're also the sort of installations which don't have reliable backups
>> and don't have any clue about the danger they are in due to the current
>> bug/issue/whatever we have with exclusive backups.
>>
>> No, I don't agree that it's sensible to continue to march on as if
>> nothing is wrong.
> 
> It's fair to argue that this facility is broken and needs to be removed.
>  But that needs to be its own argument.

I believe I made this argument in the OP.

> The argument in this subthread is that since we're already making
> changes in an adjacent area, removing this feature will have a low impact.

Fair enough, but I think the argument against exclusive backups stands
on its own.  Also, I don't see backup and restore as adjacent.  I see
them as thoroughly intertwined.

-- 
-David
da...@pgmasters.net



Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread David Steele
On 11/27/18 8:56 AM, Simon Riggs wrote:
> On Tue, 27 Nov 2018 at 03:13, David Steele  > wrote:
>  
> 
> The deprecated exclusive mode promises to make a difficult problem
> simple but doesn't live up to that promise. That's why it was replaced
> externally in 9.6 and why pg_basebackup has not used exclusive backups
> since it was introduced in 9.2.
> 
> Non-exclusive backups have been available since 9.6 and several
> third-party solutions support this mode, in addition to pg_basebackup.
> 
> The recently introduced recovery changes will break current automated
> solutions so this seems like a good opportunity to make improvements on
> the backup side as well.
> 
> I'll submit a patch for the 2019-01 commitfest.
> 
> 
> -1 for removal, in this release
> 
> It's not there because anyone likes it, it's there because removing it
> is a risk
> 
> Recent changes are the reason to keep it, not a reason to remove.

How so?

-- 
-David
da...@pgmasters.net



Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Stephen Frost
Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 11/27/18 10:29 AM, Peter Eisentraut wrote:
> > On 27/11/2018 16:02, Stephen Frost wrote:
> >> They're also the sort of installations which don't have reliable backups
> >> and don't have any clue about the danger they are in due to the current
> >> bug/issue/whatever we have with exclusive backups.
> >>
> >> No, I don't agree that it's sensible to continue to march on as if
> >> nothing is wrong.
> > 
> > It's fair to argue that this facility is broken and needs to be removed.
> >  But that needs to be its own argument.

We made that argument, and had agreement on it, and that's why it's
deprecated today.  The only question here is how long we keep this
deprecated and broken capability around and given that we're completely
breaking everything around recovery, now is as good a time as any since
users should be looking at how they do backup and restore when they
migrate to v12 due to these changes.

> > The argument in this subthread is that since we're already making
> > changes in an adjacent area, removing this feature will have a low impact.
> 
> Fair enough, but I think the argument against exclusive backups stands
> on its own.  Also, I don't see backup and restore as adjacent.  I see
> them as thoroughly intertwined.

Agreed.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Simon Riggs
On Tue, 27 Nov 2018 at 14:45, Stephen Frost  wrote:

> Greetings,
>
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > On Tue, Nov 27, 2018 at 02:21:49PM +0100, Peter Eisentraut wrote:
> > > On 27/11/2018 04:46, Andres Freund wrote:
> > >> That was my gut reaction as well, but I think David's argument about
> > >> existing scripts / workflows being broken due the the recovery.conf
> > >> is a good reason to be more aggressive here.
> > >
> > > But backup scripts are not affected by the recovery.conf changes.
> >
> > In any of my own backup scripts (yeah!), I don't have any dependency to
> > that either.  Or perhaps pgBackRest has a dependency in this area?
>
> If you don't consider your recovery scripts and your backup scripts to
> be related then I've really got to wonder how you're regularly testing
> your backups to make sure that they're actually valid.
>
> If you aren't regularly testing your backups then I've got little
> sympathy.
>
> To be clear, pgbackrest doesn't have any dependency here- but it, like
> all of the other 3rd party backup solutions and any restore solution
> that a user has come up with, are going to have to be changed to deal
> with the changes in how recovery works, so this is a good time to make
> these changes.
>

If those tools are updated and the changes all work, that will be good.

That isn't the same thing as an argument to remove things in this release.

I propose waiting until next release.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Stephen Frost
Greetings,

* Andreas Karlsson (andr...@proxel.se) wrote:
> On 11/27/18 4:11 PM, Stephen Frost wrote:
> >>I agree with your larger point, but in this case the two breakages do not
> >>seem equal. As far as I gather the removal of recovery.conf will in practice
> >>result in a longer downtime when your recovery scripts breaks but not any
> >>data loss. While if we remove the exclusive backup mode then some people's
> >>backup scripts will break and if they do not properly monitor their backups
> >>they risk data loss.
> >
> >Let's please try to remember that this is across a major version upgrade
> >and is removing a broken capability that's already been deprecated for a
> >couple years.
> 
> I know that and you know that, but even our own manual use the exclusive
> mode in some of the examples, e.g: "pg_start_backup('label_goes_here')"[1].

Thankfully, that'll get nicely cleaned up by removing the exclusive
backup mode.

> Your point about that people who do not monitor their backups wont notice
> warnings is fair, but even so I think we should give people more time to
> migrate when even our own documentation isn't always clear about the
> exclusive mode being deprecated.

I don't buy off on this argument- we also have things like this:

https://www.postgresql.org/docs/11/backup-file.html

Where we show a simple tar command as a way to backup the database, but
then we caveat it.  Similairly, we make it clear that users shouldn't be
using the exclusive mode backups:

https://www.postgresql.org/docs/11/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP

If users are only reading the system functions description and thinking
that's enough to figure out how to do backups, and didn't read the
chapters of documentation we have about how to perform a backup, then
chances are very good their existing backup systems are broken, and
that's all the more reason to remove the exclusive mode because at least
then it won't look like things are working when they aren't.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Andres Freund
Hi,

On 2018-11-26 23:04:35 -0500, Robert Haas wrote:
> It's not like the problems with exclusive backup are so serious that
> you can't work around them.  If you know which machine is your master,
> you can arrange to remove backup_label on reboot (only) on the master
> (only). Sure, a lot of people probably do this wrong, but it's
> infeasible to disallow all the things that people might use
> incorrectly without making the system useless.

That doesn't protect against postgres, not machine, [crash-]restarts
though.

Greetings,

Andres Freund



Re: Continue work on changes to recovery.conf API

2018-11-27 Thread Andres Freund
Hi,

On 2018-11-27 15:36:59 +0100, Peter Eisentraut wrote:
> That might be a useful facility, but it wouldn't really address the
> pg_basebackup -R issue, because that creates settings that you don't
> want going away in this manner.

Why is that / which are those?  It's not like it worked like that <
2dedf4d9.

Greetings,

Andres Freund



Re: Sequential UUID Generation

2018-11-27 Thread Tomas Vondra

On 11/19/18 2:08 PM, Uday Bhaskar V wrote:
I tried below function as which can be used as default to column. But 
every time we need to created 2 sequences, 1st one takes care of the 
first 8 bytes and 2nd takes care of the 2nd part of the UUID. I have not 
tested index and space utilization. I have to examine this. This might 
not be completely unique in the nature. but still trying for the best.




I got a bit bored a couple of days ago, so I wrote an extension 
generating UUIDs based on either a sequence or timestamp. See


  https://blog.2ndquadrant.com/sequential-uuid-generators/

for an explanation and some simple test results.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Continue work on changes to recovery.conf API

2018-11-27 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-11-27 15:36:59 +0100, Peter Eisentraut wrote:
> > That might be a useful facility, but it wouldn't really address the
> > pg_basebackup -R issue, because that creates settings that you don't
> > want going away in this manner.
> 
> Why is that / which are those?  It's not like it worked like that <
> 2dedf4d9.

primary_conninfo and restore_command are the main ones, and the idea is
that you'd like those to be specified even when you aren't in recovery,
so that you can have a symmetric config for when a given system does
become a replica.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Python versions (was Re: RHEL 8.0 build)

2018-11-27 Thread Andres Freund
Hi,

On 2018-11-27 14:14:24 +0100, Peter Eisentraut wrote:
> On 25/11/2018 23:14, Tom Lane wrote:
> > Andres Freund  writes:
> >> On 2018-11-24 15:49:25 -0500, Tom Lane wrote:
> >>> There's been some preliminary discussion about starting to default to
> >>> python3, but given this project's inherent conservatism, I don't expect
> >>> that to happen for some years yet.  In any case, whenever we do pull
> >>> that trigger we'd surely do so only in HEAD not released branches, so
> >>> buildfarm owners will need to deal with the case for years more.
> > 
> >> Why don't we probe for python2 in addition to python by default? That
> >> ought to make RHEL 8 work, without making the switch just yet.
> > 
> > I'm unexcited about that because that *would* be expressing a version
> > preference --- and one that's on the wrong side of history.
> 
> I think it would be appropriate to probe in the order
> 
> python python3 python2
> 
> This would satisfy most scenarios that are valid under PEP 394.

ISTM we should first go for python, python2, python3 in all branches,
and then have a separate master only commit that changes over the order
to prefer python3 over 2. I don't think preferring 3 over 2 would be
appropriate for past branches, but it'll surely become more common to
run into distros without "python" installed.

Greetings,

Andres Freund



Re: pg11.1 jit segv

2018-11-27 Thread Andres Freund
On 2018-11-27 00:26:55 -0800, Andres Freund wrote:
> Hi,
> 
> On 2018-11-26 22:56:09 -0600, Justin Pryzby wrote:
> > On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote:
> > > Could you check that the attached patch this also fixes your original
> > > issue? Going through the code to see if there's other occurances of
> > > this.
> > 
> > Confirmed that fixes my crash.
> 
> Thanks a lot for narrowing down your crash to something I can reproduce!
> 
> 
> Here's a more complete patch, with a testcase.
> 
> Tom, the test creates a 1100k column table (using \set ECHO none +
> gexec), but with a small row. Currently it's not dropped after the
> table, as I thought it might be worthwhile to be tested by
> pg_dump/upgrade etc too. You're probably the person most concerned with
> test runtimes, ... Any concerns about that? The table creation is
> quick*, on the order of 30ms.

And pushed. Justin, thanks again for reporting the bug and then
narrowing it down to a reproducible test case! Would've been much harder
to diagnose without that.

I'll look into your comments patch in a bit.

Greetings,

Andres Freund



jit comments typos (Re: pg11.1 jit segv)

2018-11-27 Thread Justin Pryzby
On Tue, Nov 27, 2018 at 10:24:52AM -0800, Andres Freund wrote:
> And pushed. Justin, thanks again for reporting the bug and then
> narrowing it down to a reproducible test case! Would've been much harder
> to diagnose without that.
> 
> I'll look into your comments patch in a bit.

Thanks for implementing and patching it :)

And thanks for remembering the patch, and reminding me.

Here's an updated copy with additional hunks.

Justin
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ec4a250..83e4e05 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1873,7 +1873,7 @@ CheckOpSlotCompatibility(ExprEvalStep *op, TupleTableSlot *slot)
 
 	/*
 	 * Should probably fixed at some point, but for now it's easier to allow
-	 * buffer and heap tuples to be used interchangably.
+	 * buffer and heap tuples to be used interchangeably.
 	 */
 	if (slot->tts_ops == &TTSOpsBufferHeapTuple &&
 		op->d.fetch.kind == &TTSOpsHeapTuple)
diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 4111bf0..ba238f1 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -103,7 +103,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	funcname = llvm_expand_funcname(context, "deform");
 
 	/*
-	 * Check which columns do have to exist, so we don't have to check the
+	 * Check which columns have to exist, so we don't have to check the
 	 * rows natts unnecessarily.
 	 */
 	for (attnum = 0; attnum < desc->natts; attnum++)
@@ -292,7 +292,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	}
 
 	/*
-	 * Check if's guaranteed the all the desired attributes are available in
+	 * Check if it's guaranteed that all the desired attributes are available in
 	 * tuple. If so, we can start deforming. If not, need to make sure to
 	 * fetch the missing columns.
 	 */
@@ -377,7 +377,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 
 		/*
 		 * If this is the first attribute, slot->tts_nvalid was 0. Therefore
-		 * reset offset to 0 to, it be from a previous execution.
+		 * also reset offset to 0, it may be from a previous execution.
 		 */
 		if (attnum == 0)
 		{
@@ -407,7 +407,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 
 		/*
 		 * Check for nulls if necessary. No need to take missing attributes
-		 * into account, because in case they're present the heaptuple's natts
+		 * into account, because if they're present, the heaptuple's natts
 		 * would have indicated that a slot_getmissingattrs() is needed.
 		 */
 		if (!att->attnotnull)
@@ -494,13 +494,13 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 			(known_alignment < 0 || known_alignment != TYPEALIGN(alignto, known_alignment)))
 		{
 			/*
-			 * When accessing a varlena field we have to "peek" to see if we
+			 * When accessing a varlena field, we have to "peek" to see if we
 			 * are looking at a pad byte or the first byte of a 1-byte-header
 			 * datum.  A zero byte must be either a pad byte, or the first
-			 * byte of a correctly aligned 4-byte length word; in either case
+			 * byte of a correctly aligned 4-byte length word; in either case,
 			 * we can align safely.  A non-zero byte must be either a 1-byte
 			 * length word, or the first byte of a correctly aligned 4-byte
-			 * length word; in either case we need not align.
+			 * length word; in either case, we need not align.
 			 */
 			if (att->attlen == -1)
 			{
@@ -594,8 +594,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 		else if (att->attnotnull && attguaranteedalign && known_alignment >= 0)
 		{
 			/*
-			 * If the offset to the column was previously known a NOT NULL &
-			 * fixed width column guarantees that alignment is just the
+			 * If the offset to the column was previously known, a NOT NULL &
+			 * fixed-width column guarantees that alignment is just the
 			 * previous alignment plus column width.
 			 */
 			Assert(att->attlen > 0);
@@ -636,8 +636,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	   LLVMBuildGEP(b, v_tts_nulls, &l_attno, 1, ""));
 
 		/*
-		 * Store datum. For byval datums copy the value, extend to Datum's
-		 * width, and store. For byref types, store pointer to data.
+		 * Store datum. For byval datums: copy the value, extend to Datum's
+		 * width, and store. For byref types: store pointer to data.
 		 */
 		if (att->attbyval)
 		{
diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp
index b33a321..2ad29be 100644
--- a/src/backend/jit/llvm/llvmjit_inline.cpp
+++ b/src/backend/jit/llvm/llvmjit_inline.cpp
@@ -9,7 +9,7 @@
  * for an external function is found - not guaranteed! - the index will then
  * be used to judge their instruction count / inline worthiness. After doing
  * so for all external functions, all the referenced functions (and
- * p

Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-27 Thread Bossart, Nathan
On 11/21/18, 10:16 PM, "Michael Paquier"  wrote:
>> At Fri, 02 Nov 2018 14:47:08 +, Nathan Bossart
>>  wrote in
>> <154117002849.5569.14588306221618961668.p...@coridan.postgresql.org>:
>>> Granted, any added delay from this patch is unlikely to be noticeable
>>> unless your archiver is way behind and archive_status has a huge
>>> number of files.  However, I have seen cases where startup is stuck on
>>> other tasks like SyncDataDirectory() and RemovePgTempFiles() for a
>>> very long time, so perhaps it is worth considering.
>
> What's the scale of the pg_wal partition and the amount of time things
> were stuck?  I would imagine that the sync phase hurts the most, and a
> fast startup time for crash recovery is always important.

I don't have exact figures to share, but yes, a huge number of calls
to sync_file_range() and fsync() can use up a lot of time.  Presumably
Postgres processes files individually instead of using sync() because
sync() may return before writing is done.  Also, sync() would affect
non-Postgres files.  However, it looks like Linux actually does wait
for writing to complete before returning from sync() [0].

For RemovePgTempFiles(), the documentation above the function
indicates that skipping temp file cleanup during startup would
actually be okay because collisions with existing temp file names
should be handled by OpenTemporaryFile().  I assume this cleanup is
done during startup because there isn't a great alternative besides
offloading the work to a new background worker or something.

On 11/27/18, 6:35 AM, "Michael Paquier"  wrote:
> Attached is a patch showing shaped based on the idea of upthread.
> Thoughts?

I took a look at this patch.

+   /*
+* In the event of a system crash, archive status files 
may be
+* left behind as their removals are not durable, 
cleaning up
+* orphan entries here is the cheapest method.  So 
check that
+* the segment trying to be archived still exists.
+*/
+   snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog);
+   if (stat(pathname, &stat_buf) != 0)
+   {

Don't we also need to check that errno is ENOENT here?

+   StatusFilePath(xlogready, xlog, ".ready");
+   if (durable_unlink(xlogready, WARNING) == 0)
+   ereport(WARNING,
+   (errmsg("removed orphan 
archive status file %s",
+   
xlogready)));
+   return;

IIUC any time that the file does not exist, we will attempt to unlink
it.  Regardless of whether unlinking fails or succeeds, we then
proceed to give up archiving for now, but it's not clear why.  Perhaps
we should retry unlinking a number of times (like we do for
pgarch_archiveXlog()) when durable_unlink() fails and simply "break"
to move on to the next .ready file if durable_unlink() succeeds.

Nathan

[0] http://man7.org/linux/man-pages/man2/sync.2.html



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-27 Thread Andres Freund
Hi,

On 2018-11-27 20:43:06 +, Bossart, Nathan wrote:
> I don't have exact figures to share, but yes, a huge number of calls
> to sync_file_range() and fsync() can use up a lot of time.  Presumably
> Postgres processes files individually instead of using sync() because
> sync() may return before writing is done.  Also, sync() would affect
> non-Postgres files.  However, it looks like Linux actually does wait
> for writing to complete before returning from sync() [0].

sync() has absolutely no way to report errors. So, we're never going to
be able to use it.  Besides, even postgres' temp files would be a good
reason to not use it.

Greetings,

Andres Freund



Re: tab-completion debug print

2018-11-27 Thread David Fetter
On Tue, Nov 27, 2018 at 06:06:06PM +0900, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> At Mon, 26 Nov 2018 07:08:53 +0100, David Fetter  wrote in 
> <20181126060853.gp...@fetter.org>
> > On Sun, Nov 25, 2018 at 11:21:51PM -0500, Tom Lane wrote:
> > > Kyotaro HORIGUCHI  writes:
> > > >> On Fri, Nov 23, 2018 at 04:32:31PM -0500, Tom Lane wrote:
> > > >>> Hm.  I can see the value of instrumenting tab-complete when you're 
> > > >>> trying
> > > >>> to debug why it did something, but this output format seems pretty 
> > > >>> terse
> > > >>> and unreadable.
> > > 
> > > > The reason for the minimal output was it won't disturb console so
> > > > much when stderr is not redirected. It could be richer if we
> > > > premise redirection.
> > > 
> > > It seems to me this behavior would be completely unusable if it's not
> > > redirected; I don't understand why you're even considering that as an
> > > interesting option.  libreadline is necessarily active when we're doing
> > > this, and it will get very confused (or at least produce a very confused
> > > display) if anything else is emitting text onto the active console line.
> 
> The mess here started because I forgot about -L option of psql. I
> wouldn't do that if I knew of it.

Am I understanding correctly that currently, -L specifies a separate
log that only shows the output of queries?

If we're using that log, we'll need to rethink how best to describe
the new sub-feature, and possibly rethink how -L should function.
Tom's suggestion about recording both the catalog queries and their
result set/error messages seems in scope.

Speaking of scope, do we want something like \L (analogous to \o) for
this, or would a thing like that be fundamentally distinct?

> psql -L ~/psqldebug.log postgres

Excellent!

Do we still want this as a compile-time option, or does it make more
sense as a run-time option? I'm thinking that with \L, it might make
sense as a run-time option.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-27 Thread Bossart, Nathan
On 11/27/18, 2:46 PM, "Andres Freund"  wrote:
> On 2018-11-27 20:43:06 +, Bossart, Nathan wrote:
>> I don't have exact figures to share, but yes, a huge number of calls
>> to sync_file_range() and fsync() can use up a lot of time.  Presumably
>> Postgres processes files individually instead of using sync() because
>> sync() may return before writing is done.  Also, sync() would affect
>> non-Postgres files.  However, it looks like Linux actually does wait
>> for writing to complete before returning from sync() [0].
>
> sync() has absolutely no way to report errors. So, we're never going to
> be able to use it.  Besides, even postgres' temp files would be a good
> reason to not use it.

Ah, I see.  Thanks for clarifying.

Nathan



Re: tab-completion debug print

2018-11-27 Thread Tom Lane
David Fetter  writes:
> Do we still want this as a compile-time option, or does it make more
> sense as a run-time option? I'm thinking that with \L, it might make
> sense as a run-time option.

This seems to me to be strictly a developer debugging feature.

regards, tom lane



Re: SSL tests failing with "ee key too small" error on Debian SID

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 09:37:17AM -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 01/10/2018 14:18, Kyotaro HORIGUCHI wrote:
>>> The attached second patch just changes key size to 2048 bits and
>>> "ee key too small" are eliminated in 001_ssltests_master, but
>>> instead I got "ca md too weak" error. This is eliminated by using
>>> sha256 instead of sha1 in cas.config. (third attached)
> 
>> I have applied these configuration changes and created a new set of test
>> files with them.
> 
> Buildfarm critters aren't going to be happy unless you back-patch that.

Thanks for applying that, Peter.
--
Michael


signature.asc
Description: PGP signature


Re: Handling of REGRESS_OPTS in MSVC for regression tests

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 10:27:17AM +0900, Michael Paquier wrote:
> Okay, let's do so by supporting correctly NO_INSTALLCHECK.  My other
> refactoring work can also live with that.  Returning an empty list via
> fetchTests() and bypass a run if nothing is present looks fine to me.
> One extra thing is that modules/commit_ts and modules/test_rls_hooks are
> missing NO_INSTALLCHECK, so we would need to add that part to be
> completely correct.  I would also be inclined to back-patch both parts,
> however for my stuff I could live with this patch only on HEAD, and
> anybody willing to use installcheck on commit_ts and test_rls_hooks may
> be surprised to not be able to do that anymore after the minor release.
> It still looks incorrect to me though to be able to run installcheck on
> those modules though...
> 
> Attached is a proposal of patch, which works as I would expect with
> modulescheck and contribcheck.  How does that look?

If possible, I would like to move on with this stuff.  To keep things
green in the buildfarm all the time, I would like to do that with two
independent steps:
1) Add NO_INSTALLCHECK to modules/commit_ts and modules/test_rls_hook.
2) Add support for NO_INSTALLCHECK in the MSVC scripts.

Are there any objections?  I could do a back-patch as well to keep
things consistent across branches if there are voices in favor of that,
but that's not necessary for the upcoming Makefile cleanup with the new
set of PGXS options.
--
Michael


signature.asc
Description: PGP signature


Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 08:43:06PM +, Bossart, Nathan wrote:
> Don't we also need to check that errno is ENOENT here?

Yep.

> IIUC any time that the file does not exist, we will attempt to unlink
> it.  Regardless of whether unlinking fails or succeeds, we then
> proceed to give up archiving for now, but it's not clear why.  Perhaps
> we should retry unlinking a number of times (like we do for
> pgarch_archiveXlog()) when durable_unlink() fails and simply "break"
> to move on to the next .ready file if durable_unlink() succeeds.

Both suggestions sound reasonable to me.  (durable_unlink is not called
on HEAD in pgarch_archiveXlog).  How about 3 retries with a in-between
wait of 1s?  That's consistent with what pgarch_ArchiverCopyLoop does,
still I am not completely sure if we actually want to be consistent for
the purpose of removing orphaned ready files.
--
Michael


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 09:45:04AM -0500, Stephen Frost wrote:
> If you don't consider your recovery scripts and your backup scripts to
> be related then I've really got to wonder how you're regularly testing
> your backups to make sure that they're actually valid.

Base backups can be perfectly self-contained as long as they include all
the WAL segments needed to recover up to the end-of-backup record.
That's what pg_basebackup does with its default options
(--wal-method=stream in particular).

> If you aren't regularly testing your backups then I've got little
> sympathy.

Fortunately they do, hundreds of time on a daily basis ;)

> To be clear, pgbackrest doesn't have any dependency here- but it, like
> all of the other 3rd party backup solutions and any restore solution
> that a user has come up with, are going to have to be changed to deal
> with the changes in how recovery works, so this is a good time to make
> these changes.

My point is that base backups do not have a mandatory dependency with
recovery.conf all the time as they can perfectly be restored if they are
standalone backups.  I can see a dependency with recovery.conf once you
have a base backup which needs to be fed with WAL segments from an
external archive, or when using a base backup to create a standby.
--
Michael


signature.asc
Description: PGP signature


Re: tab-completion debug print

2018-11-27 Thread David Fetter
On Tue, Nov 27, 2018 at 03:54:55PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > Do we still want this as a compile-time option, or does it make more
> > sense as a run-time option? I'm thinking that with \L, it might make
> > sense as a run-time option.
> 
> This seems to me to be strictly a developer debugging feature.

I would have thought so, but as our tab completion gets bigger--and
it's been steadily doing that--we may find ourselves in a situation
where it'd be handy to debug things in a production environment, with
all the restrictions that implies.

Can we conceive of a circumstance where the check for -L/\L would be
significant?  I've seen people type pretty quickly, but not thus far
fast enough to notice a cache miss.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 04:26:45PM +0100, Michael Banck wrote:
> Oh, I kinda followed that thread a bit, but I think that patch fixes
> things more by matter of moving code around, at least I haven't noticed
> that tablespaces were explicitly claimed to be fixed in that thread.

That may have been an unconscious move ;)
It seems to me the root issue is that the original implementation of
pg_verify_checksums naively assumed that files can be skipped without
first checking at their types, which I got confused by in d55241af.

You should definitely be mentioned as this reporter anyway, and get
author credits for the additional test.

> As pg_verify_checksums appears to be broken with respect to tablespaces
> in REL_11_STABLE (so I think 11.1, but not 11.0) as well, I think this
> merits a short-term minimal invasive fix (i.e. the patch you posted,
> just that there's no TAP testsuite for pg_verify_checksums in v11
> unfortunately) on its own, regardless of the wider changes proposed in
> the other thread.

Yes, let's do so rather sooner than later if there are no objections
from others.  I am going to ping the other thread about what's discussed
here additionally in case folks missed what you reported.  Fixing the
temp file issue which has been reported by Andres first is an additional
bonus.
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Mon, Nov 19, 2018 at 10:17:19PM -0500, Stephen Frost wrote:
> Let's try to not conflate these two issues though, they're quite
> independent.

This is a poke about a recent issue raised by Michael Banck here:
https://www.postgresql.org/message-id/f1543332405.17247.9.ca...@credativ.de
And for which I am suggesting a minimal fix, which is extracted from a
patch at the most-top of this thread:
https://www.postgresql.org/message-id/20181127213625.gm1...@paquier.xyz

If there are no objections, I would like to fix the actual issue.  Then
I'll rebase a patch on top of what has been fixed for this thread for
what I proposed in the base backup code.
--
Michael


signature.asc
Description: PGP signature


Re: pg11b1 from outside a txn: "VACUUM cannot run inside a transaction block": should be: ...or multi-command string

2018-11-27 Thread Justin Pryzby
I'm resending a mail from June:
https://www.postgresql.org/message-id/flat/87sh5doya9.fsf%40news-spur.riddles.org.uk#83c3d1a183217204939252d56804f247

This is maybe a trivial change in ERROR string which maybe worth changing.

On PG10:
|[pryzbyj@database ~]$ psql postgres -c 'DROP DATABASE x; CREATE DATABASE x';
|ERROR:  DROP DATABASE cannot be executed from a function or multi-command 
string

On PG11.1:
|[pryzbyj@telsasoft2015 server]$ psql postgres -c 'DROP DATABASE x; CREATE 
DATABASE x';
|ERROR:  DROP DATABASE cannot run inside a transaction block

I spent at least a few minutes trying to do things like: psql -c 
'begin;commit;create;drop'
..before figuring it out due to misleading error message, despite having
written this earlier message.

On Sat, Jun 23, 2018 at 04:06:37PM -0500, Justin Pryzby wrote:
> in pg10:
> 
>   ts=# begin;VACUUM FULL pg_toast.pg_toast_2619;
>   BEGIN
>   ERROR:  25001: VACUUM cannot run inside a transaction block
>   LOCATION:  PreventTransactionChain, xact.c:3167
> => sounds fine
> 
>   $ psql postgres -c 'SELECT 1; VACUUM pg_statistic'
>   ERROR:  VACUUM cannot be executed from a function or multi-command 
> string
> => sounds fine
> 
> In pg11b1:
> 
>   pryzbyj=# begin;VACUUM FULL pg_toast.pg_toast_2619;
>   BEGIN
>   ERROR:  25001: VACUUM cannot run inside a transaction block
>   LOCATION:  PreventInTransactionBlock, xact.c:3163
> => sounds fine
> 
>   [pryzbyj@dev ~]$ psql -c 'SELECT 1; VACUUM pg_statistic'
>   ERROR:  VACUUM cannot run inside a transaction block
> => Error message seems off??
> 
> I couldn't figure out how to \set VERBOSITY verbose inside a psql command 
> (??),
> but strace shows for v10:
> SERROR\0VERROR\0C25001\0MVACUUM cannot be executed from a function or 
> multi-command string\0Fxact.c\0L3187\0RPreventTransactionChain
> 
> And for v11:
> SERROR\0VERROR\0C25001\0MVACUUM cannot run inside a transaction 
> block\0Fxact.c\0L3163\0RPreventInTransactionBlock
> 
> Function renamed by commit 04700b685f31508036456bea4d92533e5ceee9d6.
> 
> So behavior change maybe caused by 6eb52da3948dc8bc7c8a61cbacac14823b670c58 ?
> 
> Justin



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-27 Thread Bossart, Nathan
On 11/27/18, 3:20 PM, "Michael Paquier"  wrote:
> On Tue, Nov 27, 2018 at 08:43:06PM +, Bossart, Nathan wrote:
>> IIUC any time that the file does not exist, we will attempt to unlink
>> it.  Regardless of whether unlinking fails or succeeds, we then
>> proceed to give up archiving for now, but it's not clear why.  Perhaps
>> we should retry unlinking a number of times (like we do for
>> pgarch_archiveXlog()) when durable_unlink() fails and simply "break"
>> to move on to the next .ready file if durable_unlink() succeeds.
>
> Both suggestions sound reasonable to me.  (durable_unlink is not called
> on HEAD in pgarch_archiveXlog).  How about 3 retries with a in-between
> wait of 1s?  That's consistent with what pgarch_ArchiverCopyLoop does,
> still I am not completely sure if we actually want to be consistent for
> the purpose of removing orphaned ready files.

That sounds good to me.  I was actually thinking of using the same
retry counter that we use for pgarch_archiveXlog(), but on second
thought, it is probably better to have two independent retry counters
for these two unrelated operations.

Nathan



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 09:49:29PM +, Bossart, Nathan wrote:
> That sounds good to me.  I was actually thinking of using the same
> retry counter that we use for pgarch_archiveXlog(), but on second
> thought, it is probably better to have two independent retry counters
> for these two unrelated operations.

What I had in mind was two different variables if what I wrote was
unclear, possibly with the same value, as archiving failure and failure
with orphan file removals are two different concepts.
--
Michael


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread David Steele
On 11/27/18 4:25 PM, Michael Paquier wrote:
> On Tue, Nov 27, 2018 at 09:45:04AM -0500, Stephen Frost wrote:
>> If you don't consider your recovery scripts and your backup scripts to
>> be related then I've really got to wonder how you're regularly testing
>> your backups to make sure that they're actually valid.
> 
> Base backups can be perfectly self-contained as long as they include all
> the WAL segments needed to recover up to the end-of-backup record.
> That's what pg_basebackup does with its default options
> (--wal-method=stream in particular).

This is true, of course, but it misses one of the major benefits of
file-level backups which is PITR.

>> If you aren't regularly testing your backups then I've got little
>> sympathy.
> 
> Fortunately they do, hundreds of time on a daily basis ;)

Nice!

>> To be clear, pgbackrest doesn't have any dependency here- but it, like
>> all of the other 3rd party backup solutions and any restore solution
>> that a user has come up with, are going to have to be changed to deal
>> with the changes in how recovery works, so this is a good time to make
>> these changes.
> 
> My point is that base backups do not have a mandatory dependency with
> recovery.conf all the time as they can perfectly be restored if they are
> standalone backups.  I can see a dependency with recovery.conf once you
> have a base backup which needs to be fed with WAL segments from an
> external archive, or when using a base backup to create a standby.

If you want to do PITR -- which is the default in most situations --
then some interaction with recovery.conf is needed.  I think you would
be hard-pressed to find a prominent HA or backup solution that doesn't
do so.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: Remove Deprecated Exclusive Backup Mode

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Nov 27, 2018 at 09:45:04AM -0500, Stephen Frost wrote:
> > If you don't consider your recovery scripts and your backup scripts to
> > be related then I've really got to wonder how you're regularly testing
> > your backups to make sure that they're actually valid.
> 
> Base backups can be perfectly self-contained as long as they include all
> the WAL segments needed to recover up to the end-of-backup record.
> That's what pg_basebackup does with its default options
> (--wal-method=stream in particular).

This really doesn't change my opinion at all.  Sure, some custom scripts
might have been written to operate in this way and now they'll need to
be adjusted to work the exact same way pg_basebackup works today and use
the non-exclusive mode, but they'll be better off for it and won't have
the concern about what happens if the system is inadvertantly restarted
during a backup.

I'd also say it's likely that places which know enough to build such a
solution aren't ones that we really need to worry about having an issue
adjusting their scripts.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Mon, Nov 19, 2018 at 10:17:19PM -0500, Stephen Frost wrote:
> > Let's try to not conflate these two issues though, they're quite
> > independent.
> 
> This is a poke about a recent issue raised by Michael Banck here:
> https://www.postgresql.org/message-id/f1543332405.17247.9.ca...@credativ.de
> And for which I am suggesting a minimal fix, which is extracted from a
> patch at the most-top of this thread:
> https://www.postgresql.org/message-id/20181127213625.gm1...@paquier.xyz
> 
> If there are no objections, I would like to fix the actual issue.  Then
> I'll rebase a patch on top of what has been fixed for this thread for
> what I proposed in the base backup code.

So the as-committed "whitelist" approach did, in fact, end up excluding
huge portions of the database from being checked, that is, everything
inside of tablespaces.

This doesn't exactly change my opinion regarding this discussion and I'd
rather we revert the "whitelist" patch and use the very minimal patch
from here:

https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Handling of REGRESS_OPTS in MSVC for regression tests

2018-11-27 Thread Andrew Dunstan



On 11/27/18 4:10 PM, Michael Paquier wrote:

On Tue, Nov 27, 2018 at 10:27:17AM +0900, Michael Paquier wrote:

Okay, let's do so by supporting correctly NO_INSTALLCHECK.  My other
refactoring work can also live with that.  Returning an empty list via
fetchTests() and bypass a run if nothing is present looks fine to me.
One extra thing is that modules/commit_ts and modules/test_rls_hooks are
missing NO_INSTALLCHECK, so we would need to add that part to be
completely correct.  I would also be inclined to back-patch both parts,
however for my stuff I could live with this patch only on HEAD, and
anybody willing to use installcheck on commit_ts and test_rls_hooks may
be surprised to not be able to do that anymore after the minor release.
It still looks incorrect to me though to be able to run installcheck on
those modules though...

Attached is a proposal of patch, which works as I would expect with
modulescheck and contribcheck.  How does that look?

If possible, I would like to move on with this stuff.  To keep things
green in the buildfarm all the time, I would like to do that with two
independent steps:
1) Add NO_INSTALLCHECK to modules/commit_ts and modules/test_rls_hook.
2) Add support for NO_INSTALLCHECK in the MSVC scripts.

Are there any objections?  I could do a back-patch as well to keep
things consistent across branches if there are voices in favor of that,
but that's not necessary for the upcoming Makefile cleanup with the new
set of PGXS options.



I think you should just proceed with the changes above. I just had a 
quick look at the patch you posted before, and it looks sane enough.



I don't see a need to backpatch.


cheers


andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 05:45:41PM -0500, Stephen Frost wrote:
> This doesn't exactly change my opinion regarding this discussion and I'd
> rather we revert the "whitelist" patch and use the very minimal patch
> from here:
> 
> https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz

Believe me or not, but we have spent so much energy on this stuff that I
am ready to give up on the whitelist patch and focus on other business.
This doesn't change a couple of things though, so it is not *just* a
simple revert with the patch you mention applied:
- Adding a test for tablespaces makes sense.
- skipfile should be called after making sure that we work on a file.
- temporary files and temporary paths should be ignored.
- it is necessary to exclude EXEC_BACKEND files.
--
Michael


signature.asc
Description: PGP signature


Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

2018-11-27 Thread Thomas Munro
On Tue, Nov 27, 2018 at 2:53 PM Haribabu Kommi  wrote:
> Thanks for the updated patch. It looks good.
> I marked it in the commitfest as ready for committer.

Pushed.  Thanks for the reviews!

-- 
Thomas Munro
http://www.enterprisedb.com



Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Nov 27, 2018 at 05:45:41PM -0500, Stephen Frost wrote:
> > This doesn't exactly change my opinion regarding this discussion and I'd
> > rather we revert the "whitelist" patch and use the very minimal patch
> > from here:
> > 
> > https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz
> 
> Believe me or not, but we have spent so much energy on this stuff that I
> am ready to give up on the whitelist patch and focus on other business.

I would have hoped that you'd see why I was concerned from the start
about this now that we have a released version of pg_verify_checksums
in 11.1 that is paper-bag-worthy.

> This doesn't change a couple of things though, so it is not *just* a
> simple revert with the patch you mention applied:
> - Adding a test for tablespaces makes sense.

I agree with this, of course.

> - skipfile should be called after making sure that we work on a file.

It's not clear to me what this is referring to, exactly...?  Can you
clarify?

> - temporary files and temporary paths should be ignored.

There's example code for how to do this in basebackup.c

> - it is necessary to exclude EXEC_BACKEND files.

Agreed, they should be added to the exclude list.

Thanks!

Stephen


signature.asc
Description: PGP signature


Minor typo

2018-11-27 Thread Stephen Frost
Greetings,

While reviewing a bit of code around full page images, I came across a
typo and fixed it in the attach.

Sending it here in case anyone feels that we should do more than just
correct the word..?  Perhaps for non-native English speakers seeing
"whose" used here is confusing?

If I don't hear any concerns then I'll go ahead and push it some time
tomorrow; it's a pretty minor change, of course, and we can always
adjust it further if someone raises an issue later too.

Thanks!

Stephen
From fb9b0f00396179af895c13e44bff98eee8302894 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 27 Nov 2018 18:37:59 -0500
Subject: [PATCH] Fix typo

Change:

When wal_compression is enabled, a full page image which "hole" was
removed is additionally compressed using PGLZ compression algorithm.

to:

When wal_compression is enabled, a full page image whose "hole" was
removed is additionally compressed using PGLZ compression algorithm.

As the correct word to use here is "whose", not "which".
---
 src/include/access/xlogrecord.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index 863781937e..ccf92def93 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -114,7 +114,7 @@ typedef struct XLogRecordBlockHeader
  * XLOG record's CRC, either).  Hence, the amount of block data actually
  * present is BLCKSZ - the length of "hole" bytes.
  *
- * When wal_compression is enabled, a full page image which "hole" was
+ * When wal_compression is enabled, a full page image whose "hole" was
  * removed is additionally compressed using PGLZ compression algorithm.
  * This can reduce the WAL volume, but at some extra cost of CPU spent
  * on the compression during WAL logging. In this case, since the "hole"
-- 
2.17.1



signature.asc
Description: PGP signature


Re: Minor typo

2018-11-27 Thread Daniel Gustafsson
> On 28 Nov 2018, at 00:43, Stephen Frost  wrote:

> Sending it here in case anyone feels that we should do more than just
> correct the word..?  Perhaps for non-native English speakers seeing
> "whose" used here is confusing?

Being a non-native English speaker I think it’s fine and, in my own bias,
commonly understood.

+1 on raising the “does this make sense for a non-native speaker” question
though, making the documentation (regardless of if it’s docs, code comments or
READMEs) accessible is very important.

cheers ./daniel


Re: [PATCH] Tiny CREATE STATISTICS tab-completion cleanup

2018-11-27 Thread Tomas Vondra
On 11/27/18 12:55 AM, Tomas Vondra wrote:
> Hi,
> 
> On 11/26/18 5:49 PM, Dagfinn Ilmari Mannsåker wrote:
>> Hi Hackers,
>>
>> As I was hacking on the CREATE TABLE tab completions, I noticed that the
>> CREATE STATISTICS completion was checking manually for the start and end
>> of the parenthesised list instead of using the "(*)" wildcard (because
>> it predates that functionality).  Attached is a patch that updates it to
>> use the modern syntax.
>>
> 
> Makes sense. At first I was wondering why it was not modified by the
> patch introducing the "(*)" wildcard, but I see 121213d9 only cared
> about VACUUM, EXPLAIN and ANALYZE.
> 
> The patch seems fine to me, I'll get it committed.
> 

Pushed. Thanks for the patch.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Minor typo

2018-11-27 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 28 Nov 2018, at 00:43, Stephen Frost  wrote:
>> Sending it here in case anyone feels that we should do more than just
>> correct the word..?  Perhaps for non-native English speakers seeing
>> "whose" used here is confusing?

> Being a non-native English speaker I think it’s fine and, in my own bias,
> commonly understood.

I agree that "which" is just wrong here, but "whose" seems a bit malaprop
as well, and it's not the only poor English in that sentence.  Maybe
rewrite the whole sentence to avoid the problem?

  * When wal_compression is enabled and a "hole" is removed from a full
  * page image, the page image is compressed using PGLZ compression.

(BTW, is this trying to say that we don't apply compression if the page
contains no hole?  If so, why not?)

regards, tom lane



Re: Minor typo

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 07:18:39PM -0500, Tom Lane wrote:
>   * When wal_compression is enabled and a "hole" is removed from a full
>   * page image, the page image is compressed using PGLZ compression.
> 
> (BTW, is this trying to say that we don't apply compression if the page
> contains no hole?  If so, why not?)

Compression can be applied on a page with or without a hole.  What this
sentence means is that the equivalent of a double level of compression
can be applied on top of the hole being removed, if the hole can be
removed.
--
Michael


signature.asc
Description: PGP signature


Re: Minor typo

2018-11-27 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Daniel Gustafsson  writes:
> >> On 28 Nov 2018, at 00:43, Stephen Frost  wrote:
> >> Sending it here in case anyone feels that we should do more than just
> >> correct the word..?  Perhaps for non-native English speakers seeing
> >> "whose" used here is confusing?
> 
> > Being a non-native English speaker I think it’s fine and, in my own bias,
> > commonly understood.
> 
> I agree that "which" is just wrong here, but "whose" seems a bit malaprop
> as well, and it's not the only poor English in that sentence.  Maybe
> rewrite the whole sentence to avoid the problem?
> 
>   * When wal_compression is enabled and a "hole" is removed from a full
>   * page image, the page image is compressed using PGLZ compression.

Yeah, that seems a bit cleaner.

> (BTW, is this trying to say that we don't apply compression if the page
> contains no hole?  If so, why not?)

Sure seems to be saying that, and later on..

 * If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, an
 * XLogRecordBlockCompressHeader struct follows.

Yet XLogCompressBackupBlock() sure looks like it'll happily compress a
page even if it doesn't have a hole.  The replay logic certainly seems
to only check if BKPIMAGE_IS_COMPRESSED is set.

I'm thinking there's quite a bit of room for improvement here...  I
wonder if the idea originally was "the page is 'compressed', in some
way, if it either had a hole or was actually compressed"..

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Minor typo

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Nov 27, 2018 at 07:18:39PM -0500, Tom Lane wrote:
> >   * When wal_compression is enabled and a "hole" is removed from a full
> >   * page image, the page image is compressed using PGLZ compression.
> > 
> > (BTW, is this trying to say that we don't apply compression if the page
> > contains no hole?  If so, why not?)
> 
> Compression can be applied on a page with or without a hole.  What this
> sentence means is that the equivalent of a double level of compression
> can be applied on top of the hole being removed, if the hole can be
> removed.

That isn't at all what I got from that.

A rewrite of this really should avoid talking about removing the hole as
if it's 'compression' because, first of all, it isn't, and second, now
that we have *actual* compression happening, it's terribly confusing to
talk about removing the hole using that terminology.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 06:27:57PM -0500, Stephen Frost wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> Believe me or not, but we have spent so much energy on this stuff that I
>> am ready to give up on the whitelist patch and focus on other business.
> 
> I would have hoped that you'd see why I was concerned from the start
> about this now that we have a released version of pg_verify_checksums
> in 11.1 that is paper-bag-worthy.

That's unrelated.  Let's be clear, I still don't like the fact that we
don't use a whitelist approach, per the arguments raised upthread.  The
tool also clearly lacked testing and coverage from the beginning and has
never been able to work on Windows, which was a very bad decision.
Still we need to live with that and I take my share of the blame,
willing to make this stuff work better for all.

>> - skipfile should be called after making sure that we work on a file.
> 
> It's not clear to me what this is referring to, exactly...?  Can you
> clarify?

Please see 0002 attached, which moves the call to skipfile() where I
think it should go.

>> - it is necessary to exclude EXEC_BACKEND files.
> 
> Agreed, they should be added to the exclude list.

Base backups are impacted as well as this causes spurious warnings.

Attached are two patches to fix all the mess:
- 0001 is a revert of the whitelist, minus the set of regression tests
checking after corrupted files and empty files.
- 0002 is a fix for all the issues reported on this thread, with tests
added (including the tablespace test from Michael Banck):
-- Base backups gain EXEC_BACKEND files in their warning filters.
-- pg_verify_checksums gains the same files.
-- temporary files are filtered out.
-- pg_verify_checksums performs filtering checks only on regular files,
not on paths.

0001 and 0002 need to be merged as 0001 would cause the buildfarm to
turn red on Windows if applied alone.  Can you know see my point?
--
Michael
From ad07e7c4ef9c40e2fc13ff59bb51a071ab69ebbe Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 28 Nov 2018 09:32:44 +0900
Subject: [PATCH 1/2] Switch pg_verify_checksums back to a blacklist

This basically reverts commit d55241af705667d4503638e3f77d3689fd6be31,
leaving around a portion of the regression tests still adapted with
empty relation files, and corrupted cases.
---
 .../pg_verify_checksums/pg_verify_checksums.c | 77 ---
 src/bin/pg_verify_checksums/t/002_actions.pl  | 17 
 2 files changed, 17 insertions(+), 77 deletions(-)

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index f0e09bea20..1bc020ab6c 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -15,7 +15,6 @@
 
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
-#include "common/relpath.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
@@ -50,69 +49,27 @@ usage(void)
 	printf(_("Report bugs to .\n"));
 }
 
-/*
- * isRelFileName
- *
- * Check if the given file name is authorized for checksum verification.
- */
+static const char *const skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
 static bool
-isRelFileName(const char *fn)
+skipfile(const char *fn)
 {
-	int			pos;
+	const char *const *f;
 
-	/*--
-	 * Only files including data checksums are authorized for verification.
-	 * This is guessed based on the file name by reverse-engineering
-	 * GetRelationPath() so make sure to update both code paths if any
-	 * updates are done.  The following file name formats are allowed:
-	 * 
-	 * .
-	 * _
-	 * _.
-	 *
-	 * Note that temporary files, beginning with 't', are also skipped.
-	 *
-	 *--
-	 */
-
-	/* A non-empty string of digits should follow */
-	for (pos = 0; isdigit((unsigned char) fn[pos]); ++pos)
-		;
-	/* leave if no digits */
-	if (pos == 0)
-		return false;
-	/* good to go if only digits */
-	if (fn[pos] == '\0')
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
 		return true;
 
-	/* Authorized fork files can be scanned */
-	if (fn[pos] == '_')
-	{
-		int			forkchar = forkname_chars(&fn[pos + 1], NULL);
-
-		if (forkchar <= 0)
-			return false;
-
-		pos += forkchar + 1;
-	}
-
-	/* Check for an optional segment number */
-	if (fn[pos] == '.')
-	{
-		int			segchar;
-
-		for (segchar = 1; isdigit((unsigned char) fn[pos + segchar]); ++segchar)
-			;
-
-		if (segchar <= 1)
-			return false;
-		pos += segchar;
-	}
-
-	/* Now this should be the end */
-	if (fn[pos] != '\0')
-		return false;
-	return true;
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
 }
 
 static void
@@ -189,7 +146,7 @@ scan_directory(const char *basedir, const char *subdir)
 		char		fn[MAXPGPATH];
 		struct stat st;
 
-		if (!isRelFileName(de->d_name))
+		if (skipfile(de->d_name))
 			continue;
 
 		snprintf(fn, sizeof(fn), "%s/%s", path

Re: Handling of REGRESS_OPTS in MSVC for regression tests

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 05:59:34PM -0500, Andrew Dunstan wrote:
> I think you should just proceed with the changes above. I just had a quick
> look at the patch you posted before, and it looks sane enough.

Thanks for the feedback, Andrew.  Let's wait a couple of days and see if
anybody has any objection about the patch set.

> I don't see a need to backpatch.

Okay, no problem with that.
--
Michael


signature.asc
Description: PGP signature


Re: Minor typo

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 07:51:03PM -0500, Stephen Frost wrote:
> That isn't at all what I got from that.
> 
> A rewrite of this really should avoid talking about removing the hole as
> if it's 'compression' because, first of all, it isn't, and second, now
> that we have *actual* compression happening, it's terribly confusing to
> talk about removing the hole using that terminology.

You may want to be careful about other comments in xloginsert.c or such
then.  Removing a page hole is mentioned in a couple of places as a form
of compression if I recall correctly.
--
Michael


signature.asc
Description: PGP signature


Re: MERGE SQL statement for PG12

2018-11-27 Thread Amit Langote
Hi Pavan,

Thanks for continuing to work on this.

On 2018/11/27 20:18, Pavan Deolasee wrote:
> Ok. I will try that approach again. In the meanwhile, I am posting a
> rebased version. There had been quite a lot changes on partitioning side
> and that caused non-trivial conflicts. I noticed a couple of problems
> during the rebase, but I haven't attempted to address them fully yet, since
> I think a detailed review at some point might require us to change that
> anyways.
> 
> - Noticed that partColsUpdated is not set correctly in case of MERGE since
> we never get to call expand_partitioned_rtentry() for the partition table
> in case of MERGE. This right now does not cause significant problem, since
> we initialise the required states by other means. But we should fix this.

Regarding this, you may want to take a look at the following patch that
refactors things in this area.

https://commitfest.postgresql.org/20/1778/

Among other changes (such as completely redesigned inheritance_planner),
expand_partitioned_rtentry is now called after entering make_one_rel()
with the patch, which is much later than currently.  That allows us to
initialize RTEs and RelOptInfos for only the partitions that are left
after pruning.  As of now, it's called at a point in subquery_planner
where it's too soon to prune, so  expand_partitioned_rtentry ends up
building RTEs for *all* partitions.  I think that is something we're going
to have change in the long run anyway, so perhaps something you should
consider when designing related parts of MERGE.  I will try to look at
that portion of your patch maybe next month.

Thanks,
Amit




Re: logical decoding vs. VACUUM FULL / CLUSTER on table with TOAST-ed data

2018-11-27 Thread Tomas Vondra


On 11/24/18 12:20 AM, Tomas Vondra wrote:
> ...
> 
> OK, here's an updated patch, tweaking the reorderbuffer part. I plan
> to push this sometime mid next week.
> 

Pushed and backpatched to 9.4- (same as e9edc1ba).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Function to promote standby servers

2018-11-27 Thread Ian Barwick

On 11/19/2018 01:22 PM, Michael Paquier wrote:

On Fri, Oct 26, 2018 at 01:51:24PM +0900, Michael Paquier wrote:

Or we could use "the function returns true immediately after initiating
the promotion by sending the promotion signal to the postmaster"?  As a
native speaker which one feels more natural to you?


Sorry for the time it took.  After pondering on it, I have committed the
improvements from your version.


Thanks, looks good (and apologies for the delay in responding from my
side).


Regards


Ian Barwick


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Function to promote standby servers

2018-11-27 Thread Michael Paquier
On Wed, Nov 28, 2018 at 10:06:34AM +0900, Ian Barwick wrote:
> Thanks, looks good (and apologies for the delay in responding from my
> side).

Thanks for double-checking, Ian.  I took my time as well ;)

(Hope to see you face-to-face in a couple of days around Akihabara.)
--
Michael


signature.asc
Description: PGP signature


Re: logical decoding vs. VACUUM FULL / CLUSTER on table with TOAST-ed data

2018-11-27 Thread Andres Freund
Hi,

On 2018-11-28 02:04:18 +0100, Tomas Vondra wrote:
> 
> On 11/24/18 12:20 AM, Tomas Vondra wrote:
> > ...
> > 
> > OK, here's an updated patch, tweaking the reorderbuffer part. I plan
> > to push this sometime mid next week.
> > 
> 
> Pushed and backpatched to 9.4- (same as e9edc1ba).

Backpatching seems on the more aggressive end of things for an
optimization. Could you at least announce that beforehand next time?

Greetings,

Andres Freund



Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Nov 27, 2018 at 06:27:57PM -0500, Stephen Frost wrote:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> >> Believe me or not, but we have spent so much energy on this stuff that I
> >> am ready to give up on the whitelist patch and focus on other business.
> > 
> > I would have hoped that you'd see why I was concerned from the start
> > about this now that we have a released version of pg_verify_checksums
> > in 11.1 that is paper-bag-worthy.
> 
> That's unrelated.  Let's be clear, I still don't like the fact that we
> don't use a whitelist approach, per the arguments raised upthread.  The
> tool also clearly lacked testing and coverage from the beginning and has
> never been able to work on Windows, which was a very bad decision.
> Still we need to live with that and I take my share of the blame,
> willing to make this stuff work better for all.

I don't agree that it's unrelated and I'm disappointed that you feel it
is, but I'm not going to belabor the point.

> >> - skipfile should be called after making sure that we work on a file.
> > 
> > It's not clear to me what this is referring to, exactly...?  Can you
> > clarify?
> 
> Please see 0002 attached, which moves the call to skipfile() where I
> think it should go.

Alright, on a quick glance that seems ok.

> >> - it is necessary to exclude EXEC_BACKEND files.
> > 
> > Agreed, they should be added to the exclude list.
> 
> Base backups are impacted as well as this causes spurious warnings.

Right- could you please also add comments around the two lists to make
other hackers aware that the list shows up in two places and that any
changes should be made in both places..?

> Attached are two patches to fix all the mess:
> - 0001 is a revert of the whitelist, minus the set of regression tests
> checking after corrupted files and empty files.
> - 0002 is a fix for all the issues reported on this thread, with tests
> added (including the tablespace test from Michael Banck):
> -- Base backups gain EXEC_BACKEND files in their warning filters.
> -- pg_verify_checksums gains the same files.
> -- temporary files are filtered out.
> -- pg_verify_checksums performs filtering checks only on regular files,
> not on paths.
> 
> 0001 and 0002 need to be merged as 0001 would cause the buildfarm to
> turn red on Windows if applied alone.  Can you know see my point?

Yes, I think they could be merged to address that, though I'm not sure
that it's necessairly a huge deal either, if they're going to be pushed
together.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-27 Thread Michael Paquier
On Tue, Nov 27, 2018 at 08:17:12PM -0500, Stephen Frost wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> Please see 0002 attached, which moves the call to skipfile() where I
>> think it should go.
> 
> Alright, on a quick glance that seems ok.

Thanks.

>> Base backups are impacted as well as this causes spurious warnings.
> 
> Right- could you please also add comments around the two lists to make
> other hackers aware that the list shows up in two places and that any
> changes should be made in both places..?

Good point.  Added in my local branch.

>> Attached are two patches to fix all the mess:
>> - 0001 is a revert of the whitelist, minus the set of regression tests
>> checking after corrupted files and empty files.
>> - 0002 is a fix for all the issues reported on this thread, with tests
>> added (including the tablespace test from Michael Banck):
>> -- Base backups gain EXEC_BACKEND files in their warning filters.
>> -- pg_verify_checksums gains the same files.
>> -- temporary files are filtered out.
>> -- pg_verify_checksums performs filtering checks only on regular files,
>> not on paths.
>> 
>> 0001 and 0002 need to be merged as 0001 would cause the buildfarm to
>> turn red on Windows if applied alone.  Can you know see my point?
> 
> Yes, I think they could be merged to address that, though I'm not sure
> that it's necessairly a huge deal either, if they're going to be pushed
> together.

This avoids noise failures when bisecting a regression, which matters in
some cases.  To keep the history cleaner perhaps you are right and it
would be cleaner to split into two commits.

Let's wait a bit and see if others have extra opinions to offer.
--
Michael


signature.asc
Description: PGP signature


"pg_ctl: the PID file ... is empty" at end of make check

2018-11-27 Thread Thomas Munro
Hello,

Today I saw a one-off case of $SUBJECT, on macOS.  I can't reproduce
it, but I noticed exactly the same thing on longfin the other day:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-25%2005%3A39%3A04

Anyone know what that's about?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: logical decoding vs. VACUUM FULL / CLUSTER on table with TOAST-ed data

2018-11-27 Thread Petr Jelinek
Hi,

On 28/11/2018 02:14, Andres Freund wrote:
> Hi,
> 
> On 2018-11-28 02:04:18 +0100, Tomas Vondra wrote:
>>
>> On 11/24/18 12:20 AM, Tomas Vondra wrote:
>>> ...
>>>
>>> OK, here's an updated patch, tweaking the reorderbuffer part. I plan
>>> to push this sometime mid next week.
>>>
>>
>> Pushed and backpatched to 9.4- (same as e9edc1ba).
> 
> Backpatching seems on the more aggressive end of things for an
> optimization. Could you at least announce that beforehand next time?
> 

Well, it may be optimization, but from what I've seen the problems
arising from this can easily prevent logical replication from working
altogether as reorder buffer hits OOM on bigger tables. So ISTM that it
does warrant backpatch.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2018-11-27 Thread Tatsuro Yamada

Hi,

On 2018/11/26 11:05, Tatsuro Yamada wrote:

Hi,

Attached patches are following:

* tab_completion_alter_index_set_statistics.patch
     - Add column name completion after ALTER COLUMN
     - Avoid schema completion after SET STATISTICS

* fix_manual_of_alter_index.patch
     - ALTER INDEX ALTER COLUMN syntax is able to use not only
   column number but also column name. So, I fixed the manual.

* tab_completion_alter_table_set_statistics.patch
     - Avoid schema completion after SET STATISTICS


I couldn't write patches details on previous email, so I write
more explanation for that on this email.


* tab_completion_alter_index_set_statistics.patch
===
There are two problems. You can use these DDL before testing.
  #create table hoge (a integer, b integer);
  #create index ind_hoge on hoge (a, (a + b), (a * b));

  1) Can't get column names

# alter index ind_hoge alter column ... but can't complete.

  2) I expected column names for column numbers after "SET STATISTICS", but
 tab-completion gave schema names

# alter index ind_hoge alter column expr SET STATISTICS 
information_schema.  pg_catalog.  pg_temp_1.   pg_toast.
pg_toast_temp_1. public.

Applied the patch:

  1) We can get column names after "alter column".

# alter index ind_hoge alter column 
a  expr   expr1

  2) Fix!
# alter index ind_hoge alter column expr SET STATISTICS 
===


* fix_manual_of_alter_index_v2.patch (Attached on this email)
===
https://www.postgresql.org/docs/devel/sql-alterindex.html

The patch fixes the syntax on above manual.
  s/column_number/column_number \| column_name/
And also it adds an explanation for column_name.

Syntax of ALTER INDEX ALTER COLUMN SET STATISTICS on the manual is below:

  ALTER INDEX [ IF EXISTS ] name ALTER [ COLUMN ] column_number
  SET STATISTICS integer

But we can use not only column number but also column name like this:

  # alter index ind_hoge alter column 2 SET STATISTICS 100;
  ALTER INDEX
  # alter index ind_hoge alter column expr SET STATISTICS 100;
  ALTER INDEX
===


* tab_completion_alter_table_set_statistics.patch
===
For now, we can get schema name by tab-completion. It is not appropriate.

  # alter table hoge alter column a set statistics 
  information_schema.  pg_catalog.  pg_temp_1.   pg_toast.  
  pg_toast_temp_1. public.

Applied the patch:

  # alter table hoge alter column a set statistics 
===

Any feedback is welcome. :)

Thanks,
Tatsuro Yamada
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml
index 6d34dbb74e..1b5b1f7ef1 100644
--- a/doc/src/sgml/ref/alter_index.sgml
+++ b/doc/src/sgml/ref/alter_index.sgml
@@ -27,7 +27,7 @@ ALTER INDEX name ATTACH PARTITION <
 ALTER INDEX name DEPENDS ON EXTENSION extension_name
 ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter = value [, ... ] )
 ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )
-ALTER INDEX [ IF EXISTS ] name ALTER [ COLUMN ] column_number
+ALTER INDEX [ IF EXISTS ] name ALTER [ COLUMN ] column_number | column_name
 SET STATISTICS integer
 ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]
 SET TABLESPACE new_tablespace [ NOWAIT ]
@@ -137,14 +137,12 @@ ALTER INDEX ALL IN TABLESPACE name

 

-ALTER [ COLUMN ] column_number SET STATISTICS integer
+ALTER [ COLUMN ] column_number | column_name SET STATISTICS integer
 
  
   This form sets the per-column statistics-gathering target for
   subsequent  operations, though can
   be used only on index columns that are defined as an expression.
-  Since expressions lack a unique name, we refer to them using the
-  ordinal number of the index column.
   The target can be set in the range 0 to 1; alternatively, set it
   to -1 to revert to using the system default statistics
   target ().
@@ -175,6 +173,15 @@ ALTER INDEX ALL IN TABLESPACE name
   
  
 
+ 
+  column_name
+  
+   
+The name of an existing index column.
+   
+  
+ 
+
  
   column_number
   


  1   2   >