Re: partition tree inspection functions

2018-10-19 Thread Michael Paquier
On Fri, Oct 19, 2018 at 01:05:52PM +0900, Amit Langote wrote:
> As I said above, the price of removing relhassubclass might be a bit
> steep.  So, the other alternative I mentioned before is to set
> relhassubclass correctly even for indexes if only for pg_partition_tree to
> be able to use find_inheritance_children unchanged.

Playing devil's advocate a bit more...  Another alternative here would
be to remove the fast path using relhassubclass from
find_inheritance_children and instead have its callers check for it :)

Anyway, it seems that you are right here.  Just setting relhassubclass
for partitioned indexes feels more natural with what's on HEAD now.
Even if I'd like to see all those hypothetical columns in pg_class go
away, that cannot happen without a close lookup at the performance
impact.
--
Michael


signature.asc
Description: PGP signature


Re: Protect syscache from bloating with negative cache entries

2018-10-19 Thread Kyotaro HORIGUCHI
Hello. Thank you for the comment.

At Thu, 4 Oct 2018 04:27:04 +, "Ideriha, Takeshi" 
 wrote in 
<4E72940DA2BF16479384A86D54D0988A6F1BCB6F@G01JPEXMBKW04>
> >As a *PoC*, in the attached patch (which applies to current master), size of 
> >CTups are
> >counted as the catcache size.
> >
> >It also provides pg_catcache_size system view just to give a rough idea of 
> >how such
> >view looks. I'll consider more on that but do you have any opinion on this?
> >
...
> Great! I like this view.
> One of the extreme idea would be adding all the members printed by 
> CatCachePrintStats(), 
> which is only enabled with -DCATCACHE_STATS at this moment. 
> All of the members seems too much for customers who tries to change the cache 
> limit size
> But it may be some of the members are useful because for example cc_hits 
> would indicate that current 
> cache limit size is too small.

The attached introduces four features below. (But the features on
relcache and plancache are omitted).

1. syscache stats collector (in 0002)

Records syscache status consists of the same columns above and
"ageclass" information. We could somehow triggering a stats
report with signal but we don't want take/send/write the
statistics in signal handler. Instead, it is turned on by setting
track_syscache_usage_interval to a positive number in
milliseconds.

2. pg_stat_syscache view.  (in 0002)

This view shows catcache statistics. Statistics is taken only on
the backends where syscache tracking is active.

>  pid  | application_name |relname |cache_name 
> |   size   |ageclass | nentries  
> --+--++---+--+-+---
>  9984 | psql | pg_statistic   | pg_statistic_relid_att_inh_index  
> | 12676096 | {30,60,600,1200,1800,0} | {17660,17310,55870,0,0,0}

Age class is the basis of catcache truncation mechanism and shows
the distribution based on elapsed time since last access. As I
didn't came up an appropriate way, it is represented as two
arrays.  Ageclass stores maximum age for each class in
seconds. Nentries holds entry numbers correnponding to the same
element in ageclass. In the above example,

 age class  : # of entries in the cache
   up to   30s  : 17660
   up to   60s  : 17310
   up to  600s  : 55870
   up to 1200s  : 0
   up to 1800s  : 0
   more longer  : 0

 The ageclass is {0, 0.05, 0.1, 1, 2, 3}th multiples of
 cache_prune_min_age on the backend.

3. non-transactional GUC setting (in 0003)

It allows setting GUC variable set by the action
GUC_ACTION_NONXACT(the name requires condieration) survive beyond
rollback. It is required by remote guc setting to work
sanely. Without the feature a remote-set value within a trasction
will disappear involved in rollback. The only local interface for
the NONXACT action is set_config(name, value, is_local=false,
is_nonxact = true). pg_set_backend_guc() below works on this
feature.

4. pg_set_backend_guc() function.

Of course syscache statistics recording consumes significant
amount of time so it cannot be turned on usually. On the other
hand since this feature is turned on by GUC, it is needed to grab
the active client connection to turn on/off the feature(but we
cannot). Instead, I provided a means to change GUC variables in
another backend.

pg_set_backend_guc(pid, name, value) sets the GUC variable "name"
on the backend "pid" to "value".



With the above tools, we can inspect catcache statistics of
seemingly bloated process.

A. Find a bloated process pid using ps or something.

B. Turn on syscache stats on the process.
  =# select pg_set_backend_guc(9984, 'track_syscache_usage_interval', '1');

C. Examine the statitics.

=# select pid, relname, cache_name, size from pg_stat_syscache order by size 
desc limit 3;
 pid  |   relname|cache_name|   size   
--+--+--+--
 9984 | pg_statistic | pg_statistic_relid_att_inh_index | 32154112
 9984 | pg_cast  | pg_cast_source_target_index  | 4096
 9984 | pg_operator  | pg_operator_oprname_l_r_n_index  | 4096


=# select * from pg_stat_syscache where cache_name = 
'pg_statistic_relid_att_inh_index'::regclass;
-[ RECORD 1 ]-
pid | 9984
relname | pg_statistic
cache_name  | pg_statistic_relid_att_inh_index
size| 11026176
ntuples | 77950
searches| 77950
hits| 0
neg_hits| 0
ageclass| {30,60,600,1200,1800,0}
nentries| {17630,16950,43370,0,0,0}
last_update | 2018-10-17 15:58:19.738164+09


> >> Another option is that users only specify the total memory target size
> >> and postgres dynamically change each CatCache memory target size according 
> >> to a
> >certain metric.
> >> (, which still seems difficult and expensive to develop per benefit)
> >> What do you think about this?
> >
> >

Problem about partitioned table

2018-10-19 Thread Mehman Jafarov
Hi everyone,

I have a problem with partitioned table in PostgreSql.
Actually I use the version 10. I created the partitioned table in test
environment but face some problems with partitioned table constraint.
I google all about it last week and from the official site I get that
version 11 will be released and that feature will be supported as I
understand it.
>From version 11 documentation
"*Add support for PRIMARY KEY, FOREIGN KEY, indexes, and triggers on
partitioned tables*"
I install and configure yesterday as new 11 version released. And test it.
Unfortunately I didn't achieve again.
Neither I don't understand the new feature nor this case is actually not
supported.
Please help me about the problem.

In my test environment *CASE* is like that (I used the declarative
partitioning)

I have a *er_doc_to_user_relation* table before. And I partitioned this
table by list with column *state*.
I have  created two partitions as following
   *CREATE TABLE xx.er_doc_to_user_state_1_3*
* PARTITION OF xx.er_doc_to_user_relation
(oid,created_date,state,status,updated_date,branch_oid,state_update_date,user_position,*
*
fk_action_owner,fk_action_owner_org,fk_document,fk_flow,fk_org,fk_prew_doc_user_rel,fk_user,fk_workflow,fk_action_login_type)*
* FOR VALUES IN (1,3);*
* CREATE TABLE xx.er_doc_to_user_state_2_4_9*
* PARTITION OF xx.er_doc_to_user_relation
(oid,created_date,state,status,updated_date,branch_oid,state_update_date,user_position,*
*
fk_action_owner,fk_action_owner_org,fk_document,fk_flow,fk_org,fk_prew_doc_user_rel,fk_user,fk_workflow,fk_action_login_type)*
* FOR VALUES IN (2,4,9);*
After that I have created constraints and indexes for each partition
manually. Everything is OK until here.
When I try to create constraint in another table which references
*er_doc_to_user_relation* table.
Case 1: Try to create foreign key constraint reference to parent table
*er_doc_to_user_relation.*
   *   ALTER TABLE xx.er_doc_workflow_action*
*   ADD CONSTRAINT fk_doc_work_act FOREIGN KEY
(fk_to_user_doc_rel)*
*  REFERENCES xx.er_doc_to_user_relation(oid) MATCH SIMPLE*
*  ON UPDATE NO ACTION*
* ON DELETE NO ACTION;*
 Following error occurred:
  *ERROR: cannot reference partitioned table
"er_doc_to_user_relation"*
* SQL state: 42809*

 Because it is not supported so I try the second case as following.

Case 2: Try to create foreign key constraint  reference to each partitioned
table separately (*er_doc_to_user_state_1_3, er_doc_to_user_state_2_4_9*).
*  ALTER TABLE xx.er_doc_workflow_action*
*   ADD CONSTRAINT fk_doc_work_act_1_3 FOREIGN KEY
(fk_to_user_doc_rel)*
*  REFERENCES xx.er_doc_to_user_state_1_3(oid) MATCH SIMPLE*
* ON UPDATE NO ACTION*
* ON DELETE NO ACTION;*
 Following error occurred:
 * ERROR: insert or update on table
"er_doc_workflow_action" violates foreign key constraint
"fk_doc_work_act_1_3"*
* DETAIL: Key (fk_to_user_doc_rel)=(3hjbzok1mn100g) is not present in table
"er_doc_to_user_state_1_3". SQL state: 23503*

I think this error is normal because oid which is referenced is in other
partitioned table so it can't validate all data.
  If I try to create foreign key constraint on second partition again same
error will be occurred due to same reason.

  Note: I want to create constraint only one-to-one column (*fk_to_user_doc_rel
- oid*)

BIG QUESTION IS THAT

How can I solve this problem?  What is your recommendations?

*PLEASE HELP ME !!!*

-- 
*Best Regards,*
*Mehman Jafarov*
*DBA Aministrator at CyberNet LLC*


Re: Optimze usage of immutable functions as relation

2018-10-19 Thread Anthony Bykov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hello, 
I was trying to review your patch, but I couldn't install it:

prepjointree.c: In function ‘pull_up_simple_function’:
prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this 
function); did you mean ‘PGFunction’?
  Assert(!contain_vars_of_level((Node *) functions, 0));

Was it a typo?

The new status of this patch is: Waiting on Author


Re: relhassubclass and partitioned indexes

2018-10-19 Thread Amit Langote
Thanks for commenting.

On 2018/10/19 15:17, Michael Paquier wrote:
> On Fri, Oct 19, 2018 at 01:45:03AM -0400, Tom Lane wrote:
>> Amit Langote  writes:
>>> Should relhassubclass be set/reset for partitioned indexes?
>>
>> Seems like a reasonable idea to me, at least the "set" end of it.
>> We don't ever clear relhassubclass for tables, so maybe that's
>> not necessary for indexes either.

We *do* reset opportunistically during analyze of inheritance parent; the
following code in acquire_inherited_sample_rows() can reset it:

 /*
  * Check that there's at least one descendant, else fail.  This could
  * happen despite analyze_rel's relhassubclass check, if table once had a
  * child but no longer does.  In that case, we can clear the
  * relhassubclass field so as not to make the same mistake again later.
  * (This is safe because we hold ShareUpdateExclusiveLock.)
  */
 if (list_length(tableOIDs) < 2)
 {
 /* CCI because we already updated the pg_class row in this command */
 CommandCounterIncrement();
 SetRelationHasSubclass(RelationGetRelid(onerel), false);
 ereport(elevel,
 (errmsg("skipping analyze of \"%s.%s\" inheritance tree ---
this inheritance tree contains no child tables",
 get_namespace_name(RelationGetNamespace(onerel)),
 RelationGetRelationName(onerel;
 return 0;
 }


Similarly, perhaps we can make REINDEX on a partitioned index reset the
flag if it doesn't find any children.  We won't be able to do that today
though, because:

static void
ReindexPartitionedIndex(Relation parentIdx)
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("REINDEX is not yet implemented for partitioned
indexes")));
}

> No objections to the proposal.  Allowing find_inheritance_children to
> find index trees for partitioned indexes could be actually useful for
> extensions like pg_partman.

Thanks.  Attached a patch to set relhassubclass when an index partition is
added to a partitioned index.

Regards,
Amit
>From f0c01ab41b35a5f21a90b0294d8216da78eb8882 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 19 Oct 2018 17:05:00 +0900
Subject: [PATCH 1/2] Set relhassubclass on index parents

---
 src/backend/catalog/index.c|  5 
 src/backend/commands/indexcmds.c   |  4 +++
 src/test/regress/expected/indexing.out | 46 +-
 src/test/regress/sql/indexing.sql  | 11 ++--
 4 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f1ef4c319a..62cc6a5bb9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -996,8 +996,13 @@ index_create(Relation heapRelation,
 
/* update pg_inherits, if needed */
if (OidIsValid(parentIndexRelid))
+   {
StoreSingleInheritance(indexRelationId, parentIndexRelid, 1);
 
+   /* Also, set the parent's relhassubclass. */
+   SetRelationHasSubclass(parentIndexRelid, true);
+   }
+
/*
 * Register constraint and dependencies for the index.
 *
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 3975f62c00..c392352871 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2608,6 +2608,10 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
systable_endscan(scan);
relation_close(pg_inherits, RowExclusiveLock);
 
+   /* If we added an index partition to parent, set its relhassubclass. */
+   if (OidIsValid(parentOid))
+   SetRelationHasSubclass(parentOid, true);
+
if (fix_dependencies)
{
ObjectAddress partIdx;
diff --git a/src/test/regress/expected/indexing.out 
b/src/test/regress/expected/indexing.out
index 225f4e9527..710b32192f 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1,25 +1,35 @@
 -- Creating an index on a partitioned table makes the partitions
 -- automatically get the index
 create table idxpart (a int, b int, c text) partition by range (a);
+-- relhassubclass of a partitioned index is false before creating its partition
+-- it will be set below once partitions get created
+create index check_relhassubclass_of_this on idxpart (a);
+select relhassubclass from pg_class where relname = 
'check_relhassubclass_of_this';
+ relhassubclass 
+
+ f
+(1 row)
+
+drop index check_relhassubclass_of_this;
 create table idxpart1 partition of idxpart for values from (0) to (10);
 create table idxpart2 partition of idxpart for values from (10) to (100)
partition by range (b);
 create table idxpart21 partition of idxpart2 for values from (0) to (100);
 create index on idxpart (a);
-select relname, relkind, inhparent::regclass
+select relname, relkind, relhassubclass, inhparent::regclass
 from pg_class left join pg_index ix on (indexrelid = 

Re: partition tree inspection functions

2018-10-19 Thread Amit Langote
On 2018/10/19 16:47, Michael Paquier wrote:
> On Fri, Oct 19, 2018 at 01:05:52PM +0900, Amit Langote wrote:
>> As I said above, the price of removing relhassubclass might be a bit
>> steep.  So, the other alternative I mentioned before is to set
>> relhassubclass correctly even for indexes if only for pg_partition_tree to
>> be able to use find_inheritance_children unchanged.
> 
> Playing devil's advocate a bit more...  Another alternative here would
> be to remove the fast path using relhassubclass from
> find_inheritance_children and instead have its callers check for it :)

Yeah, we could make it the responsibility of the callers of
find_all_inheritors and find_inheritance_children to check relhassubclass
as an optimization and remove any reference to relhassubclass from
pg_inherits.c.  Although we can write such a patch, it seems like it'd be
bigger than the patch to ensure the correct value of relhassubclass for
indexes, which I just posted on the other thread [1].

> Anyway, it seems that you are right here.  Just setting relhassubclass
> for partitioned indexes feels more natural with what's on HEAD now.
> Even if I'd like to see all those hypothetical columns in pg_class go
> away, that cannot happen without a close lookup at the performance
> impact.

Okay, I updated the patch on this thread.

Since the updated patch depends on the correct value of relhassubclass
being set for indexes, this patch should be applied on top of the other
patch.  I've attached here both.

Another change I made is something Robert and Alvaro seem to agree about
-- to use regclass instead of oid type as input/output columns.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/85d50b48-1b59-ae6c-e984-dd0b2926be3c%40lab.ntt.co.jp
>From f0c01ab41b35a5f21a90b0294d8216da78eb8882 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 19 Oct 2018 17:05:00 +0900
Subject: [PATCH 1/2] Set relhassubclass on index parents

---
 src/backend/catalog/index.c|  5 
 src/backend/commands/indexcmds.c   |  4 +++
 src/test/regress/expected/indexing.out | 46 +-
 src/test/regress/sql/indexing.sql  | 11 ++--
 4 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f1ef4c319a..62cc6a5bb9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -996,8 +996,13 @@ index_create(Relation heapRelation,
 
/* update pg_inherits, if needed */
if (OidIsValid(parentIndexRelid))
+   {
StoreSingleInheritance(indexRelationId, parentIndexRelid, 1);
 
+   /* Also, set the parent's relhassubclass. */
+   SetRelationHasSubclass(parentIndexRelid, true);
+   }
+
/*
 * Register constraint and dependencies for the index.
 *
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 3975f62c00..c392352871 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2608,6 +2608,10 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
systable_endscan(scan);
relation_close(pg_inherits, RowExclusiveLock);
 
+   /* If we added an index partition to parent, set its relhassubclass. */
+   if (OidIsValid(parentOid))
+   SetRelationHasSubclass(parentOid, true);
+
if (fix_dependencies)
{
ObjectAddress partIdx;
diff --git a/src/test/regress/expected/indexing.out 
b/src/test/regress/expected/indexing.out
index 225f4e9527..710b32192f 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -1,25 +1,35 @@
 -- Creating an index on a partitioned table makes the partitions
 -- automatically get the index
 create table idxpart (a int, b int, c text) partition by range (a);
+-- relhassubclass of a partitioned index is false before creating its partition
+-- it will be set below once partitions get created
+create index check_relhassubclass_of_this on idxpart (a);
+select relhassubclass from pg_class where relname = 
'check_relhassubclass_of_this';
+ relhassubclass 
+
+ f
+(1 row)
+
+drop index check_relhassubclass_of_this;
 create table idxpart1 partition of idxpart for values from (0) to (10);
 create table idxpart2 partition of idxpart for values from (10) to (100)
partition by range (b);
 create table idxpart21 partition of idxpart2 for values from (0) to (100);
 create index on idxpart (a);
-select relname, relkind, inhparent::regclass
+select relname, relkind, relhassubclass, inhparent::regclass
 from pg_class left join pg_index ix on (indexrelid = oid)
left join pg_inherits on (ix.indexrelid = inhrelid)
where relname like 'idxpart%' order by relname;
- relname | relkind |   inhparent
--+-+
- idxpart | p   | 
- idxpart1| r   | 
- idxpart1_a_idx  | i   

Re: Function to promote standby servers

2018-10-19 Thread Laurenz Albe
Michael Paquier wrote:
> +   /* wait for up to a minute for promotion */
> +   for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i)
> +   {
> +   if (!RecoveryInProgress())
> +   PG_RETURN_BOOL(true);
> +
> +   pg_usleep(100L / WAITS_PER_SECOND);
> +   }
> I would recommend to avoid pg_usleep and instead use a WaitLatch() or
> similar to generate a wait event.  The wait can then also be seen in
> pg_stat_activity, which is useful for monitoring purposes.  Using
> RecoveryInProgress is indeed doable, and that's more simple than what I
> thought first.

Agreed, done.

I have introduced a new wait event, because I couldn't find one that fit.

> Something I missed to mention in the previous review: the timeout should
> be manually enforceable, with a default at 60s.

Ok, added as a new parameter "wait_seconds".

> Is the function marked as strict?  Per the code it should be, I am not
> able to test now though.

Yes, it is.

> You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in
> system_views.sql, or any users could trigger a promotion, no?

You are right *blush*.
Fixed.

Yours,
Laurenz Albe
From 08951fea7c526450d9a632ef0e6e246dd9dba307 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 19 Oct 2018 13:24:29 +0200
Subject: [PATCH] Add pg_promote() to promote standby servers

---
 doc/src/sgml/func.sgml | 21 ++
 doc/src/sgml/high-availability.sgml|  2 +-
 doc/src/sgml/recovery-config.sgml  |  3 +-
 src/backend/access/transam/xlog.c  |  6 --
 src/backend/access/transam/xlogfuncs.c | 82 ++
 src/backend/catalog/system_views.sql   |  8 +++
 src/backend/postmaster/pgstat.c|  3 +
 src/include/access/xlog.h  |  6 ++
 src/include/catalog/pg_proc.dat|  4 ++
 src/include/pgstat.h   |  3 +-
 src/test/recovery/t/004_timeline_switch.pl |  6 +-
 src/test/recovery/t/009_twophase.pl|  6 +-
 12 files changed, 138 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5193df3366..88121cdc66 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18731,6 +18731,9 @@ SELECT set_config('log_statement_stats', 'off', false);

 pg_terminate_backend

+   
+pg_promote
+   
 

 signal
@@ -18790,6 +18793,16 @@ SELECT set_config('log_statement_stats', 'off', false);
 however only superusers can terminate superuser backends.

   
+  
+   
+pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60)
+
+   boolean
+   Promote a physical standby server.  This function is restricted to
+superusers by default, but other users can be granted EXECUTE to run
+the function.
+   
+  
  
 

@@ -18827,6 +18840,14 @@ SELECT set_config('log_statement_stats', 'off', false);
 subprocess.

 
+   
+pg_promote can only be called on standby servers.
+If the argument wait is true,
+the function waits until promotion is complete or wait_seconds
+seconds have passed, otherwise the function returns immediately after sending
+the promotion signal to the postmaster.
+   
+
   
 
   
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index ebcb3daaed..f8e036965c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1472,7 +1472,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 

 To trigger failover of a log-shipping standby server,
-run pg_ctl promote or create a trigger
+run pg_ctl promote, call pg_promote(), or create a trigger
 file with the file name and path specified by the trigger_file
 setting in recovery.conf. If you're planning to use
 pg_ctl promote to fail over, trigger_file is
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 92825fdf19..d06cd0b08e 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -439,7 +439,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  
   Specifies a trigger file whose presence ends recovery in the
   standby.  Even if this value is not set, you can still promote
-  the standby using pg_ctl promote.
+  the standby using pg_ctl promote or calling
+  pg_promote().
   This setting has no effect if standby_mode is off.
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7375a78ffc..62fc418893 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -78,12 +78,6 @@
 
 extern uint32 bootstrap_data_checksum_version;
 
-/* File path names (all relative to $PGDATA) */
-#define RECOVERY_COMMAND_FILE	"recovery.conf"
-#define RECOVERY_COMMAND_DONE	"recovery.done"
-#d

Speeding up text_position_next with multibyte encodings

2018-10-19 Thread Heikki Linnakangas
Attached is a patch to speed up text_position_setup/next(), in some 
common cases with multibyte encodings.


text_position_next() uses the Boyer-Moore-Horspool search algorithm, 
with a skip table. Currently, with a multi-byte encoding, we first 
convert both input strings to arrays of wchars, and run the algorithm on 
those arrays.


This patch removes the mb->wchar conversion, and instead runs the 
single-byte version of the algorithm directly on the inputs, even with 
multi-byte encodings. That avoids the mb->wchar conversion altogether, 
when there is no match. Even when there is a match, we don't need to 
convert the whole input string to wchars. It's enough to count the 
characters up to the match, using pg_mblen(). And when 
text_position_setup/next() are used as part of split_part() or replace() 
functions, we're not actually even interested in the character position, 
so we can skip that too.


Avoiding the large allocation is nice, too. That was actually why I 
stated to look into this: a customer complained that text_position fails 
with strings larger than 256 MB.


Using the byte-oriented search on multibyte strings might return false 
positives, though. To make that work, after finding a match, we verify 
that the match doesn't fall in the middle of a multi-byte character. 
However, as an important special case, that cannot happen with UTF-8, 
because it has the property that the byte sequence of one character 
never appears as part of another character. I think some other encodings 
might have the same property, but I wasn't sure.


This is a win in most cases. One case is slower: calling position() with 
a large haystack input, where the match is near the end of the string. 
Counting the characters up to that point is slower than the mb->wchar 
conversion. I think we could avoid that regression too, if we had a more 
optimized function to count characters. Currently, the code calls 
pg_mblen() in a loop, whereas in pg_mb2wchar_with_len(), the similar 
loop through all the characters is a tight loop within the function.


Thoughts?

- Heikki
>From 1efb5ace7cf9da63f300942f15f9da2fddfb4de5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 19 Oct 2018 15:12:39 +0300
Subject: [PATCH 1/1] Use single-byte Boyer-Moore-Horspool search even with
 multibyte encodings.

The old implementation first converted the input strings to arrays of
wchars, and performed the on those. However, the conversion is expensive,
and for a large input string, consumes a lot of memory.

Avoid the conversion, and instead use the single-byte algorithm even with
multibyte encodings. That has a couple of problems, though. Firstly, we
might get fooled if the needle string's byte sequence appears embedded
inside a different string. We might incorrectly return a match in the
middle of a multi-byte character. The second problem is that the
text_position function needs to return the position of the match, counted
in characters, rather than bytes. We can work around both of those problems
by an extra step after finding a match. Walk the string one character at a
time, starting from the beginning, until we reach the position of the match.
We can then check if the match was found at a valid character boundary,
which solves the first problem, and we can count the characters, so that we
can return the character offset. Walking the characters is faster than the
wchar conversion, especially in the case that there are no matches and we
can avoid it altogether. Also, avoiding the large allocation allows these
functions to work with inputs larger than 256 MB, even with multibyte
encodings.

For UTF-8, we can even skip the step to verify that the match lands on a
character boundary, because in UTF-8, the byte sequence of one character
cannot appear in the middle of a different character. I think many other
encodings have the same property, but I wasn't sure, so I only enabled
that optimization for UTF-8.

Most of the callers of the text_position_setup/next functions were actually
not interested in the position of the match, counted in characters. To
cater for them, refactor the text_position_next() interface into two parts:
searching for the next match (text_position_next()), and returning the
current match's position as a pointer (text_position_get_match_ptr()) or
as a character offset (text_position_get_match_pos()). Most callers of
text_position_setup/next are not interested in the character offsets, so
getting the pointer to the match within the original string is a more
convenient API for them. It also allows skipping the character counting
with UTF-8 encoding altogether.
---
 src/backend/utils/adt/varlena.c | 475 +---
 1 file changed, 253 insertions(+), 222 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index a5e812d026..0c8c8b9a4f 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -43,18 +43,33 @@ int			byt

ERROR's turning FATAL in BRIN regression tests

2018-10-19 Thread John Naylor
Hi all,

I ran into a surprising behavior while hacking on the FSM delay patch.

I changed the signature of a freespace.c function that the BRIN code
calls, and this change by itself doesn't cause a crash. With the full
FSM patch, causing BRIN errors in manual queries in psql doesn't cause
a crash. However, during the BRIN regression tests, the queries that
purposely cause errors result in FATAL instead, causing a crash.

For example, brin_summarize_new_values() eventually leads to calling
index_open():

ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("\"%s\" is not an index",
RelationGetRelationName(r;


But the elevel is 21 (FATAL) in this partial stack trace:

#0  0x5623aff8aed6 in proc_exit_prepare (code=1) at
/home/john/pgdev/postgresql/src/backend/storage/ipc/ipc.c:209
__func__ = "proc_exit_prepare"
#1  0x5623aff8adbb in proc_exit (code=1) at
/home/john/pgdev/postgresql/src/backend/storage/ipc/ipc.c:107
__func__ = "proc_exit"
#2  0x5623b01435c3 in errfinish (dummy=0) at
/home/john/pgdev/postgresql/src/backend/utils/error/elog.c:540
edata = 0x5623b06f63c0 
elevel = 21
oldcontext = 0x5623b2121b10
econtext = 0x0
__func__ = "errfinish"
#3  0x5623afbb9ab1 in index_open (relationId=52389, lockmode=4) at
/home/john/pgdev/postgresql/src/backend/access/index/indexam.c:159
__errno_location = 
r = 0x7f03773e1750
__func__ = "index_open"


If I comment out the tests that purposely cause errors, the BRIN tests
fail normally, but then another test in the same parallel group
crashes, causing all later tests to fail.

--- 1 
! psql: FATAL:  the database system is in recovery mode

and

*** 65,67 
--- 65,74 
  -- Modify fillfactor in existing index
  alter index spgist_point_idx set (fillfactor = 90);
  reindex index spgist_point_idx;
+ WARNING:  terminating connection because of crash of another server process
+ DETAIL:  The postmaster has commanded this server process to roll
back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
+ HINT:  In a moment you should be able to reconnect to the database
and repeat your command.
+ server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+ connection to server was lost


I'm thoroughly stumped -- anyone have an idea where to look next?

Thanks,
-John Naylor



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Paquier
On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote:
> Fine by me.

Thanks.  This is now committed after some tweaks to the comments, a bit
earlier than I thought first.
--
Michael


signature.asc
Description: PGP signature


Re: lowering pg_regress privileges on Windows

2018-10-19 Thread Andrew Dunstan



On 10/18/2018 08:25 PM, Thomas Munro wrote:

On Fri, Oct 19, 2018 at 1:13 PM Michael Paquier  wrote:

On Thu, Oct 18, 2018 at 08:31:11AM -0400, Andrew Dunstan wrote:

The attached ridiculously tiny patch solves the problem whereby while we can
run Postgres on Windows safely from an Administrator account, we can't run
run the regression tests from the same account, since it fails on the
tablespace test, the tablespace directory having been set up without first
having lowered privileges. The solution is to lower pg_regress' privileges
in the same way that we do with other binaries. This is useful in setups
like Appveyor where running under any other account is ... difficult. For
the cfbot Thomas has had to make the script hack the schedule file to omit
the tablespace test. This would make that redundant.

I propose to backpatch this. It's close enough to a bug and the risk is
almost infinitely small.

+1.  get_restricted_token() refactoring has been done down to
REL9_5_STABLE.  With 9.4 and older you would need to copy again this
full routine into pg_regress.c, which is in my opinion not worth
worrying about.

FWIW here is a successful Appveyor build including the full test
schedule (CI patch attached in case anyone is interested).  Woohoo!
Thanks for figuring that out Andrew.  I will be very happy to remove
that wart from my workflows.

https://ci.appveyor.com/project/macdice/postgres/builds/19626669



Excellent. I'll apply back to 9.5 as Michael suggests.

Having got past that hurdle I encountered another one in the same area. 
pg_upgrade gives up its privileges and is then unable to write things 
like log files and analyze scripts.


The attached patch cures the problem, but it doesn't seem like the best 
cure. Maybe there is a more secure way to do it. Essentially it saves 
out the ACLS for the current directory and its subdirectories and then 
allows everyone to write to them, right before running pg_upgrade. When 
pg_upgrade is done it restores the saved ACLs.


Maybe someone who understands more about how this all works can suggest 
a less blunt force approach.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index ce5c976c16..0f25b44d0a 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -569,11 +569,14 @@ sub upgradecheck
 	$ENV{PGDATA} = "$data";
 	print "\nSetting up new cluster\n\n";
 	standard_initdb() or exit 1;
+	system('icacls . /save savedacls /t');
+	system('icacls . /grant "*S-1-1-0":(OI)(CI)M');
 	print "\nRunning pg_upgrade\n\n";
 	@args = (
 		'pg_upgrade', '-d', "$data.old", '-D', $data, '-b',
 		$bindir,  '-B', $bindir);
 	system(@args) == 0 or exit 1;
+	system('icacls . /restore savedacls');
 	print "\nStarting new cluster\n\n";
 	@args = ('pg_ctl', '-l', "$logdir/postmaster2.log", 'start');
 	system(@args) == 0 or exit 1;


WAL archive (archive_mode = always) ?

2018-10-19 Thread Adelino Silva
Hi,

What is the advantage to use archive_mode = always in a slave server
compared to archive_mode = on (shared WAL archive) ?
I only see duplication of Wal files, what is the purpose of this feature ?

Many thanks in advance,
Adelino.


Re: removing unnecessary get_att*() lsyscache functions

2018-10-19 Thread Michael Paquier
On Thu, Oct 18, 2018 at 09:57:00PM +0200, Peter Eisentraut wrote:
> I noticed that get_attidentity() isn't really necessary because the
> information can be obtained from an existing tuple descriptor in each
> case.

This one is also recent, so it looks fine to remove it.

> Also, get_atttypmod() hasn't been used since 2004.

github is not actually reporting areas where this is used.

> I propose the attached patches to remove these two functions.

> - if (get_attidentity(RelationGetRelid(rel), attnum))
> + if (TupleDescAttr(RelationGetDescr(rel), attnum - 1)->attidentity)

I find this style heavy, saving Form_pg_attribute into a different
variable would be more readable in my opinion..
--
Michael


signature.asc
Description: PGP signature


Re: WAL archive (archive_mode = always) ?

2018-10-19 Thread Michael Paquier
On Fri, Oct 19, 2018 at 03:00:15PM +0100, Adelino Silva wrote:
> What is the advantage to use archive_mode = always in a slave server
> compared to archive_mode = on (shared WAL archive) ?
> I only see duplication of Wal files, what is the purpose of this
> feature?

Some users like having redundancy in their backups and archives, so as
all things are on multiple location.  archive_mode = always helps in
leveraging these needs.
--
Michael


signature.asc
Description: PGP signature


Re: ERROR's turning FATAL in BRIN regression tests

2018-10-19 Thread Tom Lane
John Naylor  writes:
> I changed the signature of a freespace.c function that the BRIN code
> calls, and this change by itself doesn't cause a crash. With the full
> FSM patch, causing BRIN errors in manual queries in psql doesn't cause
> a crash. However, during the BRIN regression tests, the queries that
> purposely cause errors result in FATAL instead, causing a crash.

Sounds like something's messing up the backend's exception stack,
so that elog.c has noplace to throw the error to.  See the
promote-ERROR-to-FATAL logic therein.

regards, tom lane



Re: Problem about partitioned table

2018-10-19 Thread Adrian Klaver

On 10/19/18 2:03 AM, Mehman Jafarov wrote:

Hi everyone,

I have a problem with partitioned table in PostgreSql.
Actually I use the version 10. I created the partitioned table in test 
environment but face some problems with partitioned table constraint.
I google all about it last week and from the official site I get that 
version 11 will be released and that feature will be supported as I 
understand it.

 From version 11 documentation
"/Add support for |PRIMARY KEY|, |FOREIGN KEY|, indexes, and triggers on 
partitioned tables/"
I install and configure yesterday as new 11 version released. And test 
it. Unfortunately I didn't achieve again.
Neither I don't understand the new feature nor this case is actually not 
supported.

Please help me about the problem.


As you found out:

https://www.postgresql.org/docs/11/static/ddl-partitioning.html

5.10.2.3. Limitations

"While primary keys are supported on partitioned tables, foreign keys 
referencing partitioned tables are not supported. (Foreign key 
references from a partitioned table to some other table are supported.)"





   Note: I want to create constraint only one-to-one column 
(/fk_to_user_doc_rel - oid/)


BIG QUESTION IS THAT

How can I solve this problem?  What is your recommendations?


Well a FK is a form of a trigger, so maybe create your own trigger on 
the child table(s).




*PLEASE HELP ME !!!*

--
*/Best Regards,/*
*/Mehman Jafarov/*
*/DBA Aministrator at CyberNet LLC/*
*/
/*



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote:
> > Fine by me.
> 
> Thanks.  This is now committed after some tweaks to the comments, a bit
> earlier than I thought first.

I just saw this committed and I'm trying to figure out why we are
creating yet-another-list when it comes to deciding what should be
checksum'd and what shouldn't be.

Specifically, pg_basebackup (or, really,
src/backend/replication/basebackup.c) has 'is_checksummed_file' which
operates differently from pg_verify_checksum with this change, and that
seems rather wrong.

Maybe we need to fix both places but I *really* don't like this approach
of "well, we'll just guess if the file should have a checksum or not"
and it certainly seems likely that we'll end up forgetting to update
this when we introduce things in the future which have checksums (which
seems pretty darn likely to happen).

I also categorically disagree with the notion that it's ok for
extensions to dump files into our tablespace diretories or that we
should try to work around random code dumping extra files in the
directories which we maintain- it's not like this additional code will
actually protect us from that, after all, and it's foolish to think we
really could protect against that.

Basically, I think this commit should be reverted, we should go back to
having a blacklist, update it in both places (and add comments to both
places to make it clear that this list exists in two different places)
and add code to handle the temp tablespace case explicitly.

Even better would be to find a way to expose the list and the code for
walking the directories and identifying the files which contain
checksums, so that we're only doing that in one place instead of
multiple different places.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Banck
Hi,

Am Freitag, den 19.10.2018, 10:36 -0400 schrieb Stephen Frost:
> Greetings,
> 
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote:
> > > Fine by me.
> > 
> > Thanks.  This is now committed after some tweaks to the comments, a bit
> > earlier than I thought first.
> 
> I just saw this committed and I'm trying to figure out why we are
> creating yet-another-list when it comes to deciding what should be
> checksum'd and what shouldn't be.
> 
> Specifically, pg_basebackup (or, really,
> src/backend/replication/basebackup.c) has 'is_checksummed_file' which
> operates differently from pg_verify_checksum with this change, and that
> seems rather wrong.

To be fair, the list in src/backend/replication/basebackup.c was a copy-
paste from the one in pg_verify_checksums (or from other parts of the 
online activation patch).

I agree it makes sense to have both in sync or, better yet, factored out
in a central place, but I don't currently have further opinions on
whether it should be a black- or whitelist.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Tom Lane
Michael Banck  writes:
> Am Freitag, den 19.10.2018, 10:36 -0400 schrieb Stephen Frost:
>> I just saw this committed and I'm trying to figure out why we are
>> creating yet-another-list when it comes to deciding what should be
>> checksum'd and what shouldn't be.

> To be fair, the list in src/backend/replication/basebackup.c was a copy-
> paste from the one in pg_verify_checksums (or from other parts of the 
> online activation patch).

Seems like a job for a new(?) module in src/common/.

regards, tom lane



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Michael Banck  writes:
> > Am Freitag, den 19.10.2018, 10:36 -0400 schrieb Stephen Frost:
> >> I just saw this committed and I'm trying to figure out why we are
> >> creating yet-another-list when it comes to deciding what should be
> >> checksum'd and what shouldn't be.
> 
> > To be fair, the list in src/backend/replication/basebackup.c was a copy-
> > paste from the one in pg_verify_checksums (or from other parts of the 
> > online activation patch).
> 
> Seems like a job for a new(?) module in src/common/.

Yeah, that would be nice...  Even nicer would be a way for non-core PG
tools to be able to use it (we have basically the same thing in
pgbackrest, unsurprisingly), though I realize that's a much larger step.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Seems like a job for a new(?) module in src/common/.

> Yeah, that would be nice...  Even nicer would be a way for non-core PG
> tools to be able to use it (we have basically the same thing in
> pgbackrest, unsurprisingly), though I realize that's a much larger step.

We install libpgcommon.a (and, as of HEAD, libpgcommon_shlib.a) for
precisely that sort of reason.

regards, tom lane



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Seems like a job for a new(?) module in src/common/.
> 
> > Yeah, that would be nice...  Even nicer would be a way for non-core PG
> > tools to be able to use it (we have basically the same thing in
> > pgbackrest, unsurprisingly), though I realize that's a much larger step.
> 
> We install libpgcommon.a (and, as of HEAD, libpgcommon_shlib.a) for
> precisely that sort of reason.

Hmm, yeah, we might be able to use that for this..  One challenge we've
been thinking about has been dealing with multiple major versions of PG
with one build of pgbackrest since we'd really like to do things like
inspect the control file, but that changes between versions.  We do have
multi-version code in some places (quite a bit in pg_dump..) so perhaps
we could have libpgcommon support also.

Probably a topic for another thread.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-19 10:36:59 -0400, Stephen Frost wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote:
> > > Fine by me.
> > 
> > Thanks.  This is now committed after some tweaks to the comments, a bit
> > earlier than I thought first.
> 
> I just saw this committed and I'm trying to figure out why we are
> creating yet-another-list when it comes to deciding what should be
> checksum'd and what shouldn't be.

I'm not sure it's fair to blame Michael here. He's picking up slack,
because the original authors of the tool didn't even bother to reply to
the issue for quite a while. pg_verify_checksum was obviously never
tested on windows, it just didn't have tests showing that.


> I also categorically disagree with the notion that it's ok for
> extensions to dump files into our tablespace diretories or that we
> should try to work around random code dumping extra files in the
> directories which we maintain- it's not like this additional code will
> actually protect us from that, after all, and it's foolish to think we
> really could protect against that.

I'm unconvinced. There already are extensions doing so, and it's not
like we've given them any sort of reasonable alternative. You can't just
create relations in a "registered" / "supported" kinda way right now, so
uh, yea, extension gotta make do.  And that has worked for many years.

Also, as made obvious here, it's pretty clear that there's platform
differences about which files exists and which don't, so it's not that a
blacklist approach automatically is more maintainable.

And I fail to see why a blacklist is architecturally better. There's
plenty cases where we might want to create temporary files,
non-checksummed data or such that we'd need a teach a blacklist about,
but there's far fewer cases where add new checksummed files. Basically
never since checksums have been introduced. And if checksums were
introduced for something new, say slrus, we'd ceertainly use
pg_verify_checksum during development.

Greetings,

Andres Freund



[Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-10-19 Thread Alexey Kondratov

Hi hackers,

Currently Postgres has options for continuous WAL files archiving, which 
is quite often used along with master-replica setup. OK, then the worst 
is happened and it's time to get your old master back and synchronize it 
with new master (ex-replica) with pg_rewind. However, required WAL files 
may be already archived and pg_rewind will fail. You can copy these 
files manually, but it is difficult to calculate, which ones you need. 
Anyway, it complicates building failover system with automatic failure 
recovery.


I expect, that it will be a good idea to allow pg_rewind to look for a 
restore_command in the target data directory recovery.conf or pass it is 
as a command line argument. Then pg_rewind can use it to get missing WAL 
files from the archive. I had a few talks with DBAs and came to 
conclusion, that this is a highly requested feature.


I prepared a proof of concept patch (please, find attached), which does 
exactly what I described above. I played with it a little and it seems 
to be working, tests were accordingly updated to verify this archive 
retrieval functionality too.


Patch is relatively simple excepting the one part: if we want to parse 
recovery.conf (with all possible includes, etc.) and get 
restore_command, then we should use guc-file.l parser, which is heavily 
linked to backend, e.g. in error reporting part. So I copied it and made 
frontend-safe version guc-file-fe.l. Personally, I don't think it's a 
good idea, but nothing else came to mind. It is also possible to leave 
the only one option -- passing restore_command as command line argument.


What do you think?


--

Alexey Kondratov

Postgres Professional: https://www.postgrespro.com

Russian Postgres Company

diff --combined src/bin/pg_rewind/Makefile
index a22fef1352,2bcfcc61af..00
--- a/src/bin/pg_rewind/Makefile
+++ b/src/bin/pg_rewind/Makefile
@@@ -20,7 -20,6 +20,7 @@@ LDFLAGS_INTERNAL += $(libpq_pgport
  
  OBJS	= pg_rewind.o parsexlog.o xlogreader.o datapagemap.o timeline.o \
  	fetch.o file_ops.o copy_fetch.o libpq_fetch.o filemap.o logging.o \
 +	guc-file-fe.o \
  	$(WIN32RES)
  
  EXTRA_CLEAN = xlogreader.c
diff --combined src/bin/pg_rewind/RewindTest.pm
index 8dc39dbc05,1dce56d035..00
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@@@ -40,7 -40,6 +40,7 @@@ use Config
  use Exporter 'import';
  use File::Copy;
  use File::Path qw(rmtree);
 +use File::Glob;
  use IPC::Run qw(run);
  use PostgresNode;
  use TestLib;
@@@ -249,41 -248,6 +249,41 @@@ sub run_pg_rewin
  "--no-sync"
  			],
  			'pg_rewind remote');
 +	}
 +	elsif ($test_mode eq "archive")
 +	{
 +
 +		# Do rewind using a local pgdata as source and
 +		# specified directory with target WALs archive.
 +		my $wals_archive_dir = "${TestLib::tmp_check}/master_wals_archive";
 +		my $test_master_datadir = $node_master->data_dir;
 +		my @wal_files = glob "$test_master_datadir/pg_wal/000*";
 +		my $restore_command;
 +
 +		rmtree($wals_archive_dir);
 +		mkdir($wals_archive_dir) or die;
 +
 +		# Move all old master WAL files to the archive.
 +		# Old master should be stopped at this point.
 +		foreach my $wal_file (@wal_files)
 +		{
 +			move($wal_file, "$wals_archive_dir/") or die;
 +		}
 +
 +		$restore_command = "cp $wals_archive_dir/\%f \%p";
 +
 +		# Stop the new master and be ready to perform the rewind.
 +		$node_standby->stop;
 +		command_ok(
 +			[
 +'pg_rewind',
 +"--debug",
 +"--source-pgdata=$standby_pgdata",
 +"--target-pgdata=$master_pgdata",
 +"--no-sync",
 +"-R", $restore_command
 +			],
 +			'pg_rewind archive');
  	}
  	else
  	{
diff --combined src/bin/pg_rewind/parsexlog.c
index 11a9c26cd2,40028471bf..00
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@@ -12,7 -12,6 +12,7 @@@
  #include "postgres_fe.h"
  
  #include 
 +#include 
  
  #include "pg_rewind.h"
  #include "filemap.h"
@@@ -46,10 -45,7 +46,10 @@@ static char xlogfpath[MAXPGPATH]
  typedef struct XLogPageReadPrivate
  {
  	const char *datadir;
 +	const char *restoreCommand;
  	int			tliIndex;
 +	XLogRecPtr  oldrecptr;
 +	TimeLineID  oldtli;
  } XLogPageReadPrivate;
  
  static int SimpleXLogPageRead(XLogReaderState *xlogreader,
@@@ -57,10 -53,6 +57,10 @@@
     int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
     TimeLineID *pageTLI);
  
 +static bool RestoreArchivedWAL(const char *path, const char *xlogfname, 
 +	off_t expectedSize, const char *restoreCommand,
 +	const char *lastRestartPointFname);
 +
  /*
   * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline
   * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of
@@@ -68,19 -60,15 +68,19 @@@
   */
  void
  extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex,
 -			   XLogRecPtr endpoint)
 +			   ControlFileData *targetCF, const char *restore_command)
  {
  	XLogRecord *record;
 +	XLogRecPtr endpoint = targetCF->checkPoint;
  	XLog

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-10-19 10:36:59 -0400, Stephen Frost wrote:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> > > On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote:
> > > > Fine by me.
> > > 
> > > Thanks.  This is now committed after some tweaks to the comments, a bit
> > > earlier than I thought first.
> > 
> > I just saw this committed and I'm trying to figure out why we are
> > creating yet-another-list when it comes to deciding what should be
> > checksum'd and what shouldn't be.
> 
> I'm not sure it's fair to blame Michael here. He's picking up slack,
> because the original authors of the tool didn't even bother to reply to
> the issue for quite a while. pg_verify_checksum was obviously never
> tested on windows, it just didn't have tests showing that.

I wasn't really going for 'blame' here, just that we've now got two
different methods for doing the same thing- and those will give
different answers; presumably this issue impacts pg_basebackup too,
since the logic there wasn't changed.

> > I also categorically disagree with the notion that it's ok for
> > extensions to dump files into our tablespace diretories or that we
> > should try to work around random code dumping extra files in the
> > directories which we maintain- it's not like this additional code will
> > actually protect us from that, after all, and it's foolish to think we
> > really could protect against that.
> 
> I'm unconvinced. There already are extensions doing so, and it's not
> like we've given them any sort of reasonable alternative. You can't just
> create relations in a "registered" / "supported" kinda way right now, so
> uh, yea, extension gotta make do.  And that has worked for many years.

I don't agree that any of this is an argument for accepting random code
writing into arbitrary places in the data directory or tablespace
directories as being something which we support or which we write code
to work-around.

If this is a capability that extension authors need then I'm all for
having a way for them to have that capability in a clearly defined and
documented way- and one which we could actually write code to handle and
work with.

> Also, as made obvious here, it's pretty clear that there's platform
> differences about which files exists and which don't, so it's not that a
> blacklist approach automatically is more maintainable.

Platform differences are certainly something we can manage and we have
code in a few different places for doing exactly that.  A platform
difference is a heck of a lot more reasonable than trying to guess at
what random code that we've never seen before has done and try to work
around that.

> And I fail to see why a blacklist is architecturally better. There's
> plenty cases where we might want to create temporary files,
> non-checksummed data or such that we'd need a teach a blacklist about,
> but there's far fewer cases where add new checksummed files. Basically
> never since checksums have been introduced. And if checksums were
> introduced for something new, say slrus, we'd ceertainly use
> pg_verify_checksum during development.

In cases where we need something temporary, we're going to need to be
prepared to clean those up and we had better make it very plain that
they're temporary and easy to write code for.  Anything we aren't
prepared to blow away on a crash-restart should be checksum'd and in an
ideal world, we'd be looking to reduce the blacklist to only those
things which are temporary.  Of course, we may need different code for
calculating the checksum of different types of files, which moves it
from really being a 'blacklist' to a list of file-types and their
associated checksum'ing mechansim.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-19 13:52:28 -0400, Stephen Frost wrote:
> > > I also categorically disagree with the notion that it's ok for
> > > extensions to dump files into our tablespace diretories or that we
> > > should try to work around random code dumping extra files in the
> > > directories which we maintain- it's not like this additional code will
> > > actually protect us from that, after all, and it's foolish to think we
> > > really could protect against that.
> > 
> > I'm unconvinced. There already are extensions doing so, and it's not
> > like we've given them any sort of reasonable alternative. You can't just
> > create relations in a "registered" / "supported" kinda way right now, so
> > uh, yea, extension gotta make do.  And that has worked for many years.
> 
> I don't agree that any of this is an argument for accepting random code
> writing into arbitrary places in the data directory or tablespace
> directories as being something which we support or which we write code
> to work-around.
> 
> If this is a capability that extension authors need then I'm all for
> having a way for them to have that capability in a clearly defined and
> documented way- and one which we could actually write code to handle and
> work with.

cstore e.g. does this, and it's already out there. So yes, we should
provide better infrastructure. But for now, we gotta deal with reality,
unless we just want to gratuituously break things.


> > And I fail to see why a blacklist is architecturally better. There's
> > plenty cases where we might want to create temporary files,
> > non-checksummed data or such that we'd need a teach a blacklist about,
> > but there's far fewer cases where add new checksummed files. Basically
> > never since checksums have been introduced. And if checksums were
> > introduced for something new, say slrus, we'd ceertainly use
> > pg_verify_checksum during development.
> 
> In cases where we need something temporary, we're going to need to be
> prepared to clean those up and we had better make it very plain that
> they're temporary and easy to write code for.  Anything we aren't
> prepared to blow away on a crash-restart should be checksum'd and in an
> ideal world, we'd be looking to reduce the blacklist to only those
> things which are temporary.

There's pending patches that add support for pg_verify_checksums running
against a running cluster. We'll not just need to recognize files that
are there after a graceful shutdown, but with anything that a cluster
can have there while running.


> Of course, we may need different code for
> calculating the checksum of different types of files, which moves it
> from really being a 'blacklist' to a list of file-types and their
> associated checksum'ing mechansim.

Which pretty much is a whitelist.  Given that we need a
filetype->checksumtype mapping or something, how is a blacklist a
reasonable approach for this?

Greetings,

Andres Freund



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-10-19 13:52:28 -0400, Stephen Frost wrote:
> > > > I also categorically disagree with the notion that it's ok for
> > > > extensions to dump files into our tablespace diretories or that we
> > > > should try to work around random code dumping extra files in the
> > > > directories which we maintain- it's not like this additional code will
> > > > actually protect us from that, after all, and it's foolish to think we
> > > > really could protect against that.
> > > 
> > > I'm unconvinced. There already are extensions doing so, and it's not
> > > like we've given them any sort of reasonable alternative. You can't just
> > > create relations in a "registered" / "supported" kinda way right now, so
> > > uh, yea, extension gotta make do.  And that has worked for many years.
> > 
> > I don't agree that any of this is an argument for accepting random code
> > writing into arbitrary places in the data directory or tablespace
> > directories as being something which we support or which we write code
> > to work-around.
> > 
> > If this is a capability that extension authors need then I'm all for
> > having a way for them to have that capability in a clearly defined and
> > documented way- and one which we could actually write code to handle and
> > work with.
> 
> cstore e.g. does this, and it's already out there. So yes, we should
> provide better infrastructure. But for now, we gotta deal with reality,
> unless we just want to gratuituously break things.

No, I don't agree that we are beholden to an external extension that
decided to start writing into directories that clearly belong to PG.

How do we even know that what cstore does in the tablespace directory
wouldn't get caught up in the checksum file pattern-matching that this
commit put in?  What if there was an extension which did create files
that would match, what would we do then?  I'm happy to go create one, if
that'd help make the point that this "pattern whitelist" isn't actually
a solution but is really rather just a hack that'll break in the future.

> > > And I fail to see why a blacklist is architecturally better. There's
> > > plenty cases where we might want to create temporary files,
> > > non-checksummed data or such that we'd need a teach a blacklist about,
> > > but there's far fewer cases where add new checksummed files. Basically
> > > never since checksums have been introduced. And if checksums were
> > > introduced for something new, say slrus, we'd ceertainly use
> > > pg_verify_checksum during development.
> > 
> > In cases where we need something temporary, we're going to need to be
> > prepared to clean those up and we had better make it very plain that
> > they're temporary and easy to write code for.  Anything we aren't
> > prepared to blow away on a crash-restart should be checksum'd and in an
> > ideal world, we'd be looking to reduce the blacklist to only those
> > things which are temporary.
> 
> There's pending patches that add support for pg_verify_checksums running
> against a running cluster. We'll not just need to recognize files that
> are there after a graceful shutdown, but with anything that a cluster
> can have there while running.

Of course- the same is true with a crash/restart case, so I'm not sure
what you're getting at here.  I wasn't ever thinking of only the
graceful shutdown case- certainly pg_basebackup operates when the
cluster is running also and that's more what I was thinking about from
the start and which is part of what started the discussion about this
commit as that was entirely ignored in this change.

> > Of course, we may need different code for
> > calculating the checksum of different types of files, which moves it
> > from really being a 'blacklist' to a list of file-types and their
> > associated checksum'ing mechansim.
> 
> Which pretty much is a whitelist.  Given that we need a
> filetype->checksumtype mapping or something, how is a blacklist a
> reasonable approach for this?

We only need the mapping for special files- call that list of special
files a whitelist or a blacklist or a bluelist, it doesn't matter,
what's relevant here is that we don't skip over anything- we account
for everything and make sure that everything that would survive a
crash/restart is checked or at least accounted for if we can't, yet,
check it.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread J Chapman Flack
On 10/19/2018 01:17 PM, Andres Freund wrote:
> On 2018-10-19 10:36:59 -0400, Stephen Frost wrote:
>
>> I also categorically disagree with the notion that it's ok for
>> extensions to dump files into our tablespace diretories 
> 
> I'm unconvinced. There already are extensions doing so, and it's not
> like we've given them any sort of reasonable alternative. You can't just
> create relations in a "registered" / "supported" kinda way right now, so
> uh, yea, extension gotta make do.  And that has worked for many years.

For some of us following along at home, this got interesting right here.
Could someone elaborate on what extensions do this, for what purpose?
And what would it mean to create relations in a "registered" /
"supported" kinda way? Has that been the topic of a past discussion
I could catch up with?

-Chap



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
On 2018-10-19 14:08:19 -0400, J Chapman Flack wrote:
> On 10/19/2018 01:17 PM, Andres Freund wrote:
> > On 2018-10-19 10:36:59 -0400, Stephen Frost wrote:
> >
> >> I also categorically disagree with the notion that it's ok for
> >> extensions to dump files into our tablespace diretories 
> > 
> > I'm unconvinced. There already are extensions doing so, and it's not
> > like we've given them any sort of reasonable alternative. You can't just
> > create relations in a "registered" / "supported" kinda way right now, so
> > uh, yea, extension gotta make do.  And that has worked for many years.
> 
> For some of us following along at home, this got interesting right here.
> Could someone elaborate on what extensions do this, for what purpose?
> And what would it mean to create relations in a "registered" /
> "supported" kinda way? Has that been the topic of a past discussion
> I could catch up with?

An fdw, like cstore, might be doing it, to store data, for example. The
registered / supported thing would be to have a catalog representation
of on-disk files that is represented in the catalogs somewhow. Right now
you can't really do that because you need a proper relation (table,
index, ...) to get a relfilenode.

Greetings,

Andres Freund



Re: Optimze usage of immutable functions as relation

2018-10-19 Thread Aleksandr Parfenov
Hi,

Thank you for the review.
I fixed a typo and some comments. Please find new version attached.

---
Best regards,
Parfenov Aleksandr

On Fri, 19 Oct 2018 at 16:40, Anthony Bykov  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> Hello,
> I was trying to review your patch, but I couldn't install it:
>
> prepjointree.c: In function ‘pull_up_simple_function’:
> prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this
> function); did you mean ‘PGFunction’?
>   Assert(!contain_vars_of_level((Node *) functions, 0));
>
> Was it a typo?
>
> The new status of this patch is: Waiting on Author
>
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index cd6e119..4a9d74a 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -23,7 +23,9 @@
  */
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "catalog/pg_proc.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -35,6 +37,8 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
 
 
 typedef struct pullup_replace_vars_context
@@ -86,6 +90,8 @@ static bool is_simple_subquery(Query *subquery, RangeTblEntry *rte,
    bool deletion_ok);
 static Node *pull_up_simple_values(PlannerInfo *root, Node *jtnode,
 	  RangeTblEntry *rte);
+static Node *pull_up_simple_function(PlannerInfo *root, Node *jtnode,
+	  RangeTblEntry *rte);
 static bool is_simple_values(PlannerInfo *root, RangeTblEntry *rte,
  bool deletion_ok);
 static bool is_simple_union_all(Query *subquery);
@@ -598,6 +604,54 @@ inline_set_returning_functions(PlannerInfo *root)
 	}
 }
 
+static bool
+is_simple_stable_function(RangeTblEntry *rte)
+{
+	Form_pg_type type_form;
+	RangeTblFunction *tblFunction = linitial_node(RangeTblFunction, rte->functions);
+	FuncExpr   *expr = (FuncExpr *) tblFunction->funcexpr;
+	HeapTuple tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(expr->funcresulttype));
+	bool		result = false;
+
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for type %u", expr->funcresulttype);
+
+	type_form = (Form_pg_type) GETSTRUCT(tuple);
+
+	if (type_form->typtype == TYPTYPE_BASE &&
+		!type_is_array(expr->funcresulttype))
+	{
+		Form_pg_proc func_form;
+		ListCell   *arg;
+		bool		has_nonconst_input = false;
+		bool		has_null_input = false;
+
+		ReleaseSysCache(tuple);
+		tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(expr->funcid));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for function %u", expr->funcid);
+		func_form = (Form_pg_proc) GETSTRUCT(tuple);
+
+		foreach(arg, expr->args)
+		{
+			if (IsA(lfirst(arg), Const))
+has_null_input |= ((Const *) lfirst(arg))->constisnull;
+			else
+has_nonconst_input = true;
+		}
+
+		result = func_form->prorettype != RECORDOID &&
+ func_form->prokind == PROKIND_FUNCTION &&
+ !func_form->proretset &&
+ func_form->provolatile == PROVOLATILE_IMMUTABLE &&
+ !has_null_input &&
+ !has_nonconst_input;
+	}
+
+	ReleaseSysCache(tuple);
+	return result;
+}
+
 /*
  * pull_up_subqueries
  *		Look for subqueries in the rangetable that can be pulled up into
@@ -728,6 +782,11 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
 			is_simple_values(root, rte, deletion_ok))
 			return pull_up_simple_values(root, jtnode, rte);
 
+		if (rte->rtekind == RTE_FUNCTION &&
+list_length(rte->functions) == 1 &&
+is_simple_stable_function(rte))
+			return pull_up_simple_function(root, jtnode, rte);
+
 		/* Otherwise, do nothing at this node. */
 	}
 	else if (IsA(jtnode, FromExpr))
@@ -1710,6 +1769,98 @@ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte)
 	return NULL;
 }
 
+static Node *
+pull_up_simple_function(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte)
+{
+	Query	   *parse = root->parse;
+	int			varno = ((RangeTblRef *) jtnode)->rtindex;
+	List	   *functions_list;
+	List	   *tlist;
+	AttrNumber	attrno;
+	pullup_replace_vars_context rvcontext;
+	ListCell   *lc;
+
+	Assert(rte->rtekind == RTE_FUNCTION);
+	Assert(list_length(rte->functions) == 1);
+
+	/*
+	 * Need a modifiable copy of the functions list to hack on, just in case it's
+	 * multiply referenced.
+	 */
+	functions_list = copyObject(rte->functions);
+
+	/*
+	 * The FUNCTION RTE can't contain any Vars of level zero, let alone any that
+	 * are join aliases, so no need to flatten join alias Vars.
+	 */
+	Assert(!contain_vars_of_level((Node *) functions_list, 0));
+
+	/*
+	 * Set up required context data for pullup_replace_vars.  In particular,
+	 * we have to make the FUNCTION lis

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
On 2018-10-19 14:11:03 -0400, Stephen Frost wrote:
> Greetings,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2018-10-19 13:52:28 -0400, Stephen Frost wrote:
> > > > > I also categorically disagree with the notion that it's ok for
> > > > > extensions to dump files into our tablespace diretories or that we
> > > > > should try to work around random code dumping extra files in the
> > > > > directories which we maintain- it's not like this additional code will
> > > > > actually protect us from that, after all, and it's foolish to think we
> > > > > really could protect against that.
> > > > 
> > > > I'm unconvinced. There already are extensions doing so, and it's not
> > > > like we've given them any sort of reasonable alternative. You can't just
> > > > create relations in a "registered" / "supported" kinda way right now, so
> > > > uh, yea, extension gotta make do.  And that has worked for many years.
> > > 
> > > I don't agree that any of this is an argument for accepting random code
> > > writing into arbitrary places in the data directory or tablespace
> > > directories as being something which we support or which we write code
> > > to work-around.
> > > 
> > > If this is a capability that extension authors need then I'm all for
> > > having a way for them to have that capability in a clearly defined and
> > > documented way- and one which we could actually write code to handle and
> > > work with.
> > 
> > cstore e.g. does this, and it's already out there. So yes, we should
> > provide better infrastructure. But for now, we gotta deal with reality,
> > unless we just want to gratuituously break things.
> 
> No, I don't agree that we are beholden to an external extension that
> decided to start writing into directories that clearly belong to PG.

Did it have an alternative?


> How do we even know that what cstore does in the tablespace directory
> wouldn't get caught up in the checksum file pattern-matching that this
> commit put in?

You listen to people?


> What if there was an extension which did create files that would
> match, what would we do then?  I'm happy to go create one, if that'd
> help make the point that this "pattern whitelist" isn't actually a
> solution but is really rather just a hack that'll break in the future.

Oh, for crying out loud. Yes, obviously you can create conflicts, nobody
ever doubted that. How on earth is that a useful point for anything? The
point isn't that it's a good idea for extensions to do so, but that it
has happened because we didn't provide better alternative.


> > > > And I fail to see why a blacklist is architecturally better. There's
> > > > plenty cases where we might want to create temporary files,
> > > > non-checksummed data or such that we'd need a teach a blacklist about,
> > > > but there's far fewer cases where add new checksummed files. Basically
> > > > never since checksums have been introduced. And if checksums were
> > > > introduced for something new, say slrus, we'd ceertainly use
> > > > pg_verify_checksum during development.
> > > 
> > > In cases where we need something temporary, we're going to need to be
> > > prepared to clean those up and we had better make it very plain that
> > > they're temporary and easy to write code for.  Anything we aren't
> > > prepared to blow away on a crash-restart should be checksum'd and in an
> > > ideal world, we'd be looking to reduce the blacklist to only those
> > > things which are temporary.
> > 
> > There's pending patches that add support for pg_verify_checksums running
> > against a running cluster. We'll not just need to recognize files that
> > are there after a graceful shutdown, but with anything that a cluster
> > can have there while running.
> 
> Of course- the same is true with a crash/restart case, so I'm not sure
> what you're getting at here.

pg_verify_checksum doesn't support running on a crashed cluster, and I'm
not sure it'd make sense to teach it to so (there's not really much it
could do to discern whether a block is a torn page that replay will fix,
or corrupted).


> I wasn't ever thinking of only the graceful shutdown case- certainly
> pg_basebackup operates when the cluster is running also and that's
> more what I was thinking about from the start and which is part of
> what started the discussion about this commit as that was entirely
> ignored in this change.

It was mainly ignored in the original pg_verify_checksums change, not in
the commit discussed here. That was broken. That built separate logic.


Both basebackup an verify checksums already only scan certain
directories. They *already* are encoding directory structure
information.



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-10-19 14:11:03 -0400, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > cstore e.g. does this, and it's already out there. So yes, we should
> > > provide better infrastructure. But for now, we gotta deal with reality,
> > > unless we just want to gratuituously break things.
> > 
> > No, I don't agree that we are beholden to an external extension that
> > decided to start writing into directories that clearly belong to PG.
> 
> Did it have an alternative?

Yes, work with us to come up with a way to accomplish what they wanted
without creating unknown/untracked files in our tablespaces.

Or, have their own mechanism for storing files on other filesystems.

And likely other options.  Saying that we should account for their
putting arbitrary files into the PG tablespace directories because they
"didn't have any other choice" seems pretty ridiculous, imv.

> > How do we even know that what cstore does in the tablespace directory
> > wouldn't get caught up in the checksum file pattern-matching that this
> > commit put in?
> 
> You listen to people?
> 
> > What if there was an extension which did create files that would
> > match, what would we do then?  I'm happy to go create one, if that'd
> > help make the point that this "pattern whitelist" isn't actually a
> > solution but is really rather just a hack that'll break in the future.
> 
> Oh, for crying out loud. Yes, obviously you can create conflicts, nobody
> ever doubted that. How on earth is that a useful point for anything? The
> point isn't that it's a good idea for extensions to do so, but that it
> has happened because we didn't provide better alternative.

The point is that you're making a case because you happen to know of an
extension which does this, but neither of us know about every extension
out there and I, at least, don't believe we should be coming up with
hacks to try and avoid looking at files that some extension has dumped
into the PG tablespace directories because that's never been a
documented or supported thing to do.

> > > > > And I fail to see why a blacklist is architecturally better. There's
> > > > > plenty cases where we might want to create temporary files,
> > > > > non-checksummed data or such that we'd need a teach a blacklist about,
> > > > > but there's far fewer cases where add new checksummed files. Basically
> > > > > never since checksums have been introduced. And if checksums were
> > > > > introduced for something new, say slrus, we'd ceertainly use
> > > > > pg_verify_checksum during development.
> > > > 
> > > > In cases where we need something temporary, we're going to need to be
> > > > prepared to clean those up and we had better make it very plain that
> > > > they're temporary and easy to write code for.  Anything we aren't
> > > > prepared to blow away on a crash-restart should be checksum'd and in an
> > > > ideal world, we'd be looking to reduce the blacklist to only those
> > > > things which are temporary.
> > > 
> > > There's pending patches that add support for pg_verify_checksums running
> > > against a running cluster. We'll not just need to recognize files that
> > > are there after a graceful shutdown, but with anything that a cluster
> > > can have there while running.
> > 
> > Of course- the same is true with a crash/restart case, so I'm not sure
> > what you're getting at here.
> 
> pg_verify_checksum doesn't support running on a crashed cluster, and I'm
> not sure it'd make sense to teach it to so (there's not really much it
> could do to discern whether a block is a torn page that replay will fix,
> or corrupted).

... and this isn't at all relevant, because pg_basebackup does run on a
running cluster.

> > I wasn't ever thinking of only the graceful shutdown case- certainly
> > pg_basebackup operates when the cluster is running also and that's
> > more what I was thinking about from the start and which is part of
> > what started the discussion about this commit as that was entirely
> > ignored in this change.
> 
> It was mainly ignored in the original pg_verify_checksums change, not in
> the commit discussed here. That was broken. That built separate logic.

As pointed out elsewhere on this thread, the logic was the same between
the two before this commit...  The code in pg_basebackup came from the
prior pg_verify_checksums code.  Certainly, some mention of the code
existing in two places, at least, should have been in the comments.

Also, just to be clear, as I tried to say up-thread, I'm not trying to
lay blame here or suggest that Michael shouldn't have been trying to fix
the issue that came up, but I don't agree that this is the way to fix
it, and, in any case, we need to make sure that pg_basebackup behaves
correctly as well.  As mentioned elsewhere, that'd be best done by
adding something to libpgcommon and then changing both pg_basebackup and
pg_verify_checksums to use that.

> Both basebackup an 

Google Code In Mentorship

2018-10-19 Thread Apurv Shah
Hello Sir/ Madam,

I participated last year in the Google Code-in Competition and I kept track
of the participating organizations. I feel that your organization fits my
interests and skill sets perfectly and is the best addition to the
participating organizations. I wish to mentor in the GCI this year. am
Apurv Shah, a freshman at UMass Amherst pursuing a Bachelor's Degree in
Computer Science. I have experience with SQL Server and C language along
with Python and Django. I hope I can mentor for your organization. I have
attached my resume for more information.


-- 
Best Regards,
Apurv Shah.


Resume.pdf
Description: Adobe PDF document


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-19 14:47:51 -0400, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2018-10-19 14:11:03 -0400, Stephen Frost wrote:
> > > * Andres Freund (and...@anarazel.de) wrote:
> > > > cstore e.g. does this, and it's already out there. So yes, we should
> > > > provide better infrastructure. But for now, we gotta deal with reality,
> > > > unless we just want to gratuituously break things.
> > > 
> > > No, I don't agree that we are beholden to an external extension that
> > > decided to start writing into directories that clearly belong to PG.
> > 
> > Did it have an alternative?
> 
> Yes, work with us to come up with a way to accomplish what they wanted
> without creating unknown/untracked files in our tablespaces.
> 
> Or, have their own mechanism for storing files on other filesystems.
> 
> And likely other options.  Saying that we should account for their
> putting arbitrary files into the PG tablespace directories because they
> "didn't have any other choice" seems pretty ridiculous, imv.
> 
> > > How do we even know that what cstore does in the tablespace directory
> > > wouldn't get caught up in the checksum file pattern-matching that this
> > > commit put in?
> > 
> > You listen to people?
> > 
> > > What if there was an extension which did create files that would
> > > match, what would we do then?  I'm happy to go create one, if that'd
> > > help make the point that this "pattern whitelist" isn't actually a
> > > solution but is really rather just a hack that'll break in the future.
> > 
> > Oh, for crying out loud. Yes, obviously you can create conflicts, nobody
> > ever doubted that. How on earth is that a useful point for anything? The
> > point isn't that it's a good idea for extensions to do so, but that it
> > has happened because we didn't provide better alternative.
> 
> The point is that you're making a case because you happen to know of an
> extension which does this, but neither of us know about every extension
> out there and I, at least, don't believe we should be coming up with
> hacks to try and avoid looking at files that some extension has dumped
> into the PG tablespace directories because that's never been a
> documented or supported thing to do.

It's not a hack if it's a quite defendable choice on its own, and the
presense of such extensions is just one further argument in one
direction (for explicit whitelisting here).


> > > > > > And I fail to see why a blacklist is architecturally better. There's
> > > > > > plenty cases where we might want to create temporary files,
> > > > > > non-checksummed data or such that we'd need a teach a blacklist 
> > > > > > about,
> > > > > > but there's far fewer cases where add new checksummed files. 
> > > > > > Basically
> > > > > > never since checksums have been introduced. And if checksums were
> > > > > > introduced for something new, say slrus, we'd ceertainly use
> > > > > > pg_verify_checksum during development.
> > > > > 
> > > > > In cases where we need something temporary, we're going to need to be
> > > > > prepared to clean those up and we had better make it very plain that
> > > > > they're temporary and easy to write code for.  Anything we aren't
> > > > > prepared to blow away on a crash-restart should be checksum'd and in 
> > > > > an
> > > > > ideal world, we'd be looking to reduce the blacklist to only those
> > > > > things which are temporary.
> > > > 
> > > > There's pending patches that add support for pg_verify_checksums running
> > > > against a running cluster. We'll not just need to recognize files that
> > > > are there after a graceful shutdown, but with anything that a cluster
> > > > can have there while running.
> > > 
> > > Of course- the same is true with a crash/restart case, so I'm not sure
> > > what you're getting at here.
> > 
> > pg_verify_checksum doesn't support running on a crashed cluster, and I'm
> > not sure it'd make sense to teach it to so (there's not really much it
> > could do to discern whether a block is a torn page that replay will fix,
> > or corrupted).
> 
> ... and this isn't at all relevant, because pg_basebackup does run on a
> running cluster.

I wasn't ever denying that or anything close to it?  My point is that
pg_verify_checksum needs much more filtering than it has now to deal
with that, because it needs to handle all files that could be present,
not just files that could be present after a graceful shutdown.

pg_basebackup's case is *not* really comparable because basebackup.c
already performed a lot of filtering before noChecksumFiles is applied.


> > > I wasn't ever thinking of only the graceful shutdown case- certainly
> > > pg_basebackup operates when the cluster is running also and that's
> > > more what I was thinking about from the start and which is part of
> > > what started the discussion about this commit as that was entirely
> > > ignored in this change.
> > 
> > It was mainly ignored in the original pg_verify_checksums change, no

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-10-19 14:47:51 -0400, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > Oh, for crying out loud. Yes, obviously you can create conflicts, nobody
> > > ever doubted that. How on earth is that a useful point for anything? The
> > > point isn't that it's a good idea for extensions to do so, but that it
> > > has happened because we didn't provide better alternative.
> > 
> > The point is that you're making a case because you happen to know of an
> > extension which does this, but neither of us know about every extension
> > out there and I, at least, don't believe we should be coming up with
> > hacks to try and avoid looking at files that some extension has dumped
> > into the PG tablespace directories because that's never been a
> > documented or supported thing to do.
> 
> It's not a hack if it's a quite defendable choice on its own, and the
> presense of such extensions is just one further argument in one
> direction (for explicit whitelisting here).

You've not convinced me that an extension dropping files into our
tablespace directories is either something we should accept or that we
should code around such cases, nor that we are doing anything but
putting a hack in place to deal with what is pretty clearly a hack in
its own right, imv.

> > > > Of course- the same is true with a crash/restart case, so I'm not sure
> > > > what you're getting at here.
> > > 
> > > pg_verify_checksum doesn't support running on a crashed cluster, and I'm
> > > not sure it'd make sense to teach it to so (there's not really much it
> > > could do to discern whether a block is a torn page that replay will fix,
> > > or corrupted).
> > 
> > ... and this isn't at all relevant, because pg_basebackup does run on a
> > running cluster.
> 
> I wasn't ever denying that or anything close to it?  My point is that
> pg_verify_checksum needs much more filtering than it has now to deal
> with that, because it needs to handle all files that could be present,
> not just files that could be present after a graceful shutdown.

Perhaps it doesn't today but surely one goal of pg_verify_checksum is to
be able to run it on a running cluster eventually.

> pg_basebackup's case is *not* really comparable because basebackup.c
> already performed a lot of filtering before noChecksumFiles is applied.

This all really just points out that we should have the code for
handling this somewhere common that both pg_verify_checksum and
pg_basebackup can leverage without duplicating all of it.

The specific case that started all of this certainly looks pretty
clearly like a case that both need to deal with.

> > As pointed out elsewhere on this thread, the logic was the same between
> > the two before this commit...  The code in pg_basebackup came from the
> > prior pg_verify_checksums code.  Certainly, some mention of the code
> > existing in two places, at least, should have been in the comments.
> 
> But the filter for basebackup only comes after the pre-existing
> filtering that the basebackup.c code already does. All of:
> 
> /*
>  * List of files excluded from backups.
>  */
> static const char *excludeFiles[] =
> {
>   /* Skip auto conf temporary file. */
>   PG_AUTOCONF_FILENAME ".tmp",
> 
>   /* Skip current log file temporary file */
>   LOG_METAINFO_DATAFILE_TMP,
> 
>   /* Skip relation cache because it is rebuilt on startup */
>   RELCACHE_INIT_FILENAME,
> 
>   /*
>* If there's a backup_label or tablespace_map file, it belongs to a
>* backup started by the user with pg_start_backup().  It is *not* 
> correct
>* for this backup.  Our backup_label/tablespace_map is injected into 
> the
>* tar separately.
>*/
>   BACKUP_LABEL_FILE,
>   TABLESPACE_MAP,
> 
>   "postmaster.pid",
>   "postmaster.opts",
> 
>   /* end of list */
>   NULL
> };
> 
> is not applied, for example.  Nor is:
> 
>   /* Skip temporary files */
>   if (strncmp(de->d_name,
>   PG_TEMP_FILE_PREFIX,
>   strlen(PG_TEMP_FILE_PREFIX)) == 0)
>   continue;
> 
> Nor is
>   /* Exclude temporary relations */
>   if (isDbDir && looks_like_temp_rel_name(de->d_name))
>   {
>   elog(DEBUG2,
>"temporary relation file \"%s\" excluded from 
> backup",
>de->d_name);
> 
>   continue;
>   }
> 
> So, yea, they had:
> 
> static const char *const skip[] = {
>   "pg_control",
>   "pg_filenode.map",
>   "pg_internal.init",
>   "PG_VERSION",
>   NULL,
> };
> 
> in common, but not more.  All the above exclusions are strictly
> necessary.

... but all of that code doesn't change that pg_basebackup also needs to
ignore the EXEC_BACKEND created confi

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
On 2018-10-19 15:57:28 -0400, Stephen Frost wrote:
> > > > > Of course- the same is true with a crash/restart case, so I'm not sure
> > > > > what you're getting at here.
> > > > 
> > > > pg_verify_checksum doesn't support running on a crashed cluster, and I'm
> > > > not sure it'd make sense to teach it to so (there's not really much it
> > > > could do to discern whether a block is a torn page that replay will fix,
> > > > or corrupted).
> > > 
> > > ... and this isn't at all relevant, because pg_basebackup does run on a
> > > running cluster.
> > 
> > I wasn't ever denying that or anything close to it?  My point is that
> > pg_verify_checksum needs much more filtering than it has now to deal
> > with that, because it needs to handle all files that could be present,
> > not just files that could be present after a graceful shutdown.
> 
> Perhaps it doesn't today but surely one goal of pg_verify_checksum is to
> be able to run it on a running cluster eventually.

I was saying *precisely* that above.  I give up.


> > pg_basebackup's case is *not* really comparable because basebackup.c
> > already performed a lot of filtering before noChecksumFiles is applied.
> 
> This all really just points out that we should have the code for
> handling this somewhere common that both pg_verify_checksum and
> pg_basebackup can leverage without duplicating all of it.

I never argued against that.  My point is that your argument that they
started out the same isn't true.


> The specific case that started all of this certainly looks pretty
> clearly like a case that both need to deal with.

Yep.


> > > As pointed out elsewhere on this thread, the logic was the same between
> > > the two before this commit...  The code in pg_basebackup came from the
> > > prior pg_verify_checksums code.  Certainly, some mention of the code
> > > existing in two places, at least, should have been in the comments.
> > 
> > But the filter for basebackup only comes after the pre-existing
> > filtering that the basebackup.c code already does. All of:
> > 
> > /*
> >  * List of files excluded from backups.
> >  */
> > static const char *excludeFiles[] =
> > {
> > /* Skip auto conf temporary file. */
> > PG_AUTOCONF_FILENAME ".tmp",
> > 
> > /* Skip current log file temporary file */
> > LOG_METAINFO_DATAFILE_TMP,
> > 
> > /* Skip relation cache because it is rebuilt on startup */
> > RELCACHE_INIT_FILENAME,
> > 
> > /*
> >  * If there's a backup_label or tablespace_map file, it belongs to a
> >  * backup started by the user with pg_start_backup().  It is *not* 
> > correct
> >  * for this backup.  Our backup_label/tablespace_map is injected into 
> > the
> >  * tar separately.
> >  */
> > BACKUP_LABEL_FILE,
> > TABLESPACE_MAP,
> > 
> > "postmaster.pid",
> > "postmaster.opts",
> > 
> > /* end of list */
> > NULL
> > };
> > 
> > is not applied, for example.  Nor is:
> > 
> > /* Skip temporary files */
> > if (strncmp(de->d_name,
> > PG_TEMP_FILE_PREFIX,
> > strlen(PG_TEMP_FILE_PREFIX)) == 0)
> > continue;
> > 
> > Nor is
> > /* Exclude temporary relations */
> > if (isDbDir && looks_like_temp_rel_name(de->d_name))
> > {
> > elog(DEBUG2,
> >  "temporary relation file \"%s\" excluded from 
> > backup",
> >  de->d_name);
> > 
> > continue;
> > }
> > 
> > So, yea, they had:
> > 
> > static const char *const skip[] = {
> > "pg_control",
> > "pg_filenode.map",
> > "pg_internal.init",
> > "PG_VERSION",
> > NULL,
> > };
> > 
> > in common, but not more.  All the above exclusions are strictly
> > necessary.
> 
> ... but all of that code doesn't change that pg_basebackup also needs to
> ignore the EXEC_BACKEND created config_exec_params/.new files.

Right.


> You're right, pg_verify_checksums, with the assumption that it only runs
> on a cleanly shut-down cluster, doesn't need the temp-file-skipping
> logic, today, but it's going to need it tomorrow, isn't it?

No, it needs it today, as explain below in the email you're replaying
to.


> > FWIW, as far as I can tell, pg_verify_checksum appears to be broken in a
> > lot of (unfortunately) pretty normal scenarios right now. Not just on
> > windows. Besides the narrow window of crashing while a .tmp file is
> > present (and then shutting down normally after a crash restart), it also
> > has the much of wider window of crashing while *any* backend has
> > temporary files/relations.  As crash-restarts don't release temp files,
> > they'll still be present after the crash, and a single graceful shutdown
> > afterwards will leave them in place. No?
> 
> We do go through and do some cleanup at crash/restart

We don't clean up temp files tho.

 * NOTE: we could, but do

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-10-19 15:57:28 -0400, Stephen Frost wrote:
> > Perhaps it doesn't today but surely one goal of pg_verify_checksum is to
> > be able to run it on a running cluster eventually.
> 
> I was saying *precisely* that above.  I give up.

I'm glad we agree on that (I thought we did before, in fact, so it was
odd to see it coming up again).

> > > pg_basebackup's case is *not* really comparable because basebackup.c
> > > already performed a lot of filtering before noChecksumFiles is applied.
> > 
> > This all really just points out that we should have the code for
> > handling this somewhere common that both pg_verify_checksum and
> > pg_basebackup can leverage without duplicating all of it.
> 
> I never argued against that.  My point is that your argument that they
> started out the same isn't true.

I overstated the relationship, clearly, but it hardly matters, as you
say..

> > The specific case that started all of this certainly looks pretty
> > clearly like a case that both need to deal with.
> 
> Yep.

... and it's in the part of the code that was actually copied and was
the same.

> > > FWIW, as far as I can tell, pg_verify_checksum appears to be broken in a
> > > lot of (unfortunately) pretty normal scenarios right now. Not just on
> > > windows. Besides the narrow window of crashing while a .tmp file is
> > > present (and then shutting down normally after a crash restart), it also
> > > has the much of wider window of crashing while *any* backend has
> > > temporary files/relations.  As crash-restarts don't release temp files,
> > > they'll still be present after the crash, and a single graceful shutdown
> > > afterwards will leave them in place. No?
> > 
> > We do go through and do some cleanup at crash/restart
> 
> We don't clean up temp files tho.
> 
>  * NOTE: we could, but don't, call this during a post-backend-crash restart
>  * cycle.  The argument for not doing it is that someone might want to examine
>  * the temp files for debugging purposes.  This does however mean that
>  * OpenTemporaryFile had better allow for collision with an existing temp
>  * file name.

Then, yes, we should go through and fix pg_verify_checksums to work
correctly in this case, likely using more-or-less the same code that
pg_basebackup has.  After that, we can add the remaining code to check
the last checkpoint and skip pages which have a more recent LSN...

> > As you say, some of those are likely to have checksums, which should be
> > handled by pg_basebackup and pg_verify_checksums, and that goes back to
> > the point I was making up-thread that we want to make sure an account
> > for everything.  With this pattern-based approach, we could easily end
> > up forgetting to add the correct new pattern into pg_verify_checksums.
> 
> If you're adding checksums for something, you better test it I don't buy
> this.  In contrast it's much more likely that there's a file that's
> short-lived that you won't easily test against pg_verify_checksum
> running in that moment.

The only justification for *not* doing this is that some extension
author might have dumped files into our tablespace directory, something
we've never claimed to support nor generally worried about in all the
time that I can recall before this.

As for saying that someone is obviously going to use pg_verify_checksums
to check their checksum code- I simply don't agree that we should be
happy to rely on that assumption when the only reason for any of this is
that some extension decided to do something that wasn't supported and
likely has issues in other parts of the code anyway (pg_basebackup would
happily copy those files too, even though there's obviously no code for
making sure it's a consistent copy or any such in pg_basebackup or
core).

> > Playing this further and assuming that extensions dropping files into
> > tablespace directories is acceptable, what are we supposed to do when
> > there's some direct conflict between a file that we want to create in a
> > PG tablespace directory and a file that some extension dropped there?
> > Create yet another subdirectory which we call
> > "THIS_IS_REALLY_ONLY_FOR_PG"?
> 
> Then it's a buggy extension. And we error out.

Extensions writing into directories they shouldn't be makes them buggy
even if the core code isn't likely to write to a particular filename,
imv.

> > What about two different extensions wanting to create files with the
> > same names in the tablespace directories..?
> > 
> > Experimentation is fine, of course, this is open source code and people
> > should feel free to play with it, but we are not obligated to avoid
> > breaking things when an extension author, through their experimentation,
> > does something which is clearly not supported, like dropping files into
> > PG's tablespace directories.  Further, when it comes to our user's data,
> > we should be taking a strict approach and accounting for everything,
> > something that t

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-19 16:35:46 -0400, Stephen Frost wrote:
> The only justification for *not* doing this is that some extension
> author might have dumped files into our tablespace directory, something
> we've never claimed to support nor generally worried about in all the
> time that I can recall before this.

No, it's really not the only reason. As I said, testing is much less
likely to find cases where we're checksumming a short-lived file even
though we shouldn't, than making sure that checksummed files are
actually checksummed.


> > > Playing this further and assuming that extensions dropping files into
> > > tablespace directories is acceptable, what are we supposed to do when
> > > there's some direct conflict between a file that we want to create in a
> > > PG tablespace directory and a file that some extension dropped there?
> > > Create yet another subdirectory which we call
> > > "THIS_IS_REALLY_ONLY_FOR_PG"?
> > 
> > Then it's a buggy extension. And we error out.
> 
> Extensions writing into directories they shouldn't be makes them buggy
> even if the core code isn't likely to write to a particular filename,
> imv.

I'll just stop talking to you about this for now.


> > > What about two different extensions wanting to create files with the
> > > same names in the tablespace directories..?
> > > 
> > > Experimentation is fine, of course, this is open source code and people
> > > should feel free to play with it, but we are not obligated to avoid
> > > breaking things when an extension author, through their experimentation,
> > > does something which is clearly not supported, like dropping files into
> > > PG's tablespace directories.  Further, when it comes to our user's data,
> > > we should be taking a strict approach and accounting for everything,
> > > something that this whitelist-of-patterns-based approach to finding
> > > files to verify the checksums on doesn't do.
> > 
> > It's not economically feasible to work on extensions that will only be
> > usable a year down the road.
> 
> I listed out multiple other solutions to this problem which you
> summarily ignored.

Except that none of them really achieves the goals you can achieve by
having the files in the database (like DROP DATABASE working, for
starters).


> It's unfortunate that those other solutions weren't used and that,
> instead, this extension decided to drop files into PG's tablespace
> directories, but that doesn't mean we should condone or implicitly
> support that action.

This just seems pointlessly rigid. Our extension related APIs aren't
generally very good or well documented. Most non-trivial extensions that
add interesting things to the postgres ecosystem are going to need a few
not so pretty things to get going.


> I stand by my position that this patch should be reverted (with no
> offense or ill-will towards Michael, of course, I certainly appreciate
> his efforts to address the issues with pg_verify_checksums) and that we
> should move more of this code to identify files to verify the checksums
> on into libpgcommon and then use it from both pg_basebackup and
> pg_verify_checksums, to the extent possible, but that we make sure to
> account for all of the files in our tablespace and database directories.

What does that do, except break things that currently work? Sure, work
on a patch that fixes the architectural concerns, but what's the point
in reverting it until that's done?


> To the extent that this is an issue for extension authors, perhaps it
> would encourage them to work with us to have supported mechanisms
> instead of using hacks like dropping files into our tablespace
> directories and such instead of using another approach to manage files
> across multiple filesystems.  I'd be kind of surprised if they really
> had an issue doing that and hopefully everyone would feel better about
> what these extensions are doing once they start using a mechanism that
> we actually support.

Haribabu has a patch, but it's on-top of pluggable storage, so not
exactly a small prerequisite.

Greetings,

Andres Freund



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-10-19 16:35:46 -0400, Stephen Frost wrote:
> > The only justification for *not* doing this is that some extension
> > author might have dumped files into our tablespace directory, something
> > we've never claimed to support nor generally worried about in all the
> > time that I can recall before this.
> 
> No, it's really not the only reason. As I said, testing is much less
> likely to find cases where we're checksumming a short-lived file even
> though we shouldn't, than making sure that checksummed files are
> actually checksummed.

While I agree that we might possibly end up trying to checksum a
short-lived and poorly-named file, I'd rather that than risk missing the
checking of a file which does have checksums that should have been
checked and claiming that we checked all of the checksums.

> > > > What about two different extensions wanting to create files with the
> > > > same names in the tablespace directories..?
> > > > 
> > > > Experimentation is fine, of course, this is open source code and people
> > > > should feel free to play with it, but we are not obligated to avoid
> > > > breaking things when an extension author, through their experimentation,
> > > > does something which is clearly not supported, like dropping files into
> > > > PG's tablespace directories.  Further, when it comes to our user's data,
> > > > we should be taking a strict approach and accounting for everything,
> > > > something that this whitelist-of-patterns-based approach to finding
> > > > files to verify the checksums on doesn't do.
> > > 
> > > It's not economically feasible to work on extensions that will only be
> > > usable a year down the road.
> > 
> > I listed out multiple other solutions to this problem which you
> > summarily ignored.
> 
> Except that none of them really achieves the goals you can achieve by
> having the files in the database (like DROP DATABASE working, for
> starters).

The only reason things like DROP DATABASE "work" in this example case is
exactly that it assumes that all of the files which exist are ones which
we put there.  Or, to put it another way, if we're only going to
checksum files based on some whitelist of files we expect to be there,
shouldn't we go back and make things like DROP DATABASE only remove
those files that we expect to be there and not random other files that
we have no knowledge of..?

> > It's unfortunate that those other solutions weren't used and that,
> > instead, this extension decided to drop files into PG's tablespace
> > directories, but that doesn't mean we should condone or implicitly
> > support that action.
> 
> This just seems pointlessly rigid. Our extension related APIs aren't
> generally very good or well documented. Most non-trivial extensions that
> add interesting things to the postgres ecosystem are going to need a few
> not so pretty things to get going.

PostGIS is a fantastic example of an extension that is far from trivial,
extends PG in ways which are clearly supported, and has worked with PG
to improve things in core to be able to then use in the extension.

> > I stand by my position that this patch should be reverted (with no
> > offense or ill-will towards Michael, of course, I certainly appreciate
> > his efforts to address the issues with pg_verify_checksums) and that we
> > should move more of this code to identify files to verify the checksums
> > on into libpgcommon and then use it from both pg_basebackup and
> > pg_verify_checksums, to the extent possible, but that we make sure to
> > account for all of the files in our tablespace and database directories.
> 
> What does that do, except break things that currently work? Sure, work
> on a patch that fixes the architectural concerns, but what's the point
> in reverting it until that's done?

As you pointed out previously, the current code *doesn't* work, before
or after this change, and we clearly need to rework this to move things
into libpgcommon and also fix pg_basebackup.  Reverting this would at
least get us back to having similar code between this and pg_basebackup,
and then it'll be cleaner and clearer to have one patch which moves that
similar logic into libpgcommon and fixes the missing exceptions for the
EXEC_BACKEND case.

Keeping the patch doesn't do anything for the pg_basebackup case, and
confuses the issue by having these filename-pattern-whitelists which
weren't there before and that should be removed, imv.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-19 17:32:58 -0400, Stephen Frost wrote:
> As you pointed out previously, the current code *doesn't* work, before
> or after this change, and we clearly need to rework this to move things
> into libpgcommon and also fix pg_basebackup.  Reverting this would at
> least get us back to having similar code between this and pg_basebackup,
> and then it'll be cleaner and clearer to have one patch which moves that
> similar logic into libpgcommon and fixes the missing exceptions for the
> EXEC_BACKEND case.
> 
> Keeping the patch doesn't do anything for the pg_basebackup case, and
> confuses the issue by having these filename-pattern-whitelists which
> weren't there before and that should be removed, imv.

The current state has pg_verify_checksum work on windows for casual
usage, which includes the regression tests that previously were broken.
Whereas reverting it has the advantage that a fixup diff would be a few
lines smaller.  If you want to push a revert together with a fix, be my
guest, but until that time it seems unhelpful to revert.

Greetings,

Andres Freund



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-10-19 17:32:58 -0400, Stephen Frost wrote:
> > As you pointed out previously, the current code *doesn't* work, before
> > or after this change, and we clearly need to rework this to move things
> > into libpgcommon and also fix pg_basebackup.  Reverting this would at
> > least get us back to having similar code between this and pg_basebackup,
> > and then it'll be cleaner and clearer to have one patch which moves that
> > similar logic into libpgcommon and fixes the missing exceptions for the
> > EXEC_BACKEND case.
> > 
> > Keeping the patch doesn't do anything for the pg_basebackup case, and
> > confuses the issue by having these filename-pattern-whitelists which
> > weren't there before and that should be removed, imv.
> 
> The current state has pg_verify_checksum work on windows for casual
> usage, which includes the regression tests that previously were broken.
> Whereas reverting it has the advantage that a fixup diff would be a few
> lines smaller.  If you want to push a revert together with a fix, be my
> guest, but until that time it seems unhelpful to revert.

Clearly, pg_basebackup *doesn't* work though, so it's at best only half
a fix and only because our regression tests don't cover the
pg_basebackup case (which is a sad state of affairs, to say the least,
but an independent issue).

I'm not in any specific rush on this and I hope to hear Michael's
thoughts once he's had a chance to review our discussion; it's his
commit, after all.  Michael's initial patch is in the archives also, so
my suggestion would be to revert this one and then take Michael's prior
patch and add to it the fix of pg_basebackup, in the same manner, and
commit those changes together with additional comments to note that the
code exists in both places.

We could then work on moving the logic into libpgcommon, in master.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andrew Dunstan




On 10/19/2018 05:32 PM, Stephen Frost wrote:


As you pointed out previously, the current code *doesn't* work, before
or after this change, and we clearly need to rework this to move things
into libpgcommon and also fix pg_basebackup.  Reverting this would at
least get us back to having similar code between this and pg_basebackup,
and then it'll be cleaner and clearer to have one patch which moves that
similar logic into libpgcommon and fixes the missing exceptions for the
EXEC_BACKEND case.

Keeping the patch doesn't do anything for the pg_basebackup case, and
confuses the issue by having these filename-pattern-whitelists which
weren't there before and that should be removed, imv.




I don't think just reverting it is really acceptable. The patch was a 
response to buildfarm breakage, and moreover was published and discussed 
before it was applied. If you don't like it I think you need to publish 
a better solution that will not involve reintroducing the buildfarm 
error. I don't have a strong opinion about the mechanism. The current 
conversation does seem to me to be generating more heat than light, TBH. 
But I do have a strong opinion about not having to enable/disable the 
TAP test in question constantly.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Tom Lane
Andrew Dunstan  writes:
> I don't think just reverting it is really acceptable.

+several.  I do not mind somebody writing and installing a better fix.
I do object to turning the buildfarm red again.

regards, tom lane



Contribution to postgresql

2018-10-19 Thread Nicky Larson
Hi,

I am using postgresql at work, and I would like to contribute.

>From the todo list, I chose:

Allow log_min_messages to be specified on a per-module basis

Is this feature always wanted ? That would be my first contribution to
postgresql, is it an easy one ?


Thanks,
Saïd


Re: WAL archive (archive_mode = always) ?

2018-10-19 Thread Jeff Janes
On Fri, Oct 19, 2018 at 10:00 AM Adelino Silva <
adelino.j.si...@googlemail.com> wrote:

> Hi,
>
> What is the advantage to use archive_mode = always in a slave server
> compared to archive_mode = on (shared WAL archive) ?
> I only see duplication of Wal files, what is the purpose of this feature ?
>

You might not have a shared wal archive in the first place.  For example,
your only access to the master is through the streaming replication
protocol, but you want to maintain a local WAL archive anyway so you can
PITR locally for testing or debugging purposes.

Cheers,

Jeff


vacuum fails with "could not open statistics file" "Device or resource busy"

2018-10-19 Thread Andres Freund
Hi,

buildfarm member lorikeet had an interesting buildfarm failure in the
logical decoding test. The failure looks unrelated to logical decoding,
but might be more widely relevant:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2018-10-19%2011%3A22%3A34
  VACUUM FULL pg_class;
+ WARNING:  could not open statistics file "pg_stat_tmp/global.stat": Device or 
resource busy

Now that animal is cygwin based, so maybe it's just some random
weirdness. But ISTM it could also indicate a bug of some sort.

Were it native windows, I'd assume we'd keep a file open with the wrong
flags or such...

Greetings,

Andres Freund



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Paquier
On Fri, Oct 19, 2018 at 06:43:18PM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I don't think just reverting it is really acceptable.
> 
> +several.  I do not mind somebody writing and installing a better fix.
> I do object to turning the buildfarm red again.

I did not expect this thread to turn into a war zone.  Anyway, there are
a couple of things I agree with on this thread:
- I agree with Andres point here:
https://postgr.es/m/20181019171747.4uithw2sjkt6m...@alap3.anarazel.de
A blacklist is fundamentally more difficult to maintain as there are
way more things added in a data folder which do not have data checksums
than things which have checksums.  So using a blacklist approach looks
unmaintainable in the long term.  Future patches related to enabling
online checksum verification make me worry if we keep the code like
that.  I can also easily imagine that anybody willing to use the
pluggable storage API would like to put new files in tablespace-related
data folders, relying on "base/" being the default system tablespace
- I agree with Stephen's point that we should decide if a file has
checksums or not in a single place, and that we should use the same
logic for base backups and pg_verify_checksums.
- I agree with not doing a simple revert to not turn the buildfarm red
again.  This is annoying for animal maintainers.  Andrew has done a very
nice work in disabling manually those tests temporarily.
- The base backup logic deciding if a file has checksums looks broken to
me: it misses files generated by EXEC_BACKEND, and any instance of
Postgres using an extension with custom files and data checksums has its
backups broken.  cstore_fdw has been mentioned above, and I recall that
Heroku for example enables data checksums.  If you combine both, it
basically means that such an instance cannot take base backups anymore
while it was able to do so with pre-10 with default options.  That's not
cool.

So what I think we ought to do is the following:
- Start a new thread, this one about TAP tests is not adapted.
- Add in src/common/relpath.c the API from d55241af called
isRelFileName(), make use of it in the base backup code, and basically
remove is_checksummed_file() and the checksum skip list.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-20 12:39:55 +0900, Michael Paquier wrote:
> - I agree with Stephen's point that we should decide if a file has
> checksums or not in a single place, and that we should use the same
> logic for base backups and pg_verify_checksums.

To be clear, I wholeheartedly agree on that.  We probably only want to
tackle that for "is checksummable file" in a backpatchable fashion, but
we probably should move to having more common infrastructure like that.


> So what I think we ought to do is the following:
> - Start a new thread, this one about TAP tests is not adapted.
> - Add in src/common/relpath.c the API from d55241af called
> isRelFileName(), make use of it in the base backup code, and basically
> remove is_checksummed_file() and the checksum skip list.

I think it probably shouldn't quite be that as an API.  The code should
not just check whether the file matches a pattern, but also importantly
needs to exclude files that are in a temp tablespace. isRelFileName()
doesn't quite describe an API meaning that.  I assume we should keep
something like isRelFileName() but use an API ontop of that that also
exclude temp files / relations.

Greetings,

Andres Freund
Date: Fri, 19 Oct 2018 20:48:02 -0700
Message-ID: <87in1xmg7h@alap4.lan>



Re: Function to promote standby servers

2018-10-19 Thread Laurenz Albe
I wrote:
> Fixed.

Here is another version, with a fix in pg_proc.dat, an improved comment
and "wait_seconds" exercised in the regression test.

Yours,
Laurenz Albe
From a2a7f9fd1b23ad102d11992b22158dab8b5451d5 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Sat, 20 Oct 2018 06:21:00 +0200
Subject: [PATCH] Add pg_promote() to promote standby servers

---
 doc/src/sgml/func.sgml | 21 ++
 doc/src/sgml/high-availability.sgml|  2 +-
 doc/src/sgml/recovery-config.sgml  |  3 +-
 src/backend/access/transam/xlog.c  |  6 --
 src/backend/access/transam/xlogfuncs.c | 83 ++
 src/backend/catalog/system_views.sql   |  8 +++
 src/backend/postmaster/pgstat.c|  3 +
 src/include/access/xlog.h  |  6 ++
 src/include/catalog/pg_proc.dat|  4 ++
 src/include/pgstat.h   |  3 +-
 src/test/recovery/t/004_timeline_switch.pl |  6 +-
 src/test/recovery/t/009_twophase.pl|  6 +-
 12 files changed, 139 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5193df3366..88121cdc66 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18731,6 +18731,9 @@ SELECT set_config('log_statement_stats', 'off', false);

 pg_terminate_backend

+   
+pg_promote
+   
 

 signal
@@ -18790,6 +18793,16 @@ SELECT set_config('log_statement_stats', 'off', false);
 however only superusers can terminate superuser backends.

   
+  
+   
+pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60)
+
+   boolean
+   Promote a physical standby server.  This function is restricted to
+superusers by default, but other users can be granted EXECUTE to run
+the function.
+   
+  
  
 

@@ -18827,6 +18840,14 @@ SELECT set_config('log_statement_stats', 'off', false);
 subprocess.

 
+   
+pg_promote can only be called on standby servers.
+If the argument wait is true,
+the function waits until promotion is complete or wait_seconds
+seconds have passed, otherwise the function returns immediately after sending
+the promotion signal to the postmaster.
+   
+
   
 
   
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index ebcb3daaed..f8e036965c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1472,7 +1472,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 

 To trigger failover of a log-shipping standby server,
-run pg_ctl promote or create a trigger
+run pg_ctl promote, call pg_promote(), or create a trigger
 file with the file name and path specified by the trigger_file
 setting in recovery.conf. If you're planning to use
 pg_ctl promote to fail over, trigger_file is
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 92825fdf19..d06cd0b08e 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -439,7 +439,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  
   Specifies a trigger file whose presence ends recovery in the
   standby.  Even if this value is not set, you can still promote
-  the standby using pg_ctl promote.
+  the standby using pg_ctl promote or calling
+  pg_promote().
   This setting has no effect if standby_mode is off.
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7375a78ffc..62fc418893 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -78,12 +78,6 @@
 
 extern uint32 bootstrap_data_checksum_version;
 
-/* File path names (all relative to $PGDATA) */
-#define RECOVERY_COMMAND_FILE	"recovery.conf"
-#define RECOVERY_COMMAND_DONE	"recovery.done"
-#define PROMOTE_SIGNAL_FILE		"promote"
-#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"
-
 
 /* User-settable parameters */
 int			max_wal_size_mb = 1024; /* 1 GB */
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 9731742978..e97d1b63bc 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -23,6 +23,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -35,6 +36,7 @@
 #include "storage/fd.h"
 #include "storage/ipc.h"
 
+#include 
 
 /*
  * Store label file and tablespace map during non-exclusive backups.
@@ -697,3 +699,84 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+/*
+ * Promote a standby server.
+ *
+ * A result of "true" means that promotion has been completed
+ * (or initiated if "wait" is false).
+ */
+Datum
+pg_prom

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

On Fri, Oct 19, 2018 at 23:40 Michael Paquier  wrote:

> On Fri, Oct 19, 2018 at 06:43:18PM -0400, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> I don't think just reverting it is really acceptable.
> >
> > +several.  I do not mind somebody writing and installing a better fix.
> > I do object to turning the buildfarm red again.
>
> I did not expect this thread to turn into a war zone.  Anyway, there are
> a couple of things I agree with on this thread:
> - I agree with Andres point here:
> https://postgr.es/m/20181019171747.4uithw2sjkt6m...@alap3.anarazel.de
> A blacklist is fundamentally more difficult to maintain as there are
> way more things added in a data folder which do not have data checksums
> than things which have checksums.  So using a blacklist approach looks
> unmaintainable in the long term.  Future patches related to enabling
> online checksum verification make me worry if we keep the code like
> that.  I can also easily imagine that anybody willing to use the
> pluggable storage API would like to put new files in tablespace-related
> data folders, relying on "base/" being the default system tablespace
>

If we are going to decide that we only deal with files in our directories
matching these whitelisted patterns, then shouldn’t we apply similar logic
in things like DROP DATABASE and any other cases where we perform actions
in a recursive manner across our database and table space directories?

Should we really be removing arbitrary files that we know nothing about,
after all?

What about pg_basebackup?  Shall we update it to only stream through files
matching these patterns as those are the only files we consider ourselves
to be aware of?

I don’t buy off on any argument that presents pluggable storage as not
including some way for us to track and be aware of what files are
associated with that pluggable storage mechanism and which of those files
have checksums and how to verify them if they do. In other words, I sure
hope we don’t accept an approach like cstore *fdw* uses for pluggable
storage where the core system has no idea whatsoever about what these
random files dropped into our tablespace directories are.

- I agree with Stephen's point that we should decide if a file has
> checksums or not in a single place, and that we should use the same
> logic for base backups and pg_verify_checksums.


Despite it being a lot of the discussion, I don’t think there was ever
disagreement on this point.

- I agree with not doing a simple revert to not turn the buildfarm red
> again.  This is annoying for animal maintainers.  Andrew has done a very
> nice work in disabling manually those tests temporarily.


This is a red herring, and always was, so I’m rather unimpressed at how it
keeps coming up- no, I’m not advocating that we should just make the build
farm red and just leave it that way.  Yes, we should fix this case, and fix
pg_basebackup, and maybe even try to add some regression tests which test
this exact same case in pg_basebackup, but making the build farm green is
*not* the only thing we should care about.

- The base backup logic deciding if a file has checksums looks broken to
> me: it misses files generated by EXEC_BACKEND, and any instance of
> Postgres using an extension with custom files and data checksums has its
> backups broken.  cstore_fdw has been mentioned above, and I recall that
> Heroku for example enables data checksums.  If you combine both, it
> basically means that such an instance cannot take base backups anymore
> while it was able to do so with pre-10 with default options.  That's not
> cool.


This is incorrect, pg_basebackup will still back up and keep files which
fail checksum checks- logic which David Steele and I pushed for when the
checksumming logic was added to pg_basebackup, as I recall, but it’ll
complain and warn about these files ...

As it *should*, even if we weren’t doing checksum checks, because these are
random files that have been dropped into a PG directory that PG doesn’t
know anything about, which aren’t handled through WAL and therefore there’s
no way to know if they’ll be at all valid when they’re copied, or that
backing them up is at all useful.

Backups are absolutely important and we wouldn’t want a backup to be
aborted or failed due to checksum failures, in general, but the user should
be made aware of these failures.  This approach of only taking
responsibility for the files we know of through these patterns could also
imply that we shouldn’t back up in pg_basebackup files that don’t match-
but that strikes me as another similarly risky approach as any mistake in
this whitelist of known files could then result in an inconsistent backup
that’s missing things that should have been included.

As for which approach is easier to maintain, I don’t see one as being
meaningfully much easier to maintain than the other, code wise, while
having these patterns looks to me as having a lot more risk, with the only
rather nebulous gain be

Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Paquier
On Fri, Oct 19, 2018 at 08:50:04PM -0700, Andres Freund wrote:
> On 2018-10-20 12:39:55 +0900, Michael Paquier wrote:
>> So what I think we ought to do is the following:
>> - Start a new thread, this one about TAP tests is not adapted.
>> - Add in src/common/relpath.c the API from d55241af called
>> isRelFileName(), make use of it in the base backup code, and basically
>> remove is_checksummed_file() and the checksum skip list.
> 
> I think it probably shouldn't quite be that as an API.  The code should
> not just check whether the file matches a pattern, but also importantly
> needs to exclude files that are in a temp tablespace. isRelFileName()
> doesn't quite describe an API meaning that.  I assume we should keep
> something like isRelFileName() but use an API ontop of that that also
> exclude temp files / relations.

From what I can see we would need to check for a couple of patterns if
we go to this extent:
- Look for PG_TEMP_FILE_PREFIX and exclude those.
- looks_like_temp_rel_name() which checks for temp files names.  This is
similar to isRelFileName except that it works on temporary files.
Moving it to relpath.c and renaming it IsTempRelFileName is an idea.
But this one would not be necessary as isRelFileName discards temporary
relations, no?
- parse_filename_for_nontemp_relation() is also too similar to
isRelFileName().

At the end, do we really need to do anything more than adding some
checks on PG_TEMP_FILE_PREFIX?  I am not sure that there is much need
for a global API like isChecksummedFile for only those two places.  I
have already a patch doing the work of moving isRelFileName() into
src/common/.  Adding one check on PG_TEMP_FILE_PREFIX is not much work
on top of it.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-20 13:30:46 +0900, Michael Paquier wrote:
> On Fri, Oct 19, 2018 at 08:50:04PM -0700, Andres Freund wrote:
> > On 2018-10-20 12:39:55 +0900, Michael Paquier wrote:
> >> So what I think we ought to do is the following:
> >> - Start a new thread, this one about TAP tests is not adapted.
> >> - Add in src/common/relpath.c the API from d55241af called
> >> isRelFileName(), make use of it in the base backup code, and basically
> >> remove is_checksummed_file() and the checksum skip list.
> > 
> > I think it probably shouldn't quite be that as an API.  The code should
> > not just check whether the file matches a pattern, but also importantly
> > needs to exclude files that are in a temp tablespace. isRelFileName()
> > doesn't quite describe an API meaning that.  I assume we should keep
> > something like isRelFileName() but use an API ontop of that that also
> > exclude temp files / relations.
> 
> From what I can see we would need to check for a couple of patterns if
> we go to this extent:
> - Look for PG_TEMP_FILE_PREFIX and exclude those.
> - looks_like_temp_rel_name() which checks for temp files names.  This is
> similar to isRelFileName except that it works on temporary files.
> Moving it to relpath.c and renaming it IsTempRelFileName is an idea.
> But this one would not be necessary as isRelFileName discards temporary
> relations, no?

I think it's not good to have those necessary intermingled in an exposed
function.


> At the end, do we really need to do anything more than adding some
> checks on PG_TEMP_FILE_PREFIX?

I think we also need the exclusion list basebackup.c has that generally
skips files. They might be excluded anyway, but I think it'd be safer to
make sure.

> I am not sure that there is much need for a global API like
> isChecksummedFile for only those two places.

Seems likely that other tools would want to have access too.

Greetings,

Andres Freund



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

On Sat, Oct 20, 2018 at 00:31 Michael Paquier  wrote:

> On Fri, Oct 19, 2018 at 08:50:04PM -0700, Andres Freund wrote:
> > On 2018-10-20 12:39:55 +0900, Michael Paquier wrote:
> >> So what I think we ought to do is the following:
> >> - Start a new thread, this one about TAP tests is not adapted.
> >> - Add in src/common/relpath.c the API from d55241af called
> >> isRelFileName(), make use of it in the base backup code, and basically
> >> remove is_checksummed_file() and the checksum skip list.
> >
> > I think it probably shouldn't quite be that as an API.  The code should
> > not just check whether the file matches a pattern, but also importantly
> > needs to exclude files that are in a temp tablespace. isRelFileName()
> > doesn't quite describe an API meaning that.  I assume we should keep
> > something like isRelFileName() but use an API ontop of that that also
> > exclude temp files / relations.
>
> From what I can see we would need to check for a couple of patterns if
> we go to this extent:
> - Look for PG_TEMP_FILE_PREFIX and exclude those.
> - looks_like_temp_rel_name() which checks for temp files names.  This is
> similar to isRelFileName except that it works on temporary files.
> Moving it to relpath.c and renaming it IsTempRelFileName is an idea.
> But this one would not be necessary as isRelFileName discards temporary
> relations, no?
> - parse_filename_for_nontemp_relation() is also too similar to
> isRelFileName().
>
> At the end, do we really need to do anything more than adding some
> checks on PG_TEMP_FILE_PREFIX?  I am not sure that there is much need
> for a global API like isChecksummedFile for only those two places.  I
> have already a patch doing the work of moving isRelFileName() into
> src/common/.  Adding one check on PG_TEMP_FILE_PREFIX is not much work
> on top of it.


I’m not at my computer at the moment so may not be entirely following the
question here but to be clear, whatever we do here will have downstream
impact into other tools, such as pgbackrest, and therefore we definitely
want to have the code in libpgcommon so that these external tools can
leverage it and know that they’re doing what PG does.

I’d also like to give David Steele a chance to comment on the specific API,
and any other backup tools authors, which I don’t think we should be
rushing into anyway and I would think we’d only put into master..

Thanks!

Stephen


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Paquier
On Sat, Oct 20, 2018 at 12:25:19AM -0400, Stephen Frost wrote:
> On Fri, Oct 19, 2018 at 23:40 Michael Paquier  wrote:
>> - I agree with not doing a simple revert to not turn the buildfarm red
>> again.  This is annoying for animal maintainers.  Andrew has done a very
>> nice work in disabling manually those tests temporarily.
> 
> This is a red herring, and always was, so I’m rather unimpressed at how it
> keeps coming up- no, I’m not advocating that we should just make the build
> farm red and just leave it that way.  Yes, we should fix this case, and fix
> pg_basebackup, and maybe even try to add some regression tests which test
> this exact same case in pg_basebackup, but making the build farm green is
> *not* the only thing we should care about.

Well, the root of the problem was that pg_verify_checksums has been
committed without any tests on its own.  If we had those tests from the
start, then we would not be having this discussion post-release time,
still trying to figure out if whitelisting or blacklisting is
appropriate.

The validation checksums in base backups has been added in 4eb77d50, a
couple of days before pg_verify_checksums has been introduced.  This has
added corruption-related tests in src/bin/pg_basebackup, which is a good
thing.  However the feature has been designed so as checksum mismatches
are ignored after 5 failures, which actually *masked* the fact that
EXEC_BACKEND files like CONFIG_EXEC_PARAMS should have been skipped
instead of getting checksum failures.  And that's a bad thing.  So this
gives in my opinion a good point about using a whitelist.
--
Michael


signature.asc
Description: PGP signature


Re: Function to promote standby servers

2018-10-19 Thread Michael Paquier
On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote:
> Here is another version, with a fix in pg_proc.dat, an improved comment
> and "wait_seconds" exercised in the regression test.

Thanks for the new version.  This looks pretty good to me.  I'll see if
I can review it once and then commit.

> - WAIT_EVENT_SYNC_REP
> + WAIT_EVENT_SYNC_REP,
> + WAIT_EVENT_PROMOTE
>  } WaitEventIPC;

Those are kept in alphabetical order.  Individual wait events are also
documented with a description.
--
Michael


signature.asc
Description: PGP signature


Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-10-19 Thread Peter Geoghegan
On Thu, Oct 18, 2018 at 1:44 PM Andres Freund  wrote:
> I wonder if it'd make sense to hack up a patch that logs when evicting a
> buffer while already holding another lwlock. That shouldn't be too hard.

I tried this. It looks like we're calling FlushBuffer() with more than
a single LWLock held (not just the single buffer lock) somewhat *less*
with the patch. This is a positive sign for the patch, but also means
that I'm no closer to figuring out what's going on.

I tested a case with a 1GB shared_buffers + a TPC-C database sized at
about 10GB. I didn't want the extra LOG instrumentation to influence
the outcome.

-- 
Peter Geoghegan



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Paquier
On Sat, Oct 20, 2018 at 12:41:04AM -0400, Stephen Frost wrote:
> I’d also like to give David Steele a chance to comment on the specific API,
> and any other backup tools authors, which I don’t think we should be
> rushing into anyway and I would think we’d only put into master..

Getting David input would be nice!
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

On Sat, Oct 20, 2018 at 00:43 Michael Paquier  wrote:

> On Sat, Oct 20, 2018 at 12:25:19AM -0400, Stephen Frost wrote:
> > On Fri, Oct 19, 2018 at 23:40 Michael Paquier 
> wrote:
> >> - I agree with not doing a simple revert to not turn the buildfarm red
> >> again.  This is annoying for animal maintainers.  Andrew has done a very
> >> nice work in disabling manually those tests temporarily.
> >
> > This is a red herring, and always was, so I’m rather unimpressed at how
> it
> > keeps coming up- no, I’m not advocating that we should just make the
> build
> > farm red and just leave it that way.  Yes, we should fix this case, and
> fix
> > pg_basebackup, and maybe even try to add some regression tests which test
> > this exact same case in pg_basebackup, but making the build farm green is
> > *not* the only thing we should care about.
>
> Well, the root of the problem was that pg_verify_checksums has been
> committed without any tests on its own.  If we had those tests from the
> start, then we would not be having this discussion post-release time,
> still trying to figure out if whitelisting or blacklisting is
> appropriate.
>
> The validation checksums in base backups has been added in 4eb77d50, a
> couple of days before pg_verify_checksums has been introduced.  This has
> added corruption-related tests in src/bin/pg_basebackup, which is a good
> thing.  However the feature has been designed so as checksum mismatches
> are ignored after 5 failures, which actually *masked* the fact that
> EXEC_BACKEND files like CONFIG_EXEC_PARAMS should have been skipped
> instead of getting checksum failures.  And that's a bad thing.  So this
> gives in my opinion a good point about using a whitelist.


That counter of checksum failures should have been getting reset between
files..  that’s certainly what I had understood was intended.  The idea of
the counter is to not flood the log with errors when a file is discovered
that’s full of checksum failures (as can happen if large chunks of the file
got replaced with something else, for example), but it should be reporting
on each file that does have failures in it.

I don’t see how having a whitelist changes that or would have impacted that
logic to make it correct initially or not.

I’m also trying to understand how this checksum logging limit is getting
hit in the tests but they’re passing..?  If this was an intended failure
check then surely there’s a test done that’s intended to be successful and
it should have complained about this file too, or perhaps that’s what’s
missing. I’m happy to look into this later this weekend.  Certainly seems
like something here isn’t really making sense tho.

Thanks!

Stephen


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Paquier
On Sat, Oct 20, 2018 at 12:41:04AM -0400, Stephen Frost wrote:
> I’d also like to give David Steele a chance to comment on the specific API,
> and any other backup tools authors, which I don’t think we should be
> rushing into anyway and I would think we’d only put into master..

By the way, we need to do something for the checksum verification code
in base backups for v11 as well.  If you enable checksums and take a
base backup of a build with EXEC_BACKEND, then this creates spurious
checksums failures.  That's a bug.  So while I agree that having a
larger robust API is fine for HEAD, I would most likely not back-patch
it.  This is why I would suggest as a first step for HEAD and v11 to use
a whitelist for base backups, to check for temporary tablespaces in
pg_verify_checksums, to move isRelFileName into src/common/ and to keep
the change minimalistic.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

On Sat, Oct 20, 2018 at 00:58 Michael Paquier  wrote:

> On Sat, Oct 20, 2018 at 12:41:04AM -0400, Stephen Frost wrote:
> > I’d also like to give David Steele a chance to comment on the specific
> API,
> > and any other backup tools authors, which I don’t think we should be
> > rushing into anyway and I would think we’d only put into master..
>
> By the way, we need to do something for the checksum verification code
> in base backups for v11 as well.  If you enable checksums and take a
> base backup of a build with EXEC_BACKEND, then this creates spurious
> checksums failures.  That's a bug.  So while I agree that having a
> larger robust API is fine for HEAD, I would most likely not back-patch
> it.  This is why I would suggest as a first step for HEAD and v11 to use
> a whitelist for base backups, to check for temporary tablespaces in
> pg_verify_checksums, to move isRelFileName into src/common/ and to keep
> the change minimalistic.


I’m all for keeping the changes which are backpatched minimal, which
updating the blacklist as your original patch on this thread did would
certainly be.  Even adding in the logic to skip temp files as pg_basebackup
has would be simpler and based on existing well-tested and extensively used
code, unlike this new pattern-based whitelist of files approach.

I have to say that I can’t recall hearing much in the way of complaints
about pg_basebackup copying all the random cstore files, or the new
checksum validation logic complaining about them, and such when doing
backups and I wonder if that is because people simply don’t use the two
together much, making me wonder how much of an issue this really is or
would be with the account-for-everything approach I’ve been advocating for.

Thanks!

Stephen

>


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-20 01:07:43 -0400, Stephen Frost wrote:
> I have to say that I can’t recall hearing much in the way of complaints
> about pg_basebackup copying all the random cstore files

Why would somebody complain about that? It's usually desirable.


> or the new checksum validation logic complaining about them, and such
> when doing backups and I wonder if that is because people simply don’t
> use the two together much, making me wonder how much of an issue this
> really is or would be with the account-for-everything approach I’ve
> been advocating for.

I mean obviously pg_verify_checksum simply hasn't been actually tested
much with plain postgres without extensions, given the all weaknesses
identified in this thread.

Greetings,

Andres Freund



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Andres Freund
Hi,

On 2018-10-20 00:25:19 -0400, Stephen Frost wrote:
> If we are going to decide that we only deal with files in our directories
> matching these whitelisted patterns, then shouldn’t we apply similar logic
> in things like DROP DATABASE and any other cases where we perform actions
> in a recursive manner across our database and table space directories?

I'm honestly not sure if you're just trolling at this point.  Why would
anybody reasonable be concerned about DROP DATABASE dropping the
directory for the database? You're not honestly suggesting that anybody
would write an extension or anything like that that stores data in the
wrong database's directory, right?  Other iterations like fsyncing
files, copying the entire template database directory, etc are similarly
harmless or positive.


Greetings,

Andres Freund



Re: Multi-insert into a partitioned table with before insert row trigger causes server crash on latest HEAD

2018-10-19 Thread David Rowley
(Returns from leave and beyond the reaches of the internet)

On 18 October 2018 at 07:45, Peter Eisentraut
 wrote:
> On 16/10/2018 06:33, Ashutosh Sharma wrote:
>> I think, the root cause of this problem is that CopyFrom() is using
>> the stale value of *has_before_insert_row_trig* to determine if the
>> current partition is okay for multi-insert or not i.e.
>> has_before_insert_row_trig used to determine multi-insert condition
>> for the current partition actually belongs to old partition. I think,
>> *has_before_insert_row_trig* needs to updated before CopyFrom()
>> evaluates if the current partition is good to go for multi insert or
>> not. Attached is the patch based on this. I've also added the relevant
>> test-case for it. Peter, David, Could you please have a look into the
>> attached patch and share your thoughts. Thank you.
>
> I have committed your fix and test, moving some code around a bit.  Thanks.

Thanks for pushing that fix.

Originally my patch in [1] only could set leafpart_use_multi_insert to
true within the `if (insertMethod == CIM_MULTI_CONDITIONAL)` test, so
wouldn't have suffered from this problem.

I'm not sure that doubling up the `insertMethod ==
CIM_MULTI_CONDITIONAL` test is the cleanest fix. Personally, I liked
the way it was in the v6 edition of the patch, but I'm used to getting
outvoted.

[1] 
https://www.postgresql.org/message-id/CAKJS1f9f8yuj04X_rffNu2JPbvhy+YP_aVH6iwCTJ1OL=yw...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

On Sat, Oct 20, 2018 at 01:16 Andres Freund  wrote:

> Hi,
>
> On 2018-10-20 00:25:19 -0400, Stephen Frost wrote:
> > If we are going to decide that we only deal with files in our directories
> > matching these whitelisted patterns, then shouldn’t we apply similar
> logic
> > in things like DROP DATABASE and any other cases where we perform actions
> > in a recursive manner across our database and table space directories?
>
> I'm honestly not sure if you're just trolling at this point.  Why would
> anybody reasonable be concerned about DROP DATABASE dropping the
> directory for the database? You're not honestly suggesting that anybody
> would write an extension or anything like that that stores data in the
> wrong database's directory, right?  Other iterations like fsyncing
> files, copying the entire template database directory, etc are similarly
> harmless or positive.


No, I’m not trolling, what I was trying to do is make the point that this
is moving us away from having a very clear idea of what’s PG’s
responsibility and what we feel comfortable operating on and this new
half-and-half stance where we’ll happily nuke files in a directory that we
didn’t create, and back them up even if we have no idea if they’ll be
consistent at all or not when restored, but we won’t try to check the
checksums on them or do some other set of operations on them.

I suspect it’s pretty clear already, but just to make it plain, I really
don’t like the half-and-half approach and it seems we’re being backed into
this because it happens to fit some specific cases and not because there
was any real thought or design put into supporting these use cases or being
able to extend PG in this way.  I do also see real risks with a
whitelisting kind of approach that we end up missing things, and I get that
you don’t see that risk but just stating that doesn’t change my opinion on
it.  Based on what you said up thread this whole thing also seems like it
may be just going away anyway- because there’s real design being done to do
this properly and allow PG to be extended in a way that we will know about
what files are associated with what extensions or storage mechanisms and
that seems like just another reason why we shouldn’t be moving to
explicitly support random files being dropped into PG directories.

Thanks,

Stephen


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Stephen Frost
Greetings,

On Sat, Oct 20, 2018 at 01:11 Andres Freund  wrote:

> Hi,
>
> On 2018-10-20 01:07:43 -0400, Stephen Frost wrote:
> > I have to say that I can’t recall hearing much in the way of complaints
> > about pg_basebackup copying all the random cstore files
>
> Why would somebody complain about that? It's usually desirable.


Even though they’re as likely as not to be invalid or corrupted..?  Maybe
things have moved forward here, I know there’s been discussion about it,
but last I heard those files weren’t WAL’d and therefore the result of
copying them from a running server was indeterminate. Yes, sometimes
they’ll be fine, but you could say the same about regular PG relations too
and yet we certainly wouldn’t be accepting of that.  It certainly seems
reasonable that people would complain about pg_basebackup misperforming
when a backup that it did results in an invalid restore, though it tends to
be a lot rarer to get complaints about partial failures like a corrupt or
partial file being copied during a backup- but then that’s part of why we
stress so much about trying to make sure we don’t do that as it can be hard
to detect.

People certainly did complain about unlogged tables being backed up and
that was just because they took up space in the backup and time on the
backup and restore just to be nuked when the server is started.

> or the new checksum validation logic complaining about them, and such
> > when doing backups and I wonder if that is because people simply don’t
> > use the two together much, making me wonder how much of an issue this
> > really is or would be with the account-for-everything approach I’ve
> > been advocating for.
>
> I mean obviously pg_verify_checksum simply hasn't been actually tested
> much with plain postgres without extensions, given the all weaknesses
> identified in this thread.


No, it hasn’t, but pg_basebackup has been around quite a while and has
always copied everything, as best as I can recall anyway.

Thanks,

Stephen

>


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-10-19 Thread Michael Paquier
On Sat, Oct 20, 2018 at 02:03:32AM -0400, Stephen Frost wrote:
> On Sat, Oct 20, 2018 at 01:11 Andres Freund  wrote:
>> or the new checksum validation logic complaining about them, and such
>>> when doing backups and I wonder if that is because people simply don’t
>>> use the two together much, making me wonder how much of an issue this
>>> really is or would be with the account-for-everything approach I’ve
>>> been advocating for.
>>
>> I mean obviously pg_verify_checksum simply hasn't been actually tested
>> much with plain postgres without extensions, given the all weaknesses
>> identified in this thread.
> 
> No, it hasn’t, but pg_basebackup has been around quite a while and has
> always copied everything, as best as I can recall anyway.

At this point, let's create a new thread with a description of what has
been discussed and what we'd like to do for HEAD and v11.  I got
something in mind which would result in a minimal patch.  Let's start
from that.
--
Michael


signature.asc
Description: PGP signature